This Project contains a python app for data analysis using Pyspark SQL
- need to uninstall all python dists below 3 (keep only the latest)
- need Java (installer added) with JAVA_HOME variable pointing to the installation folder without the /bin folder (for windows)
- Java Installer for LINUX and macOS link added
- git clone the repo
- go to PSSQL-auto directory in terminal/cmd
- type
prereq.py
in cmd and hit enter OR typechmod +x ./prereq.py
in Linux terminal hit enter and then type./prereq.py
and hit enter - for macOS, run
sudo chmod +x ./prereq.py
and then./prereq.py
.
ps. Eliminated the need to run more than 2 line of different commands in windows and linux
- for Windows simply go to cloned directory (PSSQL-auto) and run
python pssqla.py
in CMD - for Linux go to cloned directory and run the following commands in terminal~
~
sudo chmod +x ./pssqla.py
~sudo ./pssqla.py
- for macOS, run
sudo chmod +x ./pssqla.py
and then./pssqla.py
- as u may notice some files and folders are not accessible from the main folder
- to delete them use clean.py (in linux and macOS give perms using chmod, in windows type python before script path)
- to copy or move use administrator privilage (in linux and macOS use sudo)
- you can now add custom Pyspark SQL queries and save and export the resulting dataframes ;)
- you can now create pyspark pipelines (similar to an ML Model)
- Input-Columns and Predict inputs must have a Number dtype and str or object or none dtype will not be inputted
- Input-Columns can be multiple (Enter how many you want to enter at the start) and Predict column takes single column as an input