Revature Big Data Project 1 - Python version
Project Requirements: Translate your project 0 or project 1 from Scala to Python.
This version of my project 1 is a data analytics application translated from Scala to Python. This application is a demonstration of how a program in either language, using Apache Spark and Hive, becomes a useful tool in business analysis.
The object of this appication demo is an up-and-coming coffee shop with 9 branches. My goal was to create a user friendly commandline interface and experiment with Spark SQL and Data Frame analysis to assist with internal research.
-VS Code v. 1.65.2
-Python v. 3.10.3
-Apache Spark v. 3.1.2
-Apache Hive v. 2.3.9
-Microsoft Excel v. 2103
-Interactive CLI
-Functional Spark Warehouse & Hive Metastore
Repo can be cloned easily using the Github CLI, Desktop app, or by using VS Code's Github integration. This project's source code should work well in any IDE as is has no IDE-specific dependencies.
This application should not need anything outside of Python and a package manager such as Pip or Conda. PySpark must be installed for this to work, however.
To install PySpark: pip install pyspark pip install findspark
Findspark is optional - it helps find the spark installation as PySpark is not in PATH by default.
Included requirements.txt and a batch file for easy pip installation for windows users.
This project should be ready to plug and play. There are a few known bugs, such as in scenario 6, as this is an on-going project. More features will be added later for better versatility.
Jacob Nottingham
Unlicensed