Skip to content

A conversion of Revature Project 1 to Python from Scala

Notifications You must be signed in to change notification settings

HamingnottT/proj1-python

Repository files navigation

proj1-python

Revature Big Data Project 1 - Python version

Project Description

Project Requirements: Translate your project 0 or project 1 from Scala to Python.

This version of my project 1 is a data analytics application translated from Scala to Python. This application is a demonstration of how a program in either language, using Apache Spark and Hive, becomes a useful tool in business analysis.

The object of this appication demo is an up-and-coming coffee shop with 9 branches. My goal was to create a user friendly commandline interface and experiment with Spark SQL and Data Frame analysis to assist with internal research.

Technologies Used

-VS Code v. 1.65.2

-Python v. 3.10.3

-Apache Spark v. 3.1.2

-Apache Hive v. 2.3.9

-Microsoft Excel v. 2103

Features

-Interactive CLI

-Functional Spark Warehouse & Hive Metastore

Getting Started

Repo can be cloned easily using the Github CLI, Desktop app, or by using VS Code's Github integration. This project's source code should work well in any IDE as is has no IDE-specific dependencies.

This application should not need anything outside of Python and a package manager such as Pip or Conda. PySpark must be installed for this to work, however.

To install PySpark: pip install pyspark pip install findspark

Findspark is optional - it helps find the spark installation as PySpark is not in PATH by default.

Included requirements.txt and a batch file for easy pip installation for windows users.

Usage

This project should be ready to plug and play. There are a few known bugs, such as in scenario 6, as this is an on-going project. More features will be added later for better versatility.

Contributors

Jacob Nottingham

License

Unlicensed

About

A conversion of Revature Project 1 to Python from Scala

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published