Airlines Classification using PySpark ML

Spark ML

Dealing with large datasets and diverse data sources can be challenging when applying traditional machine learning techniques.
Spark, a distributed processing engine utilizing the MapReduce framework, addresses these challenges in big data processing.

Objective

This project focuses on Classification and Clustering in Spark MLlib using Airlines Data.

Implementation includes Decision tree classifier, Random forest classifier, and K-Means clustering algorithms.

Business Overview of Airlines Industry

S3 Link for Dataset

s3://airlines123/airline/data.zip

Tech Stack

Language: Python
Package: Pyspark
Services: Spark

Code Overview

File Names:
- DecisionTree.ipynb
- RandomForest.ipynb
- K_means.ipynb
Datasets:
- data.zip
- Social_Network_Ads.csv

Steps to Run

Command Prompt

Execute using Python script:

<spark_path> spark-submit <file_path>

<spark_path>: Path to Spark installation
<file_path>: Path to the script file

Example:

<C:\Users\admin\Desktop\spark\bin>spark-submit C:\Users\admin\Desktop\sparkml\DecisionTree.py>

IPython

Modular Code
- Create a virtual environment
- Install requirements: pip install -r requirements.txt
- Run code: python DecisionTree.py
- Check output for all visualizations

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Data		Data
lib		lib
ref		ref
LICENSE		LICENSE
Readme.md		Readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data

Data

lib

lib

ref

ref

LICENSE

LICENSE

Readme.md

Readme.md

requirements.txt

requirements.txt

Repository files navigation

Airlines Classification using PySpark ML

Spark ML

Objective

Business Overview of Airlines Industry

S3 Link for Dataset

Tech Stack

Code Overview

Steps to Run

Command Prompt

IPython

About

Releases

Packages

Languages

License

AjNavneet/Airlines-Classification-Clustering-PySparkML

Folders and files

Latest commit

History

Repository files navigation

Airlines Classification using PySpark ML

Spark ML

Objective

Business Overview of Airlines Industry

S3 Link for Dataset

Tech Stack

Code Overview

Steps to Run

Command Prompt

IPython

About

Topics

Resources

License

Stars

Watchers

Forks

Languages