Project name Airline On Time Performance Analytics

This project analyzes On-Time Performance Dataset. Report can be found here

On Time performance data consists of airport and airline data in USA across the past 30 years. The dataset size is 60 GB split across 330+ CSV files. Each file has 100+ fields. Only a few of the fields are relevant. The dataset requires significant cleaning before relevant information can be extracted.

The project dived into seasonal trends across airports and airline delays. The delays were visualized over the past 30 years for the top 5 airports and airlines with maximum delays. The insights captured matched with the world events which led to those delays.

The project functionality was replicated in Hadoop and Spark in AWS .

Pre-requisites:

Java (v1.8+)
Spark (v2.2.0 compiled with Scala 2.11.X and hadoop-2.7)
Scala (v2.11.X)
.Renviron should exist to generate the report in R
Hadoop(v2.8.1)

This project can output the result using both spark and hadoop. To run the project with hadoop, use make build and make run. For spark version use make sbuild and make srun.

The hadoop version might require setup in your local machine. You may use make setup to push the whole input directory in HDFS. Also, if you have the input directory setup in local/HDFS, you will be able to run the whole program directly with make all which builds the JAR, cleans HDFS output and runs the JAR with hadoop.

Before running the project:

Update the variables SCALA_HOME,SPARK_HOME,HADOOP_HOME,MY_CLASSPATH,SC_CLASSPATH as per your configuration in Makefile.
Copy the input files to be processed in a folder input in the same level as src folder.
For the hadoop version and spark version, the output will be located in output.

More details can be found in the report.pdf and diff.pdf.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
META-INF		META-INF
SMETA-INF		SMETA-INF
lookup		lookup
output		output
src/main		src/main
.classpath		.classpath
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
a8-bishwajeet.iml		a8-bishwajeet.iml
arch.png		arch.png
aws.csv		aws.csv
bm.csv		bm.csv
derby.log		derby.log
diff.Rmd		diff.Rmd
diff.pdf		diff.pdf
old.pdf		old.pdf
pom.xml		pom.xml
report.Rmd		report.Rmd
report.pdf		report.pdf
sysinfo.csv		sysinfo.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project name Airline On Time Performance Analytics

About

Releases

Packages

Languages

deyb/airline-ontime-analytics

Folders and files

Latest commit

History

Repository files navigation

Project name Airline On Time Performance Analytics

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages