Skip to content

A streaming application which processes, the Data Expo 2009 Airline On-time Performance data for years from 1987 to 2008 (approximately 120 million records), as streams, applies queries on the streams and shows the resulting visualization for analysis, created using Spark Structured Streaming and Flask.

Notifications You must be signed in to change notification settings

Jaini8/Airline_On_Time_Streaming_Data_Processing_And_Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CS543 Massive Data Storage and Retrieval and Deep Learning (Project 1)

Airline_On_Time_Streaming_Data_Processing_And_Analysis

Team members: Arwa El-Hawwat, Jaini Patel, Rahul Dev Ellezhuthil. (Group-6)

About:

In this project, we have designed and developed a stream processing application which processes the Airline Data available from 2009 Data Expo - Airline On-Time Performance, in streaming fashion and applies database queries on the streams to answer some of the questions. The dataset has 120M records and is 10 GB in size and has 4 supporting csv files which have information of airport, carriers, planes and metadata of the dataset. The tools and technologies used in this project are Spark Structured Streaming, Flask, HTML, AmCharts.Js and CSS.

Goal:

Our main objective for this project is to process the streaming flight data using Spark Structured Streaming and to be able to answer the following questions using database queries and visualizations on them (data streams): Which airline carrier is the most reliable in terms of punctuality? What were the worst months to fly historically? What are the busiest airports and paths in the United States? We aim to be able to organize and display our findings in a simple process and web application model.

Expected Users of this dashboard are, Airline staff, route planners, pilots and US domestic travellers.

Steps to run:

  1. git clone https://github.com/Jaini8/Airline_On_Time_Streaming_Data_Processing_And_Analysis.git
  2. cd Airline_On_Time_Streaming_Data_Processing_And_Analysis/flask_app
  3. python flask_app.py

Run : http://ilab1.cs.rutgers.edu:9996/

Video of the working: Airline On-Time Streaming Data Processing And Analysis

About

A streaming application which processes, the Data Expo 2009 Airline On-time Performance data for years from 1987 to 2008 (approximately 120 million records), as streams, applies queries on the streams and shows the resulting visualization for analysis, created using Spark Structured Streaming and Flask.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published