Skip to content

Exploratory Data Analysis of Airline crashes using ETL method,data storage using PostgresSQL and displaying data in a clean interactive dashboard.

Notifications You must be signed in to change notification settings

RynnAethelWulf/Dashboard-Airplane-Crashes-EDA

Repository files navigation

Project logo

Exploratory Data Analysis of Airline crashes using ETL method,data storage using PostgresSQL, displaying data in a clean interactive dashboard and deploypment using amazon ec2 cloud service.

Status GitHub Issues License


The project kicked-off with an intention to analyze the available data of airline crashes from 1908 till 2019.

📝 Table of Contents

🧐 Team Members

  1. Divya
  2. Warren
  3. Kelly
  4. Bimal

📒 Objective

Exploratory data analysis for plane crashes between 1908 and 2019 investigating on underlying reasons, location of events and finding trends if any.

Hypothesis

  1. Is there any link between the different time periods and the fatality rate?

  2. What has been the overall trend for air travel fatalities by country?

  3. What are the most dangerous aircraft operators?

  4. Which Aircraft types were most represented in accidents?

  5. Which countries had the most accidents?

Extraction

Data was extracted from a Kaggle csv and contained data from 1908 to 2020.

Transforming

The following cleaning was done using Jupyter notebooks:

  • Dropping columns for readability.
  • Formatting of dates using timestamp
  • Calculation and addition of a fatality rate column
  • Binning by 20-year periods with a corresponding column for sorting.
  • String splitting to extract origin airports for origin column.
  • Processing using GeoPY module for geolocation data of countries.
  • Dropping of null values as well as fixing location data, as some entries were null due to crashes in international waters.

Loading

  • The data was loaded into Postgres using SQL alchemy
  • API Endpoints were created using flask.

Visualization techniques

Dashboard Gif

  1. Python Flask-powered API -Endpoints created with server-side rendering.
  2. HTML , Plot.ly ,Geopy ,Apex.js and D3 used in combination for creating visuals including graphs and charts.

🏁 Getting Started

Data Api Endpoints

Admin
API endpoints generated by Flask Environment

Admin
Crashes reported has increased after the period of 1940.
Notably between 1970-1990

  • Highest fatality reported in 1948
  • Total crashes -77
  • Total fatalities -1264

Search Bar

Admin

Live Serach bar to display data analysis between 1908 to 2019

Conclusion

  1. An Increase in the number of fatalities recorded after 1945 observed. This could be the increased no. of flights and high mobility after WW2.

  2. The countries with high frequency of flights and busier routes like US; rough terrain and weather conditions(Russia) and poor flight safety standards or regulatory regime(Colombia) had largest fatalities reported.

  3. Operators - Aeroflot and Air France came out as the notorious ones for causing maximum fatalities in the crashes. Further investigations required to ascertain the reasons for the above.

  4. Aircraft types caused most fatalities- Douglas Dc-3; followed by De Havilland Canada

  5. The following years are noted for the largest no. of fatalities reported:-
    1972 – 5166 fatalities; 62 incidents

  • Avg fatality rate per incident = 83.32% 1985 – 4860 fatalities 1996 – 4376 fatalities 2000 – 2786 fatalities; 57 incidents
  • Avg fatality rate per incident = 48.87% 2014 - 2248 fatalities

Deployment

🎉 Acknowledgements

  • UWA Data Science
  • Kaggle
  • Team Members

About

Exploratory Data Analysis of Airline crashes using ETL method,data storage using PostgresSQL and displaying data in a clean interactive dashboard.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published