# Basic Information
Project Title: Investigating Trends and Risk Factors in Aircraft Incidents<br>
Team Members: Asael Horne, Soumya Chava, Monica Regulagadda<br>
E-mail Addresses: asael.horne@utah.edu, u1538585@utah.edu, u1504835@utah.edu UIDs: u1165839, u1538585, u1504835<br>
Course: Introduction to Data Science<br>
Institution: University of Utah<br>
Date: March 4, 2025<br>

# Background and Motivation
Aviation incidents remain a crucial area of study, as understanding their causes can lead to significant improvements in safety protocols, regulatory policies, and technology. Aircraft accidents and malfunctions not only pose risks to human life but also have economic and operational consequences. By analyzing historical incident data, we can identify key patterns that contribute to incidents and develop insights that may help reduce future occurrences.
Our interest in this topic stems from both academic curiosity and the presented opportunity to apply our skills of data science. Through this project, we aim to supply data-driven insights that explore trends in aviation incidents and identify key factors contributing to their occurrence. 

# Project Objectives
This project seeks to analyze aircraft incident data and answer the following questions:<br>
1.	How does the number of aircraft incidents vary by year or month?<br>
2.	What are the most common causes of aircraft incidents, and how frequently do they occur?<br>
3.	Is there a correlation between weather conditions (wind speed, visibility, precipitation) along with the age of aircraft and aircraft incidents?<br>
4.	Do more incidents occur during certain lighting conditions (e.g., daytime vs. nighttime)?<br>
5.	 How does altitude affect the likelihood of incidents?<br>
6.	Which phases of flight (takeoff, cruise, landing) have the highest number of incidents?<br>
7.	Where are the most frequent aircraft incidents located, and are there geographic patterns in crash occurrences?<br>
8.	How does the amount of fuel on board affect the likelihood and severity of aircraft incidents?<br>
9.	What percentage of injuries are fatal?<br>
   
By addressing these questions, we hope to:<br>
•	Identify patterns and trends in aviation incidents over time.<br>
•	Understand the role of environmental factors, such as weather and lighting conditions, in aviation incidents.<br>
•	Evaluate the impact of operational factors like aircraft age and altitude on incident likelihood.<br>
•	Provide data-driven insights that could inform aviation safety policies and preventive measures.<br>
•	Apply data science techniques, including statistical analysis, correlation analysis, and machine learning, to extract meaningful insights from aviation data.<br>

# Data Description and Acquisition
The dataset for this study is obtained from the government (Courtesy: National Transportation Safety Board) aviation safety database. The dataset consists of structured records with attributes such as:<br>
•	Incident Details: Date, time, location, and event type.<br>
•	Weather Conditions: Wind speed, visibility, precipitation, temperature.<br>
•	Aircraft Information: Manufacturer, model, year of manufacture, registration details.<br>
•	Flight Phases: Takeoff, cruise, landing.<br>
•	Injury Data: Severity and number of injuries or fatalities.<br>
The data is available for download from the official government source at the NTSB website. The files available for download are Microsoft Access Database files, with more than 100,000 entries and more than 200 attributes. No web scraping or API access restrictions are involved, ensuring compliance with ethical guidelines.<br>
 
# Ethical Considerations
Ethical concerns include:<br>
•	Ensuring responsible use of aviation incident data without misrepresenting findings.<br>
•	Avoiding misleading conclusions that might unfairly attribute blame to individuals, airlines, or aircraft manufacturers without sufficient evidence.<br>
•	Ensuring compliance with data privacy and ethical research guidelines.<br>
•	Depending on results, students may be less likely to fly, causing a decrease in revenue for aviation companies. <br>

# Data Cleaning and Processing
To ensure the dataset is suitable for analysis, we will do a substantial amount of cleaning and processing. We will do some feature engineering like the following:<br>
•	Handling missing data: Using imputation techniques for missing values.<br>
•	Data normalization: Standardizing categorical and numerical data.<br>
•	Removing inconsistencies: Identifying and eliminating duplicate or incorrect records.<br>
The attributes that we will be extracting are ev_date, ev_time, latitude, longitude, mid_air, on_ground_collision, ligt_cond, sky_ceil_ht, wind_vel_kts, gust_kt, altimeter, wx_int_precip, fuel_on_board, acft_model, acft_make, damage, Altitude, inj_person_count, and finding_description.<br>

# Exploratory Analysis
To gain initial insights into the dataset, we will use:<br>
•	Time-series analysis to track trends in aviation incidents.<br>
•	Bar charts and pie charts to visualize the most common causes of incidents.<br>
•	Correlation matrices and scatter plots to analyze relationships between weather conditions, aircraft age and incident occurrence.<br>

# Analysis Methodology
The project will employ:<br>
1.	Descriptive Statistics: Box plots, pie charts, histograms, and scatter plots will be used to summarize and visualize key patterns in the data.<br>
2.	Correlation Analysis: T-tests and ANOVA tests will be used to explore relationships between different factors (e.g., weather, age and incident frequency).<br>
3.	Regression Models: Linear regression models built using the Ordinary Least Squares method will be used to determine the impact of aircraft age, weather, and altitude on incident likelihood.<br>
4.	Chi-squared Tests: Chi-squared tests will be used to assess differences in categorical variables like flight phase, lighting conditions, and variables that violate basic assumptions of ANOVA and t-tests.<br>
5.	Machine Learning Models: Basic unsupervised clustering models will help to predict high-risk conditions based on historical data.<br>

# Project Schedule
Date | Task<br>
March 11, 2025 | Data acquisition and preliminary cleaning<br>
March 18, 2025 | Data exploration and visualization<br>
March 25, 2025 | Correlation and trend analysis<br>
April 1, 2025  | Model development and hypothesis testing<br>
April 8, 2025  | Refinement and documentation<br>
April 15, 2025 | Final submission and presentation preparation<br>

