<a href="https://colab.research.google.com/github/Lovelylove03/-my_streamlit12_app.py/blob/main/project3_SENDAIR.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

1. Data Exploration
Objective: Understand the data structure, clean it, and prepare it for analysis.

Steps:

Load the datasets (flight.csv, airports.csv, airlines.csv).
Inspect the datasets to identify missing values, outliers, and anomalies.
Clean the data: Handle missing values, remove duplicates, and correct any inconsistent data.
Generate summary statistics and visualizations to understand data distribution.


In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load datasets
flights = pd.read_csv('flight.csv')
airports = pd.read_csv('airports.csv')
airlines = pd.read_csv('airlines.csv')

# Inspect the first few rows
print(flights.head())
print(airports.head())
print(airlines.head())

# Check for missing values
print(flights.isnull().sum())
print(airports.isnull().sum())
print(airlines.isnull().sum())

# Visualize data distribution
sns.histplot(flights['delay_minutes'], kde=True)
plt.show()


Analysis of Delays
Objective: Identify patterns in flight delays.

Steps:

Calculate the average delay for each airline, airport, and day of the week.
Identify the most significant sources of delays.
Create visualizations to highlight these findings.


In [None]:
# Calculate average delay by airline
avg_delay_airline = flights.groupby('airline')['delay_minutes'].mean().reset_index()

# Visualize average delay by airline
sns.barplot(x='airline', y='delay_minutes', data=avg_delay_airline)
plt.title('Average Delay by Airline')
plt.show()


Geocoding and Visualization
Objective: Visualize airport locations and their associated delay statistics.

Steps:

Merge the flight data with the airport data based on airport codes.
Use geocoding to map airports on an interactive map.
Create visualizations showing the distribution of delays across different airports.


In [None]:
import folium

# Merge flights with airports
flights_airports = pd.merge(flights, airports, left_on='origin_airport', right_on='iata_code')

# Create a map
m = folium.Map(location=[40, -95], zoom_start=4)

# Add airport locations
for _, row in flights_airports.iterrows():
    folium.Marker([row['latitude'], row['longitude']],
                  popup=f"{row['airport_name']} - Avg Delay: {row['delay_minutes']} min").add_to(m)

# Show map
m.save('airports_map.html')


Machine Learning Model for Delay Prediction
Objective: Predict flight delays using machine learning.

Steps:

Prepare the data: Convert categorical variables, split the data into features and labels.
Train a machine learning model (e.g., Random Forest, XGBoost).
Evaluate the model’s performance (e.g., accuracy, precision, recall).


In [None]:
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report

# Convert categorical variables using one-hot encoding
flights_encoded = pd.get_dummies(flights, columns=['airline', 'origin_airport', 'destination_airport'])

# Split the data
X = flights_encoded.drop('delay_binary', axis=1)  # Assuming delay_binary is your target
y = flights_encoded['delay_binary']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train a Random Forest model
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

# Predict and evaluate
y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))


 Customer Feedback Analysis
Objective: Analyze customer reviews and predict sentiment.

Steps:

Perform sentiment analysis on the review text.
Identify common positive and negative feedback themes.
Visualize the distribution of sentiments across different airlines.

In [None]:
from textblob import TextBlob

# Analyze sentiment
def get_sentiment(text):
    analysis = TextBlob(text)
    return 'positive' if analysis.sentiment.polarity > 0 else 'negative'

reviews['sentiment'] = reviews['review_text'].apply(get_sentiment)

# Visualize sentiment distribution
sns.countplot(x='airline', hue='sentiment', data=reviews)
plt.title('Sentiment Analysis of Airline Reviews')
plt.show()
