
# ✈️ Who's Happy in the Sky? Predicting Passenger Satisfaction to Reduce Churn

This project builds a logistic regression model to predict airline passenger satisfaction based on service features. It simulates how airlines could identify dissatisfied passengers early and take action to improve experience and reduce churn.


## 📂 Load Dataset

In [None]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix, ConfusionMatrixDisplay

import warnings
warnings.filterwarnings("ignore")

# Download dataset if running on Colab
if "google.colab" in str(get_ipython()):
    !wget https://raw.githubusercontent.com/Rafsun-Chowdhury/Airline-Customer-Satisfaction-Prediction-with-Logistic-Regression/main/Invistico_Airline.csv

df = pd.read_csv("Invistico_Airline.csv")
df.head()


## 🧹 Data Preparation

In [None]:

# Drop initial NaNs
df = df.dropna().reset_index(drop=True)

# Map satisfaction to binary values
df['satisfaction'] = df['satisfaction'].map({'neutral or dissatisfied': 0, 'satisfied': 1})

# Drop any NaNs created by the mapping
df = df.dropna(subset=['satisfaction'])

# Confirm class distribution
print("Class balance (0 = not satisfied, 1 = satisfied):")
print(df['satisfaction'].value_counts())

# Convert inflight entertainment to float
df['Inflight entertainment'] = df['Inflight entertainment'].astype(float)
df[['Inflight entertainment', 'satisfaction']].head()


## 🤖 Train Logistic Regression Model

In [None]:

X = df[['Inflight entertainment']]
y = df['satisfaction']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42, stratify=y)

model = LogisticRegression()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

print("Classification Report:")
print(classification_report(y_test, y_pred))

ConfusionMatrixDisplay.from_estimator(model, X_test, y_test)


## 📊 Visualizing Logistic Curve

In [None]:

sns.regplot(x='Inflight entertainment', y='satisfaction', data=df, logistic=True, ci=None)
plt.title("Logistic Curve: Inflight Entertainment vs. Satisfaction")
plt.grid(True)
plt.show()


## 💡 Predict Satisfaction Likelihood

In [None]:

def predict_satisfaction_risk(entertainment_rating):
    proba = model.predict_proba([[entertainment_rating]])[0][1]
    print(f"Satisfaction likelihood at entertainment rating {entertainment_rating}/5: {proba*100:.1f}%")

# Example
predict_satisfaction_risk(2.0)



## ✅ Conclusion

This project demonstrates how airlines can use logistic regression to estimate passenger satisfaction in real time based on service feedback like inflight entertainment. The simplified prediction function can be integrated into customer feedback dashboards to flag unhappy customers before they churn.
