
# ✈️ Who's Happy in the Sky? Predicting Passenger Satisfaction to Reduce Churn

This project builds a logistic regression model to predict airline passenger satisfaction based on service features. It simulates how airlines could identify dissatisfied passengers early and take action to improve experience and reduce churn.


## 📂 Load Dataset

In [None]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix, ConfusionMatrixDisplay

import warnings
warnings.filterwarnings("ignore")

# Download dataset if on Colab
if "google.colab" in str(get_ipython()):
    !wget https://raw.githubusercontent.com/Rafsun-Chowdhury/Airline-Customer-Satisfaction-Prediction-with-Logistic-Regression/main/Invistico_Airline.csv

df = pd.read_csv("Invistico_Airline.csv")
df.head()


## 🧹 Clean and Prepare Data

In [None]:

# Drop missing values in key columns
df = df.dropna(subset=['Inflight entertainment', 'satisfaction']).reset_index(drop=True)

# Map satisfaction to binary (1 = satisfied, 0 = not)
df['satisfaction'] = df['satisfaction'].map({'neutral or dissatisfied': 0, 'satisfied': 1})
df = df.dropna(subset=['satisfaction'])

# Convert inflight entertainment to numeric
df['Inflight entertainment'] = df['Inflight entertainment'].astype(float)

# Confirm class distribution
print("Class balance:")
print(df['satisfaction'].value_counts())


## 🤖 Train Logistic Regression Model

In [None]:

X = df[['Inflight entertainment']]
y = df['satisfaction']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42, stratify=y)

model = LogisticRegression()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

print("Classification Report:")
print(classification_report(y_test, y_pred))
ConfusionMatrixDisplay.from_estimator(model, X_test, y_test)


## 📊 Logistic Curve Visualization

In [None]:

sns.regplot(x='Inflight entertainment', y='satisfaction', data=df, logistic=True, ci=None)
plt.title("Inflight Entertainment vs Satisfaction (Logistic Regression Curve)")
plt.grid(True)
plt.show()


## 🔮 Predict Satisfaction from Rating

In [None]:

def predict_satisfaction_risk(entertainment_rating):
    proba = model.predict_proba([[entertainment_rating]])[0][1]
    print(f"Satisfaction likelihood at rating {entertainment_rating}/5: {proba*100:.1f}%")

# Example usage
predict_satisfaction_risk(3.0)



## ✅ Conclusion

With just a single input (inflight entertainment rating), this logistic regression model estimates passenger satisfaction with over 80% accuracy. This simple tool could help airlines proactively respond to low satisfaction in real-time.
