In [None]:
import pandas as pd
import pandera as pa
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.compose import make_column_transformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder, MinMaxScaler
from sklearn.dummy import DummyClassifier
from sklearn.model_selection import cross_validate
from sklearn.pipeline import make_pipeline
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import make_scorer, f1_score
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import precision_score
from sklearn.model_selection import GridSearchCV

# **Predicting US Airline Passenger Satisfaction: A Data-Driven Approach**

### **Summary**

## **Introduction**
In the highly competitive field of air transport management, passenger satisfaction plays a critical role in making customer loyalty, providing operational insights, enhancing financial performance, and ensuring compliance with regulations and rankings (Sadegh Eshaghi et al, 2024). While there are numerous studies held on factors influencing customer satisfaction like service quality (???Namukasa, 2023), it is very important to be able to predict the customer satisfaction with high accuracy for understanding how to improve and make better decisions. In this study, we aim to create a reliable predictive model that will predict US airline passenger satisfaction with high performance.

### **Dataset Overview**
The dataset includes a variety of features such as:
- **Flight Information**: Flight distance, departure delay, arrival delay, etc.
- **Passenger Demographics**: Age, gender, travel class, etc.
- **Flight Service Quality**: Satisfaction ratings on aspects such as in-flight entertainment, seat comfort, and food quality.

The goal is to predict the **passenger satisfaction** (either satisfied or dissatisfied) based on these features.

## **Preprocessing**

 A feature was removed due to missing values and high correlation with another feature, making it redundant.

2. **Encoding Categorical Variables**  
   Categorical variables were encoded for machine learning algorithms. One feature was converted into a binary variable, and other categorical variables were one-hot encoded to transform them into numerical format.

3. **Scaling Numerical Features**  
   Numerical features were standardized or scaled to ensure they have similar ranges. This helps prevent dominance of variables with larger ranges in machine learning models. Different scaling methods were applied based on the type of data:
   - One-hot encoding for categorical variables.
   - Min-Max scaling for ordinal variables.
   - Standard scaling for numerical variables.

4. **Column Transformer Setup**  
   A `ColumnTransformer` was used to apply appropriate transformations to each feature type, ensuring the data is ready for machine learning models. Categorical variables were one-hot encoded, ordinal variables were min-max scaled, and numerical variables were standardized.

This preprocessing pipeline ensures the dataset is properly prepared for analysis and modeling, with consistent scaling and encoding applied across the features.


## **Results and Discussion**


### **References**
List any research papers, articles, or sources that you referred to while doing the analysis.
