In [2]:
import pandas as pd
import sqlite3

conn = sqlite3.connect("../data/processed/flight_customer.db")

pd.read_sql("SELECT * FROM all_customers_loyalty LIMIT 5", conn)

Unnamed: 0,id,gender,customer_type,age,type_of_travel,class,flight_distance,inflight_wifi_service,departure/arrival_time_convenient,ease_of_online_booking,...,inflight_entertainment,on_board_service,leg_room_service,baggage_handling,checkin_service,inflight_service,cleanliness,departure_delay_in_minutes,arrival_delay_in_minutes,satisfaction
0,19556,Female,Loyal Customer,52,Business travel,Eco,160,5,4,3,...,5,5,5,5,2,5,5,50,44.0,satisfied
1,90035,Female,Loyal Customer,36,Business travel,Business,2863,1,1,3,...,4,4,4,4,3,4,5,0,0.0,satisfied
2,12360,Male,disloyal Customer,20,Business travel,Eco,192,2,0,2,...,2,4,1,3,2,2,2,0,0.0,neutral or dissatisfied
3,77959,Male,Loyal Customer,44,Business travel,Business,3377,0,0,0,...,1,1,1,1,3,1,4,0,6.0,satisfied
4,36875,Female,Loyal Customer,49,Business travel,Eco,1182,2,3,4,...,2,2,2,2,4,2,4,0,20.0,satisfied


### What is the target variable?
(usually dissatisfaction → churn risk)

In [15]:
tv = """ 
SELECT 
    gender, 
    type_of_travel, 
    satisfaction,
    COUNT(satisfaction) AS total_dissatisfied
FROM 
    all_customers_loyalty
WHERE 
    satisfaction = 'neutral or dissatisfied' 
GROUP BY 
    type_of_travel,
    gender
ORDER BY 
    total_dissatisfied DESC
"""

target_variable = pd.read_sql(tv, conn)
target_variable


Unnamed: 0,gender,type_of_travel,satisfaction,total_dissatisfied
0,Female,Business travel,neutral or dissatisfied,19519
1,Female,Personal Travel,neutral or dissatisfied,18005
2,Male,Personal Travel,neutral or dissatisfied,17982
3,Male,Business travel,neutral or dissatisfied,17719


### Analysis

Our biggest group of dissatisfied customers are our female travelers.
Female customers traveling business are 19,519 and traveling personally are 18,005. 

We have to find a way to cater to our make the female customers feel better during traveling.

---


### Which features provide the strongest signal?

In [27]:
sfs = """ 
SELECT 
    gender, 
    type_of_travel, 
    satisfaction,
    COUNT(satisfaction) AS total_dissatisfied,
    AVG(leg_room_service) AS leg_room_service,
    AVG(checkin_service) AS checkin_service, 
    AVG(inflight_service) AS inflight_service, 
    AVG(arrival_delay_in_minutes) AS arrival_delay, 
    AVG(departure_delay_in_minutes) AS departure_delay, 
    AVG(seat_comfort) AS seat_comfort, 
    AVG(food_and_drink) AS food_and_drink,
    AVG(inflight_wifi_service) AS wifi_services,
    AVG(online_boarding) AS online_boarding
FROM 
    all_customers_loyalty
WHERE 
    satisfaction = 'neutral or dissatisfied'
    AND gender = 'Female'
GROUP BY 
    satisfaction,
    gender
"""
strongest_feature_signal = pd.read_sql(sfs, conn)
strongest_feature_signal

Unnamed: 0,gender,type_of_travel,satisfaction,total_dissatisfied,leg_room_service,checkin_service,inflight_service,arrival_delay,departure_delay,seat_comfort,food_and_drink,wifi_services,online_boarding
0,Female,Business travel,neutral or dissatisfied,37524,2.886713,3.032672,3.296743,16.926607,16.153369,3.14119,2.993951,2.396013,2.811321


### Analysis

The lowest services for our dissatisfied female customers are the wifi service, onboarding service, leg room service and the food and drinks.

We can introduce different seating or special seating, but we can not make it spefically for females due to laws. 

For the online boarding we need to speak with the engineering team to discuss how to make the platform easier for the onboaring process.

For food and drinks, we can put more of a variety on the plane.

For wifi, we would need to speak with the IT department to see if there is anything we can do to increase connectivity.

---

### Which service features appear most important?

In [28]:
mis = """ 
SELECT 
    gender, 
    type_of_travel, 
    satisfaction,
    COUNT(satisfaction) AS total_dissatisfied,
    AVG(leg_room_service) AS leg_room_service,
    AVG(checkin_service) AS checkin_service, 
    AVG(inflight_service) AS inflight_service, 
    AVG(seat_comfort) AS seat_comfort, 
    AVG(food_and_drink) AS food_and_drink,
    AVG(inflight_wifi_service) AS wifi_services, 
    AVG(online_boarding) AS online_boarding,
    AVG(arrival_delay_in_minutes) AS arrival_delay, 
    AVG(departure_delay_in_minutes) AS departure_delay
FROM 
    all_customers_loyalty
WHERE 
    satisfaction = 'neutral or dissatisfied'
    AND gender = 'Female'
GROUP BY 
    satisfaction,
    gender
"""
most_important_signal = pd.read_sql(mis, conn)
most_important_signal

Unnamed: 0,gender,type_of_travel,satisfaction,total_dissatisfied,leg_room_service,checkin_service,inflight_service,seat_comfort,food_and_drink,wifi_services,online_boarding,arrival_delay,departure_delay
0,Female,Business travel,neutral or dissatisfied,37524,2.886713,3.032672,3.296743,3.14119,2.993951,2.396013,2.811321,16.926607,16.153369


### Analysis 

The most important signal we need to keep up is the inflight services and the seat comfort. These are the only two categories that are above average among dissatisfied female customers.

---

### Are delays significant churn predictors?

In [53]:
dsc = """ 
SELECT 
    gender, 
    type_of_travel, 
    satisfaction,
    COUNT(satisfaction) AS total_dissatisfied,
    AVG(leg_room_service) AS leg_room_service,
    AVG(checkin_service) AS checkin_service, 
    AVG(inflight_service) AS inflight_service, 
    AVG(seat_comfort) AS seat_comfort, 
    AVG(food_and_drink) AS food_and_drink,
    AVG(inflight_wifi_service) AS wifi_services, 
    AVG(online_boarding) AS online_boarding,
    AVG(arrival_delay_in_minutes) AS arrival_delay, 
    AVG(departure_delay_in_minutes) AS departure_delay
FROM 
    all_customers_loyalty
WHERE 
    satisfaction = 'neutral or dissatisfied'
GROUP BY 
    satisfaction,
    gender, 
    type_of_travel
ORDER BY 
    total_dissatisfied DESC
"""

delay_satisfaction_churn = pd.read_sql(dsc, conn)
delay_satisfaction_churn

Unnamed: 0,gender,type_of_travel,satisfaction,total_dissatisfied,leg_room_service,checkin_service,inflight_service,seat_comfort,food_and_drink,wifi_services,online_boarding,arrival_delay,departure_delay
0,Female,Business travel,neutral or dissatisfied,19519,2.887033,2.755469,3.158922,2.94513,2.895282,2.399457,2.63825,17.838772,16.90312
1,Female,Personal Travel,neutral or dissatisfied,18005,2.886365,3.333185,3.446154,3.353735,3.100916,2.39228,2.998945,15.93774,15.340572
2,Male,Personal Travel,neutral or dissatisfied,17982,3.225114,3.338728,3.754199,3.022912,3.026082,2.399844,2.441553,15.373596,14.864364
3,Male,Business travel,neutral or dissatisfied,17719,2.972177,2.764942,3.216491,2.836955,2.814944,2.40228,2.556465,19.037361,18.233535


### Analysis 

Delays are not a significant churn predictor due to the most dissatisfications comes from females traveling for business with 19,519 not having the longest average delay. 

The longest average delay belongs to males traveling business, but they have the lowest total of dissatisfications at 17,719. 

---

### Which customers are most likely to churn?
(e.g., long delays + low seat comfort)

In [55]:
mlc = """ 
SELECT 
    age, 
    class,
    gender, 
    type_of_travel, 
    satisfaction,
    COUNT(satisfaction) AS total_dissatisfied,
    AVG(leg_room_service) AS leg_room_service,
    AVG(checkin_service) AS checkin_service, 
    AVG(inflight_service) AS inflight_service, 
    AVG(seat_comfort) AS seat_comfort, 
    AVG(food_and_drink) AS food_and_drink,
    AVG(inflight_wifi_service) AS wifi_services, 
    AVG(online_boarding) AS online_boarding,
    AVG(arrival_delay_in_minutes) AS arrival_delay, 
    AVG(departure_delay_in_minutes) AS departure_delay
FROM 
    all_customers_loyalty
WHERE 
    satisfaction = 'neutral or dissatisfied'
GROUP BY 
    satisfaction,
    gender, 
    type_of_travel
ORDER BY 
    total_dissatisfied DESC
"""
most_likely_churn = pd.read_sql(mlc, conn)
most_likely_churn

Unnamed: 0,age,class,gender,type_of_travel,satisfaction,total_dissatisfied,leg_room_service,checkin_service,inflight_service,seat_comfort,food_and_drink,wifi_services,online_boarding,arrival_delay,departure_delay
0,33,Business,Female,Business travel,neutral or dissatisfied,19519,2.887033,2.755469,3.158922,2.94513,2.895282,2.399457,2.63825,17.838772,16.90312
1,43,Eco,Female,Personal Travel,neutral or dissatisfied,18005,2.886365,3.333185,3.446154,3.353735,3.100916,2.39228,2.998945,15.93774,15.340572
2,50,Eco,Male,Personal Travel,neutral or dissatisfied,17982,3.225114,3.338728,3.754199,3.022912,3.026082,2.399844,2.441553,15.373596,14.864364
3,20,Eco,Male,Business travel,neutral or dissatisfied,17719,2.972177,2.764942,3.216491,2.836955,2.814944,2.40228,2.556465,19.037361,18.233535


### Analysis 

The highest at risk customers we have at churning are females travelign business class that are 33 years old. Second highest is also females but they are traveling personal in Eco and are age 33. 

We need to our services to satisfy our female customer base.

---

### What actions should the loyalty team take to retain them?

### Analysis 

We need to cater/change/adjust to make our female customer base more satisfied. 

The lowest categories are wifi service, onboarding service, leg room service and the food and drinks.

We need to speak with the IT department to get better connectivity for inflight. 

We need to speak with engineering to make onboarding easier on the app.

We can offer special seating for specific individuals so they can get better leg room.

We cab offer a wider variety of food and drinks.

We need to keep our inflight services the same because that is currently our highest category for this demographic.
