# Airline Passenger Satisfaction – EDA & KPI Design

## 1. Business context & objectives

In this notebook, we analyse an open airline passenger satisfaction dataset to understand the drivers of satisfaction and dissatisfaction.

The perspective is that of a **Technical Product Owner** working in an Analytics Centre of Excellence (ACoE) for an airline. The focus is on:
- Designing meaningful satisfaction and experience KPIs.
- Performing exploratory data analysis (EDA) by key segments (class, travel type, loyalty, delays).
- Translating findings into product and analytics questions that an ACoE would support.

**Key business questions:**
- What proportion of passengers are satisfied vs neutral/dissatisfied?
- How does satisfaction vary by customer type, travel purpose, and class?
- Which service dimensions (e.g., online booking, wifi, seat comfort) differentiate satisfied from dissatisfied passengers the most?
- How do departure and arrival delays affect satisfaction?

## 2. Data overview

In this section, we:
- Load the airline passenger satisfaction dataset from Kaggle.
- Inspect the schema (columns, datatypes, missing values).
- Perform basic sanity checks on ranges and categories.

**Notes on the dataset:**
- Each row represents one passenger’s flight and survey response.
- Target variable: overall satisfaction (e.g. `satisfied` vs `neutral or dissatisfied`).
- Predictors include:
  - Demographics (e.g. Age, Gender).
  - Travel details (Customer Type, Type of Travel, Class, Flight Distance).
  - Service ratings (seat comfort, inflight wifi, online booking, on-board service, cleanliness, etc.).
  - Operational metrics (Departure Delay in Minutes, Arrival Delay in Minutes).

## 3. Data cleaning & feature preparation

Here we:
- Handle missing values (e.g., delays, service ratings) using sensible rules.
- Normalise category values where needed.
- Create a few helper features for analysis, such as:
  - Age groups (e.g. `<25`, `25–40`, `40–60`, `60+`).
  - Delay buckets (e.g. `0`, `1–15`, `16–60`, `>60` minutes).
  - Binary indicators where helpful (e.g. `is_delayed`).

The aim is not heavy feature engineering but ensuring the data is clean and interpretable for KPI design.

## 4. KPI definition 
_(PO perspective)_

In this section we define and compute the core KPIs that an airline ACoE might expose to stakeholders.

**Example KPIs:**
- Overall passenger satisfaction rate.
- Satisfaction rate by:
  - Customer Type (loyal vs disloyal).
  - Type of Travel (business vs personal).
  - Class (Business, Eco, Eco Plus).
- Satisfaction vs delay:
  - Satisfaction rate by departure delay bucket.
  - Satisfaction rate by arrival delay bucket.
- Satisfaction vs service ratings:
  - Average rating per service dimension for satisfied vs dissatisfied passengers.
  - Difference in averages to highlight biggest gaps.

Each KPI will be presented as:
- A short table.
- One or two clear visualisations (bar charts, grouped bars, or simple heatmaps).
- A short narrative comment from a product/operations perspective.

## 5. Exploratory data analysis (segmentation & drivers)

This section deep-dives into patterns behind the KPIs.

**5.1 Segment analysis**

We explore satisfaction across important segments, such as:
- Class vs Type of Travel.
- Customer Type vs Class.
- Age groups vs satisfaction.

For each segment, we:
- Visualise the distribution of satisfied vs dissatisfied.
- Comment on where dissatisfaction is concentrated.

**5.2 Service quality analysis**

We:
- Compare distributions of service ratings for satisfied vs dissatisfied passengers.
- Identify which service dimensions have the largest difference between groups.
- Highlight 3–5 service areas that appear most critical (e.g. online boarding, inflight wifi, seat comfort, on-board service).

**5.3 Delay impact analysis**

We:
- Examine the relationship between departure/arrival delays and satisfaction.
- Use delay buckets to see how satisfaction drops as delays increase.
- Comment on which delay thresholds are most harmful to experience.

## 6. Key insights for stakeholders

Here we summarise the most important findings in business language, not statistics.

Examples of statements to capture:
- Which segments (e.g. economy business travellers, disloyal customers) are at greatest risk of dissatisfaction.
- Which service dimensions drive the largest change in satisfaction.
- How strongly delays correlate with dissatisfaction.

For each insight, we briefly outline:
- Potential impact (e.g. churn risk, NPS impact, operational escalations).
- Which stakeholder(s) should care (e.g. operations, digital product, customer experience).

## 7. Implications for product & analytics
_(TPO perspective)_

In this final section of the EDA notebook, we connect the insights to product and analytics work.

We outline:
- Candidate dashboard views / KPIs that the ACoE should maintain (e.g., satisfaction by class and travel type, delay vs satisfaction trends).
- Data quality improvements that may be needed (e.g., better capture of delay reasons).
- Open questions for further analysis (to feed into the backlog).

We do not yet define detailed user stories here; instead, we prepare input for the more structured backlog section in the modeling notebook.