# 📊 Exploratory Data Analysis

Exploratory Data Analysis (EDA) adalah suatu pendekatan analisis data yang bertujuan untuk menyelidiki dan memahami sifat-sifat dataset tanpa membuat asumsi yang mendalam atau mengaplikasikan metode statistik formal.

EDA membantu analis data untuk <b>mengidentifikasi pola, tren, anomali, dan informasi penting lainnya dalam dataset</b>.

# **🤖 Langkah - Langkah EDA**

1. Pahami struktur data, periksa dan tangani nilai yang hilang atau anomali.
2. Hitung statistik deskriptif untuk mendapatkan gambaran awal tentang data.
3. Gunakan visualisasi seperti grafik dan plot untuk memahami distribusi dan hubungan antara variabel.
4. Bersihkan data dengan menghapus atau mengisi nilai yang tidak valid atau hilang.
5. Analisis korelasi antara variabel numerik.
6. Analisis variabel kategorikal dengan menghitung frekuensi dan distribusinya.
7. Jika memungkinkan, segmentasikan data berdasarkan kategori tertentu dan lakukan analisis pada setiap subset.
8. Identifikasi tren, pola musiman, atau pola lainnya dalam data.

# **🗿 DQ-Challenge!**

<img src="https://raw.githubusercontent.com/bachtiyarmawork/DQLab-Project/main/EDA%20-%20DQChallenge.png">

# **✈️ Dataset**

**Airline Passanger Satisfaction** dataset atau skor kepuasan pelanggan adalah data yang merekam lebih dari lebih dari 100.000 penumpang maskapai penerbangan, termasuk informasi tambahan tentang setiap penumpang, penerbangan mereka, dan jenis perjalanan, serta penilaian mereka terhadap faktor-faktor yang berbeda seperti kebersihan, kenyamanan, pelayanan, dan pengalaman secara keseluruhan.

List kolom pada dataset :     
* ID : ID unik untuk setiap entri data
* Gender : Jenis kelamin penumpang
* Age : Usia penumpang
* Customer Type : Jenis pelanggan
* Type of Travel : Jenis perjalanan
* Class : Kelas penerbangan
* Flight Distance : Jarak penerbangan
* Departure Delay : Keterlambatan keberangkatan
* Arrival Delay : Keterlambatan kedatangan
* Departure and Arrival Time Convenience : Kenyamanan waktu keberangkatan dan kedatangan
* Ease of Online Booking : Kemudahan pemesanan tiket online
* Check-in Service : Kualitas pelayanan saat check-in
* Online Boarding : Pengalaman proses boarding online
* Gate Location : Penilaian lokasi gerbang keberangkatan
* On-board Service : Kualitas layanan di dalam pesawat
* Seat Comfort : Tingkat kenyamanan tempat duduk
* Leg Room Service : Kualitas pelayanan ruang kaki
* Cleanliness : Penilaian kebersihan pesawat
* Food and Drink : Kualitas makanan dan minuman
* In-flight Service : Kualitas layanan di dalam pesawat
* In-flight Wifi Service : Kualitas layanan WiFi di dalam pesawat
* In-flight Entertainment : Kualitas hiburan di dalam pesawat
* Baggage Handling : Kualitas penanganan bagasi
* Satisfaction : Penilaian kepuasan penumpang secara keseluruhan

<br>

source data : https://raw.githubusercontent.com/bachtiyarmawork/DQLab-Project/main/airline_passenger_satisfaction.csv

## **⛏️ Extract Data**

Proses ekstraksi data dari sumber dilakukan dengan library pandas dengan mengimpor method <a href="https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html">read_csv</a> dan menginisialisasikan *filepath_or_buffer* dengan url yang tertera diatas (sumber data)

In [1]:
# Abaikan pesan warning
import warnings
warnings.filterwarnings("ignore")

In [2]:
# Import library yang dibutuhkan
import pandas as pd
pd.set_option('display.max_columns', None)

# Proses ekstraksi data
url = 'https://raw.githubusercontent.com/bachtiyarmawork/DQLab-Project/main/airline_passenger_satisfaction.csv'
data_airline = pd.read_csv(
    url,
    sep = ',',
    dtype = {
        'ID' : 'category'
    }
)

# Tampilkan data
display(data_airline)

Unnamed: 0,ID,Gender,Age,Customer Type,Type of Travel,Class,Flight Distance,Departure Delay,Arrival Delay,Departure and Arrival Time Convenience,Ease of Online Booking,Check-in Service,Online Boarding,Gate Location,On-board Service,Seat Comfort,Leg Room Service,Cleanliness,Food and Drink,In-flight Service,In-flight Wifi Service,In-flight Entertainment,Baggage Handling,Satisfaction
0,1,Male,48,First-time,Business,Business,821,2,5.0,3,3,4,3,3,3,5,2,5,5,5,3,5,5,Neutral or Dissatisfied
1,2,Female,35,Returning,Business,Business,821,26,39.0,2,2,3,5,2,5,4,5,5,3,5,2,5,5,Satisfied
2,3,Male,41,Returning,Business,Business,853,0,0.0,4,4,4,5,4,3,5,3,5,5,3,4,3,3,Satisfied
3,4,Male,50,Returning,Business,Business,1905,0,0.0,2,2,3,4,2,5,5,5,4,4,5,2,5,5,Satisfied
4,5,Female,49,Returning,Business,Business,3470,0,1.0,3,3,3,5,3,3,4,4,5,4,3,3,3,3,Satisfied
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
129875,129876,Male,28,Returning,Personal,Economy Plus,447,2,3.0,4,4,4,4,2,5,1,4,4,4,5,4,4,4,Neutral or Dissatisfied
129876,129877,Male,41,Returning,Personal,Economy Plus,308,0,0.0,5,3,5,3,4,5,2,5,2,2,4,3,2,5,Neutral or Dissatisfied
129877,129878,Male,42,Returning,Personal,Economy Plus,337,6,14.0,5,2,4,2,1,3,3,4,3,3,4,2,3,5,Neutral or Dissatisfied
129878,129879,Male,50,Returning,Personal,Economy Plus,337,31,22.0,4,4,3,4,1,4,4,5,3,3,4,5,3,5,Satisfied


## **🔍 Memahami Tipe Data**

Memahami atau memeriksa tipe data dari setiap kolom pada data penting dilakukan untuk memastikan bahwa tipe data yang diharapkan sesuai dengan yang sebenarnya. Hal ini berguna untuk memastikan bahwa analis data dapat melakukan proses analisia yang sesuai dengan jenis data yang ada. Gunakan sintaks berikut untuk mendapatkan informasi umum pada data
<p align="center"><code>DataFrame.info()</code></p>

docs : https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.info.html

In [3]:
data_airline.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 129880 entries, 0 to 129879
Data columns (total 24 columns):
 #   Column                                  Non-Null Count   Dtype   
---  ------                                  --------------   -----   
 0   ID                                      129880 non-null  category
 1   Gender                                  129880 non-null  object  
 2   Age                                     129880 non-null  int64   
 3   Customer Type                           129880 non-null  object  
 4   Type of Travel                          129880 non-null  object  
 5   Class                                   129880 non-null  object  
 6   Flight Distance                         129880 non-null  int64   
 7   Departure Delay                         129880 non-null  int64   
 8   Arrival Delay                           129487 non-null  float64 
 9   Departure and Arrival Time Convenience  129880 non-null  int64   
 10  Ease of Online Booking          

#### 👨‍💻 Task 1

Setelah diperoleh informasi umum pada data diatas, berikan kesimpulan / analisa sederhana dari perhitungan yang telah dilakukan!

Kesimpulan :

Dataset airline passenger satisfaction adalah kumpulan data yang berisi informasi mengenai kepuasan penumpang maskapai penerbangan. Data tersebut mencakup berbagai variabel seperti jenis kelamin, usia, tipe pelanggan, jenis perjalanan, kelas penerbangan, jarak penerbangan, keterlambatan keberangkatan dan kedatangan, kenyamanan waktu keberangkatan dan kedatangan, kemudahan pemesanan tiket online, layanan check-in, layanan online boarding, lokasi gerbang keberangkatan, kualitas layanan di dalam pesawat, kenyamanan kursi, layanan ruang kaki, kebersihan, makanan dan minuman, layanan di dalam pesawat seperti wifi dan hiburan, penanganan bagasi, serta tingkat kepuasan penumpang.

Data ini dapat digunakan untuk menganalisis faktor-faktor yang mempengaruhi kepuasan penumpang, mengevaluasi kualitas layanan yang ditawarkan oleh maskapai penerbangan, dan mengidentifikasi area di mana perbaikan dapat dilakukan untuk meningkatkan pengalaman penumpang.

---

#### 👨‍💻 Next Task : **ITS YOUR TIME TO SHINE**

<img src="https://media.makeameme.org/created/its-your-time-xadirv.jpg">



Dari data Airline Satisfaction diatas, buatlah proses EDA-nya dan juga buatkan minimal 5 - 10 business question serta berikan actionable insight / business recomendationnya!

In [5]:
data_airline.head()

Unnamed: 0,ID,Gender,Age,Customer Type,Type of Travel,Class,Flight Distance,Departure Delay,Arrival Delay,Departure and Arrival Time Convenience,Ease of Online Booking,Check-in Service,Online Boarding,Gate Location,On-board Service,Seat Comfort,Leg Room Service,Cleanliness,Food and Drink,In-flight Service,In-flight Wifi Service,In-flight Entertainment,Baggage Handling,Satisfaction
0,1,Male,48,First-time,Business,Business,821,2,5.0,3,3,4,3,3,3,5,2,5,5,5,3,5,5,Neutral or Dissatisfied
1,2,Female,35,Returning,Business,Business,821,26,39.0,2,2,3,5,2,5,4,5,5,3,5,2,5,5,Satisfied
2,3,Male,41,Returning,Business,Business,853,0,0.0,4,4,4,5,4,3,5,3,5,5,3,4,3,3,Satisfied
3,4,Male,50,Returning,Business,Business,1905,0,0.0,2,2,3,4,2,5,5,5,4,4,5,2,5,5,Satisfied
4,5,Female,49,Returning,Business,Business,3470,0,1.0,3,3,3,5,3,3,4,4,5,4,3,3,3,3,Satisfied


## Pengaruh kelas penerbangan yang digunakan terhadap kepuasaan customer?

In [7]:
#Pengaruh kelas dengan kepuasan penumpang
#Kelas penerbangan: Class
#Kepuasan Customer: Satisfication

#how to
data_airline.groupby(['Class', 'Satisfaction'], as_index = False).agg(jumlah = ('ID', 'count'))

Unnamed: 0,Class,Satisfaction,jumlah
0,Business,Neutral or Dissatisfied,18994
1,Business,Satisfied,43166
2,Economy,Neutral or Dissatisfied,47366
3,Economy,Satisfied,10943
4,Economy Plus,Neutral or Dissatisfied,7092
5,Economy Plus,Satisfied,2319


In [20]:
satisfaction_by_class = data_airline.groupby(['Class', 'Satisfaction'], as_index=False).agg(Total=('Satisfaction', 'count'))
class_totals = satisfaction_by_class.groupby('Class')['Total'].sum()
satisfaction_by_class['Percentage'] = (satisfaction_by_class['Total'] / satisfaction_by_class['Class'].map(class_totals)).round(4) * 100

pvt_satisfaction_by_class = satisfaction_by_class.pivot_table(
    values = 'Percentage',
    index = 'Class',
    columns = ['Satisfaction']
)

pvt_satisfaction_by_class = pvt_satisfaction_by_class[['Satisfied', 'Neutral or Dissatisfied']]
pvt_satisfaction_by_class = pvt_satisfaction_by_class.sort_values(by = ['Class'], ascending = False)

display(pvt_satisfaction_by_class)

Satisfaction,Satisfied,Neutral or Dissatisfied
Class,Unnamed: 1_level_1,Unnamed: 2_level_1
Economy Plus,24.64,75.36
Economy,18.77,81.23
Business,69.44,30.56


In [21]:
import plotly.graph_objects as go

top_labels = pvt_satisfaction_by_class.columns.tolist()
x_data = pvt_satisfaction_by_class.values.tolist()
y_data = pvt_satisfaction_by_class.index.tolist()
colors = ['#9ADE7B', '#FF6D60']

fig = go.Figure()

for i in range(0, len(x_data[0])):
    for xd, yd in zip(x_data, y_data):
        fig.add_trace(go.Bar(
            x=[xd[i]], y=[yd],
            orientation='h',
            marker=dict(
                color=colors[i],
                line=dict(color='rgb(248, 248, 249)', width=1)
            )
        ))

fig.update_layout(
    height = 500,
    width = 900,
    title = dict(
        text = '''
        <b><span style="color:#D9DD6B"><i><b>Satisfaction Score</b></i></span> tiap Kelas Penumpang</b><br><sup><sup>Kelas Ekonomi dan Ekonomi+ Perlu Segera \'Berbenah\'<sup><sup>
        ''',
        font = dict(
            family = 'sans serif',
            size = 30,
            color = '#00917C'
        ),
        y = 0.9,
        x = 0.5 #0.025
    ),
    xaxis=dict(
        showgrid=False,
        showline=False,
        showticklabels=False,
        zeroline=False,
        domain=[0.02, 1]
    ),
    yaxis=dict(
        showgrid=False,
        showline=False,
        showticklabels=False,
        zeroline=False,
    ),
    barmode='stack',
    paper_bgcolor='rgb(248, 248, 255)',
    plot_bgcolor='rgb(248, 248, 255)',
    margin=dict(l=120, r=10, t=140, b=40),
    showlegend=False,
)

# Atur posisi label
fig.update_traces(
    hovertemplate='<b>%{label} Class</b><br>%{value}%'
)

annotations = []

for yd, xd in zip(y_data, x_data):
    # labeling the y-axis
    annotations.append(dict(xref='paper', yref='y',
                            x=0, y=yd,
                            xanchor='right',
                            text=str(yd),
                            font=dict(family='Arial', size=14,
                                      color='rgb(67, 67, 67)'),
                            showarrow=False, align='right'))
    # labeling the first percentage of each bar (x_axis)
    annotations.append(dict(xref='x', yref='y',
                            x=xd[0] / 2, y=yd,
                            text=str(xd[0]) + '%',
                            font=dict(family='Arial', size=14,
                                      color='rgb(248, 248, 255)'),
                            showarrow=False))
    # labeling the first Likert scale (on the top)
    if yd == y_data[-1]:
        annotations.append(dict(xref='x', yref='paper',
                                x=xd[0] / 2, y=1.05,
                                text=top_labels[0],
                                font=dict(family='Arial', size=14,
                                          color='rgb(67, 67, 67)'),
                                showarrow=False))
    space = xd[0]
    for i in range(1, len(xd)):
            # labeling the rest of percentages for each bar (x_axis)
            annotations.append(dict(xref='x', yref='y',
                                    x=space + (xd[i]/2), y=yd,
                                    text=str(xd[i]) + '%',
                                    font=dict(family='Arial', size=14,
                                              color='rgb(248, 248, 255)'),
                                    showarrow=False))
            # labeling the Likert scale
            if yd == y_data[-1]:
                annotations.append(dict(xref='x', yref='paper',
                                        x=space + (xd[i]/2), y=1.05,
                                        text=top_labels[i],
                                        font=dict(family='Arial', size=14,
                                                  color='rgb(67, 67, 67)'),
                                        showarrow=False))
            space += xd[i]

fig.update_layout(annotations=annotations)

fig.show()

Insight:
- Mayoritas tidak puas dengan kelas economy, dan economy plus
- Class business mayoritas puas

Actionable Insight:
- Pelayanan harus optimal untuk semua kelas
- Yang membedakan fasilitas untuk tiap kelas pastinya berbeda-beda



---



## Faktor apa yang paling mempengaruhi kepuasan pelanggan??

In [8]:
data_airline.head()

Unnamed: 0,ID,Gender,Age,Customer Type,Type of Travel,Class,Flight Distance,Departure Delay,Arrival Delay,Departure and Arrival Time Convenience,Ease of Online Booking,Check-in Service,Online Boarding,Gate Location,On-board Service,Seat Comfort,Leg Room Service,Cleanliness,Food and Drink,In-flight Service,In-flight Wifi Service,In-flight Entertainment,Baggage Handling,Satisfaction
0,1,Male,48,First-time,Business,Business,821,2,5.0,3,3,4,3,3,3,5,2,5,5,5,3,5,5,Neutral or Dissatisfied
1,2,Female,35,Returning,Business,Business,821,26,39.0,2,2,3,5,2,5,4,5,5,3,5,2,5,5,Satisfied
2,3,Male,41,Returning,Business,Business,853,0,0.0,4,4,4,5,4,3,5,3,5,5,3,4,3,3,Satisfied
3,4,Male,50,Returning,Business,Business,1905,0,0.0,2,2,3,4,2,5,5,5,4,4,5,2,5,5,Satisfied
4,5,Female,49,Returning,Business,Business,3470,0,1.0,3,3,3,5,3,3,4,4,5,4,3,3,3,3,Satisfied


In [17]:
#hitung rata-rata setiap kolom
data_airline.describe().loc['mean']

Unnamed: 0,mean
Age,39.427957
Flight Distance,1190.316392
Departure Delay,14.713713
Arrival Delay,15.091129
Departure and Arrival Time Convenience,3.057599
Ease of Online Booking,2.756876
Check-in Service,3.306267
Online Boarding,3.252633
Gate Location,2.976925
On-board Service,3.383023


In [18]:
list_columns_survey = [
    'Departure and Arrival Time Convenience',
    'Ease of Online Booking',
    'Check-in Service',
    'Online Boarding',
    'Gate Location',
    'On-board Service',
    'Seat Comfort',
    'Leg Room Service',
    'Cleanliness',
    'Food and Drink',
    'In-flight Service',
    'In-flight Wifi Service',
    'In-flight Entertainment',
    'Baggage Handling'
]

mean_values = data_airline[list_columns_survey].mean(axis=0)

# Create a new DataFrame to store the mean values
mean_df = pd.DataFrame(
    mean_values,
    columns=['Mean']
)

mean_df = mean_df.reset_index(names = 'Survey Type')
mean_df['Mean'] = mean_df['Mean'].round(2)
mean_df = mean_df.sort_values(by = ['Mean'], ignore_index = True)

mean_df['Mean Star'] = '★ ' + mean_df['Mean'].astype(str)
# Print the mean DataFrame
display(mean_df)

Unnamed: 0,Survey Type,Mean,Mean Star
0,In-flight Wifi Service,2.73,★ 2.73
1,Ease of Online Booking,2.76,★ 2.76
2,Gate Location,2.98,★ 2.98
3,Departure and Arrival Time Convenience,3.06,★ 3.06
4,Food and Drink,3.2,★ 3.2
5,Online Boarding,3.25,★ 3.25
6,Cleanliness,3.29,★ 3.29
7,Check-in Service,3.31,★ 3.31
8,Leg Room Service,3.35,★ 3.35
9,In-flight Entertainment,3.36,★ 3.36


In [19]:
import plotly.express as px

mean_df['Color'] = 'Other'
mean_df.loc[:2, 'Color'] = 'Bottom'

fig = px.bar(
    mean_df,
    x = 'Mean',
    y = 'Survey Type',
    orientation = 'h',
    color = 'Color',
    color_discrete_map = {
        'Bottom': '#FF6D60',
        'Other': '#9ADE7B'
    },
    text_auto = True
)

fig.update_layout(
    width = 1200,
    height = 600,
    title = '<b>Rataan Review Pelayanan Maskapai Penerbangan</b>',
    xaxis_title = '',
    yaxis_title = '',
    showlegend = False,
    paper_bgcolor = 'rgb(255, 255, 255, 1)',
    plot_bgcolor = 'rgb(255, 255, 255, 0)',
)

fig.update_traces(
    text = '%{label}/5',
    textposition = 'outside',
    hovertemplate = '<b>%{label}</b><br>★%{value} / 5.00'
)

fig.show()

- Top 3: In-flight service, baggage handling, seat comfort
- Need improvement: In-flight wi-fi service

---

## Pemilihan jenis(class) penerbangan berdasarkan usia

In [31]:
average_age_per_class = data_airline.groupby('Class', as_index=False).agg(
    average_Age = ('Age', 'mean'),
    median_age = ('Age', 'median')
).round(0)
average_age_per_class

Unnamed: 0,Class,average_Age,median_age
0,Business,42.0,42.0
1,Economy,37.0,36.0
2,Economy Plus,39.0,38.0


In [33]:
import plotly.express as px

fig = px.bar(
    average_age_per_class,
    x = 'Class',
    y = 'average_Age'
)

fig.show()


---

## Gender mana yang lebih memberikan penilaian kurang baik untuk delay

In [35]:
data_airline.head(2)

Unnamed: 0,ID,Gender,Age,Customer Type,Type of Travel,Class,Flight Distance,Departure Delay,Arrival Delay,Departure and Arrival Time Convenience,Ease of Online Booking,Check-in Service,Online Boarding,Gate Location,On-board Service,Seat Comfort,Leg Room Service,Cleanliness,Food and Drink,In-flight Service,In-flight Wifi Service,In-flight Entertainment,Baggage Handling,Satisfaction
0,1,Male,48,First-time,Business,Business,821,2,5.0,3,3,4,3,3,3,5,2,5,5,5,3,5,5,Neutral or Dissatisfied
1,2,Female,35,Returning,Business,Business,821,26,39.0,2,2,3,5,2,5,4,5,5,3,5,2,5,5,Satisfied


In [34]:
data_airline.groupby(['Gender']).agg(jumlah = ('ID', 'count'))

Unnamed: 0_level_0,jumlah
Gender,Unnamed: 1_level_1
Female,65899
Male,63981


In [36]:
data_airline.groupby(['Gender', 'Departure and Arrival Time Convenience'], as_index=False).agg(
    departure_satisfied=('Departure and Arrival Time Convenience', 'count')
)

Unnamed: 0,Gender,Departure and Arrival Time Convenience,departure_satisfied
0,Female,0,3470
1,Female,1,9858
2,Female,2,11050
3,Female,3,11427
4,Female,4,16108
5,Female,5,13986
6,Male,0,3211
7,Male,1,9551
8,Male,2,10484
9,Male,3,10951


Hipotesis : Female lebih concern terhadap waktu kedatangan dan keberangkatan

Melakukan uji hipotesis dengan statistika

Cari tahu jumlah female dan male apakah benar-benar seimbang, atau jangan" male lebih banyak


In [37]:
data_airline.groupby(['Gender']).agg(jumlah = ('ID', 'count'))

Unnamed: 0_level_0,jumlah
Gender,Unnamed: 1_level_1
Female,65899
Male,63981


In [38]:
import pandas as pd
from scipy.stats import chi2_contingency

# Data: gender dan kepuasan
data = {
    'Gender': ['Laki-laki', 'Laki-laki', 'Perempuan', 'Perempuan'],
    'Kepuasan': ['Puas', 'Tidak Puas', 'Puas', 'Tidak Puas'],
    'Frekuensi': [30, 20, 25, 5]
}

# Membuat DataFrame
df = pd.DataFrame(data)

# Membuat tabel kontingensi
tabel_kontingensi = df.pivot(index='Gender', columns='Kepuasan', values='Frekuensi').fillna(0)

# Menampilkan tabel kontingensi
print("Tabel Kontingensi:")
print(tabel_kontingensi)

# Melakukan uji Chi-Square
chi2, p, dof, expected = chi2_contingency(tabel_kontingensi)

# Menampilkan hasil
print(f"\nNilai Chi-Square: {chi2:.4f}")
print(f"Nilai p: {p:.4f}")
print(f"Derajat Kebebasan: {dof}")
print("Tabel Ekspektasi:")
print(expected)

# Mengambil keputusan berdasarkan nilai p
alpha = 0.05
if p < alpha:
    print("\nTolak H0: Ada hubungan yang signifikan antara gender dan kepuasan.")
else:
    print("\nGagal menolak H0: Tidak ada hubungan yang signifikan antara gender dan kepuasan.")

Tabel Kontingensi:
Kepuasan   Puas  Tidak Puas
Gender                     
Laki-laki    30          20
Perempuan    25           5

Nilai Chi-Square: 3.7275
Nilai p: 0.0535
Derajat Kebebasan: 1
Tabel Ekspektasi:
[[34.375 15.625]
 [20.625  9.375]]

Gagal menolak H0: Tidak ada hubungan yang signifikan antara gender dan kepuasan.


---

## Proporsi Customer Type Untuk masing-masing class penerbangan

In [39]:
data_airline['Customer Type'].unique()

array(['First-time', 'Returning'], dtype=object)

In [40]:
data_airline['Customer Type'].value_counts()

Unnamed: 0_level_0,count
Customer Type,Unnamed: 1_level_1
Returning,106100
First-time,23780


In [41]:
data_airline.groupby(['Class', 'Customer Type']).agg(jumlah = ('ID', 'count'))

Unnamed: 0_level_0,Unnamed: 1_level_0,jumlah
Class,Customer Type,Unnamed: 2_level_1
Business,First-time,9231
Business,Returning,52929
Economy,First-time,13634
Economy,Returning,44675
Economy Plus,First-time,915
Economy Plus,Returning,8496


Insight:
- Improvement dari segi marketing untuk attract lebih banyak firstimer
- Mempertahankan pelayanan agar customer type returning bisa tetap memakai layanan kita

---

## Bagaimana Distribusi usia penumpang

In [43]:
import plotly.express as px

fig = px.histogram(
    data_airline,
    x = 'Age',
    color_discrete_sequence  = ['#3498db'],
    nbins = 10
)

fig.update_traces(
      marker_line_width = 1,
      marker_line_color = 'white'
)

fig.update_layout(
    plot_bgcolor = 'rgba(0, 0, 0, 0)',
    title = dict(
        text = "<b>Distribusi <span style='color:#3498db'>Usia</b> Pelanggan<br><sup><sup>Maskapai Penerbangan</sup></sup>",
        font = dict(
            size = 28,
            color = '#757882'
        ),
        y = 0.92,
        x = 0.5
    ),
    yaxis = dict(
        title = '',
        showgrid = False,
        showline = False,
        showticklabels = False,
        zeroline = False,
    ),
    margin = dict(
        t = 80,
        b = 10,
        r = 20
    )
)

fig.show()