<a id="Prep_Air's_Flow_Card"></a>
### Prep_Air's_Flow_Card
At Preppin' Data we use a number of (mock) companies to look at the challenges they have with their data. For January, we're going to focus on our own airline, Prep Air. The airline has introduced a new loyalty card called the Flow Card. We need to clean up a number of data sets to determine how well the card is doing. 

The first task is setting some context for later weeks by understanding how popular the Flow Card is. Our stakeholder would like two data sets about our passengers. One data set for card users and one data set for those who don't use the card. 
![image.png](attachment:bed16600-b12b-4abc-8f86-bad5192eca49.png)

In [330]:
import pandas as pd

In [331]:
df = pd.read_csv('PD 2024 Wk 1 Input.csv')
df.head()

Unnamed: 0,Flight Details,Flow Card?,Bags Checked,Meal Type
0,2024-07-22//PA010//Tokyo-New York//Economy//2380,1,0,Egg Free
1,2024-09-28//PA008//Perth-New York//Economy//1855,0,2,Vegetarian
2,2024-04-20//PA002//New York-London//Economy//3490,1,1,Vegan
3,2024-01-23//PA010//Tokyo-New York//Premium Eco...,1,1,Vegetarian
4,2024-10-01//PA008//Perth-New York//Business Cl...,0,0,Vegetarian


In [332]:
df['Flight Details'][0]

'2024-07-22//PA010//Tokyo-New York//Economy//2380'

### Split the Flight Details field Solution

To clean and prepare the data for analysis, the following steps will be executed:

1. **Define the Column Structure:**  
   A list of column names will be established to standardize the structure of the dataset. The column names include:  
   `['Date', 'Flight Number', 'From-To', 'Class', 'Price']`.

2. **Parse Flight Details:**  
   The raw flight details will be split using the delimiter `//`. This will transform the string into a list of values corresponding to each column in the predefined list.  

   Example:  
   - **Input String:** `"2024-07-22//PA010//Tokyo-New York//Economy//2380"`  
   - **Parsed Output:** `['2024-07-22', 'PA010', 'Tokyo-New York', 'Economy', '2380']`

3. **Extract Origin and Destination:**  
   The value from the `From-To` column (index 2) will be further split using the `-` delimiter. This will separate the flight's origin (`From`) and destination (`To`) into two distinct fields for enhanced clarity and analysis.  

   Example:  
   - **Input:** `"Tokyo-New York"`  
   - **Output:**  
     - `From: "Tokyo"`  
     - `To: "New York"`

In [334]:
new_cols = ['Date', 'Flight Number', 'From-To', 'Class', 'Price']
for col_index, col in enumerate(new_cols):
    # if col = 'From-to'
    # we are going to split it to two columns
    if col_index == 2:
        df['From'] = df['Flight Details'].apply(lambda x: x.split('//')[col_index]).apply(lambda x:x.split('-')[0])
        df['To'] = df['Flight Details'].apply(lambda x: x.split('//')[col_index]).apply(lambda x:x.split('-')[1])
    else:
        df[col] = df['Flight Details'].apply(lambda x: x.split('//')[col_index])
        

In [335]:
df.head()

Unnamed: 0,Flight Details,Flow Card?,Bags Checked,Meal Type,Date,Flight Number,From,To,Class,Price
0,2024-07-22//PA010//Tokyo-New York//Economy//2380,1,0,Egg Free,2024-07-22,PA010,Tokyo,New York,Economy,2380.0
1,2024-09-28//PA008//Perth-New York//Economy//1855,0,2,Vegetarian,2024-09-28,PA008,Perth,New York,Economy,1855.0
2,2024-04-20//PA002//New York-London//Economy//3490,1,1,Vegan,2024-04-20,PA002,New York,London,Economy,3490.0
3,2024-01-23//PA010//Tokyo-New York//Premium Eco...,1,1,Vegetarian,2024-01-23,PA010,Tokyo,New York,Premium Economy,825.0
4,2024-10-01//PA008//Perth-New York//Business Cl...,0,0,Vegetarian,2024-10-01,PA008,Perth,New York,Business Class,634.8


In [336]:
df = df[df.columns[4:].tolist() + df.columns[1:4].tolist()]

In [337]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3778 entries, 0 to 3777
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   Date           3778 non-null   object
 1   Flight Number  3778 non-null   object
 2   From           3778 non-null   object
 3   To             3778 non-null   object
 4   Class          3778 non-null   object
 5   Price          3778 non-null   object
 6   Flow Card?     3778 non-null   int64 
 7   Bags Checked   3778 non-null   int64 
 8   Meal Type      3189 non-null   object
dtypes: int64(2), object(7)
memory usage: 265.8+ KB


In [338]:
df['Date'] = pd.to_datetime(df['Date'])
df['Price'] = pd.to_numeric(df['Price'])

In [339]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3778 entries, 0 to 3777
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype         
---  ------         --------------  -----         
 0   Date           3778 non-null   datetime64[ns]
 1   Flight Number  3778 non-null   object        
 2   From           3778 non-null   object        
 3   To             3778 non-null   object        
 4   Class          3778 non-null   object        
 5   Price          3778 non-null   float64       
 6   Flow Card?     3778 non-null   int64         
 7   Bags Checked   3778 non-null   int64         
 8   Meal Type      3189 non-null   object        
dtypes: datetime64[ns](1), float64(1), int64(2), object(5)
memory usage: 265.8+ KB


In [340]:
flow_card = {1: 'Yes', 0: 'No'}
df['Flow Card?'] = df['Flow Card?'].map(lambda x:flow_card.get(x, x))

In [341]:
df['Flow Card?'].value_counts()

Flow Card?
No     1895
Yes    1883
Name: count, dtype: int64

In [342]:
flow_card = df[df['Flow Card?'] == 'Yes']
no_flow_card = df[df['Flow Card?'] == 'No']

In [343]:
flow_card.head()

Unnamed: 0,Date,Flight Number,From,To,Class,Price,Flow Card?,Bags Checked,Meal Type
0,2024-07-22,PA010,Tokyo,New York,Economy,2380.0,Yes,0,Egg Free
2,2024-04-20,PA002,New York,London,Economy,3490.0,Yes,1,Vegan
3,2024-01-23,PA010,Tokyo,New York,Premium Economy,825.0,Yes,1,Vegetarian
6,2024-06-05,PA006,Tokyo,London,First Class,618.0,Yes,3,Vegan
8,2024-03-30,PA004,Perth,London,First Class,446.0,Yes,1,Nut Free


In [344]:
no_flow_card.head()

Unnamed: 0,Date,Flight Number,From,To,Class,Price,Flow Card?,Bags Checked,Meal Type
1,2024-09-28,PA008,Perth,New York,Economy,1855.0,No,2,Vegetarian
4,2024-10-01,PA008,Perth,New York,Business Class,634.8,No,0,Vegetarian
5,2024-03-04,PA007,New York,Perth,Business Class,458.4,No,3,Nut Free
7,2024-02-25,PA010,Tokyo,New York,Premium Economy,1435.0,No,0,
13,2024-03-29,PA004,Perth,London,Economy,2730.0,No,2,Vegan


In [345]:
flow_card.info()

<class 'pandas.core.frame.DataFrame'>
Index: 1883 entries, 0 to 3777
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype         
---  ------         --------------  -----         
 0   Date           1883 non-null   datetime64[ns]
 1   Flight Number  1883 non-null   object        
 2   From           1883 non-null   object        
 3   To             1883 non-null   object        
 4   Class          1883 non-null   object        
 5   Price          1883 non-null   float64       
 6   Flow Card?     1883 non-null   object        
 7   Bags Checked   1883 non-null   int64         
 8   Meal Type      1594 non-null   object        
dtypes: datetime64[ns](1), float64(1), int64(1), object(6)
memory usage: 147.1+ KB


In [346]:
no_flow_card.info()

<class 'pandas.core.frame.DataFrame'>
Index: 1895 entries, 1 to 3776
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype         
---  ------         --------------  -----         
 0   Date           1895 non-null   datetime64[ns]
 1   Flight Number  1895 non-null   object        
 2   From           1895 non-null   object        
 3   To             1895 non-null   object        
 4   Class          1895 non-null   object        
 5   Price          1895 non-null   float64       
 6   Flow Card?     1895 non-null   object        
 7   Bags Checked   1895 non-null   int64         
 8   Meal Type      1595 non-null   object        
dtypes: datetime64[ns](1), float64(1), int64(1), object(6)
memory usage: 148.0+ KB
