# Government Aviation - Denied Boarding

The Data is taken from [Data.gov](https://catalog.data.gov/dataset/commercial-aviation-involuntary-denied-boarding), and it concerns the number of air travel passengers boarded and denied boarding by month. 

In [33]:
# Importing packages and loading data
import os as os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
os.getcwd()
os.listdir()
raw_data = pd.read_csv('Commercial_Aviation.csv')

## Data Exploration

In [34]:
raw_data.info()
raw_data.shape

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 899 entries, 0 to 898
Data columns (total 29 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   YEAR                     899 non-null    int64  
 1   MONTH                    899 non-null    int64  
 2   QUARTER                  899 non-null    int64  
 3   MKT_CARRIER              682 non-null    object 
 4   MKT_CARRIER_AIRLINE_ID   684 non-null    float64
 5   MKT_CARRIER_NAME         684 non-null    object 
 6   MKT_UNIQUE_CARRIER       684 non-null    object 
 7   MKT_UNIQUE_CARRIER_NAME  684 non-null    object 
 8   OP_CARRIER               899 non-null    object 
 9   OP_CARRIER_AIRLINE_ID    898 non-null    float64
 10  OP_CARRIER_NAME          898 non-null    object 
 11  OP_UNIQUE_CARRIER        898 non-null    object 
 12  OP_UNIQUE_CARRIER_NAME   898 non-null    object 
 13  PAX_ALT_TRANS            899 non-null    int64  
 14  PAX_NO_ALT_TRANS         8

(899, 29)

## Filtered subset data

In [38]:
new_set = raw_data[['YEAR','MONTH','MKT_CARRIER','MKT_CARRIER_NAME','OP_CARRIER',
                    'OP_CARRIER_NAME','TOT_DEN_BOARDING','TOT_BOARDING']]

In [46]:
new_set['Date'] = pd.to_datetime(new_set[['YEAR','MONTH']].assign(DAY = 1)).dt.to_period('M')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_set['Date'] = pd.to_datetime(new_set[['YEAR','MONTH']].assign(DAY = 1)).dt.to_period('M')


In [53]:
del(new_set['YEAR'])

In [54]:
del(new_set['MONTH'])

In [55]:
new_set.head(5)

Unnamed: 0,MKT_CARRIER,MKT_CARRIER_NAME,OP_CARRIER,OP_CARRIER_NAME,TOT_DEN_BOARDING,TOT_BOARDING,Date
0,UA,United Air Lines Inc.,ZW,Air Wisconsin Airlines Corp,1,762892,2018-03
1,UA,United Air Lines Inc.,ZW,Air Wisconsin Airlines Corp,0,1046874,2018-06
2,UA,United Air Lines Inc.,ZW,Air Wisconsin Airlines Corp,0,979383,2018-09
3,UA,United Air Lines Inc.,ZW,Air Wisconsin Airlines Corp,0,997846,2018-12
4,UA,United Air Lines Inc.,ZW,Air Wisconsin Airlines Corp,1,926950,2019-03


## Checking uniqueness 

In [56]:
new_set.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 899 entries, 0 to 898
Data columns (total 7 columns):
 #   Column            Non-Null Count  Dtype    
---  ------            --------------  -----    
 0   MKT_CARRIER       682 non-null    object   
 1   MKT_CARRIER_NAME  684 non-null    object   
 2   OP_CARRIER        899 non-null    object   
 3   OP_CARRIER_NAME   898 non-null    object   
 4   TOT_DEN_BOARDING  899 non-null    int64    
 5   TOT_BOARDING      899 non-null    int64    
 6   Date              899 non-null    period[M]
dtypes: int64(2), object(4), period[M](1)
memory usage: 49.3+ KB


In [57]:
new_set['MKT_CARRIER'].describe()

count     682
unique     13
top        UA
freq      158
Name: MKT_CARRIER, dtype: object

## Checking for Nan values

In [59]:
new_set['MKT_CARRIER'].unique()

array(['UA', 'FL', 'AS', nan, 'G4', 'AA', 'DL', 'F9', 'F10', 'HA', 'B6',
       'WN', 'NK', 'VX'], dtype=object)

In [62]:
char_mkt_carrier = np.where(new_set['MKT_CARRIER'].isnull()==True,
                            'NA',
                            new_set['MKT_CARRIER'])
new_set['MKT_CARRIER'] = pd.Categorical(char_mkt_carrier)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_set['MKT_CARRIER'] = pd.Categorical(char_mkt_carrier)


In [63]:
char_mkt_carrier_name = np.where(new_set['MKT_CARRIER_NAME'].isnull()==True,
                            'UNKNOWN',
                            new_set['MKT_CARRIER_NAME'])
new_set['MKT_CARRIER_NAME'] = pd.Categorical(char_mkt_carrier_name)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_set['MKT_CARRIER_NAME'] = pd.Categorical(char_mkt_carrier_name)


In [64]:
new_set.to_csv('Aviation_read.csv')