# Pandas challenge

The company **BestAirportsEver** has contacted us to help them understand how to improve their services quality in the area of Los Angeles, United States. They want we create two products for them:    
    
1. **Carriers on time**: given a destination and a departure date, returns the three carriers with a lower departure delay in average.
2. **Crowd avoidance**: given a destination and a month, recommend which are the 5 best days and time range (morning, afternoon, evening and night) where there is a low number of flights departing from the airport, so there will be less people on there. The time ranges will be:
    * *Morning*:     5 am to 12 pm (noon)
    * *Afternoon*:   12 pm to 5 pm
    * *Evening*:     5 pm to 9 pm
    * *Night*:       9 pm to 4 am

To do so, we have received a dataset that contains the information from all the flights leaving the LA ariports in 2018:

Los Angeles Airports are:
* Los Angeles International Airport (LAX)
* Ontario International Airport (ONT)
* John Wayne Airport (SNA)
* Hollywood Burbank Airport (BUR)
* Long Beach Airport (LGB)

Dataset information:
* fl_datetime: flight date time
* tailnum: Plane tail number.
* carrier: Two letter carrier abbreviation.
* origin, dest: Origin and destination.
* dep_time, arr_time: Actual departure and arrival times (format HHMM or HMM), local tz.
* dep_delay, arr_delay: Departure and arrival delays, in minutes. Negative times represent early departures/arrivals.
* air_time: Amount of time spent in the air, in minutes.

The output of this functions should be two functions called `carriers_on_time()` and `crowd_avoidance()`.

In [1]:
import pandas as pd
url = "https://drive.google.com/file/d/1nLL-p0gYoJosgRpoNTAV7fvDvl66yKmi/view?usp=sharing"
path = "https://drive.google.com/uc?export=download&id="+url.split("/")[-2]
df = pd.read_csv(path)

df['fl_datetime'] = pd.to_datetime(df['fl_datetime'])

In [2]:
df

Unnamed: 0,fl_datetime,tail_num,op_carrier,origin,dest,dep_time,arr_time,dep_delay,arr_delay,air_time
0,2018-01-01 10:24:00,302NV,G4,LAX,MFR,1024,1227,24,29,92
1,2018-01-01 08:24:00,332NV,G4,LAX,CVG,824,1521,84,74,208
2,2018-01-01 21:31:00,N510NK,NK,LAX,BWI,2131,511,15,11,256
3,2018-01-01 23:23:00,N615NK,NK,LAX,CLE,2323,701,-7,9,238
4,2018-01-01 08:13:00,N650NK,NK,LAX,IAH,813,1321,-6,-14,158
...,...,...,...,...,...,...,...,...,...,...
326733,2018-08-31 09:53:00,N877NN,AA,LAX,IAD,953,1804,3,11,270
326734,2018-08-31 08:34:00,N125AA,AA,LAX,LAS,834,958,-6,-8,46
326735,2018-08-31 11:39:00,N998AN,AA,LAX,ORD,1139,1758,24,24,213
326736,2018-08-31 10:05:00,N958NN,AA,LAX,ATL,1005,1716,13,-7,230


In [3]:
def carriers_on_time(dest, dep_date):
    """
    Given a destination and a departure date, return the 3 carriers with the least delay.
    Args:
        * dest: string format specifying the destination airport code
        * dep_date: string with a format of "Year-Month-Day". Ex: "2018-03-08"
    Returns:
        A list of strings containing the 3 carriers with the least departure delay, sorted in ascending order.
    """

    # Convert departure date to datetime object
    dep_date = pd.to_datetime(dep_date)

    # Extract date from flight datetime and convert to datetime object
    df['date'] = pd.to_datetime(df['fl_datetime']).dt.date
    df['date'] = pd.to_datetime(df['date'])

    # Filter DataFrame for specified destination and departure date
    filtered_df = df[(df['dest'] == dest) & (df['date'] == dep_date)]

    # Group by carrier and calculate median departure delay
    grouped_carriers = filtered_df.groupby('op_carrier')['dep_delay'].median()

    # Sort carriers by median delay and select top 3
    top_carriers = grouped_carriers.sort_values().head(3)

    print('The three carriers with the least departure delay are (already sorted):')

    # Return only the carrier codes
    return list(top_carriers.index)

carriers_on_time('JFK', '2018-08-27')

The three carriers with the least departure delay are (already sorted):


['AA', 'AS', 'DL']

In [6]:
def time_range(datetime):
    hour = int(datetime.strftime('%H'))
    if 5 <= hour and hour <= 12:
        return 'Morning'
    elif 12 < hour and hour <= 17:
        return 'Afternoon'
    elif 17 < hour and hour <= 21:
        return 'Evening'
    else:
        return 'Night'

def crowd_avoidance(dest, month):
    """
    Recommend the best times to go to the airport on a given day and month to avoid crowds.

    Args:
        dest (str): The destination airport code.
        month (str): The month as a string (e.g., 'June').
    """
    # Use .copy() to explicitly create a copy of the DataFrame
    filtered_df = df.copy()

    # Filter flights by destination and month
    filtered_df = filtered_df[(filtered_df['dest'] == dest) & (filtered_df['fl_datetime'].dt.month_name() == month)]

    # Assign time ranges to flights
    filtered_df.loc[:, 'time_range'] = filtered_df['fl_datetime'].apply(time_range)

    # Group by date and time range, and count the number of flights

    filtered_df = (filtered_df
                    .groupby(['date','time_range'])['tail_num']
                    .count()
                    .reset_index()
                    .sort_values('tail_num')
                    .drop(['tail_num'], axis=1)
                    .head(5)
    )
    # Print recommendation
    print(f"We recommend going to the airport on the following day and time ranges in {month} (already sorted):")
    print(filtered_df)

crowd_avoidance('JFK', 'June')

We recommend going to the airport on the following day and time ranges in June (already sorted):
          date time_range
117 2018-06-30    Evening
51  2018-06-13      Night
77  2018-06-20    Evening
63  2018-06-16      Night
35  2018-06-09      Night


We recommend going to the airport on the following day and time ranges in June (already sorted):
          date time_range
117 2018-06-30    Evening
51  2018-06-13      Night
77  2018-06-20    Evening
63  2018-06-16      Night
35  2018-06-09      Night
