# Robin Marte â€“ Project 1  
## Comparing Airline Arrival Delays


This project analyzes arrival delays for two airlines, Alaska and AM West, across five destinations:

- Los Angeles  
- Phoenix  
- San Diego  
- San Francisco  
- Seattle  

The purpose of this analysis is to compare the arrival performance of the two airlines using pandas and to determine whether one airline appears to perform better overall and within each destination.

In [2]:
import pandas as pd

In [3]:
# load dataset
df = pd.read_excel("airlineDelays.xlsx")

# preview
df.head()

Unnamed: 0,Airline,Destination,Status,Flights
0,Alaska,Los Angeles,On Time,497
1,Alaska,Los Angeles,Delayed,62
2,Alaska,Phoenix,On Time,221
3,Alaska,Phoenix,Delayed,12
4,Alaska,San Diego,On Time,212


## Overall Delay Comparison

First, I will compare the airlines using total flights across all destinations.

To do this:
1. Group by Airline and Status
2. Sum the number of flights
3. Calculate delay rate

Delay Rate = Delayed Flights / Total Flights

In [4]:
# Group by airline and status
overall = df.groupby(["Airline", "Status"])["Flights"].sum().unstack()

# Calculate total flights
overall["Total Flights"] = overall["Delayed"] + overall["On Time"]

# Calculate delay rate
overall["Delay Rate"] = overall["Delayed"] / overall["Total Flights"]

overall

Status,Delayed,On Time,Total Flights,Delay Rate
Airline,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
AM West,787,6438,7225,0.108927
Alaska,501,3274,3775,0.132715


### Overall Results

This table shows the total number of delayed and on-time flights for each airline, along with the delay rate.

Based on the overall delay rate, one airline may appear to perform better when all flights are combined.

## Delay Comparison by Destination

Next, I will compare delay rates within each destination.

This allows for a more detailed comparison of performance in specific cities.

In [5]:
# Group by airline and destination
by_destination = df.groupby(["Airline", "Destination", "Status"])["Flights"].sum().unstack()

# Calculate totals and delay rates
by_destination["Total Flights"] = by_destination["Delayed"] + by_destination["On Time"]
by_destination["Delay Rate"] = by_destination["Delayed"] / by_destination["Total Flights"]

by_destination

Unnamed: 0_level_0,Status,Delayed,On Time,Total Flights,Delay Rate
Airline,Destination,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
AM West,Los Angeles,117,694,811,0.144266
AM West,Phoenix,415,4840,5255,0.078972
AM West,San Diego,65,383,448,0.145089
AM West,San Francisco,129,320,449,0.287305
AM West,Seattle,61,201,262,0.232824
Alaska,Los Angeles,62,497,559,0.110912
Alaska,Phoenix,12,221,233,0.051502
Alaska,San Diego,20,212,232,0.086207
Alaska,San Francisco,102,503,605,0.168595
Alaska,Seattle,305,1841,2146,0.142125


## What I notice from the results

From the overall table, AM West has a lower delay rate than Alaska when all flights are combined.  
AM West is about 0.109 delayed and Alaska is about 0.133 delayed.

However, when the data is broken down by destination, Alaska has a lower delay rate in each of the five cities. This pattern is consistent across Los Angeles, Phoenix, San Diego, San Francisco, and Seattle.

So overall, AM West looks better when totals are combined, but Alaska looks better when comparing performance within each destination.

## Conclusion

The overall comparison and the destination-level comparison lead to different interpretations of the data. Looking only at the total delay rates suggests that AM West performs better, but examining the results within each destination shows Alaska with lower delay rates in every city.

This shows why breaking the data into categories was an important step in the analysis. The grouped results provide a clearer comparison than the overall totals alone.

This project helped me better understand how to use pandas to organize data, calculate summary statistics, and compare results across multiple variables.