# American Airlines & United Airlines

![](banner_airlines.jpg)

Data Retrieval, Descriptive Statistics, Histogram

In [None]:
f = "setup.R"; for (i in 1:10) { if (file.exists(f)) break else f = paste0("../", f) }; source(f)

## Situation

* **Role:** Business analyst
* **Business Decision:** How to schedule employees business travel?
* **Aproach:** Explore data to gain insight about possible airline price discrimination.
* **Dataset:** The dataset that you will use for this assignment is called `airline_ticket_prices.csv`. These data are a random sample of all recent roundtrip airline tickets purchased for travel between New York’s John F. Kennedy Airport (JFK) and Los Angeles (LAX). First-class fares have been excluded. For each ticket, the data includes the carrier (i.e. the airline), the roundtrip fare, and several other variables.

## Data

Retrieve the data in the file called `airline_ticket_prices.csv`.

In [None]:
data = read.csv("airline_ticket_prices.csv", header=TRUE)
size(data)
data

Take a look at the variables `advance`, `busclass`, and `nonrefundable`.
* Question: What do you think each variable measures?
* Answer:
  * **advance:** Measures how many days in advance each ticket was purchased.

  * **busclass:** Indicates whether a ticket purchased was a business class ticket. 1 indicates that a ticket was a business class ticket and 0 indicates that it was not. This is very visible in that the "1's" are much more expensive than the "0's". Note: We call this a "dummy variable" or an indicator variable. We will discuss these sorts of variables in more detail later in the course.

  * **nonrefundable:** Indicates whether a ticket purchased was a nonrefundable ticket. 1 indicates that a ticket was nonrefundable and 0 indicates that it was not (that is, it was refundable). We can guess at this by seeing that the "1's" are much less expensive than the "0's". 

## Descriptive Statistics & Histograms

List summary statistics for the variable `rtrip_fare`.  Note that the mean is larger than the median.
* Question: What does this indicate?
* Answer: These data are skewed to the right.

In [None]:
describe(data$rtrip_fare)

Construct a histogram of round trip fares.
* Question: Why is there so much white space on the right side of the histogram?
* Answer: There are a small number of very high fares, i.e., above \\$1000.

In [None]:
ggplot(data) + ggtitle("Round Trip Fares") + xlab("round trip fare ($)") +
geom_histogram(aes(rtrip_fare), bins=100)

Construct another histogram using only tickets under \\$1000. Be sure to label the axes.

In [None]:
data.cheap = data[data$rtrip_fare < 1000, ]

ggplot(data.cheap) + ggtitle("Round Trip Fares < $1000") + xlab("round trip fare ($)") +
geom_histogram(aes(rtrip_fare), bins=100)

Go back to the full dataset.
* Question: Is one of the airlines much cheaper on average than the other?
* Answer: No. On average, both airlines' fares are just over $368.

In [None]:
data.AA = data[data$carrier == "AA", ]
data.UA = data[data$carrier == "UA", ]

m.AA = mean(data.AA$rtrip_fare)
m.UA = mean(data.UA$rtrip_fare)

data.frame(price_mean_AA=m.AA, price_mean_UA=m.UA)

The `satstayover` variable is a 0/1 variable. Take a look at the prices of tickets with `satstayover` equal to 0 and equal to 1.
* Question: What do you think this variable measures and how do you interpret the price difference?
* Answer: This variable indicates that a roundtrip ticket includes a Saturday stayover in the destination location. Tickets with values of 1 are much less expensive than those with zeroes, \\$252 versus \\$402. This is classic price discrimination. Business travelers are less likely have a Saturday stayover and have higher willingness-to-pay, in part because they are typically buying their tickets with company funds.

In [None]:
data.0 = data[data$satstayover == 0, ]
data.1 = data[data$satstayover == 1, ]

m.0 = mean(data.0$rtrip_fare)
m.1 = mean(data.1$rtrip_fare)

data.frame(price_mean_no_stayover=m.0, price_mean_stayover=m.1)

<p style="text-align:left; font-size:10px;">
Copyright (c) Berkeley Data Analytics Group, LLC
<span style="float:right;">
Document revised January 21, 2020
</span>
</p>