# Your turn: Data Wrangling

[DataFrames.jl documentation](https://juliadata.github.io/DataFrames.jl/stable/)

First load the `DataFrames` and `RData` packages. Then we'll load some R data.

### This notebook contains solutions.

In [None]:
using DataFrames
using RData
nycflights = load("../data/nycflights13.RData")

We'll work with the `flights` dataset from `nycflights13`.

In [None]:
flights = nycflights["flights"]
describe(flights)

## Find all flights that

Departed from LaGuardia (LGA). (You should find 104,662 flights.)

In [None]:
flights[flights.origin .== "LGA", :]

Flew to Houston (IAH or HOU). (You should find 9,313 flights.)

*Hint:* You may need to add parentheses around certain expressions, depending on how you do this one.

In [None]:
flights[in(["IAH", "HOU"]).(flights.dest), :]

Had an arrival delay of two or more hours. (You should find 10,200 flights.)

In [None]:
flights[coalesce.(flights.arr_delay, 0) .>= 120, :]

Were operated by United, American, or Delta (UA, AA, DL). (You should find 139,504 flights.)

In [None]:
flights[in(["UA", "AA", "DL"]).(flights.carrier), :]

Arrived more than two hours late but didn't leave late. (You should find 29 flights.)

In [None]:
flights[(coalesce.(flights.arr_delay, 0) .> 120) .& (coalesce.(flights.dep_delay, 0) .<= 0), :]

## Grouped summaries

Which carrier had the worst average arrival delay? Hint: look at the final example in the data wrangling notebook.

[Query.jl documentation](https://www.queryverse.org/Query.jl/stable/)

In [None]:
using Query
using Statistics

In [None]:
flights |>
  @select(:carrier, :arr_delay) |>
  DataFrame |>
  (df -> groupby(df, :carrier)) |>
  (df -> aggregate(df, x->mean(coalesce.(x, 0)))) |>
  @rename(:arr_delay_function => :mean_arrival_delay) |>
  @orderby_descending(_.mean_arrival_delay)

Answer: "F9" (Frontier Airlines Inc.)