# Day 1 Recap

In [None]:
import numpy as np
import pandas as pd
import seaborn.apionly as sns
import matplotlib.pyplot as plt

In [None]:
flights = pd.read_csv("data/ny-flights.csv.gz",
                      parse_dates=["fl_date", "arr", "dep"])
first = flights.groupby("unique_carrier").first()
first.head()

## Data Structures

1. `DataFrame`: 2-dimensional labeled array
2. `Series`: 1-dimensional labeled array
3. `Index`: label containers

## Indexing

Use `[]` aka `__getitem__` for selecting just columns

In [None]:
first[['origin', 'dest']].head()

Use `.loc` for label indexing

In [None]:
first.loc[['AA', 'DL'], ['origin', 'dest']]

Use `.iloc` for positional indexing

In [None]:
first.iloc[[0, 2], [4, 5]]

All indexers acccept a *boolean mask*

In [None]:
flights[flights.dep.isnull()].head()

## Alignment

In [None]:
df1 = pd.DataFrame({"A": [1, 2, 3]}, index=['a', 'b', 'c'])
df2 = pd.DataFrame({"A": [2, 4, 6]}, index=['b', 'a', 'd'])

In [None]:
df1

In [None]:
df2

Pandas *aligns* by label, then does the operation.

In [None]:
df1 + df2

This saves you from ahving to write the join yourself.

## Groupby

1. Split by some array
2. Apply some function
3. Combine the results
    - `.agg`: 1 output row per input group
    - `.transform`: 1 output row per input row

In [None]:
flights.groupby("unique_carrier").dep_delay.mean()