# Pythonic Pandas - A Refresher

This is a quick notebook to help tie back some of what was covered in `cheatsheet.py` to the way we use Pandas to analyze data. Use this document as a reminder that the tools and abstractions provided by Pandas are based on the features that make Python Python.

> Note: This isn't necessarily the best, most pandas-y notebook ever. See the other notesbooks in this file for that.

In [2]:
import pandas as pd

`import` statements pull in modules, which are full of functions and classes that give us easy access to complex features. `as` lets use use a shorter, friendly name for our modules if we'd like.

In [3]:
df = pd.read_csv('oscar_winners.csv')

*Assignment* happens when we use a `=` to give a variable a value. Right now, we just used a *function* included in the pandas *module* to set the value of the brand new *variable*, `df` (dataframe) to whatever `returns` out of the `read_csv` function. 

> Remember: We don't see any output unless there's a problem. Function calls and other plain old *references* are the way to look at stuff in Jupyter.

In [4]:
type(df) # call the type funtion

pandas.core.frame.DataFrame

In this case, what got returned was an *instance* of the `DataFrame` class that the authors of pandas kindly created for us. (OnlyClassesGetTitleCased) 

Let's take a look at it.

In [5]:
df # plain old reference

Unnamed: 0,1934,7,Actor,1,Clark Gable,It Happened One Night
0,1934,7,Actor,,Frank Morgan,The Affairs of Cellini
1,1934,7,Actor,,William Powell,The Thin Man
2,1934,7,Actress,1.0,Claudette Colbert,It Happened One Night
3,1934,7,Actress,,Bette Davis,Of Human Bondage
4,1934,7,Actress,,Grace Moore,One Night of Love
5,1934,7,Actress,,Norma Shearer,The Barretts of Wimpole Street
6,1934,7,Art Direction,,The Affairs of Cellini,Richard Day [Third]
7,1934,7,Art Direction,,The Gay Divorcee,Van Nest Polglase Carroll Clark [Second]
8,1934,7,Art Direction,1.0,The Merry Widow,Cedric Gibbons Fredric Hope
9,1934,7,Assistant Director,,Cleopatra,Cullen Tate [Second]


Wow! Look at that wonderful data-- **NOT!** Our first row of data got caught in the first row. :( 

Since `df` is an *instance* of a class, then it probably has some useful functions and variables (or more accurately, methods and fields) for us to use...

In [6]:
df.columns # an instance of the `Index` class

Index(['1934', '7', 'Actor', '1', 'Clark Gable', 'It Happened One Night'], dtype='object')

This reminds me of something... A list! It's *list-like*! `[]` Rows and headers in Pandas resemble lists in native Python. Let's replace that list with with a new one!

In [7]:
df.columns = ['Year', 'Ceremony', 'Award', 'Win', 'Recipient', 'Film'  ]
df.columns

Index(['Year', 'Ceremony', 'Award', 'Win', 'Recipient', 'Film'], dtype='object')

Excellent. That makes a lot more sense. Let's take a look at a column. We're gonna go ahead and *access* a column with it's name, kind of like accessing a dict `{}` with a key. You guessed it: `DataFrame`s in Pandas are *dict-like*. 

In [8]:
type(df['Recipient'])

pandas.core.series.Series

And the values at the 'keys' in the dict-like dataframe are list-like objects called `Series`.

In [11]:
df['Recipient'][1:5]  # a key/column is a string

1       William Powell
2    Claudette Colbert
3          Bette Davis
4          Grace Moore
Name: Recipient, dtype: object

We'll stop here, since now we remember that:

- Pandas is a module that we can import and reference.
```python
import pandas as pd
```

- Since it's a module, it's filled with convenient functions and other tools.
```python
df = pd.read_csv('a_wonderful_csv.csv')
```

- As we use Pandas, we run into more convenient abstractions, such as the `DataFrame` class.
```python
df.columns # shows us a list-like representation of the header row
df['ColumnName'] # we can use strings to access columns, like accessing lists with dict keys
```

Each of these have more utilities attached to them, and we'll explore them in-context next.

Onward to the next notebook for more in-depth work with Pandas!