# Pandas

Let's practice with Pandas dataframes and series!

In [None]:
#Run these imports first
import numpy as np
import pandas as pd

### Series

Let's make a series, including one field with NaN (an empty value).


In [None]:
s = pd.Series([1,3,5,np.nan, 6, 8])
s

In [None]:
# Accessing a member of the series:

s[4]

### Dataframes

Here we define dataframe.

In [None]:
df = pd.DataFrame({'date' : ['2016-01-01', '2016-01-02', '2016-01-03'],
                    'qty': [20, 30, 40]})
df


Larger data should be loaded from a file.

In [None]:
rain = pd.read_csv('https://s3.amazonaws.com/elephantscale-public/data/rainfall/rainfall.csv')
rain

In [None]:
# Let's load a column
rain['City']

We can also get a column by index number.(starting from zero)

In [None]:
rain.iloc[:,0]

We can also get a row by index number

In [None]:
# Let's load a row
rain.iloc[[1]]

Or a group of rows by index number

In [None]:
rain.iloc[0:1]

In [None]:
We can even get a sub-dataframe using iloc

```python
rain.iloc[0:1,1:2] # First two rows, columns 1 and 2 (counting from 0)
```

In [None]:
# TODO: How would you get row # 1 through 3, with the first 2 columns?



# Filtering

We can filter rows like this:

In [None]:
#find all rainfall less than 10

rain[rain['Rainfall'] < 10]

# Filter Exercises

Complete the following:

In [None]:
# How would we find all reading for Aprils (in all cities)?

In [None]:
# How would we find all readings for Los Angeles?

# Naming Rows

We can give rows names instead of numbers.

In [None]:
rain.set_index(rain['City']  + rain['Month'])


### Setting the index.

In Pandas we have something called the index.  Here's how we use the index to get rows:

```python
rain.loc[0]  #will get row at index '0' as a series
rain.loc[[0]] #will get row at index '0' as a one-row dataframe
```

Just like columns can be accessed by both number and name, rows also can be accessed by either number or name.  

By default, the index is simply the row number starting from zero, but this can be changed or overridden.

``` rain.set_index("colname")
```

In [None]:
rain.set_index(rain['City']  + rain['Month'])

### Pandas and changes in-place

**Most** (but not all) changes to the dataframe do not happen in-place.  This means that they return a mutated copy of the data, but don't touch the original data.

Let's try referencing the rainfall after setting the dataframe index as above.

In [None]:
# Let us try referencing this -- Note: it won't work.
rain.loc['San FranciscoJan']  # ERROR!

set_index, and many other functions returns a mutated dataframe and does NOT change it in-place. If we want to apply the change we can write on top of the old dataframe.

In [None]:
rain = rain.set_index(rain['City']  + rain['Month'])
                    
rain

In [None]:
rain.loc['San FranciscoJan']  #Should Work Now

Most functions can in fact change data in place with the optional inPlace parameter.

```python
 rain.set_index(rain['City']  + rain['Month'], inplace=True)
```
