# Agenda

1. Recap
2. Address book
3. More with reading from and writing to files
4. Cleaning data with `nan` and interpolating
5. Analysis with data frames
    - Cutting and categorizing
    - Sorting
    - Grouping
    - Concatenating data frames together
    - Join data frames
    

# Recap

When we use Pandas, we're mainly using two different data structures:

- Series, which is basically a 1D NumPy array with a nice set of wrappers around it.  Each series has a single dtype.  Pandas often guesses correctly, but you can set it just as you did with NumPy arrays.
- Data frame, which is basically a glorified 2D NumPy array.  Each column in a data frame is a separate series, which means that each column has a separate dtype.  

Both a series and a data frame have an *index*, which describes the rows. An index can contain any type of values at all -- integers, strings, dates, or anything else.  Integers and strings are most common.  The values can even repeat.

A data frame, in addition to an index, has a value for "columns," which describes the names of the columns.

We can retrieve from either a series or from a data frame via the index using `.loc`.  Or we can use the numeric position using `.iloc`.

In [1]:
import numpy as np
import pandas as pd
from pandas import Series, DataFrame

In [3]:
df = DataFrame(np.random.randint(0, 1000, [5,6]),
              index=list('vwxyz'),
              columns=list('abcdef'))
df

Unnamed: 0,a,b,c,d,e,f
v,772,582,393,320,11,773
w,400,535,723,139,423,244
x,475,892,999,438,333,382
y,610,323,559,372,365,336
z,770,201,77,18,935,138


In [4]:
# I can retrieve an entire row via .loc and an index

df.loc['x']

a    475
b    892
c    999
d    438
e    333
f    382
Name: x, dtype: int64

In [5]:
df.loc['x', 'd']   # retrieve row x, column d

438

In [6]:
df.loc['x', 'd'] = 12.34
df   # the dtype for d has changed - now it's np.float64

Unnamed: 0,a,b,c,d,e,f
v,772,582,393,320.0,11,773
w,400,535,723,139.0,423,244
x,475,892,999,12.34,333,382
y,610,323,559,372.0,365,336
z,770,201,77,18.0,935,138


In [7]:
df.dtypes  # show me all dtypes for all columns

a      int64
b      int64
c      int64
d    float64
e      int64
f      int64
dtype: object

In [8]:
# d is now a float64 column
# but what if I retrieve row x?

df.loc['x']

a    475.00
b    892.00
c    999.00
d     12.34
e    333.00
f    382.00
Name: x, dtype: float64