# Agenda

1. Pandas in general
2. Series
    - Creating 
    - Retrieving from them
    - Methods
    - Working with `nan`
    - Broadcasting
    - Indexes
3. Data frames
    - Creating
    - Retrieving from them
    - Applying methods across all columns
4. Reading data from files
    - CSV
    - (Excel, a little)
    - JSON
    - Retrieving from the network

# Pandas data structures

There are two main data structures you need to know about with Pandas:

- `Series` -- basically a 1D NumPy array
- `DataFrame` -- basically a 2D NumPy array

The rows of a data frame are going to be equivalent to the rows of a NumPy array.

The columns of a data frame are all going to be Series objects.

# Installing pandas

You can install it along with many other Python packages with `pip` (command-line program, not to be run inside of Python or Jupyter):

    python3 -m pip install -U pandas
    
Once it is installed, then we'll want to load it into our program.

In [2]:
import numpy as np
import pandas as pd
from pandas import Series, DataFrame

In [3]:
# let's create a Pandas Series

s = Series([10, 20, 30, 40, 50, 60, 70])

In [4]:
type(s)

pandas.core.series.Series

In [5]:
s

0    10
1    20
2    30
3    40
4    50
5    60
6    70
dtype: int64

In [6]:
# I can do many of the same things with a series that I did with NumPy arrays

s[0]

10

In [7]:
s[3]

40

In [8]:
s.sum()

280

In [9]:
s.mean()

40.0

In [10]:
s = Series([10, 20, 30, 40, 50, 60, 70], dtype=np.float64)

In [11]:
s

0    10.0
1    20.0
2    30.0
3    40.0
4    50.0
5    60.0
6    70.0
dtype: float64

In [12]:
# behind the scenes, we can find a NumPy array
s.values

array([10., 20., 30., 40., 50., 60., 70.])

In [13]:
# I can create a Pandas Series  based on a 1D NumPy array

np.random.seed(0)
s = Series(np.random.randint(0, 100, 10))

s

0    44
1    47
2    64
3    67
4    67
5     9
6    83
7    21
8    36
9    87
dtype: int64

In [14]:
s.min()

9

In [15]:
s.max()

87

In [16]:
s.mean()

52.5

In [17]:
s.std()

25.67424130654432

In [18]:
s.sum()

525

In [20]:
s.count()   # how many non-nan values are there in this series?

10

In [None]:
# can I create a bool