# Pandas Tutorial

## Starting with key imports

In [1]:
import numpy as np
import pandas as pd

from pandas import Series, DataFrame

## Series Data

In [2]:
obj = pd.Series([4,7,-5,3])
obj

0    4
1    7
2   -5
3    3
dtype: int64

``` 0    4
    1    7
    2   -5
    3    3
    dtype: int64
```
The string representation of a Series displayed interactively shows the index on the left and the values on the right.  Since we did not specify an index for the data, a default one consisting of the integers 0 through N -1 (where N is the length of the data) is created.

In [3]:
obj.array

<PandasArray>
[4, 7, -5, 3]
Length: 4, dtype: int64

## Series with an index identifying each data point with a label:

In [4]:
obj2 = pd.Series([4,7,-5,3], index=["d", "b", "a", "c"])
obj2

d    4
b    7
a   -5
c    3
dtype: int64

In [5]:
obj2.index

Index(['d', 'b', 'a', 'c'], dtype='object')

Using NumPy functions or NumPy-like operations, such as filtering with boolean array, scalar multiplication, or applying math functions, will preserve the inex-value link:

In [6]:
obj2[obj2 > 0]

d    4
b    7
c    3
dtype: int64

In [7]:
obj2 * 2

d     8
b    14
a   -10
c     6
dtype: int64

In [8]:
np.exp(obj2)

d      54.598150
b    1096.633158
a       0.006738
c      20.085537
dtype: float64

Another way to think about a Series is as a fixed-length, ordered dictionary, as it is a mapping of index values to data values.  It can be used in many contexts where you might use a dictionary:

In [9]:
"b" in obj2

True

In [10]:
"e" in obj2

False

In [11]:
sdata = {"Ohio":35000, "Texas": 71000, "Oregon": 16000, "Utah": 5000}
obj3 = pd.Series(sdata)
obj3

Ohio      35000
Texas     71000
Oregon    16000
Utah       5000
dtype: int64

When transforming a dictionary to a Series you can specify the index order or override the indexes to add or truncate them

In [13]:
states = ["California", "Ohio", "Oregon", "Texas"]

obj4 = pd.Series(sdata, index=states)

obj4

California        NaN
Ohio          35000.0
Oregon        16000.0
Texas         71000.0
dtype: float64

### Missing data

The `isna` and `notna` functions in pandas should be used to detect missing data:

In [16]:
pd.isna(obj4)
obj4.isna()

California     True
Ohio          False
Oregon        False
Texas         False
dtype: bool

In [17]:
pd.notna(obj4)
obj4.notna()

California    False
Ohio           True
Oregon         True
Texas          True
dtype: bool

A Series can be converted back to a dictionary with its to_dict method:

In [12]:
obj3.to_dict()


{'Ohio': 35000, 'Texas': 71000, 'Oregon': 16000, 'Utah': 5000}

### Naming

In [19]:
obj4.name = "population"
obj4.index.name = "State"
obj4

State
California        NaN
Ohio          35000.0
Oregon        16000.0
Texas         71000.0
Name: population, dtype: float64

## DataFrame