# Pandas Exercises

#### Make sure you have a clear understanding of Pandas' functionality, as well as why and when you would want to use it! Feel free to explore more of the library on your own too.

### Import NumPy and Pandas

In [None]:
import numpy as np
import pandas as pd

### Series
Similar to a NumPy array, except a Series can be indexed by labels instead of a number. Series can also store any Python object.

A Series can be created using a couple data structures

In [None]:
# Index labels
labels = ['one', 'two', 'three']

In [None]:
# Standard List
arr = [5, 10, 15]

pd.Series(data=arr, index=labels)

In [None]:
# NumPy Array
np_arr = np.array([2, 4, 6])

pd.Series(data=np_arr, index=labels)

In [None]:
# Dictionary
d = {'a': 5, 'b':10, 'c':15}

pd.Series(data=d)

### Indexing

In [None]:
ser1 = pd.Series([4, 6, 2, 8], ['Apples', 'Bananas', 'Oranges', 'Mango'])
ser1['Bananas']

In [None]:
ser2 = pd.Series([8, 2, 6, 4], ['Apples', 'Bananas', 'Oranges', 'Pineapple'])
ser2['Pineapple']

In [None]:
ser1 + ser2

### DataFrames

Pandas DataFrames are dictionary-like containers for Series objects, and axes (rows and columns) are labeled. DataFrames are the primary data structure of Pandas

In [None]:
from numpy.random import randn
np.random.seed(5)

In [None]:
df = pd.DataFrame(data=randn(5,4),index='r1 r2 r3 r4 r5'.split(),columns='c1 c2 c3 c4'.split())
df

### Indexing

In [None]:
df['c4']

In [None]:
# Indexing multiple columns requires a list
df[['c1', 'c2']]

#### Selecting Rows by Label

In [None]:
df.loc['r2']

#### Selectings Rows by Index

In [None]:
df.iloc[2]

#### Selecting Rows and Columns

In [None]:
df.loc['r1', 'c1']

In [None]:
df.loc[['r1', 'r2'], ['c1', 'c2']]

### Conditional Selection

In [None]:
df > 0

In [None]:
df[df>0]

In [None]:
df['c2'] > 0

In [None]:
df[df['c2']>0]

In [None]:
df[df['c2']>0]['c3']

In [None]:
df[df['c2']>0][['c2', 'c3']]

For multiple conditions, use & or | (and, or) with parenthesis

In [None]:
df[(df['c1']<0) & (df['c3']>0)]

#### Lets replace the index!

In [None]:
df.reset_index()

In [None]:
new_index = 'CA NY OR CO'.split()
df['States'] = new_index

In [None]:
df

In [None]:
df.set_index('States', inplace=True)
df

### Creating New Columns

In [None]:
df['c5'] = randn(5, 1)
df

### Removing a column

In [None]:
# axis=0 for rows, axis=1 for columns
df.drop('c5', axis=1, inplace=True)

In [None]:
df

In [None]:
df.drop('r5', axis=0, inplace=True)

In [None]:
df

### Multi-Indexing / Hierarchy

In [None]:
outside = ['G1','G1','G1','G2','G2','G2']
inside = [1,2,3,1,2,3]
hier_index = list(zip(outside,inside))
hier_index = pd.MultiIndex.from_tuples(hier_index)

In [None]:
hier_index

In [None]:
df = pd.DataFrame(data=randn(6, 2), index=hier_index, columns=['A', 'B'])
df

In [None]:
df.loc['G1']

In [None]:
df.loc['G2']

.loc() is used to index rows. For columns, [] will suffice. Remember, axis=0 for rows, axis=1 for columns

In [None]:
df.index.names = ['Group', 'Num']

In [None]:
df

In [None]:
df.loc['G1']

An alternative to .loc() is .xs(). Functionality is mostly the same, but xs() allows for level-based indexing.

In [None]:
df.xs('G1')

In [None]:
df.xs(['G1', 1])

In [None]:
df.xs(1, level='Num')