## S14a: Lab 2 - Pandas

Pandas plays nicely with Numpy, making a nice transition from previous work with n-shaped arrays. In this notebook we will be two structures for combining data: Series and DataFrames.

In this notebook, we are offering just a glimpse into Pandas before moving on to application in an example. It's worth learning more outside of lab: if you've only got 10min, go [here](https://pandas.pydata.org/docs/getting_started/10min.html); otherwise check out the learning [options](https://pandas.pydata.org/docs/getting_started/intro_tutorials/index.html). Much of this can be learned piecemeal through practical experience cleaning and pre-processing data.

### Building a data structure: starting with [Series](https://pandas.pydata.org/docs/reference/api/pandas.Series.html?highlight=series#pandas.Series)

In [1]:
# First import libraries

import numpy as np
import pandas as pd

In [3]:
# Scenario 1: Rustle up some simple key-value mock data
keys = ['rad', 'bad', 'sad', 'mad', 'fad']
values = np.random.randint(1, 11, len(keys))
print(values)

# Connect with pandas Series(data=None, index=None, dtype=None, name=None, copy=False, fastpath=False)
data_auto_i = pd.Series(values)
print('Auto increment:')
print(data_auto_i)
data_manu_i = pd.Series(values, keys)
print('\nManual increment:')
print(data_manu_i)

[5 3 9 8 1]
Auto increment:
0    5
1    3
2    9
3    8
4    1
dtype: int64

Manual increment:
rad    5
bad    3
sad    9
mad    8
fad    1
dtype: int64


In [4]:
# Find particular value by index/key

print(data_auto_i[2])
print(data_manu_i['sad'])

9
9


In [5]:
# Scenario 2: Rustle up data using dictionaries

arr_dicts = [
    {'name': 'Zona', 'color': 'red', 'intensity': np.random.randint(0, 256)},
    {'name': 'Aleksander', 'color': 'green', 'intensity': np.random.randint(0, 256)},
    {'name': 'Fred', 'color': 'blue', 'intensity': np.random.randint(0, 256)},
    {'name': 'Brian', 'color': 'cyan', 'intensity': np.random.randint(0, 256)},
    {'name': 'Jared', 'color': 'magenta', 'intensity:': np.random.randint(0, 256)}
]
teamdata1 = pd.Series(arr_dicts)
teamdata1

0    {'name': 'Zona', 'color': 'red', 'intensity': ...
1    {'name': 'Aleksander', 'color': 'green', 'inte...
2    {'name': 'Fred', 'color': 'blue', 'intensity':...
3    {'name': 'Brian', 'color': 'cyan', 'intensity'...
4    {'name': 'Jared', 'color': 'magenta', 'intensi...
dtype: object

In [6]:
teamdata1[0]

{'color': 'red', 'intensity': 138, 'name': 'Zona'}

### Excel-esque with [Dataframes](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html?highlight=dataframe#pandas.DataFrame)

In [7]:
# Some extractions
rows = np.array(['1', '2', '3'])
cols = np.array(['a', 'b', 'c', 'd', 'e'])

# To random Dataframe
dataframe = pd.DataFrame(np.random.randint(1, 101, (len(rows), len(cols))), rows, cols)
dataframe

Unnamed: 0,a,b,c,d,e
1,70,75,57,80,80
2,66,5,6,95,96
3,26,13,42,91,66


In [15]:
# Grab a col

print('COL:')
print(dataframe['c'])

# Grab a value

print('\nPOS:')
print(dataframe['c']['2'])

# row
print('\nrow')
print(dataframe.iloc[1,:])

COL:
1    57
2     6
3    42
Name: c, dtype: int64

POS:
6

row
a    66
b     5
c     6
d    95
e    96
Name: 2, dtype: int64


In [22]:
# !!!YOUR TURN!!!

# Slicing! Grab the first 2 rows and first 2 cols from dataframe
print(dataframe[0:2][['a', 'b']])


    a   b
1  70  75
2  66   5


### Change gears

There is alot to learn about pandas - but you will learn more by example in the next notebook.