# DMA Lab: Pandas Basics Demo Session

## Pandas

* *pandas* is a Python library for data analysis. 
* It offers a number of data exploration, cleaning and transformation operations that are critical in working with data in Python. 

* *pandas* build upon *numpy* and *scipy* providing easy-to-use data structures and data manipulation functions with integrated indexing.

* The main data structures *pandas* provides are *Series* and *DataFrames*. 


**Recommended Resources:**
* *pandas* Documentation: http://pandas.pydata.org/pandas-docs/stable/
* *Python for Data Analysis* by Wes McKinney
* *Python Data Science Handbook* by Jake VanderPlas

Let's get started with our first *pandas* notebook!

### Import Pandas and Numpy Libraries

In [None]:
import pandas as pd
import numpy as np

### Introduction to pandas Data Structures

* *pandas* has two main data structures it uses, namely, *Series* and *DataFrames*. 

### pandas Series
- one-dimensional labeled array. 

### Creating a Series by passing a list of values, letting pandas create a default integer index

In [None]:
data = np.array([1, 3, 5, np.nan, 6, 8])
s = pd.Series(data)
s

In [None]:
s.index

In [None]:
print(s[:2])

### Creating a series with 'object' data type index

In [None]:
ser = pd.Series([100, 'foo', 300, 'bar', 500], index=['tom', 'bob', 'nancy', 'dan', 'eric'])

In [None]:
print(ser)

In [None]:
ser.index

In [None]:
ser.loc[['nancy','bob']]                #label-location based indexer for selection by label

In [None]:
ser[[4, 3, 1]]

In [None]:
ser.iloc[2]                            #integer-location based indexing for selection by position

In [None]:
'bob' in ser

In [None]:
ser

In [None]:
ser * 2

In [None]:
ser[['nancy', 'eric']] ** 2

### pandas DataFrame
- *pandas DataFrame* is a 2-dimensional labeled data structure.

### Create DataFrame from dictionary of Python Series

In [None]:
d = {'one' : pd.Series([100., 200., 300.], index=['apple', 'ball', 'clock']),
     'two' : pd.Series([111., 222., 333., 4444.], index=['apple', 'ball', 'cerill', 'dancy'])}
print(d)

In [None]:
df = pd.DataFrame(d)
df

In [None]:
df.index

In [None]:
df.columns

In [None]:
pd.DataFrame(d, index=['dancy', 'ball', 'apple'])

In [None]:
pd.DataFrame(d, index=['dancy', 'ball', 'apple'], columns=['two', 'five'])

### Create DataFrame from list of Python dictionaries

In [None]:
data = [{'alex': 1, 'joe': 2}, {'ema': 5, 'dora': 10, 'alice': 20}]
data

In [None]:
pd.DataFrame(data)

In [None]:
pd.DataFrame(data, index=['orange', 'red'])

In [None]:
pd.DataFrame(data, columns=['joe', 'dora','alice'])

### Basic DataFrame operations

In [None]:
df

In [None]:
df['one']

In [None]:
df['three'] = df['one'] * df['two']
df

In [None]:
df['flag'] = df['one'] > 250
df

In [None]:
three = df.pop('three')

In [None]:
three

In [None]:
df

In [None]:
del df['two']

In [None]:
df

In [None]:
df.insert(2, 'copy_of_one', df['one'])
df

In [None]:
df['one_upper_half'] = df['one'][:2]
df

## Reference on Pandas for more details
    - http://pandas.pydata.org/pandas-docs/stable/