## This Jupyter Notebook is Brought to You by Doug Purcell
- software engineer, corporate trainer, and 5X author in SF
- My latest book is on Amazon, titled "Code Cool Stuff With Python"
- [Book Available in e-book/paperback editions](https://www.amazon.com/Code-Cool-Stuff-Python-Purcell/dp/0997326271/ref=tmm_pap_swatch_0?_encoding=UTF8&qid=&sr=)
![image info](book_cover.jpg)   

### Data Manipulation In Python Using Pandas 🐼🐼🐼
- Learn how to install the pandas library
- Learn the core data structures and their differences 
- Learn how to source the web for data sets
- Learn how to read data sets using pandas
- Learn functions in pandas for getting quick stats 
- Learn how to access the rows/columns of a data set
- Learn how to do graphs in pandas
- Learn how to emulate the Excel pivot table in pandas 

### How to Install Pandas 

- pip install pandas
- conda install pandas 

## The Two Primary Data Structures in Pandas
- Series: A 1D array like structure with homogeneous data. 
- DataFrame: A 2D array with heterogeneous data 

# Series Data Structure 
-  s = pd.Series(data, index=index)
- Data can be many different things
  - A python dict
  - A ndarray
  - A list 

## Series Example # 1

In [None]:
import numpy as np
import pandas as pd
s = pd.Series(np.random.rand(5), index=['a', 'b', 'c', 'd', 'e'])
s

## Series Example # 2

In [None]:
pd.Series([1, 2, 3,], index=['a', 'b', 'c'])

## Series Example # 3

In [None]:
fruits = {'a': 'apple', 'c': 'coconut', 'g': 'guava', 'p': 'pineapple'}
series = pd.Series(fruits)
series

# Now It's Time to Get to The DataFrame

In [None]:
from IPython.display import Image
Image(filename="giphy.gif")

## The Class Details for a DataFrame
- class pandas.DataFrame(data=None, index: Optional[Collection] = None, columns: Optional[Collection] = None, dtype: Union[str, numpy.dtype, ExtensionDtype, None] = None, copy: bool = False
- data can be ndarray, Iterable, dict, or DataFrame
- Index is optional and will result to Rangeindex
- columns: labels to use for resulting frame
- dtype: A single data type
- copy: copy data from inputs 

## DataFrame Example # 1

In [None]:
# building a dataframe from a dictionary
df = pd.DataFrame({'evens': [2, 4, 6, 8], 'odds':[1, 3, 5, 7]})
df

## DataFrame Example Number #2

In [None]:
# Creating a dataframe from a numpy ndarray

df2 = pd.DataFrame(np.array([[2, 4, 6], [8, 10, 12], [14, 16, 18]]),  columns=['a', 'b', 'c'], index=['X1', 'X2', 'X3'])
df2

## DataFrame Example # 3

In [None]:
# Randomly create a dataframe that's 10 x 4
import pandas as pd
matrix = pd.DataFrame(np.random.randint(1,100,size=(10, 4)), columns=list('ABCD'))


In [None]:
## Statistical Functions in Pandas



In [None]:
# descriptive stats: stats thats seeking to describe 
# describe: quick summary of various stats
# get count, mean, std, min, percentiles, max, min of all columns 
matrix.describe()

In [None]:
# corr(): compute pairwise correlation of columns 
matrix.corr()

In [None]:
# returns max from each column in the entire DataFrame 
matrix.max()

In [None]:
# returns min from each column in the entire DataFrame
matrix.min()

In [None]:
# gets the median from each column
matrix.median()

In [None]:
# gets the mode of each axis
matrix.mode()

## How to Index and Select data from a DataFrame
There's various ways in which you can access data in a DataFrame.
Let's learn the various ways in which we can access the rows/columns of a table


In [None]:
# access the columns of a DataFrame
matrix.columns


In [None]:
# use subscript notation to access the data from each row
matrix['A']

In [None]:
# combine that with some of the statistical functions we learned
matrix['A'].max()

In [None]:
# sum all of the values of A
matrix['A'].sum()

In [None]:
# get the number of count of items in A
matrix['A'].count()

In [None]:
# combining multiple columns together
# yes, you will need to use double square brackets 
matrix[['A', 'B']]

In [None]:
# you can perform mathematical applications on 
# columns just like you would on number types 
# Example, can add, subtract, divide, or multiply columns 


matrix['A'] - matrix['B']

In [None]:
# multiply two columns 
matrix['C'] * matrix['D']

In [None]:
# divide two columns
matrix['A'] / matrix['B']

In [None]:
# accesisng the rows
# use iloc function

matrix.iloc[0:5] # gets first 5 items in a row


In [None]:
# access every two items in a row
matrix.iloc[::2]

In [None]:
# reverse the rows in the dataframe
matrix[::-1]

In [None]:
# iterating through the dataframe

for index, row in matrix.iterrows():
    print('row {} ---> {}'.format(index, row['A']))

## Creating charts in Pandas

You can use pandas to various plot styles 

- line 
- bar
- hist
- box
- kde, or density plots
- area
- scatter
- hexbin
- pie

In [None]:
# creating charts from pandas
matrix.plot()

In [None]:
matrix.plot(kind='bar')

In [None]:
matrix.plot(kind='hist')

In [None]:
matrix.plot(kind='box')

In [None]:
matrix.plot(kind='kde')

In [None]:
matrix.plot(kind='area')

In [None]:
matrix.plot(kind='pie', subplots=True)

In [None]:
# Pivote Tables
# --------------
# pandas can emulate piviot tables
# For example let's take a look at the A column
# Perhaps we can pivot A so that the tall table is now wise
# can pass is values='A' to zoom in on a certain value

pd.pivot_table(matrix, columns='A')

In [None]:
matrix

## Can Read in Large Datasets With Pandas!

There's a lot of open datasets available. Here are some resources:
- Kaggle: [https://www.kaggle.com/datasets](https://www.kaggle.com/datasets)
- Azure: [https://azure.microsoft.com/en-us/services/open-datasets](https://azure.microsoft.com/en-us/services/open-datasets/)
-  Data gov: [https://www.data.gov]
- (https://www.data.gov/)
- Awesome public datasets: [https://github.com/awesomedata/awesome-public-datasets](https://github.com/awesomedata/awesome

In [6]:
# downloaded hospital bed capacity dataset
# url: https://www.kaggle.com/mrmorj/hospital-bed-capacity-and-covid19
read = pd.read_csv("data/hrr.csv")
read

NameError: name 'pd' is not defined

In [None]:
read.head()

In [None]:
read.head(10)

In [None]:
read.tail()

In [None]:
read.tail(5)

## This is Not Even Scratching The Surface of Pandas!

Checkout the official resource of pandas: https://pandas.pydata.org/docs 

In [None]:
read.info()

In [None]:
read.describe()

In [None]:
read.plot()

## Thanks for Being a Lovely Audience!

![dancing parrot](dancing_parrot.gif)
