# Pandas

- Pandas is a library for Data Analysis
- It is extremely powerful table (DataFrame) system built off of NumPy
- Fantastic documentation: https://pandas.pydata.org/docs/

What can we do with Pandas?
- Tools for reading and writing data between many formats (csv, excel, sql database, html tables)
- intelligently grab data based on indexing, logic, subsetting, and more.
- Handle missing data
- Adjust and restructure data


## Series
- A data structure in Pandas that holds an array of information along with a named index
- The named index differentiates this from a simple NumPy array
- Formal Defenition: One-dimensional Ndarra with axis labels

Import Pandas and NumPy

In [1]:
import numpy as np

In [2]:
import pandas as pd

In [3]:
# create python list
myindex = ['USA', 'Canada', 'Mexico']

In [4]:
mydata = [1776, 1867, 1821]

In [8]:
myser = pd.Series(data=mydata, index=myindex)

In [9]:
myser

USA       1776
Canada    1867
Mexico    1821
dtype: int64

In [10]:
type(myser)

pandas.core.series.Series

In [32]:
# Create an array of 4 random integer numbers between 0 and 100 
ran_data = np.random.randint(0,100,4)

In [33]:
ran_data

array([88, 10, 11, 45])

In [34]:
names = ['Andrew','Bobo','Claire','David']

In [35]:
# Combine data and names together in ages
ages = pd.Series(ran_data,names)

In [36]:
ages

Andrew    88
Bobo      10
Claire    11
David     45
dtype: int64

In [12]:
# Grab with index number
myser[0]

1776

In [14]:
# Or with index label
myser['USA']

1776

In [15]:
# Create a python dictionary
ages = {'Sam':5, 'Frank':10, 'Spike':7}

In [16]:
pd.Series(ages)

Sam       5
Frank    10
Spike     7
dtype: int64

In [17]:
# Imaginary Sales Data for 1st and 2nd Quarters for Global Company
q1 = {'Japan': 80, 'China': 450, 'India': 200, 'USA': 250}
q2 = {'Brazil': 100,'China': 500, 'India': 210,'USA': 260}

In [21]:
sales_q1 = pd.Series(q1)

In [22]:
sales_q2 = pd.Series(q2)

In [23]:
sales_q1

Japan     80
China    450
India    200
USA      250
dtype: int64

In [24]:
sales_q2

Brazil    100
China     500
India     210
USA       260
dtype: int64

In [25]:
# Grab
sales_q1['Japan']

80

In [26]:
sales_q1.keys()

Index(['Japan', 'China', 'India', 'USA'], dtype='object')

In [27]:
sales_q1 * 2

Japan    160
China    900
India    400
USA      500
dtype: int64

In [30]:
sales_q1

Japan     80
China    450
India    200
USA      250
dtype: int64

In [31]:
sales_q2

Brazil    100
China     500
India     210
USA       260
dtype: int64

In [28]:
# if you want to see the half year result (Brazil and Japan are not in both quaters)
sales_q1 + sales_q2

Brazil      NaN
China     950.0
India     410.0
Japan       NaN
USA       510.0
dtype: float64

In [29]:
# to prevent the NaN
sales_q1.add(sales_q2, fill_value=0)  

Brazil    100.0
China     950.0
India     410.0
Japan      80.0
USA       510.0
dtype: float64