## Clean & Organize Data / Exploratory Data Analysis


##### * Series and DataFrames
##### * Conditional Filtering and Useful Methods
##### * Missing Data
##### * Group By Operation
##### * Combining DataFrames
##### * Text Methods and Time Methods
##### * Inputs and Outputs

### Series Part 1

######  * A series is a data structure in Pandas that holds an array of information along with a named index
######  * The named index differentiates this from a simple NumPy array
######  * Formal Definition: One- dimensional ndarray with axis labels

In [1]:
import numpy as np
import pandas as pd

In [2]:
# help(pd.Series)

In [3]:
my_index = ["USA", "Canada", "Mexico"]
my_data  = [1776, 1867, 1821]
my_ser   = pd.Series(data=my_data, index=my_index)      # pd.Series gives back the data organized internally

In [4]:
my_ser

USA       1776
Canada    1867
Mexico    1821
dtype: int64

In [5]:
my_ser[0]       # gives back the first data
my_ser['USA']   # returns the same value as above

1776

In [6]:
ages = {'Sam':5, 'Frank':10, 'Spike':7}

In [7]:
pd.Series(ages)

Sam       5
Frank    10
Spike     7
dtype: int64

## Series Part II

In [10]:
# Imaginary Sales Data for 1st and 2nd Quarters for Global Company
q1 = {'Japan': 80, 'China': 450, 'India': 200, 'USA': 250}
q2 = {'Brazil': 100,'China': 500, 'India': 210,'USA': 260}

In [11]:
sales_q1 = pd.Series(q1)
sales_q2 = pd.Series(q2)

In [12]:
sales_q1

Japan     80
China    450
India    200
USA      250
dtype: int64

In [13]:
sales_q2

Brazil    100
China     500
India     210
USA       260
dtype: int64

In [14]:
sales_q1['Japan']

80

In [16]:
sales_q1.keys()             # one easy way of finding out the keys

Index(['Japan', 'China', 'India', 'USA'], dtype='object')

In [18]:
np.array([1, 2]) * 2 

array([2, 4])

In [20]:
sales_q1 * 2              # since data and index are stored as a np.array, we can have operations done on it 

Japan    160
China    900
India    400
USA      500
dtype: int64

In [21]:
sales_q2 / 100

Brazil    1.0
China     5.0
India     2.1
USA       2.6
dtype: float64

In [23]:
sales_q1 + sales_q2        # gives back the missing values as NaN

Brazil      NaN
China     950.0
India     410.0
Japan       NaN
USA       510.0
dtype: float64

In [26]:
sales_q1.add(sales_q2, fill_value=0)    # add the two dict together while assigning 0s to the missing values

Brazil    100.0
China     950.0
India     410.0
Japan      80.0
USA       510.0
dtype: float64