# Analysis with Pandas

In this section of the course we will learn how to use pandas for data analysis.

* Series
* DataFrames
* Missing Data
* GroupBy
* Merging,Joining,and Concatenating
* Operations
* Data Input and Output

## Series

The first main data type we will learn about for pandas is the **Serie**s data type. Let's import Pandas and explore the Series object.

A Series is very similar to a NumPy array (in fact it is built on top of the NumPy array object). What differentiates the NumPy array from a Series, is that a Series can have axis labels, meaning it can be indexed by a label, instead of just a number location. It also doesn't need to hold numeric data, it can hold any arbitrary Python Object.

Let's explore this concept through some examples:

In [1]:
import numpy as np
import pandas as pd

In [2]:
# from pandas import Series

### Creating a Series

You can convert a list,numpy array, or dictionary to a Series:

In [3]:
labels = ['a','b','c']
my_list = [10,20,30]
arr = np.array([10,20,30, 40])
d = {'a':10,'b':20,'c':30}

** Using Lists**

In [4]:
my_list

[10, 20, 30]

In [5]:
pd.Series(my_list)

0    10
1    20
2    30
dtype: int64

In [6]:
labels

['a', 'b', 'c']

In [5]:
pd.Series(my_list, labels)

a    10
b    20
c    30
dtype: int64

In [7]:
pd.Series(labels, my_list)

10    a
20    b
30    c
dtype: object

In [None]:
pd.Series(data=my_list, index=labels)

In [None]:
labels1 = ['a', 'b', 'c', 'd']

In [8]:
arr

array([10, 20, 30, 40])

In [10]:
pd.Series(data=arr, index = [1, 2, 3, 4])

1    10
2    20
3    30
4    40
dtype: int32

** NumPy Arrays **

In [None]:
Series(arr)

In [None]:
pd.Series(arr)

In [None]:
pd.Series(labels1, arr)

In [None]:
my_list.append(10.2)

In [None]:
my_list

In [None]:
Series(my_list, index=labels1)

In [None]:
pd.Series(index=labels1, data =arr)

** Dictionary**

In [None]:
d

In [None]:
pd.Series(d)

### Data in a Series

A pandas Series can hold a variety of object types:

In [None]:
pd.Series(data=labels)

In [None]:
# Even functions (although unlikely that you will use this)
pd.Series([sum,print,len])

## Using an Index

The key to using a Series is understanding its index. Pandas makes use of these index names or numbers by allowing for fast look ups of information (works like a hash table or dictionary).

Let's see some examples of how to grab information from a Series. Let us create two sereis, ser1 and ser2:

In [None]:
labels2 = ['USA', 'Germany','USSR', 'Japan']

In [None]:
labels2[0]

In [None]:
labels2[0:3:2]

In [None]:
ser1 = pd.Series([1,2,3,4],index = ['USA', 'Germany','USSR', 'Japan'])                                   

In [None]:
ser1['USA':'Japan']

In [None]:
ser1+ser6

In [None]:
ser6 = pd.Series([1,2 ,3,4])
ser6

In [None]:
np.arange(0, 20, 2)

In [None]:
ser6[0:3:2]

In [None]:
ser6

In [None]:
ser5 = Series([1,2 ,3, 4])
ser5

In [None]:
(ser5+ser6)*2

In [None]:
my_list

In [None]:
(my_list+my_list)*2

In [None]:
ser2 = pd.Series([1,2,5,4],index = ['USA', 'Germany','Italy', 'Japan'])                                   

In [None]:
ser1['USA']

Operations are then also done based off of index:

In [None]:
ser1 + ser2