# Introduction to Pandas
+ Pandas is an open sourse library built on top of NumPy.
+ It allows for fast analysis and data cleaning and preparation.
+ It excels in performance and productivity.
+ It also has built-in visualization features.
+ It can work with data from a wide variety of sources.

In Pandas we will learn about:
+ Series
+ DataFrames
+ Missing Data
+ GroupBy
+ Merging, Joining, and Concatenating
+ Operations
+ Data Input and Output

<hr>

# Series
Series is a a Pandas data type. It is similar to a NumPy array.
The difference between NumPy array and Series, is that a Series can have axis labels, meaning it can be indexed by a label, instead of just a number location. It also doesn't need to hold numeric data, it can hold any arbitrary Python Object.

In [2]:
import numpy as np
import pandas as pd

## Creating a Series
We can covert a list, numpy array, or dictionary to a Series:


In [3]:
labels = ['a','b','c']
my_list = [10,20,30]
arr = np.array([10,20,30])
d = {'a':10, 'b':20, 'c':30}

**Using lists**

In [4]:
pd.Series(data=my_list)

0    10
1    20
2    30
dtype: int64

In [5]:
pd.Series(data=my_list, index=labels)

a    10
b    20
c    30
dtype: int64

In [6]:
pd.Series(my_list, labels)

a    10
b    20
c    30
dtype: int64

**NumPy arrays**

In [7]:
pd.Series(arr)

0    10
1    20
2    30
dtype: int32

In [12]:
pd.Series(arr, labels)

a    10
b    20
c    30
dtype: int32

**Dictionary**

In [13]:
pd.Series(d)

a    10
b    20
c    30
dtype: int64

## Data in a Series
A pandas Series can hold a variety of object types:

In [14]:
pd.Series(data=labels)

0    a
1    b
2    c
dtype: object

In [16]:
# it can even store references to python functions
pd.Series(data=[sum,len,len])

0    <built-in function sum>
1    <built-in function len>
2    <built-in function len>
dtype: object

## Using an Index
Pandas makes use of index names or numbers by allowing for fast look ups of information (sililar to a hash map or dictionary).

In [19]:
ser1 = pd.Series([1,2,3,4], index=['one', 'two', 'three', 'four'])
ser1

one      1
two      2
three    3
four     4
dtype: int64

In [20]:
ser2 = pd.Series([1,2,5,4], index=['one', 'two', 'five', 'four'])
ser2

one     1
two     2
five    5
four    4
dtype: int64

In [21]:
ser1['two']

2

Operations are then also based on indexes:

In [22]:
ser1 + ser2

five     NaN
four     8.0
one      2.0
three    NaN
two      4.0
dtype: float64

<hr>