# Introduction to Pandas

Pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

There are several topics that needed to be cover in Pandas

* Series
* DataFrames
* Missing Data
* GroupBy
* Merging,Joining,and Concatenating
* Operations
* Data Input and Output

## Series

Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). The axis labels are collectively referred to as the index. The basic method to create a Series is to call:

**s = pd.Series(data, index=index)**

Series are different from Numpy array as they have lableled index and they can hold any Python Object.

In [1]:
import numpy as np
import pandas as pd

### Creating a Series

We can convert a list,numpy array, or dictionary to a Series.

In [2]:
labels = ['Sun', 'Mon','Tue','Wed', 'Thur', 'Fri', 'Sat']
my_list = [1, 2, 3, 4, 5, 6, 7]
arr = np.array([1, 2, 3, 4, 5, 6, 7])
d = {'Sun': 1, 'Mon': 2,'Tue': 3,'Wed': 4, 'Thur': 5, 'Fri': 6, 'Sat': 7}

** Using Lists**

In [3]:
pd.Series(data=my_list)

0    1
1    2
2    3
3    4
4    5
5    6
6    7
dtype: int64

In [4]:
pd.Series(data=my_list,index=labels)

Sun     1
Mon     2
Tue     3
Wed     4
Thur    5
Fri     6
Sat     7
dtype: int64

We can pass data and index in order.

In [5]:
pd.Series(my_list,labels)

Sun     1
Mon     2
Tue     3
Wed     4
Thur    5
Fri     6
Sat     7
dtype: int64

** Using NumPy Arrays **

In [6]:
pd.Series(arr)

0    1
1    2
2    3
3    4
4    5
5    6
6    7
dtype: int64

In [7]:
pd.Series(arr,labels)

Sun     1
Mon     2
Tue     3
Wed     4
Thur    5
Fri     6
Sat     7
dtype: int64

** Using Dictionary**

In [8]:
pd.Series(d)

Fri     6
Mon     2
Sat     7
Sun     1
Thur    5
Tue     3
Wed     4
dtype: int64

### Data in a Series an be anything (integers, strings, floating point numbers, Python objects, etc.)


In [9]:
# Strings
pd.Series(data=labels)

0     Sun
1     Mon
2     Tue
3     Wed
4    Thur
5     Fri
6     Sat
dtype: object

In [10]:
# Python Objects
pd.Series([min, max, print, len])

0      <built-in function min>
1      <built-in function max>
2    <built-in function print>
3      <built-in function len>
dtype: object

## Using an Index

We use series because it's a lot like hash or dictionary. We have index which can be used for fast lookup.

In [11]:
series1 = pd.Series([1, 2, 4, 5], index=['a', 'b', 'd', 'e'])                            

In [12]:
series1

a    1
b    2
d    4
e    5
dtype: int64

In [13]:
series2 = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])

In [14]:
series2

a    1
b    2
c    3
d    4
dtype: int64

In [15]:
series1['a']

1

We can also add Series.

In [16]:
series1 + series2

a    2.0
b    4.0
c    NaN
d    8.0
e    NaN
dtype: float64

Note that when data is missing is one of the series corresponding to a particular index, then the result of addition is NaN(Not a Number)