# Introduction to Pandas

![alt text](http://i1.wp.com/blog.adeel.io/wp-content/uploads/2016/11/pandas1.png?zoom=1.25&fit=818%2C163)

Pandas is an open-source, BSD-licensed Python library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. You can think of pandas as an extremely powerful version of Excel, with a lot more features.

## **About iPython Notebooks**

iPython Notebooks are interactive coding environments embedded in a webpage. You will be using iPython notebooks in this class. You only need to write code between the ### START CODE HERE ### and ### END CODE HERE ### comments. After writing your code, you can run the cell by either pressing "SHIFT"+"ENTER" or by clicking on "Run Cell" (denoted by a play symbol) in the left bar of the cell.

**In this notebook you will learn -**

* Series
* DataFrames
* Missing Data
* GroupBy
* Merging, Joining and Concatenating
* Operations
* Data Input and Output

## Importing Pandas
To import Pandas under the name **pd** write the following:

In [0]:
import numpy as np
import pandas as pd

#Series

The first main data type we will learn about for pandas is the Series data type. 

Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). The axis labels are collectively referred to as the index.

A Series is very similar to a NumPy array (in fact it is built on top of the NumPy array object). What differentiates the NumPy array from a Series, is that a Series can have axis labels, meaning it can be indexed by a label, instead of just a number location. It also doesn't need to hold numeric data, it can hold any arbitrary Python Object.

The basic method to create a Series is to call:

s = pd.Series(data, index=index)

### Creating a Series

You can convert a list,numpy array, or dictionary to a Series:

In [0]:
labels = ['a','b','c']
my_list = [10,20,30]
arr = np.array([10,20,30])
d = {'a':10,'b':10,'c':30}

** Using Lists**

In [3]:
pd.Series(data=my_list)

0    10
1    20
2    30
dtype: int64

In [4]:
pd.Series(data=my_list,index=labels)

a    10
b    20
c    30
dtype: int64

In [5]:
pd.Series(my_list,index=['x','y','z'])

x    10
y    20
z    30
dtype: int64

In [6]:
s = pd.Series(np.random.randn(5), index=['a', 'b', 'c', 'd', 'e'])
s

a   -0.355909
b    0.533525
c    1.445525
d    0.554169
e    1.119680
dtype: float64

** NumPy Arrays **

If data is an ndarray, index must be the same length as data. If no index is passed, one will be created having values [0, ..., len(data) - 1].

In [7]:
pd.Series(arr)

0    10
1    20
2    30
dtype: int64

In [8]:
pd.Series(arr,labels)

a    10
b    20
c    30
dtype: int64

** Dictionary**

In [9]:
pd.Series(d)

a    10
b    10
c    30
dtype: int64

**From scalar value**

If data is a scalar value, an index must be provided. The value will be repeated to match the length of index

In [10]:
pd.Series(3., index=['a', 'b', 'c', 'd', 'e'])

a    3.0
b    3.0
c    3.0
d    3.0
e    3.0
dtype: float64

**Exercise 1.1:**

Create a Pandas series from Numpy array **arr** of odd numbers between 40 to 50.

In [12]:
### START CODE HERE ### 
import numpy as np
import pandas as pd

arr = np.arange(41,51,2)
pd.Series(arr)

### END CODE HERE ###

0    41
1    43
2    45
3    47
4    49
dtype: int64

**Expected Output:**

0    41

1    43

2    45

3    47

4    49

dtype: int64

### Data in a Series

Like a NumPy array, a pandas Series has a dtype.

s.dtype

A pandas Series can hold a variety of object types:

In [0]:
pd.Series(data=labels)

0    a
1    b
2    c
dtype: object

In [13]:
# Even functions (although unlikely that you will use this)
pd.Series([sum,print,len])

0      <built-in function sum>
1    <built-in function print>
2      <built-in function len>
dtype: object

## Using an Index

The key to using a Series is understanding its index. Pandas makes use of these index names or numbers by allowing for fast look ups of information (works like a hash table or dictionary).

Let's see some examples of how to grab information from a Series. Let us create two sereis, ser1 and ser2:

In [0]:
ser1 = pd.Series([1,2,3,4],index = ['USA', 'Germany','USSR', 'Japan'])                                   

In [15]:
ser1

USA        1
Germany    2
USSR       3
Japan      4
dtype: int64

In [0]:
ser2 = pd.Series([1,2,5,4],index = ['USA', 'Germany','Italy', 'Japan'])                                   

In [17]:
ser2

USA        1
Germany    2
Italy      5
Japan      4
dtype: int64

In [18]:
ser1['USA']

1

Operations are then also done based off of index:

In [19]:
ser1 + ser2

Germany    4.0
Italy      NaN
Japan      8.0
USA        2.0
USSR       NaN
dtype: float64

**Exercise 1.2**

Create another pandas series **arr2** of even numbers from 20 to 30 using numpy and add this newly created series with **arr** in the previous exercise.

In [24]:
### START CODE HERE ### 
import numpy as np
import pandas as pd

arr = np.arange(41,51,2)
print(np.arange(41,51,2))
print('\n')
arr2 = np.arange(20,30,2)
print(np.arange(20,30,2))
print('\n')
p1=pd.Series(arr)
p2=pd.Series(arr2)

print(p1+p2)

### END CODE HERE ###

[41 43 45 47 49]


[20 22 24 26 28]


0    61
1    65
2    69
3    73
4    77
dtype: int64


Let's stop here for now and move on to DataFrames, which will expand on the concept of Series!
# Great Job!