# Pandas

In this section of the course we will learn how to use pandas for data analysis. You can think of pandas as an extremely powerful version of Excel, with a lot more features. In this section of the course, you should go through the notebooks in this order:

* Series
* DataFrames
* Missing Data
* GroupBy
* Merging,Joining,and Concatenating
* Operations
* Data Input and Output

In [2]:
import pandas as pd
import numpy as np

# Series

A Series is very similar to a NumPy array (in fact it is built on top of the NumPy array object). What differentiates the NumPy array from a Series, is that a Series can have axis labels, meaning it can be indexed by a label, instead of just a number location. It also doesn't need to hold numeric data, it can hold any arbitrary Python Object.

In [3]:
#first column is index 
#second column value
pd.Series([10,88,3,4,5])

0    10
1    88
2     3
3     4
4     5
dtype: int64

In [4]:
seri = pd.Series([10,88,3,4,5])
#type(seri)
seri.index = ['a','b','c','d','e']

In [6]:
print(seri)

a    10
b    88
c     3
d     4
e     5
dtype: int64


In [7]:
seri.ndim

1

In [10]:
seri.dtype

dtype('int64')

In [12]:
seri.size

5

In [14]:
i = np.random.randint(1,3,5)
seri1 = pd.Series([99,23,76,2323,98], index = i)
seri1

3      99
4      23
1      76
8    2323
8      98
dtype: int64

In [16]:
seri1[8]

8    2323
8      98
dtype: int64

### Creating a Series

You can convert a list,numpy array, or dictionary to a Series:

In [17]:
seri2 = pd.Series([99,23,76,2323,98], index = ["a","b","c","d","e"])
seri2

a      99
b      23
c      76
d    2323
e      98
dtype: int64

In [19]:
seri2["a"]

99

In [21]:
seri2["a":"c"]

a    99
b    23
c    76
dtype: int64

In [23]:
import numpy as np

arr = np.array([10,20,30])
pd.Series(arr)

0    10
1    20
2    30
dtype: int32

In [25]:
labels = ["a","b","c"]
pd.Series(arr,labels)

a    10
b    20
c    30
dtype: int32

In [27]:
#Create a dictinary
dic1 = {"reg":10, "log":11,"cart":12}

In [29]:
series = pd.Series(dic1)

In [31]:
series

reg     10
log     11
cart    12
dtype: int64

### Data in a Series

A pandas Series can hold a variety of object types:

In [46]:
#this method just uses to access indexes.
series.index

Index(['reg', 'log', 'cart'], dtype='object')

In [48]:
#it can be used like dictionary method.
list(series.items())

[('reg', 10), ('log', 11), ('cart', 12)]

In [50]:
series.values

array([10, 11, 12], dtype=int64)

In [52]:
#this method just uses to access keys.
series.keys

<bound method Series.keys of reg     10
log     11
cart    12
dtype: int64>

In [60]:
"log" in series

True

In [62]:
"a" in seri

False

In [64]:
seri["log"] = 130
seri["log"]

130


# Creating DataFrame 

In [66]:
#NumPy cannot keep categorical and numeric data together. That's why we need a Pandas.
l = [1,2,23,345,7,8,3]
l

[1, 2, 23, 345, 7, 8, 3]

In [68]:
pd.DataFrame(l,columns = ["numbers"])

Unnamed: 0,numbers
0,1
1,2
2,23
3,345
4,7
5,8
6,3


In [70]:
from numpy.random import randn


df = pd.DataFrame(randn(5,4),index='A B C D E'.split(),columns='W X Y Z'.split())
df

Unnamed: 0,W,X,Y,Z
A,1.012633,-1.472837,-0.138886,-0.377776
B,2.69281,0.03852,0.785329,-2.43407
C,0.331398,0.480349,0.978148,1.612179
D,-0.64702,0.147289,1.549425,-0.4102
E,0.863492,-0.579668,-0.218061,0.209027


In [72]:
m = np.arange(1,10).reshape((3,3))
m

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

## Rename Columns and Rows

In [75]:
df =pd.DataFrame(m, columns=["var1","var2","var3"])
df.head()

Unnamed: 0,var1,var2,var3
0,1,2,3
1,4,5,6
2,7,8,9


In [77]:
df.columns

Index(['var1', 'var2', 'var3'], dtype='object')

In [79]:
df.columns = ["deg1","deg2","deg3"]

In [81]:
df

Unnamed: 0,deg1,deg2,deg3
0,1,2,3
1,4,5,6
2,7,8,9


In [83]:
df.index

RangeIndex(start=0, stop=3, step=1)

In [85]:
df

Unnamed: 0,deg1,deg2,deg3
0,1,2,3
1,4,5,6
2,7,8,9


In [87]:
df.index = ['a','b','c']

In [89]:
df

Unnamed: 0,deg1,deg2,deg3
a,1,2,3
b,4,5,6
c,7,8,9


In [91]:
df.describe()

Unnamed: 0,deg1,deg2,deg3
count,3.0,3.0,3.0
mean,4.0,5.0,6.0
std,3.0,3.0,3.0
min,1.0,2.0,3.0
25%,2.5,3.5,4.5
50%,4.0,5.0,6.0
75%,5.5,6.5,7.5
max,7.0,8.0,9.0


In [93]:
df.T

Unnamed: 0,a,b,c
deg1,1,4,7
deg2,2,5,8
deg3,3,6,9


In [95]:
type(df)

pandas.core.frame.DataFrame

In [97]:
df.axes

[Index(['a', 'b', 'c'], dtype='object'),
 Index(['deg1', 'deg2', 'deg3'], dtype='object')]

In [99]:
df.shape

(3, 3)

In [101]:
df.ndim

2

In [103]:
df.size

9

In [105]:
df.values

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [107]:
type(df.values)

numpy.ndarray

In [109]:
df.head()

Unnamed: 0,deg1,deg2,deg3
a,1,2,3
b,4,5,6
c,7,8,9


In [111]:
df.tail(1)

Unnamed: 0,deg1,deg2,deg3
c,7,8,9


In [115]:
array_2d = np.arange(0,100).reshape(10,10)
index = 'r1 r2 r3 r4 r5 r6 r7 r8 r9 r10'.split()
columns = 'c1 c2 c3 c4 c5 c6 c7 c8 c9 c10'.split()
df = pd.DataFrame(data = array_2d, index = index, columns = columns)
df['newind']=['a','b','c','d','e','f','g','h','i','j'] 
df

Unnamed: 0,c1,c2,c3,c4,c5,c6,c7,c8,c9,c10,newind
r1,0,1,2,3,4,5,6,7,8,9,a
r2,10,11,12,13,14,15,16,17,18,19,b
r3,20,21,22,23,24,25,26,27,28,29,c
r4,30,31,32,33,34,35,36,37,38,39,d
r5,40,41,42,43,44,45,46,47,48,49,e
r6,50,51,52,53,54,55,56,57,58,59,f
r7,60,61,62,63,64,65,66,67,68,69,g
r8,70,71,72,73,74,75,76,77,78,79,h
r9,80,81,82,83,84,85,86,87,88,89,i
r10,90,91,92,93,94,95,96,97,98,99,j


In [117]:
df.reset_index(inplace = True)
df

Unnamed: 0,index,c1,c2,c3,c4,c5,c6,c7,c8,c9,c10,newind
0,r1,0,1,2,3,4,5,6,7,8,9,a
1,r2,10,11,12,13,14,15,16,17,18,19,b
2,r3,20,21,22,23,24,25,26,27,28,29,c
3,r4,30,31,32,33,34,35,36,37,38,39,d
4,r5,40,41,42,43,44,45,46,47,48,49,e
5,r6,50,51,52,53,54,55,56,57,58,59,f
6,r7,60,61,62,63,64,65,66,67,68,69,g
7,r8,70,71,72,73,74,75,76,77,78,79,h
8,r9,80,81,82,83,84,85,86,87,88,89,i
9,r10,90,91,92,93,94,95,96,97,98,99,j


In [119]:
df.set_index('newind', inplace = True)
df

Unnamed: 0_level_0,index,c1,c2,c3,c4,c5,c6,c7,c8,c9,c10
newind,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
a,r1,0,1,2,3,4,5,6,7,8,9
b,r2,10,11,12,13,14,15,16,17,18,19
c,r3,20,21,22,23,24,25,26,27,28,29
d,r4,30,31,32,33,34,35,36,37,38,39
e,r5,40,41,42,43,44,45,46,47,48,49
f,r6,50,51,52,53,54,55,56,57,58,59
g,r7,60,61,62,63,64,65,66,67,68,69
h,r8,70,71,72,73,74,75,76,77,78,79
i,r9,80,81,82,83,84,85,86,87,88,89
j,r10,90,91,92,93,94,95,96,97,98,99
