# Pandas Tutorial

Pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

Agenda

- What is Data Frames?
- What is Data Series?
- Different operation in Pandas

In [1]:
import pandas as pd
import numpy as np

In [2]:
# Playing with Dataframe
df = pd.DataFrame(np.arange(0,20).reshape(5,4), index = ['Row1', 'Row2', 'Row3', 'Row4', 'Row5'], columns = ["Column1", "Column2", "Column3", "Column4"])

In [3]:
df.head()

Unnamed: 0,Column1,Column2,Column3,Column4
Row1,0,1,2,3
Row2,4,5,6,7
Row3,8,9,10,11
Row4,12,13,14,15
Row5,16,17,18,19


In [4]:
df.to_csv('Test1.csv')

In [5]:
df['Column3']

Row1     2
Row2     6
Row3    10
Row4    14
Row5    18
Name: Column3, dtype: int32

In [6]:
df[['Column3', 'Column4']]

Unnamed: 0,Column3,Column4
Row1,2,3
Row2,6,7
Row3,10,11
Row4,14,15
Row5,18,19


In [7]:
# Accessing the elements

# There are two ways to access the elements
# 1. -> .loc (row index)
# 2. -> .iloc (index location - Both row and column indexes)

df.loc['Row1']

Column1    0
Column2    1
Column3    2
Column4    3
Name: Row1, dtype: int32

In [8]:
type(df.loc['Row1'])

# Data Frames - more than one row and one column
# Data Series - can be one row or one column

pandas.core.series.Series

In [9]:
df.iloc[:,:]

Unnamed: 0,Column1,Column2,Column3,Column4
Row1,0,1,2,3
Row2,4,5,6,7
Row3,8,9,10,11
Row4,12,13,14,15
Row5,16,17,18,19


In [10]:
type(df.iloc[:,:])

pandas.core.frame.DataFrame

In [11]:
df.iloc[:,0]

Row1     0
Row2     4
Row3     8
Row4    12
Row5    16
Name: Column1, dtype: int32

In [12]:
type(df.iloc[:,0])

pandas.core.series.Series

In [13]:
# Convert Data Frames into array

df.iloc[:,1:].values

array([[ 1,  2,  3],
       [ 5,  6,  7],
       [ 9, 10, 11],
       [13, 14, 15],
       [17, 18, 19]])

In [14]:
df.iloc[:,1:].values.shape

(5, 3)

In [15]:
df.isnull().sum()

Column1    0
Column2    0
Column3    0
Column4    0
dtype: int64

In [16]:
df

Unnamed: 0,Column1,Column2,Column3,Column4
Row1,0,1,2,3
Row2,4,5,6,7
Row3,8,9,10,11
Row4,12,13,14,15
Row5,16,17,18,19


In [17]:
# Unique Values

df['Column1'].value_counts()

0     1
4     1
8     1
12    1
16    1
Name: Column1, dtype: int64

In [18]:
df['Column1'].unique()

array([ 0,  4,  8, 12, 16])

In [19]:
df = pd.read_csv('mercedesbenz.csv')

In [20]:
df.head()

Unnamed: 0,ID,y,X0,X1,X2,X3,X4,X5,X6,X8,...,X375,X376,X377,X378,X379,X380,X382,X383,X384,X385
0,0,130.81,k,v,at,a,d,u,j,o,...,0,0,1,0,0,0,0,0,0,0
1,6,88.53,k,t,av,e,d,y,l,o,...,1,0,0,0,0,0,0,0,0,0
2,7,76.26,az,w,n,c,d,x,j,x,...,0,0,0,0,0,0,1,0,0,0
3,9,80.62,az,t,n,f,d,x,l,e,...,0,0,0,0,0,0,0,0,0,0
4,13,78.02,az,v,n,f,d,h,d,n,...,0,0,0,0,0,0,0,0,0,0


- CSV => Comma Separated Values