# Pandas Tutorial

Pandas is an open source, BSD-licensed providing high-performace, easy-to-use data structure and data analysis tools for the Python programming language.

### Agenda
- What is a Data Frame?
- What is a Data Series?
- Different operations in Pandas

In [1]:
## First import Pandas library

import pandas as pd
import numpy as np

In [2]:
## Make a dataframe with some data
## 2d data only

df = pd.DataFrame(np.arange(0,20).reshape(5,4), index=['Row1', 'Row2', 'Row3', 'Row4', 'Row5'], columns=['Column1', 'Column2','Column3', 'Column4'])

In [4]:
df.head()

Unnamed: 0,Column1,Column2,Column3,Column4
Row1,0,1,2,3
Row2,4,5,6,7
Row3,8,9,10,11
Row4,12,13,14,15
Row5,16,17,18,19


In [5]:
## Accessing data frames
## 1. .loc() function: row name
## 2. .iloc() function: row index value

df.loc['Row1'] # this is a series

Column1    0
Column2    1
Column3    2
Column4    3
Name: Row1, dtype: int64

In [8]:
df.iloc[0] # when you dont know the row name but know the index place

Column1    0
Column2    1
Column3    2
Column4    3
Name: Row1, dtype: int64

We see that having one column of data results in a series while more than one column gets you a data frame

In [18]:
type(df.iloc[0:2,0])

pandas.core.series.Series

In [19]:
type(df.iloc[0:2, 0:1])

pandas.core.frame.DataFrame

In [9]:
## easy way to convert to other forms of data like csv
df.to_csv('test_df_to_csv.csv')

In [10]:
## check the type 
type(df.loc['Row1'])

pandas.core.series.Series

In [11]:
df.iloc[:,:]

Unnamed: 0,Column1,Column2,Column3,Column4
Row1,0,1,2,3
Row2,4,5,6,7
Row3,8,9,10,11
Row4,12,13,14,15
Row5,16,17,18,19


In [12]:
## Take the elements from row 3
df.iloc[2, :]

Column1     8
Column2     9
Column3    10
Column4    11
Name: Row3, dtype: int64

In [16]:
df.iloc[1:3, :]

Unnamed: 0,Column1,Column2,Column3,Column4
Row2,4,5,6,7
Row3,8,9,10,11


In [13]:
## take the elements from column 4
df.iloc[:,3:]

Unnamed: 0,Column4
Row1,3
Row2,7
Row3,11
Row4,15
Row5,19


In [17]:
df.iloc[1:3, 2:4]

Unnamed: 0,Column3,Column4
Row2,6,7
Row3,10,11


In [22]:
## convert data frame to arrays

df.iloc[:,1:].values

array([[ 1,  2,  3],
       [ 5,  6,  7],
       [ 9, 10, 11],
       [13, 14, 15],
       [17, 18, 19]])

In [25]:
df.iloc[:,1:].values.shape

(5, 3)

In [30]:
## checkinf for nulls and nas

df.isnull().sum()

Column1    0
Column2    0
Column3    0
Column4    0
dtype: int64

In [31]:
df.isna().sum()

Column1    0
Column2    0
Column3    0
Column4    0
dtype: int64

In [26]:
df['Column1'].value_counts()

Column1
0     1
4     1
8     1
12    1
16    1
Name: count, dtype: int64

In [32]:
df['Column1'].unique() # returns an array of the unique values

array([ 0,  4,  8, 12, 16])

In [28]:
# df=pd.read_csv('mercedesbenz.csv')

In [34]:
df.loc[:,['Column1', 'Column4']]

Unnamed: 0,Column1,Column4
Row1,0,3
Row2,4,7
Row3,8,11
Row4,12,15
Row5,16,19


In [35]:
df[['Column1', 'Column4']]

Unnamed: 0,Column1,Column4
Row1,0,3
Row2,4,7
Row3,8,11
Row4,12,15
Row5,16,19
