# Playing With Data

Here we are going to use the iris dataset and apply some <i>pandas</i> magic on it.

## But!!! What is pandas??

Pandas is an open source library providing high-performance, easy data structures and data analysis tools for our lovely python. Now lets import it asap and see what it can do.

In [1]:
import pandas as pd # pd is pandas favourite nickname :->

To check the version of any library in python we can do...

In [2]:
print("Pandas Version: ",pd.__version__)

Pandas Version:  0.23.0


## Getting our data

Hey wait... I have downloaded the csv file but how am I going to use it here? <br>
Your panda is there to make your life easier

In [3]:
dataframe = pd.read_csv('iris.csv')

Hey wait... Ahhhh.. what is a <b>dataframe</b>? <br>
It is a 2-dimensional labeled data structure with columns of potentially different types... or you can say its a cool word for your everyday spreadsheet format.

Now lets have a look at the dataframe

In [4]:
dataframe

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1,5.1,3.5,1.4,0.2,Iris-setosa
1,2,4.9,3.0,1.4,0.2,Iris-setosa
2,3,4.7,3.2,1.3,0.2,Iris-setosa
3,4,4.6,3.1,1.5,0.2,Iris-setosa
4,5,5.0,3.6,1.4,0.2,Iris-setosa
5,6,5.4,3.9,1.7,0.4,Iris-setosa
6,7,4.6,3.4,1.4,0.3,Iris-setosa
7,8,5.0,3.4,1.5,0.2,Iris-setosa
8,9,4.4,2.9,1.4,0.2,Iris-setosa
9,10,4.9,3.1,1.5,0.1,Iris-setosa


Isn't it annoying to scroll all the way to the end.. Dont worry your pal pandas is here

In [5]:
dataframe.head() # It shows the first five rows of the df

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1,5.1,3.5,1.4,0.2,Iris-setosa
1,2,4.9,3.0,1.4,0.2,Iris-setosa
2,3,4.7,3.2,1.3,0.2,Iris-setosa
3,4,4.6,3.1,1.5,0.2,Iris-setosa
4,5,5.0,3.6,1.4,0.2,Iris-setosa


In [6]:
dataframe.tail() # It shows the last five rows of the df

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
145,146,6.7,3.0,5.2,2.3,Iris-virginica
146,147,6.3,2.5,5.0,1.9,Iris-virginica
147,148,6.5,3.0,5.2,2.0,Iris-virginica
148,149,6.2,3.4,5.4,2.3,Iris-virginica
149,150,5.9,3.0,5.1,1.8,Iris-virginica


In case you want some more

In [7]:
dataframe.head(10) # You can specify the number of rows you want to see

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1,5.1,3.5,1.4,0.2,Iris-setosa
1,2,4.9,3.0,1.4,0.2,Iris-setosa
2,3,4.7,3.2,1.3,0.2,Iris-setosa
3,4,4.6,3.1,1.5,0.2,Iris-setosa
4,5,5.0,3.6,1.4,0.2,Iris-setosa
5,6,5.4,3.9,1.7,0.4,Iris-setosa
6,7,4.6,3.4,1.4,0.3,Iris-setosa
7,8,5.0,3.4,1.5,0.2,Iris-setosa
8,9,4.4,2.9,1.4,0.2,Iris-setosa
9,10,4.9,3.1,1.5,0.1,Iris-setosa


Now how do I know the number of rows of the dataset?

In [8]:
print("Rows: ", dataframe.shape[0]) # df.shape returns (rows, columns) so df.shape[0] = dim of rows
# Or
print("Rows: ", len(dataframe))
# Columns
print("Columns: ", dataframe.shape[1])

Rows:  150
Rows:  150
Columns:  6


Get the name of all the columns

In [9]:
dataframe.columns

Index(['Id', 'SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm',
       'Species'],
      dtype='object')

In [10]:
dataframe.columns[3] # If you wanna be column specific

'PetalLengthCm'

Get the value at nth row and mth column<br>
<code>df.value[n][m]</code>

In [11]:
dataframe.values[50][3]

4.7

Now lets split our data by<br>
- Column wise
- Row wise

<b>Column wise</b><br>
Lets say we want the Species column to be separated from the dataset and store it in another variable

In [39]:
species = dataframe.iloc[:, -1:]
features = dataframe.iloc[:, :-1]

In [40]:
dataframe.head()

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1,5.1,3.5,1.4,0.2,Iris-setosa
1,2,4.9,3.0,1.4,0.2,Iris-setosa
2,3,4.7,3.2,1.3,0.2,Iris-setosa
3,4,4.6,3.1,1.5,0.2,Iris-setosa
4,5,5.0,3.6,1.4,0.2,Iris-setosa


In [41]:
features.head()

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm
0,1,5.1,3.5,1.4,0.2
1,2,4.9,3.0,1.4,0.2
2,3,4.7,3.2,1.3,0.2
3,4,4.6,3.1,1.5,0.2
4,5,5.0,3.6,1.4,0.2


In [44]:
species.head()

Unnamed: 0,Species
0,Iris-setosa
1,Iris-setosa
2,Iris-setosa
3,Iris-setosa
4,Iris-setosa


In [25]:
print(species.unique())
print(set(species))
print(dataframe['Species'].unique())

['Iris-setosa' 'Iris-versicolor' 'Iris-virginica']
{'Iris-setosa', 'Iris-virginica', 'Iris-versicolor'}
['Iris-setosa' 'Iris-versicolor' 'Iris-virginica']


Say I want to convert it to a list

In [17]:
list_species = list(species)
print(list_species)

['Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor', 'I

<b>Row Wise</b>