## About the dataset

This is perhaps the best known database to be found in the pattern recognition literature. The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant.  One class is linearly separable from the other 2; the latter are NOT linearly separable from each other.

Attribute Information:
   1. sepal length in cm
   2. sepal width in cm
   3. petal length in cm
   4. petal width in cm
   5. class: 
      -- Iris Setosa
      -- Iris Versicolour
      -- Iris Virginica

### Read the dataset and store it in the dataframe named Iris

In [1]:
import numpy as np
import pandas as pd

In [2]:
Iris = pd.read_csv('iris.csv')

### Find out the datatypes of each and every column

In [14]:
Iris.dtypes

Sepal Length (in cm)    float64
Sepal Width in (cm)     float64
Petal length (in cm)    float64
Petal width (in cm)     float64
Class                    object
dtype: object

### Print top 10 & bottom 10 samples from the dataframe

In [15]:
Iris[0:10]

Unnamed: 0,Sepal Length (in cm),Sepal Width in (cm),Petal length (in cm),Petal width (in cm),Class
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa
5,5.4,3.9,1.7,0.4,Iris-setosa
6,4.6,3.4,1.4,0.3,Iris-setosa
7,5.0,3.4,1.5,0.2,Iris-setosa
8,4.4,2.9,1.4,0.2,Iris-setosa
9,4.9,3.1,1.5,0.1,Iris-setosa


In [28]:
row_cnt = Iris.shape[0]
start_idx = row_cnt - 10
start_idx
Iris[start_idx:row_cnt]

Unnamed: 0,Sepal Length (in cm),Sepal Width in (cm),Petal length (in cm),Petal width (in cm),Class
140,6.7,3.1,5.6,2.4,Iris-virginica
141,6.9,3.1,5.1,2.3,Iris-virginica
142,5.8,2.7,5.1,1.9,Iris-virginica
143,6.8,3.2,5.9,2.3,Iris-virginica
144,6.7,3.3,5.7,2.5,Iris-virginica
145,6.7,3.0,5.2,2.3,Iris-virginica
146,6.3,2.5,5.0,1.9,Iris-virginica
147,6.5,3.0,5.2,2.0,Iris-virginica
148,6.2,3.4,5.4,2.3,Iris-virginica
149,5.9,3.0,5.1,1.8,Iris-virginica


### Find the shape of the dataset

In [30]:
Iris.shape

(150, 5)

### Set the index of the dataframe to be the first column

In [41]:
Iris.index = Iris[Iris.columns[0]]

### Use Iloc function to print all the rows of the 3rd column 

In [79]:
Iris.iloc[0:150, 2:3]

Unnamed: 0_level_0,Petal length (in cm)
Sepal Length (in cm),Unnamed: 1_level_1
5.1,1.4
4.9,1.4
4.7,1.3
4.6,1.5
5.0,1.4
5.4,1.7
4.6,1.4
5.0,1.5
4.4,1.4
4.9,1.5


### Slicing
Print only the Sepal width and Sepal Length for first 10 rows 

In [93]:
Iris.iloc[0:10, 1:2]

Unnamed: 0_level_0,Sepal Width in (cm)
Sepal Length (in cm),Unnamed: 1_level_1
5.1,3.5
4.9,3.0
4.7,3.2
4.6,3.1
5.0,3.6
5.4,3.9
4.6,3.4
5.0,3.4
4.4,2.9
4.9,3.1


### Using Logical statements for indexing
Print all the columns of row which has class name "Iris-setosa"

In [119]:
Iris.loc[Iris['Class'] == "Iris-setosa"]

Unnamed: 0_level_0,Sepal Length (in cm),Sepal Width in (cm),Petal length (in cm),Petal width (in cm),Class
Sepal Length (in cm),Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
5.1,5.1,3.5,1.4,0.2,Iris-setosa
4.9,4.9,3.0,1.4,0.2,Iris-setosa
4.7,4.7,3.2,1.3,0.2,Iris-setosa
4.6,4.6,3.1,1.5,0.2,Iris-setosa
5.0,5.0,3.6,1.4,0.2,Iris-setosa
5.4,5.4,3.9,1.7,0.4,Iris-setosa
4.6,4.6,3.4,1.4,0.3,Iris-setosa
5.0,5.0,3.4,1.5,0.2,Iris-setosa
4.4,4.4,2.9,1.4,0.2,Iris-setosa
4.9,4.9,3.1,1.5,0.1,Iris-setosa


### Multiply Sepal Length and width and store it under the column name "SepalExtra" in the same Iris dataframe

In [145]:
sLength = Iris.index.size
randSeries = np.random.randn(sLength)
randSeries[0:150] = 0
SepalExtra = pd.Series(randSeries)
Iris = Iris.assign(SepalExtra = SepalExtra.values)
Iris['SepalExtra'] = Iris['Sepal Length (in cm)'] * Iris['Sepal Width in (cm)']
Iris

Unnamed: 0_level_0,Sepal Length (in cm),Sepal Width in (cm),Petal length (in cm),Petal width (in cm),Class,SepalExtra
Sepal Length (in cm),Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
5.1,5.1,3.5,1.4,0.2,Iris-setosa,17.85
4.9,4.9,3.0,1.4,0.2,Iris-setosa,14.70
4.7,4.7,3.2,1.3,0.2,Iris-setosa,15.04
4.6,4.6,3.1,1.5,0.2,Iris-setosa,14.26
5.0,5.0,3.6,1.4,0.2,Iris-setosa,18.00
5.4,5.4,3.9,1.7,0.4,Iris-setosa,21.06
4.6,4.6,3.4,1.4,0.3,Iris-setosa,15.64
5.0,5.0,3.4,1.5,0.2,Iris-setosa,17.00
4.4,4.4,2.9,1.4,0.2,Iris-setosa,12.76
4.9,4.9,3.1,1.5,0.1,Iris-setosa,15.19


### Find out the mean and variance for each column but for class column 

In [167]:
cols = Iris.columns
desiredCols = ['Sepal Length (in cm)', 'Sepal Width in (cm)', 'Petal length (in cm)',
       'Petal width (in cm)', 'SepalExtra']
np.mean(Iris[desiredCols])

Sepal Length (in cm)     5.843333
Sepal Width in (cm)      3.054000
Petal length (in cm)     3.758667
Petal width (in cm)      1.198667
SepalExtra              17.806533
dtype: float64

In [169]:
np.var(Iris[desiredCols])

Sepal Length (in cm)     0.681122
Sepal Width in (cm)      0.186751
Petal length (in cm)     3.092425
Petal width (in cm)      0.578532
SepalExtra              11.272436
dtype: float64

### Write a function that accepts two numbers as input and prints them - Pass the Sepal length and sepal width of 5th row and print the output

In [233]:
def printTwoInputs(x,y):
    print("Input1:",x, end='\n')
    print("Input2:",y, end='\n')

printTwoInputs(Iris.iloc[4]['Sepal Length (in cm)'], Iris.iloc[4]['Sepal Width in (cm)'])


Input1: 5.0
Input2: 3.6


### Find the range of all the columns in the dataset

*Range = Max value - Min value (in the column)*

In [238]:
Iris[desiredCols].max() - Iris[desiredCols].min()

Sepal Length (in cm)     3.60
Sepal Width in (cm)      2.40
Petal length (in cm)     5.90
Petal width (in cm)      2.40
SepalExtra              20.02
dtype: float64

### Sort the entire dataset according to the column Petal width

In [251]:
Iris.sort_values('Petal width (in cm)')

Unnamed: 0_level_0,Sepal Length (in cm),Sepal Width in (cm),Petal length (in cm),Petal width (in cm),Class,SepalExtra
Sepal Length (in cm),Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
5.2,5.2,4.1,1.5,0.1,Iris-setosa,21.32
4.3,4.3,3.0,1.1,0.1,Iris-setosa,12.90
4.9,4.9,3.1,1.5,0.1,Iris-setosa,15.19
4.9,4.9,3.1,1.5,0.1,Iris-setosa,15.19
4.8,4.8,3.0,1.4,0.1,Iris-setosa,14.40
4.9,4.9,3.1,1.5,0.1,Iris-setosa,15.19
5.1,5.1,3.5,1.4,0.2,Iris-setosa,17.85
5.2,5.2,3.5,1.5,0.2,Iris-setosa,18.20
5.2,5.2,3.4,1.4,0.2,Iris-setosa,17.68
4.7,4.7,3.2,1.6,0.2,Iris-setosa,15.04


### Remove the new column "SepalExtra" from the dataframe

In [254]:
Iris.drop(columns = ['SepalExtra'])

Unnamed: 0_level_0,Sepal Length (in cm),Sepal Width in (cm),Petal length (in cm),Petal width (in cm),Class
Sepal Length (in cm),Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
5.1,5.1,3.5,1.4,0.2,Iris-setosa
4.9,4.9,3.0,1.4,0.2,Iris-setosa
4.7,4.7,3.2,1.3,0.2,Iris-setosa
4.6,4.6,3.1,1.5,0.2,Iris-setosa
5.0,5.0,3.6,1.4,0.2,Iris-setosa
5.4,5.4,3.9,1.7,0.4,Iris-setosa
4.6,4.6,3.4,1.4,0.3,Iris-setosa
5.0,5.0,3.4,1.5,0.2,Iris-setosa
4.4,4.4,2.9,1.4,0.2,Iris-setosa
4.9,4.9,3.1,1.5,0.1,Iris-setosa


### Print only the rows which has the class to be "Iris-setosa"

In [256]:
Iris.loc[Iris['Class'] == "Iris-setosa"]

Unnamed: 0_level_0,Sepal Length (in cm),Sepal Width in (cm),Petal length (in cm),Petal width (in cm),Class,SepalExtra
Sepal Length (in cm),Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
5.1,5.1,3.5,1.4,0.2,Iris-setosa,17.85
4.9,4.9,3.0,1.4,0.2,Iris-setosa,14.7
4.7,4.7,3.2,1.3,0.2,Iris-setosa,15.04
4.6,4.6,3.1,1.5,0.2,Iris-setosa,14.26
5.0,5.0,3.6,1.4,0.2,Iris-setosa,18.0
5.4,5.4,3.9,1.7,0.4,Iris-setosa,21.06
4.6,4.6,3.4,1.4,0.3,Iris-setosa,15.64
5.0,5.0,3.4,1.5,0.2,Iris-setosa,17.0
4.4,4.4,2.9,1.4,0.2,Iris-setosa,12.76
4.9,4.9,3.1,1.5,0.1,Iris-setosa,15.19


### Take only the top 10 rows of the dataset with only first 3 columns and store it in a dataframe named "IrisSubset" 

In [283]:
IrisSubset = Iris.from_records(Iris.iloc[0:10])

In [289]:
IrisSubset.drop(columns = ['Petal width (in cm)', 'Class', 'SepalExtra'])

Unnamed: 0,Sepal Length (in cm),Sepal Width in (cm),Petal length (in cm)
0,5.1,3.5,1.4
1,4.9,3.0,1.4
2,4.7,3.2,1.3
3,4.6,3.1,1.5
4,5.0,3.6,1.4
5,5.4,3.9,1.7
6,4.6,3.4,1.4
7,5.0,3.4,1.5
8,4.4,2.9,1.4
9,4.9,3.1,1.5
