## About the dataset

This is perhaps the best known database to be found in the pattern recognition literature. The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant.  One class is linearly separable from the other 2; the latter are NOT linearly separable from each other.

Attribute Information:
   1. sepal length in cm
   2. sepal width in cm
   3. petal length in cm
   4. petal width in cm
   5. class: 
      -- Iris Setosa
      -- Iris Versicolour
      -- Iris Virginica

### Read the dataset and store it in the dataframe named Iris

In [4]:
import numpy as np
import pandas as pd

### Find out the datatypes of each and every column

In [50]:
data = pd.read_csv("Iris.csv")

### Print top 10 & bottom 10 samples from the dataframe

In [6]:
print("First 10 Rows :\n" + str(data.head(10)))
print("Last 10 Rows :\n" + str(data.tail(10)))

First 10 Rows :
   Sepal Length (in cm)  Sepal Width in (cm)  Petal length (in cm)  \
0                   5.1                  3.5                   1.4   
1                   4.9                  3.0                   1.4   
2                   4.7                  3.2                   1.3   
3                   4.6                  3.1                   1.5   
4                   5.0                  3.6                   1.4   
5                   5.4                  3.9                   1.7   
6                   4.6                  3.4                   1.4   
7                   5.0                  3.4                   1.5   
8                   4.4                  2.9                   1.4   
9                   4.9                  3.1                   1.5   

   Petal width (in cm)        Class  
0                  0.2  Iris-setosa  
1                  0.2  Iris-setosa  
2                  0.2  Iris-setosa  
3                  0.2  Iris-setosa  
4                  0.2 

### Find the shape of the dataset

In [7]:
data.shape

(150, 5)

### Set the index of the dataframe to be the first column

In [8]:
data.set_index(data.iloc[:,0],inplace=True)

In [9]:
type(data.index)
data.index

Float64Index([5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, 4.9,
              ...
              6.7, 6.9, 5.8, 6.8, 6.7, 6.7, 6.3, 6.5, 6.2, 5.9],
             dtype='float64', name='Sepal Length (in cm)', length=150)

### Use Iloc function to print all the rows of the 3rd column 

In [21]:
data.iloc[:,2]

0      1.4
1      1.4
2      1.3
3      1.5
4      1.4
5      1.7
6      1.4
7      1.5
8      1.4
9      1.5
10     1.5
11     1.6
12     1.4
13     1.1
14     1.2
15     1.5
16     1.3
17     1.4
18     1.7
19     1.5
20     1.7
21     1.5
22     1.0
23     1.7
24     1.9
25     1.6
26     1.6
27     1.5
28     1.4
29     1.6
      ... 
120    5.7
121    4.9
122    6.7
123    4.9
124    5.7
125    6.0
126    4.8
127    4.9
128    5.6
129    5.8
130    6.1
131    6.4
132    5.6
133    5.1
134    5.6
135    6.1
136    5.6
137    5.5
138    4.8
139    5.4
140    5.6
141    5.1
142    5.1
143    5.9
144    5.7
145    5.2
146    5.0
147    5.2
148    5.4
149    5.1
Name: Petal length (in cm), Length: 150, dtype: float64

### Slicing
Print only the Sepal width and Sepal Length for first 10 rows 

In [23]:
print(data.iloc[:11,:2])

    Sepal Length (in cm)  Sepal Width in (cm)
0                    5.1                  3.5
1                    4.9                  3.0
2                    4.7                  3.2
3                    4.6                  3.1
4                    5.0                  3.6
5                    5.4                  3.9
6                    4.6                  3.4
7                    5.0                  3.4
8                    4.4                  2.9
9                    4.9                  3.1
10                   5.4                  3.7


### Using Logical statements for indexing
Print all the columns of row which has class name "Iris-setosa"

In [44]:
#print(data.iloc[,:])
data.loc[data["Class"]=="Iris-setosa",:]

Unnamed: 0,Sepal Length (in cm),Sepal Width in (cm),Petal length (in cm),Petal width (in cm),Class
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa
5,5.4,3.9,1.7,0.4,Iris-setosa
6,4.6,3.4,1.4,0.3,Iris-setosa
7,5.0,3.4,1.5,0.2,Iris-setosa
8,4.4,2.9,1.4,0.2,Iris-setosa
9,4.9,3.1,1.5,0.1,Iris-setosa


### Multiply Sepal Length and width and store it under the column name "SepalExtra" in the same Iris dataframe

In [58]:
sepalextra = data["Sepal Length (in cm)"]*data["Sepal Width in (cm)"]

In [59]:
data["SepalExtra"]=sepalextra
data.iloc[0:4,:]

Unnamed: 0,Sepal Length (in cm),Sepal Width in (cm),Petal length (in cm),Petal width (in cm),Class,SepalExtra
0,5.1,3.5,1.4,0.2,Iris-setosa,17.85
1,4.9,3.0,1.4,0.2,Iris-setosa,14.7
2,4.7,3.2,1.3,0.2,Iris-setosa,15.04
3,4.6,3.1,1.5,0.2,Iris-setosa,14.26


### Find out the mean and variance for each column but for class column 

In [27]:
data1 = data.drop("Class", axis=1)
print("Mean of all the columns: \n\n" + str(data1.mean()))
print("\nVariance of all the columns: \n\n" + str(data1.var()))

Mean of all the columns: 

Sepal Length (in cm)     5.843333
Sepal Width in (cm)      3.054000
Petal length (in cm)     3.758667
Petal width (in cm)      1.198667
SepalExtra              17.806533
dtype: float64

Variance of all the columns: 

Sepal Length (in cm)     0.685694
Sepal Width in (cm)      0.188004
Petal length (in cm)     3.113179
Petal width (in cm)      0.582414
SepalExtra              11.348090
dtype: float64


### Write a function that accepts two numbers as input and prints them - Pass the Sepal length and sepal width of 5th row and print the output

In [28]:
def write(input1, input2):
    print("input1 is "+ str(input1))
    print("input2 is "+ str(input2))
write(data.iloc[4,0], data.iloc[4,1])

input1 is 5.0
input2 is 3.6


### Find the range of all the columns in the dataset

*Range = Max value - Min value (in the column)*

In [36]:
range = data1.max() - data1.min()
range

Sepal Length (in cm)     3.60
Sepal Width in (cm)      2.40
Petal length (in cm)     5.90
Petal width (in cm)      2.40
SepalExtra              20.02
dtype: float64

### Sort the entire dataset according to the column Petal width

In [40]:
data.sort_values(by=["Petal width (in cm)"])

Unnamed: 0_level_0,Sepal Length (in cm),Sepal Width in (cm),Petal length (in cm),Petal width (in cm),Class,SepalExtra
Sepal Length (in cm),Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
5.2,5.2,4.1,1.5,0.1,Iris-setosa,21.32
4.3,4.3,3.0,1.1,0.1,Iris-setosa,12.90
4.9,4.9,3.1,1.5,0.1,Iris-setosa,15.19
4.9,4.9,3.1,1.5,0.1,Iris-setosa,15.19
4.8,4.8,3.0,1.4,0.1,Iris-setosa,14.40
4.9,4.9,3.1,1.5,0.1,Iris-setosa,15.19
5.1,5.1,3.5,1.4,0.2,Iris-setosa,17.85
5.2,5.2,3.5,1.5,0.2,Iris-setosa,18.20
5.2,5.2,3.4,1.4,0.2,Iris-setosa,17.68
4.7,4.7,3.2,1.6,0.2,Iris-setosa,15.04


### Remove the new column "SepalExtra" from the dataframe

In [61]:
print(data)
data.drop(columns = "SepalExtra", inplace = True)
print(data)

     Sepal Length (in cm)  Sepal Width in (cm)  Petal length (in cm)  \
0                     5.1                  3.5                   1.4   
1                     4.9                  3.0                   1.4   
2                     4.7                  3.2                   1.3   
3                     4.6                  3.1                   1.5   
4                     5.0                  3.6                   1.4   
5                     5.4                  3.9                   1.7   
6                     4.6                  3.4                   1.4   
7                     5.0                  3.4                   1.5   
8                     4.4                  2.9                   1.4   
9                     4.9                  3.1                   1.5   
10                    5.4                  3.7                   1.5   
11                    4.8                  3.4                   1.6   
12                    4.8                  3.0                  

### Print only the rows which has the class to be "Iris-setosa"

In [63]:
print(data.loc[data["Class"]=="Iris-setosa",:])

    Sepal Length (in cm)  Sepal Width in (cm)  Petal length (in cm)  \
0                    5.1                  3.5                   1.4   
1                    4.9                  3.0                   1.4   
2                    4.7                  3.2                   1.3   
3                    4.6                  3.1                   1.5   
4                    5.0                  3.6                   1.4   
5                    5.4                  3.9                   1.7   
6                    4.6                  3.4                   1.4   
7                    5.0                  3.4                   1.5   
8                    4.4                  2.9                   1.4   
9                    4.9                  3.1                   1.5   
10                   5.4                  3.7                   1.5   
11                   4.8                  3.4                   1.6   
12                   4.8                  3.0                   1.4   
13    

### Take only the top 10 rows of the dataset with only first 3 columns and store it in a dataframe named "IrisSubset" 

In [70]:
IrisSubset = data.iloc[:10,:3]

In [71]:
IrisSubset

Unnamed: 0,Sepal Length (in cm),Sepal Width in (cm),Petal length (in cm)
0,5.1,3.5,1.4
1,4.9,3.0,1.4
2,4.7,3.2,1.3
3,4.6,3.1,1.5
4,5.0,3.6,1.4
5,5.4,3.9,1.7
6,4.6,3.4,1.4
7,5.0,3.4,1.5
8,4.4,2.9,1.4
9,4.9,3.1,1.5
