## About the dataset

This is perhaps the best known database to be found in the pattern recognition literature. The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant.  One class is linearly separable from the other 2; the latter are NOT linearly separable from each other.

Attribute Information:
   1. sepal length in cm
   2. sepal width in cm
   3. petal length in cm
   4. petal width in cm
   5. class: 
      -- Iris Setosa
      -- Iris Versicolour
      -- Iris Virginica

### Read the dataset and store it in the dataframe named Iris

In [1]:
import pandas as pd
iris = pd.read_csv('Iris.csv')

### Find out the datatypes of each and every column

In [2]:
iris.dtypes

Sepal Length (in cm)    float64
Sepal Width in (cm)     float64
Petal length (in cm)    float64
Petal width (in cm)     float64
Class                    object
dtype: object

### Print top 10 & bottom 10 samples from the dataframe

In [3]:
iris.head()

Unnamed: 0,Sepal Length (in cm),Sepal Width in (cm),Petal length (in cm),Petal width (in cm),Class
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


In [4]:
iris.tail()

Unnamed: 0,Sepal Length (in cm),Sepal Width in (cm),Petal length (in cm),Petal width (in cm),Class
145,6.7,3.0,5.2,2.3,Iris-virginica
146,6.3,2.5,5.0,1.9,Iris-virginica
147,6.5,3.0,5.2,2.0,Iris-virginica
148,6.2,3.4,5.4,2.3,Iris-virginica
149,5.9,3.0,5.1,1.8,Iris-virginica


### Find the shape of the dataset

In [5]:
iris.shape

(150, 5)

### Set the index of the dataframe to be the first column

In [6]:
iris.set_index('Sepal Length (in cm)',inplace=True)
#iris.set_index('Sepal Length (in cm)')

In [7]:
iris.head()

Unnamed: 0_level_0,Sepal Width in (cm),Petal length (in cm),Petal width (in cm),Class
Sepal Length (in cm),Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
5.0,3.6,1.4,0.2,Iris-setosa


### Use Iloc function to print all the rows of the 3rd column 

In [8]:
iris.iloc[:,2]

Sepal Length (in cm)
5.1    0.2
4.9    0.2
4.7    0.2
4.6    0.2
5.0    0.2
5.4    0.4
4.6    0.3
5.0    0.2
4.4    0.2
4.9    0.1
5.4    0.2
4.8    0.2
4.8    0.1
4.3    0.1
5.8    0.2
5.7    0.4
5.4    0.4
5.1    0.3
5.7    0.3
5.1    0.3
5.4    0.2
5.1    0.4
4.6    0.2
5.1    0.5
4.8    0.2
5.0    0.2
5.0    0.4
5.2    0.2
5.2    0.2
4.7    0.2
      ... 
6.9    2.3
5.6    2.0
7.7    2.0
6.3    1.8
6.7    2.1
7.2    1.8
6.2    1.8
6.1    1.8
6.4    2.1
7.2    1.6
7.4    1.9
7.9    2.0
6.4    2.2
6.3    1.5
6.1    1.4
7.7    2.3
6.3    2.4
6.4    1.8
6.0    1.8
6.9    2.1
6.7    2.4
6.9    2.3
5.8    1.9
6.8    2.3
6.7    2.5
6.7    2.3
6.3    1.9
6.5    2.0
6.2    2.3
5.9    1.8
Name: Petal width (in cm), Length: 150, dtype: float64

### Slicing
Print only the Sepal width and Sepal Length for first 10 rows 

In [9]:
iris = iris.reset_index()
iris.loc[:9,['Sepal Length (in cm)','Sepal Width in (cm)']]

Unnamed: 0,Sepal Length (in cm),Sepal Width in (cm)
0,5.1,3.5
1,4.9,3.0
2,4.7,3.2
3,4.6,3.1
4,5.0,3.6
5,5.4,3.9
6,4.6,3.4
7,5.0,3.4
8,4.4,2.9
9,4.9,3.1


### Using Logical statements for indexing
Print all the columns of row which has class name "Iris-setosa"

In [10]:
iris[iris['Class']=='Iris-setosa']

Unnamed: 0,Sepal Length (in cm),Sepal Width in (cm),Petal length (in cm),Petal width (in cm),Class
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa
5,5.4,3.9,1.7,0.4,Iris-setosa
6,4.6,3.4,1.4,0.3,Iris-setosa
7,5.0,3.4,1.5,0.2,Iris-setosa
8,4.4,2.9,1.4,0.2,Iris-setosa
9,4.9,3.1,1.5,0.1,Iris-setosa


### Multiply Sepal Length and width and store it under the column name "SepalExtra" in the same Iris dataframe

In [11]:
iris['SepalExtra'] = iris['Sepal Length (in cm)'] * iris['Sepal Width in (cm)']

In [12]:
iris.head()

Unnamed: 0,Sepal Length (in cm),Sepal Width in (cm),Petal length (in cm),Petal width (in cm),Class,SepalExtra
0,5.1,3.5,1.4,0.2,Iris-setosa,17.85
1,4.9,3.0,1.4,0.2,Iris-setosa,14.7
2,4.7,3.2,1.3,0.2,Iris-setosa,15.04
3,4.6,3.1,1.5,0.2,Iris-setosa,14.26
4,5.0,3.6,1.4,0.2,Iris-setosa,18.0


### Find out the mean and variance for each column but for class column 

In [13]:
iris.mean()

Sepal Length (in cm)     5.843333
Sepal Width in (cm)      3.054000
Petal length (in cm)     3.758667
Petal width (in cm)      1.198667
SepalExtra              17.806533
dtype: float64

In [14]:
iris.var()

Sepal Length (in cm)     0.685694
Sepal Width in (cm)      0.188004
Petal length (in cm)     3.113179
Petal width (in cm)      0.582414
SepalExtra              11.348090
dtype: float64

### Write a function that accepts two numbers as input and prints them - Pass the Sepal length and sepal width of 5th row and print the output

In [15]:
def sepal(s_length,s_width):
    print(f"Sepal Length: {s_length}\nSepal Width: {s_width}")

sepal(iris.loc[4,'Sepal Length (in cm)'],iris.loc[4,'Sepal Width in (cm)'])

Sepal Length: 5.0
Sepal Width: 3.6


### Find the range of all the columns in the dataset

*Range = Max value - Min value (in the column)*

In [16]:
iris_range = iris.max().drop(labels='Class') - iris.min().drop(labels='Class')
iris_range

Sepal Length (in cm)      3.6
Sepal Width in (cm)       2.4
Petal length (in cm)      5.9
Petal width (in cm)       2.4
SepalExtra              20.02
dtype: object

### Sort the entire dataset according to the column Petal width

In [17]:
iris.sort_values('Petal width (in cm)')

Unnamed: 0,Sepal Length (in cm),Sepal Width in (cm),Petal length (in cm),Petal width (in cm),Class,SepalExtra
32,5.2,4.1,1.5,0.1,Iris-setosa,21.32
13,4.3,3.0,1.1,0.1,Iris-setosa,12.90
37,4.9,3.1,1.5,0.1,Iris-setosa,15.19
9,4.9,3.1,1.5,0.1,Iris-setosa,15.19
12,4.8,3.0,1.4,0.1,Iris-setosa,14.40
34,4.9,3.1,1.5,0.1,Iris-setosa,15.19
0,5.1,3.5,1.4,0.2,Iris-setosa,17.85
27,5.2,3.5,1.5,0.2,Iris-setosa,18.20
28,5.2,3.4,1.4,0.2,Iris-setosa,17.68
29,4.7,3.2,1.6,0.2,Iris-setosa,15.04


### Remove the new column "SepalExtra" from the dataframe

In [18]:
iris.drop('SepalExtra', axis=1, inplace=True)

In [19]:
iris.head()

Unnamed: 0,Sepal Length (in cm),Sepal Width in (cm),Petal length (in cm),Petal width (in cm),Class
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


### Print only the rows which has the class to be "Iris-setosa"

In [20]:
iris[iris['Class']=='Iris-setosa']

Unnamed: 0,Sepal Length (in cm),Sepal Width in (cm),Petal length (in cm),Petal width (in cm),Class
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa
5,5.4,3.9,1.7,0.4,Iris-setosa
6,4.6,3.4,1.4,0.3,Iris-setosa
7,5.0,3.4,1.5,0.2,Iris-setosa
8,4.4,2.9,1.4,0.2,Iris-setosa
9,4.9,3.1,1.5,0.1,Iris-setosa


### Take only the top 10 rows of the dataset with only first 3 columns and store it in a dataframe named "IrisSubset" 

In [21]:
iris_subset = iris.iloc[:10,:3]

In [22]:
iris_subset

Unnamed: 0,Sepal Length (in cm),Sepal Width in (cm),Petal length (in cm)
0,5.1,3.5,1.4
1,4.9,3.0,1.4
2,4.7,3.2,1.3
3,4.6,3.1,1.5
4,5.0,3.6,1.4
5,5.4,3.9,1.7
6,4.6,3.4,1.4
7,5.0,3.4,1.5
8,4.4,2.9,1.4
9,4.9,3.1,1.5
