# Iris

### Introduction:

This exercise may seem a little bit strange, but keep doing it.

### Step 1. Import the necessary libraries

In [1]:
import pandas as pd
import numpy as np

pd.set_option("display.float_format", "{:,.2f}".format)
pd.set_option("display.max_columns", None)

### Step 2. Import the dataset from this [address](https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data). 

In [2]:
path = "../../data/iris.data"
df = pd.read_csv(path, header=None)
df.head()

Unnamed: 0,0,1,2,3,4
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


### Step 3. Assign it to a variable called iris

In [3]:
iris = df.copy()
iris

Unnamed: 0,0,1,2,3,4
0,5.10,3.50,1.40,0.20,Iris-setosa
1,4.90,3.00,1.40,0.20,Iris-setosa
2,4.70,3.20,1.30,0.20,Iris-setosa
3,4.60,3.10,1.50,0.20,Iris-setosa
4,5.00,3.60,1.40,0.20,Iris-setosa
...,...,...,...,...,...
145,6.70,3.00,5.20,2.30,Iris-virginica
146,6.30,2.50,5.00,1.90,Iris-virginica
147,6.50,3.00,5.20,2.00,Iris-virginica
148,6.20,3.40,5.40,2.30,Iris-virginica


### Step 4. Create columns for the dataset

In [4]:
cols = [ "sepal_length (in cm)"
       , "sepal_width (in cm)"
       , "petal_length (in cm)"
       , "petal_width (in cm)"
       , "class"]
iris.columns = cols
iris.head()

Unnamed: 0,sepal_length (in cm),sepal_width (in cm),petal_length (in cm),petal_width (in cm),class
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


### Step 5.  Is there any missing value in the dataframe?

In [5]:
iris.isna().sum()

sepal_length (in cm)    0
sepal_width (in cm)     0
petal_length (in cm)    0
petal_width (in cm)     0
class                   0
dtype: int64

### Step 6.  Lets set the values of the rows 10 to 29 of the column 'petal_length' to NaN

In [6]:
iris.loc[10:29, "petal_length (in cm)"] = np.nan

In [7]:
iris.isna().sum()

sepal_length (in cm)     0
sepal_width (in cm)      0
petal_length (in cm)    20
petal_width (in cm)      0
class                    0
dtype: int64

### Step 7. Good, now lets substitute the NaN values to 1.0

In [8]:
iris.fillna(1
            , inplace=True)
iris.isna().sum()

sepal_length (in cm)    0
sepal_width (in cm)     0
petal_length (in cm)    0
petal_width (in cm)     0
class                   0
dtype: int64

### Step 8. Now let's delete the column class

In [9]:
iris.drop(columns=["class"]
          , axis=0
          , inplace=True)
iris.head()

Unnamed: 0,sepal_length (in cm),sepal_width (in cm),petal_length (in cm),petal_width (in cm)
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2


### Step 9.  Set the first 3 rows as NaN

In [10]:
iris.iloc[:3,:]=np.nan
iris.head()

Unnamed: 0,sepal_length (in cm),sepal_width (in cm),petal_length (in cm),petal_width (in cm)
0,,,,
1,,,,
2,,,,
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2


### Step 10.  Delete the rows that have NaN

In [11]:
iris.dropna(axis=0
            , how="any"
            , inplace=True)
iris.head()

Unnamed: 0,sepal_length (in cm),sepal_width (in cm),petal_length (in cm),petal_width (in cm)
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2
5,5.4,3.9,1.7,0.4
6,4.6,3.4,1.4,0.3
7,5.0,3.4,1.5,0.2


### Step 11. Reset the index so it begins with 0 again

In [12]:
iris.reset_index(inplace=True, drop=True)
iris.head()

Unnamed: 0,sepal_length (in cm),sepal_width (in cm),petal_length (in cm),petal_width (in cm)
0,4.6,3.1,1.5,0.2
1,5.0,3.6,1.4,0.2
2,5.4,3.9,1.7,0.4
3,4.6,3.4,1.4,0.3
4,5.0,3.4,1.5,0.2


### BONUS: Create your own question and answer it.