# Iris

### Introduction:

This exercise may seem a little bit strange, but keep doing it.

### Step 1. Import the necessary libraries

In [2]:
import pandas as pd
import numpy as np


### Step 2. Import the dataset from this [address](https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data).

### Step 3. Assign it to a variable called iris

In [3]:
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
iris = pd.read_csv(url, header=None)
iris.head()


Unnamed: 0,0,1,2,3,4
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


### Step 4. Create columns for the dataset

In [4]:
iris.columns = ["sepal_length", "sepal_width", "petal_length", "petal_width", "class"]
iris.head()


Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,class
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


### Step 5.  Is there any missing value in the dataframe?

In [5]:
iris.isnull().sum()


Unnamed: 0,0
sepal_length,0
sepal_width,0
petal_length,0
petal_width,0
class,0


### Step 6.  Lets set the values of the rows 10 to 29 of the column 'petal_length' to NaN

In [6]:
iris.loc[10:29, "petal_length"] = np.nan
iris.iloc[10:30]


Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,class
10,5.4,3.7,,0.2,Iris-setosa
11,4.8,3.4,,0.2,Iris-setosa
12,4.8,3.0,,0.1,Iris-setosa
13,4.3,3.0,,0.1,Iris-setosa
14,5.8,4.0,,0.2,Iris-setosa
15,5.7,4.4,,0.4,Iris-setosa
16,5.4,3.9,,0.4,Iris-setosa
17,5.1,3.5,,0.3,Iris-setosa
18,5.7,3.8,,0.3,Iris-setosa
19,5.1,3.8,,0.3,Iris-setosa


### Step 7. Good, now lets substitute the NaN values to 1.0

In [7]:
iris["petal_length"].fillna(1.0, inplace=True)
iris.iloc[10:30]


The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  iris["petal_length"].fillna(1.0, inplace=True)


Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,class
10,5.4,3.7,1.0,0.2,Iris-setosa
11,4.8,3.4,1.0,0.2,Iris-setosa
12,4.8,3.0,1.0,0.1,Iris-setosa
13,4.3,3.0,1.0,0.1,Iris-setosa
14,5.8,4.0,1.0,0.2,Iris-setosa
15,5.7,4.4,1.0,0.4,Iris-setosa
16,5.4,3.9,1.0,0.4,Iris-setosa
17,5.1,3.5,1.0,0.3,Iris-setosa
18,5.7,3.8,1.0,0.3,Iris-setosa
19,5.1,3.8,1.0,0.3,Iris-setosa


### Step 8. Now let's delete the column class

In [8]:
iris.drop(columns="class", inplace=True)
iris.head()


Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2


### Step 9.  Set the first 3 rows as NaN

In [9]:
iris.iloc[0:3] = np.nan
iris.head(5)


Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
0,,,,
1,,,,
2,,,,
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2


### Step 10.  Delete the rows that have NaN

In [10]:
iris = iris.dropna()
iris.head()


Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2
5,5.4,3.9,1.7,0.4
6,4.6,3.4,1.4,0.3
7,5.0,3.4,1.5,0.2


### Step 11. Reset the index so it begins with 0 again

In [11]:
iris.reset_index(drop=True, inplace=True)
iris.head()


Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
0,4.6,3.1,1.5,0.2
1,5.0,3.6,1.4,0.2
2,5.4,3.9,1.7,0.4
3,4.6,3.4,1.4,0.3
4,5.0,3.4,1.5,0.2


### BONUS: Create your own question and answer it.

In [12]:
avg_sepal_length = iris["sepal_length"].mean()
print(f"Average sepal_length after cleaning: {avg_sepal_length:.2f}")


Average sepal_length after cleaning: 5.86
