# Iris

### Introduction:

This exercise may seem a little bit strange, but keep doing it.

### Step 1. Import the necessary libraries

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

### Step 2. Import the dataset from this [address](https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data). 

### Step 3. Assign it to a variable called iris

In [6]:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = pd.read_csv(url, index_col=['sepal_length','sepal_width', 'petal_length', 'petal_width', 'class'])

iris.head()

ValueError: Index sepal_length invalid

### Step 4. Create columns for the dataset

In [7]:
# 1. sepal_length (in cm)
# 2. sepal_width (in cm)
# 3. petal_length (in cm)
# 4. petal_width (in cm)
# 5. class


iris.columns = ['sepal_length','sepal_width', 'petal_length', 'petal_width', 'class']
iris.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,class
0,4.9,3.0,1.4,0.2,Iris-setosa
1,4.7,3.2,1.3,0.2,Iris-setosa
2,4.6,3.1,1.5,0.2,Iris-setosa
3,5.0,3.6,1.4,0.2,Iris-setosa
4,5.4,3.9,1.7,0.4,Iris-setosa


### Step 5.  Is there any missing value in the dataframe?

In [11]:
iris["class"] = iris["class"].astype("category")
iris.isna().sum()

### Step 6.  Lets set the values of the rows 10 to 29 of the column 'petal_length' to NaN

In [21]:
iris.loc[10:30, "petal_length"] = np.nan
iris.iloc[10:30]

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,class
10,4.8,3.4,,0.2,Iris-setosa
11,4.8,3.0,,0.1,Iris-setosa
12,4.3,3.0,,0.1,Iris-setosa
13,5.8,4.0,,0.2,Iris-setosa
14,5.7,4.4,,0.4,Iris-setosa
15,5.4,3.9,,0.4,Iris-setosa
16,5.1,3.5,,0.3,Iris-setosa
17,5.7,3.8,,0.3,Iris-setosa
18,5.1,3.8,,0.3,Iris-setosa
19,5.4,3.4,,0.2,Iris-setosa


### Step 7. Good, now lets substitute the NaN values to 1.0

In [28]:
iris.petal_length.fillna(1, inplace = True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  iris.petal_length.fillna(1, inplace = True)


### Step 8. Now let's delete the column class

In [30]:
iris = iris.drop("class", axis = 1)

### Step 9.  Set the first 3 rows as NaN

In [31]:
iris.iloc[:3, :] = np.nan

### Step 10.  Delete the rows that have NaN

In [35]:
iris.dropna(axis=0)

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
3,5.0,3.6,1.4,0.2
4,5.4,3.9,1.7,0.4
5,4.6,3.4,1.4,0.3
6,5.0,3.4,1.5,0.2
7,4.4,2.9,1.4,0.2
...,...,...,...,...
144,6.7,3.0,5.2,2.3
145,6.3,2.5,5.0,1.9
146,6.5,3.0,5.2,2.0
147,6.2,3.4,5.4,2.3


### Step 11. Reset the index so it begins with 0 again

In [37]:
iris.reset_index(drop=True)

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
0,,,,
1,,,,
2,,,,
3,5.0,3.6,1.4,0.2
4,5.4,3.9,1.7,0.4
...,...,...,...,...
144,6.7,3.0,5.2,2.3
145,6.3,2.5,5.0,1.9
146,6.5,3.0,5.2,2.0
147,6.2,3.4,5.4,2.3


### BONUS: Create your own question and answer it.