# Iris

### Introduction:

This exercise may seem a little bit strange, but keep doing it.

### Step 1. Import the necessary libraries

In [1]:
import pandas as pd
import numpy as np

### Step 2. Import the dataset from this [address](https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data).

In [2]:
url = "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv"
iris = pd.read_csv(url)

### Step 3. Assign it to a variable called iris

In [3]:
print(iris.head())

   sepal_length  sepal_width  petal_length  petal_width species
0           5.1          3.5           1.4          0.2  setosa
1           4.9          3.0           1.4          0.2  setosa
2           4.7          3.2           1.3          0.2  setosa
3           4.6          3.1           1.5          0.2  setosa
4           5.0          3.6           1.4          0.2  setosa


### Step 4. Create columns for the dataset

In [4]:
iris.columns = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'class']
print("Updated column names:", iris.columns)

Updated column names: Index(['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'class'], dtype='object')


### Step 5.  Is there any missing value in the dataframe?

In [5]:
print("Missing values before modification:\n", iris.isnull().sum())


Missing values before modification:
 sepal_length    0
sepal_width     0
petal_length    0
petal_width     0
class           0
dtype: int64


### Step 6.  Lets set the values of the rows 10 to 29 of the column 'petal_length' to NaN

In [6]:
iris.loc[10:29, 'petal_length'] = np.nan
print("Missing values after setting NaN:\n", iris.isnull().sum())

Missing values after setting NaN:
 sepal_length     0
sepal_width      0
petal_length    20
petal_width      0
class            0
dtype: int64


### Step 7. Good, now lets substitute the NaN values to 1.0

In [7]:
iris['petal_length'].fillna(1.0, inplace=True)
print("Missing values after filling NaN:\n", iris.isnull().sum())

Missing values after filling NaN:
 sepal_length    0
sepal_width     0
petal_length    0
petal_width     0
class           0
dtype: int64


The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  iris['petal_length'].fillna(1.0, inplace=True)


### Step 8. Now let's delete the column class

In [8]:
iris.drop(columns=['class'], inplace=True)
print("Dataset after dropping 'class' column:\n", iris.head())

Dataset after dropping 'class' column:
    sepal_length  sepal_width  petal_length  petal_width
0           5.1          3.5           1.4          0.2
1           4.9          3.0           1.4          0.2
2           4.7          3.2           1.3          0.2
3           4.6          3.1           1.5          0.2
4           5.0          3.6           1.4          0.2


### Step 9.  Set the first 3 rows as NaN

In [9]:
iris.iloc[:3] = np.nan
print("Dataset after setting first 3 rows to NaN:\n", iris.head())

Dataset after setting first 3 rows to NaN:
    sepal_length  sepal_width  petal_length  petal_width
0           NaN          NaN           NaN          NaN
1           NaN          NaN           NaN          NaN
2           NaN          NaN           NaN          NaN
3           4.6          3.1           1.5          0.2
4           5.0          3.6           1.4          0.2


### Step 10.  Delete the rows that have NaN

In [10]:
iris.dropna(inplace=True)
print("Dataset after dropping NaN rows:\n", iris.head())

Dataset after dropping NaN rows:
    sepal_length  sepal_width  petal_length  petal_width
3           4.6          3.1           1.5          0.2
4           5.0          3.6           1.4          0.2
5           5.4          3.9           1.7          0.4
6           4.6          3.4           1.4          0.3
7           5.0          3.4           1.5          0.2


### Step 11. Reset the index so it begins with 0 again

In [11]:
iris.reset_index(drop=True, inplace=True)
print("Dataset after resetting index:\n", iris.head())

Dataset after resetting index:
    sepal_length  sepal_width  petal_length  petal_width
0           4.6          3.1           1.5          0.2
1           5.0          3.6           1.4          0.2
2           5.4          3.9           1.7          0.4
3           4.6          3.4           1.4          0.3
4           5.0          3.4           1.5          0.2


### BONUS: Create your own question and answer it.