<a href="https://colab.research.google.com/github/Mercymerine/Pandas-Analysis/blob/main/iris.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Iris

### Introduction:

This exercise may seem a little bit strange, but keep doing it.

### Step 1. Import the necessary libraries

In [1]:
import pandas as pd

### Step 2. Import the dataset from this [address](https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data).

### Step 3. Assign it to a variable called iris

In [11]:
url = ('https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data')

Index(['5.1', '3.5', '1.4', '0.2', 'Iris-setosa'], dtype='object')

### Step 4. Create columns for the dataset

In [None]:
# 1. sepal_length (in cm)
# 2. sepal_width (in cm)
# 3. petal_length (in cm)
# 4. petal_width (in cm)
# 5. class

In [None]:
'''
header=None: This parameter specifies that there is no header row in the CSV file. By default, pandas assumes that the first row contains column names, but since we provided the names parameter explicitly, we set header=None to avoid treating the first row as headers.
names=columns: This parameter specifies the column names to be used for the DataFrame. We've defined the column names earlier and stored them in the list columns.
'''

In [27]:
columns = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'class']
iris = pd.read_csv(url, header=None, names=columns)
iris.head()


Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,class
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


### Step 5.  Is there any missing value in the dataframe?

In [28]:
iris.isnull().any().any()

False

### Step 6.  Lets set the values of the rows 10 to 29 of the column 'petal_length' to NaN

In [29]:
import numpy as np

iris.loc[10:29, 'petal_length'] = np.nan
print(iris.isnull().sum())

sepal_length     0
sepal_width      0
petal_length    20
petal_width      0
class            0
dtype: int64


### Step 7. Good, now lets substitute the NaN values to 1.0

In [30]:
iris['petal_length'].fillna(1.0, inplace=True)
print(iris.isnull().sum())
print(iris.loc[10:29, 'petal_length'])

sepal_length    0
sepal_width     0
petal_length    0
petal_width     0
class           0
dtype: int64
10    1.0
11    1.0
12    1.0
13    1.0
14    1.0
15    1.0
16    1.0
17    1.0
18    1.0
19    1.0
20    1.0
21    1.0
22    1.0
23    1.0
24    1.0
25    1.0
26    1.0
27    1.0
28    1.0
29    1.0
Name: petal_length, dtype: float64


### Step 8. Now let's delete the column class

In [33]:
iris.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2


In [32]:
iris.drop('class', axis=1, inplace=True)


### Step 9.  Set the first 3 rows as NaN

In [35]:
iris.iloc[0:3, :] = np.nan
print(iris.head())

   sepal_length  sepal_width  petal_length  petal_width
0           NaN          NaN           NaN          NaN
1           NaN          NaN           NaN          NaN
2           NaN          NaN           NaN          NaN
3           4.6          3.1           1.5          0.2
4           5.0          3.6           1.4          0.2


### Step 10.  Delete the rows that have NaN

In [36]:
iris.dropna(inplace=True)
print(iris.head())

   sepal_length  sepal_width  petal_length  petal_width
3           4.6          3.1           1.5          0.2
4           5.0          3.6           1.4          0.2
5           5.4          3.9           1.7          0.4
6           4.6          3.4           1.4          0.3
7           5.0          3.4           1.5          0.2


### Step 11. Reset the index so it begins with 0 again

In [39]:
iris.reset_index(drop=True, inplace=True)
iris.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
0,4.6,3.1,1.5,0.2
1,5.0,3.6,1.4,0.2
2,5.4,3.9,1.7,0.4
3,4.6,3.4,1.4,0.3
4,5.0,3.4,1.5,0.2


### BONUS: Create your own question and answer it.

In [37]:
#What is the shape of the above dataset?
iris.shape

(147, 4)