# Multivariate Outlier Detection â€“ Interactive Exercises

In this notebook, you will practice detecting outliers in the Iris dataset using **multivariate methods**.
For each step, try to complete the code, and use the collapsible hints if you get stuck. A **collapsed solution** is provided at the end for self-checking.

## Step 1: Import Libraries
Import the necessary Python libraries: `pandas`, `matplotlib.pyplot`, `pylab.rcParams`, and `seaborn`.

In [None]:
# YOUR CODE HERE


<details>
<summary>Hint</summary>
Use `import pandas as pd`, `import matplotlib.pyplot as plt`, `from pylab import rcParams`, and `import seaborn as sns`.
</details>

## Step 2: Set Plotting Parameters
Set the figure size and seaborn style for better visualization.
- Figure size: `(5,4)`
- Style: `'whitegrid'`
- Enable `%matplotlib inline` to show plots inline.

In [None]:
# YOUR CODE HERE


<details>
<summary>Hint</summary>
Use `rcParams['figure.figsize'] = 5,4` and `sns.set_style('whitegrid')`.
Remember `%matplotlib inline` for Jupyter notebooks.
</details>

## Step 3: Load the Iris Dataset
Load the CSV file into a pandas DataFrame, assign column names, and inspect the first 5 rows.
- Columns: `['Sepal Length', 'Sepal Width', 'Petal Length', 'Petal Width', 'Species']`

In [None]:
# YOUR CODE HERE


<details>
<summary>Hint</summary>
Use `pd.read_csv()` with `header=None`.
Then assign `df.columns = [...]`.
Use `df.head()` to inspect the first 5 rows.
</details>

## Step 4: Create a Boxplot
Plot `Sepal Length` vs `Species` using a boxplot to identify potential outliers.
- Use `hue='Species'` and `palette='hls'`

In [None]:
# YOUR CODE HERE


<details>
<summary>Hint</summary>
Use `sns.boxplot(x='Species', y='Sepal Length', data=df, hue='Species', palette='hls', legend=False)`
</details>

## Step 5: Scatterplot Matrix
Generate a scatterplot matrix to detect multivariate outliers.
- Use `sns.pairplot(df, hue='Species', palette='hls')`

In [None]:
# YOUR CODE HERE


<details>
<summary>Hint</summary>
Call `sns.pairplot(df, hue='Species', palette='hls')`
Check the scatterplots for points that don't follow cluster patterns.
</details>

## Step 6: Tukey's Outlier Detection
Calculate descriptive statistics for the features and identify outliers using the IQR method.
- Use `X_df = pd.DataFrame(x, columns=[...])`
- Use `X_df.describe()`

In [None]:
# YOUR CODE HERE


<details>
<summary>Hint</summary>
Set pandas display float format for readability: `pd.options.display.float_format = '{:.1f}'.format`
Create the DataFrame: `X_df = pd.DataFrame(x, columns=['Sepal Length', 'Sepal Width', 'Petal Length', 'Petal Width'])`
Then call `X_df.describe()` to view Q1, Q3, min, and max values.
</details>

## Step 7: Collapsed Solution
Click below to reveal full solutions for all exercises.

In [None]:
# Solution
import pandas as pd
import matplotlib.pyplot as plt
from pylab import rcParams
import seaborn as sns

%matplotlib inline
rcParams['figure.figsize'] = 5,4
sns.set_style('whitegrid')

address = '/workspaces/python-for-data-science-and-machine-learning-essential-training-part-1-3006708/data/iris.data.csv'
df = pd.read_csv(address, header=None, sep=',')
df.columns = ['Sepal Length', 'Sepal Width', 'Petal Length', 'Petal Width', 'Species']
x = df.iloc[:,0:4].values
y = df.iloc[:,4].values
df.head()

sns.boxplot(x='Species', y='Sepal Length', data=df, hue='Species', palette='hls', legend=False)
sns.pairplot(df, hue='Species', palette='hls')

pd.options.display.float_format = '{:.1f}'.format
X_df = pd.DataFrame(x, columns=['Sepal Length', 'Sepal Width', 'Petal Length', 'Petal Width'])
X_df.describe()