# Data Preparation for Iris Classification

In this notebook, we'll prepare the Iris dataset for our machine learning model. We'll perform the following steps:
1. Load the dataset
2. Explore the data
3. Perform basic data cleaning
4. Visualize the data
5. Prepare the data for model training

## 1. Load the necessary libraries and the dataset

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_iris
from azureml.core import Workspace, Dataset

# Load the Iris dataset
iris = load_iris()
df = pd.DataFrame(data=np.c_[iris['data'], iris['target']], 
                  columns=iris['feature_names'] + ['target'])

print(df.head())

## 2. Explore the data

In [None]:
# Display basic information about the dataset
print(df.info())

# Display summary statistics
print(df.describe())

# Check for missing values
print(df.isnull().sum())

## 3. Perform basic data cleaning

In [None]:
# Convert target to categorical
df['species'] = pd.Categorical.from_codes(iris.target, iris.target_names)

# Check for duplicates
print(f"Number of duplicates: {df.duplicated().sum()}")

# Remove duplicates if any
df = df.drop_duplicates()

print(df.head())

## 4. Visualize the data

In [None]:
# Pairplot to visualize relationships between features
sns.pairplot(df, hue='species')
plt.show()

# Correlation heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(df.corr(), annot=True, cmap='coolwarm')
plt.title('Correlation Heatmap of Iris Features')
plt.show()

## 5. Prepare the data for model training

In [None]:
# Split features and target
X = df.drop(['target', 'species'], axis=1)
y = df['species']

# Connect to your Azure ML workspace
ws = Workspace.from_config()

# Create a tabular dataset
dataset = Dataset.Tabular.register_pandas_dataframe(
    dataframe=df,
    target=(ws, 'iris'),
    name='iris',
    description='Iris dataset for classification'
)

print("Dataset registered.")
print(f"Dataset name: {dataset.name}, id: {dataset.id}")

## Conclusion

We've successfully loaded the Iris dataset, explored its characteristics, performed basic cleaning, visualized the data, and prepared it for model training. The dataset has been registered in our Azure ML workspace and is now ready for use in our next notebook where we'll train our machine learning model.