
# Example of Supervised Learning with the Iris Dataset

What is the **Iris dataset**?

Imagine you have several flowers, and for each one, you record some information about them, such as:
- The **length** of the petal (the colorful part of the flower).
- The **width** of the petal.
- The **length** of the sepal (the part that holds the flower).
- The **width** of the sepal.

In addition to this information, for each flower, you also know its **type**. In the case of the Iris dataset, there are 3 types of flowers:
1. **Setosa**
2. **Versicolor**
3. **Virginica**

So, the **Iris dataset** is essentially a large table with this information for many flowers (150 flowers in total) and their corresponding flower type.

Example of a row in this table:
- Petal length: 5 cm
- Petal width: 3 cm
- Sepal length: 4.5 cm
- Sepal width: 2.3 cm
- Flower type: Setosa

What does **supervised learning** do?

**Supervised learning** means the computer learns from examples that already have the **correct answer**.

In this case, the computer will learn to **identify the type of flower** (Setosa, Versicolor, or Virginica) based on its characteristics (sepal and petal length and width). That is, you already have the input data (the flower measurements) and the answers (the flower type).

What is a **Random Forest classifier**?

Now, **Random Forest** is a tool the computer uses to **make predictions** based on the data it has learned.

Imagine the computer needs to decide, for example, whether a flower is "Setosa" or "Versicolor". It will use the **Random Forest**, which is like a team of "decision trees". Each tree makes its own prediction based on the data, and in the end, the team (the "Random Forest") decides the best answer, i.e., the type of flower.

How does the code work?

Now that we know what the Iris dataset is, supervised learning, and the Random Forest classifier, let’s understand the code step by step in a very simple way:

1. **Load the flower data**:
   The code starts by loading the Iris dataset, meaning it retrieves the information about the flowers and their characteristics (like petal size) and the flower type.

2. **Split the data into two groups**:
   The code divides this data into two groups:
   - **Training group**: Where the computer will learn the characteristics of the flowers and their types (80% of the data).
   - **Testing group**: Where the computer will try to guess the type of flower, and we will check if it got it right (20% of the data).

3. **Train the model**:
   Using the training group, the **Random Forest** starts "learning". It looks at the characteristics of the flowers and tries to determine which flower type is associated with each set of characteristics.

4. **Make predictions**:
   After learning, the **Random Forest** uses what it has learned to try to guess the types of flowers in the **testing group** (which it has never seen before).

5. **Check if it got it right**:
   Finally, the code compares the computer’s predictions with the correct answers (the flower types in the testing group) and calculates the **accuracy**, which is the percentage of times the computer got it right.

### In summary:
- The **Iris dataset** is a table containing information about flowers (characteristics like petal size) and their types.
- **Supervised learning** is a type of learning where the computer learns from data that already has the correct answer (flower type).
- The **Random Forest** is a computer model that uses multiple "trees" to predict the flower type based on the characteristics it has learned.

The code trains this model and, at the end, calculates the percentage of times the model correctly guessed the flower types in the testing group.


In [None]:

# Import necessary libraries
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

# Load the Iris dataset
data = load_iris()
X = data.data
y = data.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train a RandomForest model
model = RandomForestClassifier()
model.fit(X_train, y_train)

# Predict the results
y_pred = model.predict(X_test)

# Evaluate the model's accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Model accuracy: {accuracy * 100:.2f}%")
