## Machine Learning: Iris Example
There are three main types of machine learning:
1. **Supervised Learning**: Learning from labeled data. The system is trained on a dataset that contains both the input features (X) and the expected output (Y).
2. **Unsupervised Learning**: Learning from unlabeled data. The system identifies patterns and relationships without predefined labels.
3. **Reinforcement Learning**: Learning by interacting with an environment. The system learns to make decisions based on rewards and penalties.

Let's start with a simple example of supervised learning using a popular dataset: the Iris dataset.


In [None]:
# Import necessary libraries
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
import pandas as pd

# Load the Iris dataset
iris = load_iris()

# Convert the dataset to a Pandas DataFrame for better readability
iris_df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
iris_df['target'] = iris.target  # Add the target variable as a new column

# Display the DataFrame
print(iris_df.head())  # Show the first few rows for a quick preview

# Print the dimensions of the DataFrame
print("Dimensions of the DataFrame:", iris_df.shape)


In [None]:
# Features (X) and labels (y)
X = iris.data #Features (X): iris.data contains the features (measurements of flower characteristics like petal length, sepal width, etc.).
y = iris.target #Labels (y): iris.target contains the labels (species of the iris flowers), which we want the model to predict.

In [None]:
# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

- **Data Splitting**: The `train_test_split` function from Scikit-Learn splits the dataset into training and testing sets.
  - **`test_size=0.3`**: Specifies that 30% of the data should be used for testing, and 70% for training.
  - **`random_state=42`**: Sets a seed for reproducibility, ensuring that the split is consistent across different runs.


In [None]:
# Initialize the Random Forest classifier
clf = RandomForestClassifier()

- **Random Forest Classifier**: `RandomForestClassifier()` initializes a Random Forest model, an ensemble learning method that builds multiple decision trees and combines them for better prediction accuracy and robustness.


In [None]:
# Train the model
clf.fit(X_train, y_train)

- **Model Training**: The `fit` method trains the Random Forest model on the training data (`X_train` and `y_train`). The model learns patterns in the training data, allowing it to make predictions.

In [None]:
# Make predictions
y_pred = clf.predict(X_test)

- **Predictions**: `clf.predict(X_test)` uses the trained model to predict the labels for the test set (`X_test`). These predictions are stored in `y_pred`.

In [None]:
# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Model accuracy: {accuracy * 100:.2f}%")

- **Calculate Accuracy**: `accuracy_score(y_test, y_pred)` compares the model’s predictions (`y_pred`) with the actual labels in the test set (`y_test`) to calculate the accuracy, which is the percentage of correct predictions.
- **Print Accuracy**: The code prints the accuracy as a percentage, formatted to two decimal places.


## Explanation of the Code

1. **Loading the Dataset**: We used the `load_iris()` function from `sklearn.datasets` to load the Iris dataset, which is a popular dataset for classification tasks. It contains 150 samples with 4 features (sepal length, sepal width, petal length, and petal width), and the goal is to classify these into three species of Iris.

2. **Train-Test Split**: We split the data into two parts—70% for training and 30% for testing. This is important to evaluate the model's performance on unseen data.

3. **Random Forest Classifier**: We used a Random Forest classifier, which is an ensemble method that combines multiple decision trees to improve prediction accuracy.

4. **Training the Model**: The model is trained on the training set using the `.fit()` method.

5. **Making Predictions**: The trained model makes predictions on the test set using `.predict()`.

6. **Evaluating the Model**: Finally, we evaluate the accuracy of the model by comparing the predicted labels to the actual labels using `accuracy_score()`.

## Key Takeaways

- Machine Learning models require data to learn patterns and relationships.
- In **Supervised Learning**, we provide labeled data to the model.
- The **Train-Test Split** is crucial for evaluating model performance.

Next, we will explore more advanced topics like model evaluation, hyperparameter tuning, and real-world applications in future lessons.
