# Task
Perform decision tree and random forest algorithm on the dataset "/content/iris.csv".

## Load data

### Subtask:
Load the iris dataset from the CSV file into a pandas DataFrame.


**Reasoning**:
Import pandas and load the dataset into a DataFrame and display the head and info to verify.



In [1]:
import pandas as pd

df = pd.read_csv('/content/iris.csv')
display(df.head())
display(df.info())

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   sepal_length  150 non-null    float64
 1   sepal_width   150 non-null    float64
 2   petal_length  150 non-null    float64
 3   petal_width   150 non-null    float64
 4   species       150 non-null    object 
dtypes: float64(4), object(1)
memory usage: 6.0+ KB


None

## Prepare data

### Subtask:
Separate the features and target variables and split the data into training and testing sets.


**Reasoning**:
Separate the features and target variables and split the data into training and testing sets as per the instructions.



In [2]:
from sklearn.model_selection import train_test_split

X = df.drop('species', axis=1)
y = df['species']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

display(X_train.head())
display(X_test.head())
display(y_train.head())
display(y_test.head())

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
22,4.6,3.6,1.0,0.2
15,5.7,4.4,1.5,0.4
65,6.7,3.1,4.4,1.4
11,4.8,3.4,1.6,0.2
42,4.4,3.2,1.3,0.2


Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
73,6.1,2.8,4.7,1.2
18,5.7,3.8,1.7,0.3
118,7.7,2.6,6.9,2.3
78,6.0,2.9,4.5,1.5
76,6.8,2.8,4.8,1.4


Unnamed: 0,species
22,setosa
15,setosa
65,versicolor
11,setosa
42,setosa


Unnamed: 0,species
73,versicolor
18,setosa
118,virginica
78,versicolor
76,versicolor


## Train decision tree model

### Subtask:
Train a Decision Tree Classifier model on the training data.


**Reasoning**:
Import the DecisionTreeClassifier and train the model.



In [3]:
from sklearn.tree import DecisionTreeClassifier

dt_classifier = DecisionTreeClassifier(random_state=42)
dt_classifier.fit(X_train, y_train)

## Evaluate decision tree model

### Subtask:
Evaluate the performance of the Decision Tree model on the testing data.


**Reasoning**:
Evaluate the performance of the trained Decision Tree model using accuracy, classification report, and confusion matrix.



In [4]:
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

y_pred_dt = dt_classifier.predict(X_test)

accuracy_dt = accuracy_score(y_test, y_pred_dt)
report_dt = classification_report(y_test, y_pred_dt)
matrix_dt = confusion_matrix(y_test, y_pred_dt)

print("Decision Tree Model Performance:")
print(f"Accuracy: {accuracy_dt:.4f}")
print("\nClassification Report:")
print(report_dt)
print("\nConfusion Matrix:")
print(matrix_dt)

Decision Tree Model Performance:
Accuracy: 1.0000

Classification Report:
              precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        10
  versicolor       1.00      1.00      1.00         9
   virginica       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30


Confusion Matrix:
[[10  0  0]
 [ 0  9  0]
 [ 0  0 11]]


## Train random forest model

### Subtask:
Train a Random Forest Classifier model on the training data.


**Reasoning**:
Import the RandomForestClassifier and train the model on the training data.



In [5]:
from sklearn.ensemble import RandomForestClassifier

rf_classifier = RandomForestClassifier(random_state=42)
rf_classifier.fit(X_train, y_train)

## Evaluate random forest model

### Subtask:
Evaluate the performance of the Random Forest model on the testing data.


**Reasoning**:
Evaluate the performance of the trained Random Forest model using the testing data and print the evaluation metrics.



In [6]:
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

y_pred_rf = rf_classifier.predict(X_test)

accuracy_rf = accuracy_score(y_test, y_pred_rf)
report_rf = classification_report(y_test, y_pred_rf)
matrix_rf = confusion_matrix(y_test, y_pred_rf)

print("Random Forest Model Performance:")
print(f"Accuracy: {accuracy_rf:.4f}")
print("\nClassification Report:")
print(report_rf)
print("\nConfusion Matrix:")
print(matrix_rf)

Random Forest Model Performance:
Accuracy: 1.0000

Classification Report:
              precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        10
  versicolor       1.00      1.00      1.00         9
   virginica       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30


Confusion Matrix:
[[10  0  0]
 [ 0  9  0]
 [ 0  0 11]]


## Summary:

### Data Analysis Key Findings

*   The Iris dataset was successfully loaded, containing 150 entries and 5 columns with no missing values.
*   The data was split into training (80%) and testing (20%) sets, resulting in 120 training and 30 testing samples.
*   Both the Decision Tree and Random Forest models achieved an accuracy of 1.0000 on the test data.
*   Both models showed perfect precision, recall, and f1-scores (1.00) for all three species classes (setosa, versicolor, and virginica) on the test set.
*   The confusion matrices for both models indicated zero misclassifications on the test data.


