# Problem Statement

Consider a class teacher who has a small dataset of students that includes:

- Hours studied per day  
- Attendance percentage  
- Final exam result (Pass or Fail)

A new student approaches the teacher and asks:

**“If I study this much and attend this much, will I pass the exam?”**

Instead of making a guess, the goal is to use **machine learning** to learn patterns from previous students' data.  
By analyzing the relationship between study hours, attendance, and outcomes, the system can **predict whether the new student is likely to pass or fail**.

This represents a simple and practical example of **classification**, where the objective is to predict a category.


# Create or Load a Small Dataset

In [1]:
import pandas as pd

# Tiny dataset: study hours, attendance, pass/fail (1 = pass, 0 = fail)
data = {
    "study_hours": [1, 2, 3, 4, 5, 6, 7],
    "attendance":  [50, 55, 60, 65, 70, 80, 90],
    "result":      [0, 0, 0, 1, 1, 1, 1]
}

df = pd.DataFrame(data)

X = df[["study_hours", "attendance"]]   # input features
y = df["result"]                        # target label


##  Dataset Creation (Student Performance Prediction)

- A tiny student dataset is created with **two features**:
  - **Study hours**
  - **Attendance percentage**
- The **result** is encoded as:
  - `1` → Pass  
  - `0` → Fail  
- `X` contains the **input features** that the model will learn from.
- `y` contains the **target labels** that the model must predict.


# Train a Classification Model

In [2]:
from sklearn.tree import DecisionTreeClassifier

model = DecisionTreeClassifier()
model.fit(X, y)


##  Model Training (Decision Tree)

- A **Decision Tree** is used — a simple and intuitive machine learning algorithm.
- It learns **easy-to-understand rules**, such as:  
  *“If study_hours > 3 → likely pass”*
- `model.fit()` trains the model on the student dataset, allowing it to learn these patterns.


# Evaluate the Model

In [3]:
accuracy = model.score(X, y)
print("Accuracy:", accuracy)


Accuracy: 1.0


## Model Evaluation (Accuracy Score)

- `model.score()` returns the **accuracy**, which tells how many predictions the model gets correct on the known dataset.
- Since this is a **tiny demo dataset**, the accuracy may appear high because the model can easily **memorize** the small amount of data.


# Predict Result for a New Student

In [6]:
# Example: Student studies 4.5 hours, has 75% attendance
prediction = model.predict([[4.5, 75]])
print("Predicted (1=Pass, 0=Fail):", prediction[0])


Predicted (1=Pass, 0=Fail): 1




#  Workflow Summary

### 1. **Create a Tiny Dataset**
- Include study hours, attendance percentage, and pass/fail results.

### 2. **Separate Features and Labels**
- `X` → study hours + attendance  
- `y` → result (pass or fail)

### 3. **Train a Decision Tree Classifier**
- Fit the model to learn simple rules from the data.

### 4. **Evaluate Using Accuracy**
- Measure how many predictions the model gets correct.

### 5. **Predict for a New Student**
- Use the trained model to determine whether a new student will pass or fail.


---

#  Libraries Used

## 1. **pandas**
- Creates and manages data in a **DataFrame** format.
- Makes it easy to extract columns for **features (X)** and **labels (y)**.

## 2. **scikit-learn (sklearn)**
Used for performing machine learning tasks:

- `DecisionTreeClassifier` → learns pass/fail rules from the dataset  
- `.fit()` → trains the model  
- `.score()` → evaluates model accuracy  
- `.predict()` → predicts outcomes for new students  
