# Practical Exercise: Predicting Exam Success with Logistic Regression
In this exercise, we will use logistic regression to predict whether a student will pass an exam based on three features:
- Hours spent studying (`study_hours`)
- Attendance record (`attendance`, where 1 means they attended regularly)
- Past academic performance (`past_performance`, where 1 means good past grades)

The target variable is `passed_exam`, which is 1 if the student passed the exam and 0 otherwise.

## Step 1: Import Required Libraries
We begin by importing the necessary Python libraries:
- `pandas` and `numpy` for data handling
- `train_test_split` to divide the dataset
- `LogisticRegression` for the model
- `accuracy_score` and `classification_report` to evaluate performance.

In [1]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report
print("✅ Libraries loaded successfully.")

✅ Libraries loaded successfully.


## Step 2: Load and Explore the Dataset
Let's load the dataset and preview it using `.head()` and `.describe()`.

In [2]:
df = pd.read_csv('data/student_exam_data.csv')
print("📄 First few rows of the dataset:")
print(df.head())

print("\n📊 Summary statistics:")
print(df.describe())

📄 First few rows of the dataset:
   study_hours  attendance  past_performance  passed_exam
0          6.0           1                 0            0
1          4.7           1                 1            1
2          6.3           0                 0            0
3          8.0           1                 0            1
4          4.5           0                 0            0

📊 Summary statistics:
       study_hours  attendance  past_performance  passed_exam
count   200.000000  200.000000        200.000000    200.00000
mean      4.918000    0.555000          0.440000      0.49500
std       1.847959    0.498213          0.497633      0.50123
min       0.000000    0.000000          0.000000      0.00000
25%       3.600000    0.000000          0.000000      0.00000
50%       5.000000    1.000000          0.000000      0.00000
75%       6.000000    1.000000          1.000000      1.00000
max      10.000000    1.000000          1.000000      1.00000


## Step 3: Prepare the Data
Now we separate the features (`X`) from the target (`y`) and split the dataset into training and testing sets.

In [3]:
X = df[['study_hours', 'attendance', 'past_performance']]
y = df['passed_exam']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
print("✅ Data split into training and testing sets.")

✅ Data split into training and testing sets.


## Step 4: Train the Logistic Regression Model
Now we create a `LogisticRegression` model and train it using the `.fit()` method.

In [4]:
model = LogisticRegression()
model.fit(X_train, y_train)
print("✅ Model trained.")

✅ Model trained.


## Step 5: Evaluate the Model
Let's make predictions on the test set and evaluate them using accuracy and a classification report.

In [5]:
y_pred = model.predict(X_test)
print(f"Accuracy: {accuracy_score(y_test, y_pred):.2f}")
print("\nClassification Report:")
print(classification_report(y_test, y_pred))

Accuracy: 0.73

Classification Report:
              precision    recall  f1-score   support

           0       0.74      0.78      0.76        32
           1       0.73      0.68      0.70        28

    accuracy                           0.73        60
   macro avg       0.73      0.73      0.73        60
weighted avg       0.73      0.73      0.73        60



## Step 6: Try Your Own Example
Now let's try predicting the outcome for a custom student profile. Enter the values and see if they are likely to pass the exam.

In [7]:
print("🔍 Enter student details:")
study_hours = float(input("Study hours: "))
attendance = int(input("Attendance (0 or 1): "))
past_perf = int(input("Past performance (0 or 1): "))

new_student = pd.DataFrame([[study_hours, attendance, past_perf]], columns=X.columns)
prediction = model.predict(new_student)[0]
prob = model.predict_proba(new_student)[0][1]

result = '🟢 Likely to pass' if prediction == 1 else '🔴 At risk of failing'
print(f"Prediction: {result} (probability: {prob:.2f})")

🔍 Enter student details:


Study hours:  5
Attendance (0 or 1):  1
Past performance (0 or 1):  1


Prediction: 🟢 Likely to pass (probability: 0.70)
