#Student Pass/Fail Prediction Model
This notebook represents a machine learning model that predicts whether a student will pass or fail based on their academic performance, specifically their grades in various subjects, study time, past failures, and absent days.

#Dataset Description
The dataset used for this model is based on a student performance data from a Portuguese secondary school, specifically focusing on mathematics. The dataset consists of various features that can impact a student's academic success, as listed below:

G1: Grade in the first period (0-20 scale).

G2: Grade in the second period (0-20 scale).

G3: Final grade (0-20 scale) - This is the target variable we will use to determine pass/fail status.
studytime: Amount of study time, represented on a scale from 1 (very low) to 4 (very high).

failures: Number of past class failures.

absences: Number of school absences.

The target variable, pass_fail, is derived from the final grade (G3). A student is classified as passing if their final grade is greater than or equal to 10, and failing otherwise. This binary classification allows us to use supervised learning techniques to predict student outcomes based on the available features.

#Model Overview
The model utilizes the Random Forest algorithm, which is an ensemble learning method that constructs multiple decision trees during training and outputs the mode of the classes (for classification) or mean prediction (for regression). This approach helps improve accuracy and control over-fitting.


In [None]:

# Import the required libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt
import seaborn as sns

# Load the dataset
df = pd.read_csv('./student-mat.csv', sep=';')

# Convert final grades to binary Pass/Fail labels
df['pass_fail'] = df['G3'].apply(lambda x: 1 if x >= 10 else 0)

# Select features for the model
features = df[['G1', 'G2', 'studytime', 'failures', 'absences']]
target = df['pass_fail']

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.3, random_state=42)

# Create and train the Random Forest classifier
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

# Calculate accuracy on the training set
y_pred_train = model.predict(X_train)
train_accuracy = accuracy_score(y_train, y_pred_train)

# Calculate accuracy on the test set
y_pred_test = model.predict(X_test)
test_accuracy = accuracy_score(y_test, y_pred_test)

# Display accuracy
print(f"Training Accuracy: {train_accuracy * 100:.2f}%")
print(f"Test Accuracy: {test_accuracy * 100:.2f}%")



## Data Analysis
Let's first analyze the data and plot some correlations between features to get a better understanding of it.


In [None]:

# Correlation matrix
plt.figure(figsize=(10,6))
sns.heatmap(df[['G1', 'G2', 'studytime', 'failures', 'absences', 'pass_fail']].corr(), annot=True, cmap='coolwarm')
plt.title('Correlation Matrix')
plt.show()



## Model Prediction
Now, we'll add a function to predict whether a student will pass or fail based on their input features.


In [None]:

# Function to predict pass/fail
def predict_pass_fail(g1, g2, studytime, failures, absences):
    input_data = np.array([[g1, g2, studytime, failures, absences]])
    prediction = model.predict(input_data)
    return 'PASS' if prediction[0] == 1 else 'FAIL'

# Example prediction
example = predict_pass_fail(15, 14, 3, 0, 5)
print(f"Prediction for the example student: {example}")
