## Creating and Viewing the Dataset

We start by creating a small dataset representing Nigerian IT students.  
Each student has features like study hours, attendance percentage, assignments submitted, and whether they are part of a study group.  
Our goal is to predict if the student will pass based on these features.

Let's look at the first few records.

In [1]:
pip install joblib




## Preparing Features and Splitting Data

We separate the data into input features (X) and the target variable (y), which indicates if a student passed.  
Then, we split the dataset into training and testing sets.  
The model will learn from the training data and be evaluated on the test data.

In [20]:
import pandas as pd
import numpy as np
import joblib
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Step 1: Create the dataset
data = {
    "study_hours_per_week": [4, 10, 15, 3, 20, 6, 8, 12, 5, 7, 16, 18, 9, 2, 14, 11, 17, 1, 13, 19],
    "attendance_percent": [55, 75, 90, 50, 98, 65, 70, 85, 60, 58, 95, 92, 68, 40, 88, 76, 91, 35, 83, 99],
    "assignments_submitted": [2, 4, 5, 1, 5, 3, 3, 4, 2, 2, 5, 5, 3, 1, 4, 4, 5, 1, 4, 5],
    "study_group": [0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1],
    "passed": [0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1]
}

df = pd.DataFrame(data)
df

Unnamed: 0,study_hours_per_week,attendance_percent,assignments_submitted,study_group,passed
0,4,55,2,0,0
1,10,75,4,1,1
2,15,90,5,1,1
3,3,50,1,0,0
4,20,98,5,1,1
5,6,65,3,0,0
6,8,70,3,0,0
7,12,85,4,1,1
8,5,60,2,0,0
9,7,58,2,0,0


## Preparing Features and Splitting Data

We separate the data into input features (X) and the target variable (y), which indicates if a student passed.  
Then, we split the dataset into training and testing sets.  
The model will learn from the training data and be evaluated on the test data.

In [3]:


X = df.drop("passed", axis=1)
y = df["passed"]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
X_train.head()

Unnamed: 0,study_hours_per_week,attendance_percent,assignments_submitted,study_group
8,5,60,2,0
5,6,65,3,0
11,18,92,5,1
3,3,50,1,0
18,13,83,4,1


## Training the Logistic Regression Model

We train a logistic regression model using the training data.  
Logistic regression is a simple yet effective algorithm used to predict binary outcomes like pass or fail.

In [15]:
# Train model
df = pd.DataFrame(data)
from sklearn.linear_model import LogisticRegression
model = LogisticRegression()
model.fit(X_train, y_train)


In [23]:
# save the trained model

joblib.dump(model, "student_model.pkl")


['student_model.pkl']

## Predicting and Evaluating the Model

After training, we test the model by making predictions on unseen test data.  
We then calculate the accuracy, the percentage of correct predictions to see how well the model performs.

In [5]:
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

print(f"Prediction Accuracy: {accuracy:.2f}")

Prediction Accuracy: 0.50


## Making Predictions for a New Student

We use the trained model to predict whether a new student with specific features will pass.  
This simulates how the model can be used in real life to estimate student outcomes.

In [6]:
# Predicting for a new student
new_student = pd.DataFrame({"study_hours_per_week": [13],"attendance_percent": [84],"assignments_submitted": [4],"study_group": [1]})

# Make prediction
prediction = model.predict(new_student)

# Display result
prediction[0]

1

prediction output: 
If the result is 1, the student is predicted to pass.
If it's 0, the student is predicted to fail.

## Conclusion
In this mini project, we built a logistic regression model to predict whether students pass or fail based on their study habits and academic behavior.

What we did:

Created a realistic dataset with features like study hours, attendance, assignment submissions, and study group participation.

Trained a machine learning model using LogisticRegression.

Evaluated the model’s accuracy to understand how well it performs.

Made a real-time prediction for a new student.

This simple predictive model can be further improved with more data, feature engineering, or by trying different algorithms. But for a small dataset, it gives a clear idea of how machine learning can help in decision-making.


## Next Step: Build a Simple Streamlit App
Below is an example of a basic Streamlit app that lets users input student data and get a pass/fail prediction using the logistic regression model.

Note: This code is for demonstration only and should be run outside the notebook, from the command line or terminal, with the command:
streamlit run app.py 

In [None]:
import streamlit as st
import numpy as np
import joblib
from sklearn.linear_model import LogisticRegression

# model for demonstration

model = joblib.load("student_model.pkl")


st.title("Student Pass Prediction")

study_hours = st.number_input("Study Hours per Week", 0, 50)
attendance = st.slider("Attendance Percentage", 0, 100)
assignments = st.number_input("Assignments Submitted", 0, 10)
study_group = st.selectbox("In Study Group?", ["No", "Yes"])
study_group_value = 1 if study_group == "Yes" else 0

if st.button("Predict"):
    features = np.array([[study_hours, attendance, assignments, study_group_value]])
    prediction = model.predict(features)[0]
    if prediction == 1:
        st.success("Prediction: Pass")
    else:
        st.success("Prediction: Fail")