<a href="https://colab.research.google.com/github/Pradhap22/Student-Performance-Forecast/blob/main/Student_Performance_Forecast.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Orison Technologies**

Task-4: Build a Student Performance Prediction Model 🎓📊

🎯 Objective:
Develop a machine learning model to predict students' academic performance (e.g., final grades) based on various features such as attendance, study habits, and socio-economic factors.

**Step 1: Problem Understanding & Objective Definition**


Define the problem clearly. In this case, we want to predict student performance (final grades) based on various features like attendance, study habits, and socio-economic factors. This is a supervised regression problem because the target variable (final grade) is continuous.



**Step 2: Data Collection
Gather a dataset that includes:**

Target Variable: Final grades or similar performance metric.

Feature Variables: Variables that could influence the final grade, like:
Attendance: Number of absences, attendance percentage, etc.

Study Habits: Daily study hours, time spent on assignments, etc.

Socio-Economic Factors: Family income, parent’s education level, etc.
Personal Information: Age, gender

**Step 4: Importing The Necessary Libraries**

In [4]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score


**Step 4: Data Preprocessing:**

After Creating the csv file and load  the dataset, perform these preprocessing steps:


In [5]:
import pandas as pd
import numpy as np

# Number of rows
num_rows = 2000

# Generate random data for each continuous feature
data = {
    "attendance_percentage": np.random.uniform(50, 100, num_rows),  # Attendance in percentage (50-100%)
    "daily_study_hours": np.random.uniform(0, 8, num_rows),         # Daily study hours (0-8 hours)
    "assignment_hours_per_week": np.random.uniform(0, 20, num_rows), # Assignment hours per week (0-20 hours)
    "family_income": np.random.uniform(20000, 100000, num_rows),    # Family income in USD (20,000 - 100,000)
    "parent_education_years": np.random.uniform(8, 20, num_rows),   # Years of education (8-20 years)
    "age": np.random.uniform(14, 25, num_rows),                     # Age of students (14-25 years)
    "gender_numeric": np.random.choice([0, 1], num_rows),           # Gender as numeric (0 for female, 1 for male)
    "final_grade": np.random.uniform(0, 100, num_rows)              # Final grade as a percentage (0-100)
}

# Create a DataFrame
df = pd.DataFrame(data)

# Save to CSV
df.to_csv("student_performance_data.csv", index=False)

print("CSV file 'student_performance_data.csv' created successfully with 2000 rows and only continuous data.")



CSV file 'student_performance_data.csv' created successfully with 2000 rows and only continuous data.


In [7]:
df = pd.read_csv("student_performance_data.csv")
print(df.head())


   attendance_percentage  daily_study_hours  assignment_hours_per_week  \
0              91.297691           5.891451                  16.793444   
1              98.047257           5.911881                  14.377697   
2              59.272797           1.892092                  16.874802   
3              63.757378           3.058175                  16.360923   
4              90.986835           4.954716                   5.668192   

   family_income  parent_education_years        age  gender_numeric  \
0   88011.959946               13.253112  21.140897               0   
1   52861.003202               14.573945  17.827931               1   
2   54378.791502               11.421577  14.888880               1   
3   25409.534977               19.597205  17.062375               0   
4   22019.079741               18.992281  23.949462               0   

   final_grade  
0    77.711779  
1     8.763874  
2    20.224343  
3    97.586957  
4    93.784523  


In [8]:

# Define features and target variable
X = df.drop("final_grade", axis=1)  # Features (everything except 'final_grade')
y = df["final_grade"]               # Target (final grade)

# Split the data into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


**Step 5:Initialize and Train the Model**

Let's try both Linear Regression and Random Forest Regressor to predict the final grade. First, we’ll initialize the models and then train them using the training data.

In [9]:
# Initialize and train Linear Regression model
lr_model = LinearRegression()
lr_model.fit(X_train, y_train)


In [10]:
# Initialize and train Random Forest model
rf_model = RandomForestRegressor(n_estimators=100, random_state=42)
rf_model.fit(X_train, y_train)


**Step 4: Make Predictions on Test Data**

Now, we’ll use both models to make predictions on the test set.

In [11]:
# Make predictions
lr_predictions = lr_model.predict(X_test)
rf_predictions = rf_model.predict(X_test)


**Step 5: Evaluate Model Performance**

Evaluate the models' performance using Mean Absolute Error (MAE), Mean Squared Error (MSE), and R^2 Score.

 **Evaluation for Linear Regression**

In [12]:
print("Linear Regression Model Performance:")
print("MAE:", mean_absolute_error(y_test, lr_predictions))
print("MSE:", mean_squared_error(y_test, lr_predictions))
print("R^2 Score:", r2_score(y_test, lr_predictions))


Linear Regression Model Performance:
MAE: 25.28742106956771
MSE: 849.967626308891
R^2 Score: -0.022237662257066804


**Evaluation for Random Forest Regressor**

In [13]:
print("\nRandom Forest Model Performance:")
print("MAE:", mean_absolute_error(y_test, rf_predictions))
print("MSE:", mean_squared_error(y_test, rf_predictions))
print("R^2 Score:", r2_score(y_test, rf_predictions))



Random Forest Model Performance:
MAE: 25.889852650591674
MSE: 900.4647010431785
R^2 Score: -0.08296940077205694


**Example Code to Test the Model**

In [16]:
import numpy as np

# Example feature values for a new student
new_student_data = {
    "attendance_percentage": 85.0,       # Example: 85% attendance
    "daily_study_hours": 3.5,            # Example: 3.5 hours of study per day
    "assignment_hours_per_week": 10.0,   # Example: 10 hours per week spent on assignments
    "family_income": 45000.0,            # Example: $45,000 family income
    "parent_education_years": 16.0,      # Example: 16 years of education
    "age": 17.0,                         # Example: 17 years old
    "gender_numeric": 1                  # Example: Male (1 for male, 0 for female)
}

# Convert new student data to a DataFrame format (to match model input)
new_student_df = pd.DataFrame([new_student_data])

# Predict with the trained model (use Random Forest as example)
rf_prediction = rf_model.predict(new_student_df)[0]
lr_pr = lr_model.predict(new_student_df)[0]

# Output the predicted final grade
print(f"Predicted Final Grade for the new student: {rf_prediction:.2f}")
print(f"Predicted Final Grade For the new student: {lr_pr}")


Predicted Final Grade for the new student: 60.62
Predicted Final Grade For the new student: 49.49548339743224
