# **Student Performance Prediction**


### **Abstract:**

In the realm of education, understanding and predicting student performance is paramount for educators and policymakers alike. By leveraging various factors such as socio-economic background, prior academic achievements, and personal attributes, predictive models can offer insights into student outcomes. This paper aims to explore the efficacy of machine learning algorithms in forecasting student performance and to identify key predictors that influence academic success. We utilize a dataset comprising demographic information, academic records, and socio-economic indicators to train and evaluate predictive models. Our findings indicate that certain features such as parental education, attendance rates, and study habits significantly impact student performance. The predictive models developed in this study exhibit promising accuracy and can serve as valuable tools for educational institutions in identifying at-risk students and implementing targeted interventions.


### **Introduction:**

In today's educational landscape, the ability to anticipate student performance has profound implications for educational institutions, policymakers, and stakeholders. Understanding the factors that contribute to academic success or failure enables educators to implement proactive measures aimed at enhancing student outcomes and narrowing achievement gaps. While traditional methods of assessment and evaluation provide valuable insights, the advent of machine learning techniques offers new avenues for predictive analytics in education.

The predictive modeling of student performance involves analyzing a myriad of factors that extend beyond academic aptitude alone. Socio-economic background, familial support structures, learning environments, and personal attributes all play pivotal roles in shaping student achievement. By harnessing the power of machine learning algorithms, researchers can discern intricate patterns within vast datasets and derive predictive models capable of forecasting student performance with remarkable accuracy.

This study endeavors to explore the predictive capabilities of machine learning algorithms in forecasting student performance. By leveraging a comprehensive dataset encompassing demographic information, academic records, and socio-economic indicators, we seek to identify key predictors that influence academic outcomes. Through rigorous analysis and model evaluation, we aim to elucidate the complex interplay between various factors and student success.

The remainder of this paper is organized as follows: Section 2 provides an overview of related work in the field of predictive modeling in education. Section 3 outlines the methodology employed in this study, including data preprocessing, feature selection, and model training. Section 4 presents the results of our analysis, including model performance metrics and key findings. Finally, Section 5 discusses the implications of our findings and avenues for future research in the realm of student performance prediction.

# **Data Preprocessing**

**Step 1:** Import the necessary libraries

**Step 2:** Read the data from kaggle and print the first 5 and last 5 rows.



In [1]:
# https://www.kaggle.com/datasets/spscientist/students-performance-in-exams

**Step 3:** Print the rows and columns of the dataset.

**Step 4:** Find null values.

**Step 5:** What is the type of each column of data?

**Step 6:** Count the number of male and female students.

**Step 7:** Count the number of options that come under 'parental level of education

**Step 8:** What other columns have categorical values?

**Step 9:** Create a column called 'average' to find the mean of reading, writing and math score.

# **Exploratory Data Analysis**

**Step 1:** Plot a histogram of the math score, reading score and writing score as 3 subplots.

**Step 2:** Use seaborns 'pairplot' function on all the numerical data.

**Step 3:** Plot a bar plot taking 'race/ethnicity' as x-axis and 'average' as the y-axis

**Step 4:** Plot a bar chart to show parental education vs. average.

**Step 5:** What is the parental education with the lowest average?

**Step 6:** Create a new column for the students who passed (P) and failed (F)

For the 3 subjects: Math, Reading, Writing

**Step 7:** Print the new dataframe

**Step 8:** Print the number of passed and failed students in each subject

**Step 9:** Plot a graph to show a double bar graph against parental level of education vs. math, reading, writing score. Construct 3 grpahs as subplots.

**Step 10:** Plot a graph to show a double bar graph against test preparation course of education vs. math, reading, writing score. Construct 3 graphs as subplots.



# **Building the Model**

Let's predict the writing score for a particular student.

**Step 1:** Define the X and y

**Step 2:** Identify the various categorical columns present in the dataset.

**Step 3:** Convert the categorical variables into numerical variables by one hot encoding.

**Step 4**: Do train test split

**Step 5:** Implement various models to see which ones have the highest accuracy.

# **Model Evaluation:**

**Step 1:** Calculate the best models accuracy.

**Step 2:** Make predictions

**Step 3:** Construct a confusion matrix for the predictions