# Description
A program that predicts a student math score performance considering some demographics informations:
- gender
- race/ethnicity
- parental level of education
- lunch
- test preparation course

The outcome of the predict function is rather or not a student has the potencial to get a math grade score **above 69** (in a range of 0 to 100).

# Libraries Used
- Pandas
- Scikit Learn

# AI predict method
- Linear Regression Algorithm

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score, confusion_matrix, classification_report, ConfusionMatrixDisplay
from sklearn.model_selection import train_test_split

In [None]:
students_demographics = pd.read_csv('/kaggle/input/student-performance-in-mathematics/exams.csv')
students_demographics

# Normalizing data
Given the table above we can see that every data is very diferent from each other, so I'm going to assign numbers to each group of column values so we can have a table only with numbers.

Example:
`gender` column values will be replaced from `female` and `male` to `1` and `0`.


| gender | number representation  |
|--------|------------------------|
| male   | 0                      |
| female | 1                      |


In [None]:
race_ethnicity_dict = {
    "group A": 0,
    "group B": 1,
    "group C": 2,
    "group D": 3,
    "group E": 4
}

parental_education_dict = {
    "some college": 0,
    "associate's degree": 1,
    "some high school": 2,
    "bachelor's degree": 3,
    "master's degree": 4,
    "high school": 5
}

gender_dict = {
    "male": 0,
    "female": 1
}

lunch_dict = {
    "free/reduced": 0,
    "standard": 1
}

preparation_course = {
    "none": 0,
    "completed": 1
}

In [None]:
def transform_demographics(data):
    data['gender'] = gender_dict[data['gender']]
    data['race/ethnicity'] = race_ethnicity_dict[data['race/ethnicity']]
    data['parental level of education'] = parental_education_dict[data['parental level of education']]
    data['lunch'] = lunch_dict[data['lunch']]
    data['test preparation course'] = preparation_course[data['test preparation course']]
    data['math score'] = 1 if data['math score'] > 69 else 0
    data['reading score'] = 1 if data['reading score'] > 69 else 0
    data['writing score'] = 1 if data['writing score'] > 69 else 0
    
    return data
    
transformed_students_demographics = students_demographics.transform(transform_demographics, axis=1)
transformed_students_demographics

In [None]:
y_target_math_score = transformed_students_demographics['math score'].astype('int')

In [None]:
x_data = transformed_students_demographics.drop(columns=['math score', 'reading score', 'writing score'])
x_data

# Splitting and Model Trainning

In [None]:
X_train, X_test, y_train, y_test = train_test_split(x_data, y_target_math_score, test_size=0.3)

In [None]:
lr = LogisticRegression()
lr.fit(X_train, y_train)

In [None]:
y_result = lr.predict(X_test)

# Result Metrics


In [None]:
print(classification_report(y_test, y_result))

In [None]:
confusion_matrix(y_test, y_result)

# Playground
Testing the prediction using the generated trained model

In [None]:
student = pd.DataFrame(
    data={
        'gender': 1,
        'race/ethnicity': 4,
        'parental level of education': 5,
        'lunch': 1,
        'test preparation course': 1
    },
    index=[0]
)

result = lr.predict(student)
result