# Análise Estatística — Student Performance

**Arquivo:** `Student_performance_data.csv`

**Link Kaggle** `https://www.kaggle.com/datasets/rabieelkharoua/students-performance-dataset`

**Objetivo:** Aplicar conceitos de inferência estatística (testes de hipótese e intervalos de confiança via bootstrap ou métodos analíticos) usando Python. Este notebook contém: exploração de dados, formulação de hipóteses, escolha e aplicação de testes, bootstrap para ICs e visualizações.

## Estrutura do notebook

1. Carregar dados
2. Análise exploratória (tipos, valores ausentes, estatísticas)
3. Formulação de hipóteses
4. Testes de hipótese com verificação de pressupostos
5. Intervalos de confiança por bootstrap
6. Visualizações
7. Conclusões

In [None]:
# Imports e carregamento de dados
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
import warnings

warnings.filterwarnings('ignore')

In [5]:
path = 'student_performance_data.csv'
df = pd.read_csv(path)
print('Dados carregados. Shape:', df.shape)
df.head()

Dados carregados. Shape: (2392, 15)


Unnamed: 0,StudentID,Age,Gender,Ethnicity,ParentalEducation,StudyTimeWeekly,Absences,Tutoring,ParentalSupport,Extracurricular,Sports,Music,Volunteering,GPA,GradeClass
0,1001,17,1,0,2,19.833723,7,1,2,0,0,1,0,2.929196,2.0
1,1002,18,0,0,1,15.408756,0,0,1,0,0,0,0,3.042915,1.0
2,1003,15,0,2,3,4.21057,26,0,2,0,0,0,0,0.112602,4.0
3,1004,17,1,0,3,10.028829,14,0,3,1,0,0,0,2.054218,3.0
4,1005,17,1,0,2,4.672495,17,1,3,0,0,0,0,1.288061,4.0


In [6]:
# Exploração inicial
display(df.dtypes.to_frame(name='dtype').assign(n_unique=df.nunique(), n_missing=df.isna().sum()))
print('\nResumo estatístico (numérico):\n')
display(df.describe().T)
print('\nResumo (todas colunas):')
display(df.describe(include='all').T)

print('Colunas:', df.columns.tolist())
possible_score_cols = [c for c in df.columns if any(s in c.lower() for s in ['gpa','grade','score','final','mark'])]
print('Colunas relacionadas a notas encontradas:', possible_score_cols)

if 'GPA' in df.columns:
    df['avg_score'] = df['GPA']
elif possible_score_cols:
    df['avg_score'] = df[possible_score_cols].mean(axis=1)

if 'Gender' in df.columns:
    print('\nContagem por Gender:\n', df['Gender'].value_counts())

display(df.head())

Unnamed: 0,dtype,n_unique,n_missing
StudentID,int64,2392,0
Age,int64,4,0
Gender,int64,2,0
Ethnicity,int64,4,0
ParentalEducation,int64,5,0
StudyTimeWeekly,float64,2392,0
Absences,int64,30,0
Tutoring,int64,2,0
ParentalSupport,int64,5,0
Extracurricular,int64,2,0



Resumo estatístico (numérico):



Unnamed: 0,count,mean,std,min,25%,50%,75%,max
StudentID,2392.0,2196.5,690.655244,1001.0,1598.75,2196.5,2794.25,3392.0
Age,2392.0,16.468645,1.123798,15.0,15.0,16.0,17.0,18.0
Gender,2392.0,0.51087,0.499986,0.0,0.0,1.0,1.0,1.0
Ethnicity,2392.0,0.877508,1.028476,0.0,0.0,0.0,2.0,3.0
ParentalEducation,2392.0,1.746237,1.000411,0.0,1.0,2.0,2.0,4.0
StudyTimeWeekly,2392.0,9.771992,5.652774,0.001057,5.043079,9.705363,14.40841,19.978094
Absences,2392.0,14.541388,8.467417,0.0,7.0,15.0,22.0,29.0
Tutoring,2392.0,0.301421,0.458971,0.0,0.0,0.0,1.0,1.0
ParentalSupport,2392.0,2.122074,1.122813,0.0,1.0,2.0,3.0,4.0
Extracurricular,2392.0,0.383361,0.486307,0.0,0.0,0.0,1.0,1.0



Resumo (todas colunas):


Unnamed: 0,count,mean,std,min,25%,50%,75%,max
StudentID,2392.0,2196.5,690.655244,1001.0,1598.75,2196.5,2794.25,3392.0
Age,2392.0,16.468645,1.123798,15.0,15.0,16.0,17.0,18.0
Gender,2392.0,0.51087,0.499986,0.0,0.0,1.0,1.0,1.0
Ethnicity,2392.0,0.877508,1.028476,0.0,0.0,0.0,2.0,3.0
ParentalEducation,2392.0,1.746237,1.000411,0.0,1.0,2.0,2.0,4.0
StudyTimeWeekly,2392.0,9.771992,5.652774,0.001057,5.043079,9.705363,14.40841,19.978094
Absences,2392.0,14.541388,8.467417,0.0,7.0,15.0,22.0,29.0
Tutoring,2392.0,0.301421,0.458971,0.0,0.0,0.0,1.0,1.0
ParentalSupport,2392.0,2.122074,1.122813,0.0,1.0,2.0,3.0,4.0
Extracurricular,2392.0,0.383361,0.486307,0.0,0.0,0.0,1.0,1.0


Colunas: ['StudentID', 'Age', 'Gender', 'Ethnicity', 'ParentalEducation', 'StudyTimeWeekly', 'Absences', 'Tutoring', 'ParentalSupport', 'Extracurricular', 'Sports', 'Music', 'Volunteering', 'GPA', 'GradeClass']
Colunas relacionadas a notas encontradas: ['GPA', 'GradeClass']

Contagem por Gender:
 Gender
1    1222
0    1170
Name: count, dtype: int64


Unnamed: 0,StudentID,Age,Gender,Ethnicity,ParentalEducation,StudyTimeWeekly,Absences,Tutoring,ParentalSupport,Extracurricular,Sports,Music,Volunteering,GPA,GradeClass,avg_score
0,1001,17,1,0,2,19.833723,7,1,2,0,0,1,0,2.929196,2.0,2.929196
1,1002,18,0,0,1,15.408756,0,0,1,0,0,0,0,3.042915,1.0,3.042915
2,1003,15,0,2,3,4.21057,26,0,2,0,0,0,0,0.112602,4.0,0.112602
3,1004,17,1,0,3,10.028829,14,0,3,1,0,0,0,2.054218,3.0,2.054218
4,1005,17,1,0,2,4.672495,17,1,3,0,0,0,0,1.288061,4.0,1.288061
