<a href="https://colab.research.google.com/github/EddyGiusepe/Remember_Statistics_for_Data_Science_with_Python/blob/main/Remember_statistics_Data_Science.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# <h2 align='center'>Statistics for Data Science with Python</h2> 


**Cientista de Dados Jr.:**  Dr.Eddy Giusepe Chirinos Isidro

## Importamos as nossas bibliotecas

In [1]:
import numpy as np
import pandas as pd

import scipy.stats
from scipy.stats import ttest_ind, levene, f_oneway, chi2_contingency, pearsonr

import seaborn as sns
import matplotlib.pyplot as plt


## Carregamos nossos Dados

In [2]:
df = pd.read_csv("/content/drive/MyDrive/4_teoria_IA_ML_DL_Eddy/Remember_Statistics_for_Data_Science/teachingratings.csv")
df.head(5)

Unnamed: 0,minority,age,gender,credits,beauty,eval,division,native,tenure,students,allstudents,prof,PrimaryLast,vismin,female,single_credit,upper_division,English_speaker,tenured_prof
0,yes,36,female,more,0.289916,4.3,upper,yes,yes,24,43,1,0,1,1,0,1,1,1
1,yes,36,female,more,0.289916,3.7,upper,yes,yes,86,125,1,0,1,1,0,1,1,1
2,yes,36,female,more,0.289916,3.6,upper,yes,yes,76,125,1,0,1,1,0,1,1,1
3,yes,36,female,more,0.289916,4.4,upper,yes,yes,77,123,1,1,1,1,0,1,1,1
4,no,59,male,more,-0.737732,4.5,upper,yes,yes,17,20,2,0,0,0,0,1,1,1


In [3]:
df.shape

(463, 19)

In [4]:
df.describe()

Unnamed: 0,age,beauty,eval,students,allstudents,prof,PrimaryLast,vismin,female,single_credit,upper_division,English_speaker,tenured_prof
count,463.0,463.0,463.0,463.0,463.0,463.0,463.0,463.0,463.0,463.0,463.0,463.0,463.0
mean,48.365011,6.27114e-08,3.998272,36.62419,55.177106,45.434125,0.203024,0.138229,0.421166,0.058315,0.660907,0.939525,0.779698
std,9.802742,0.7886477,0.554866,45.018481,75.0728,27.508902,0.402685,0.345513,0.49428,0.234592,0.473913,0.238623,0.414899
min,29.0,-1.450494,2.1,5.0,8.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,42.0,-0.6562689,3.6,15.0,19.0,20.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0
50%,48.0,-0.0680143,4.0,23.0,29.0,44.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0
75%,57.0,0.5456024,4.4,40.0,60.0,70.5,0.0,0.0,1.0,0.0,1.0,1.0,1.0
max,73.0,1.970023,5.0,380.0,581.0,94.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


## <font color="orange">T-Test</font>

**Hipótese nula (Null Hypothesis)**: a pontuação da avaliação é independente dos instrutores do sexo feminino e masculino

**Hipótese alternativa (Alternate Hypothesis)**: a pontuação da avaliação depende dos instrutores do sexo feminino e masculino

In [5]:
ttest_ind(df[df["gender"] == "male"]["eval"], df[df["gender"] == "female"]["eval"])

Ttest_indResult(statistic=3.249937943510772, pvalue=0.0012387609449522217)

**Probabilidade de obter uma avaliação de ensino alta ou baixa**

In [6]:
eval_mean = round(df["eval"].mean(), 3)
eval_std = round(df["eval"].std(), 3)

print(eval_mean, eval_std)

3.998 0.555


In [7]:
prob = scipy.stats.norm.cdf((4.5 - eval_mean) / eval_std)
print(1- prob)

0.1828639734596742


## <font color="orange">Levene-Test</font>

**Hipótese nula (Null Hypothesis)**: Variâncias populacionais são iguais

**Hipótese alternativa (Alternate Hypothesis)**: Variâncias populacionais não são iguais

In [8]:
levene(df[df["gender"] == "male"]["eval"], df[df["gender"] == "female"]["eval"], center = "mean")

LeveneResult(statistic=0.1903292243529225, pvalue=0.6628469836244741)

## <font color="orange">ANOVA</font>

**Hipótese nula (Null Hypothesis)**: As amostras têm os mesmos valores médios (mean)

**Hipótese alternativa (Alternate Hypothesis)**: As amostras têm diferentes valores médios (mean)

In [9]:
def group(x):
    if x <= 40:
        return "younger"
    elif x > 40 and x < 57:
        return "middle"
    else:
        return "older"

df["age_group"] = df["age"].apply(group)

**Estatísticas de beleza por faixa etária (por Idade)**

In [10]:
age_stats = df.groupby("age_group")["beauty"].agg(["count", "mean", "std"])
age_stats

Unnamed: 0_level_0,count,mean,std
age_group,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
middle,228,-0.035111,0.686637
older,122,-0.245777,0.74072
younger,113,0.336196,0.913748
