## Trazer testes de hipóteses para variáveis irrelevantes, por exemplo

# Teste de hipóteses é uma área da estatística onde estamos interessados em ter evidências estatísticas sobre uma hipótese em relação a população.

## Existem diversos testes de hipóteses para diferentes objetivos. Nesse capítulo vou explorar:
- Teste de média com uma amostra
- Valor p
- Erro tipo 1
- Erro tipo 2
- Tipos de teste

In [1]:
import pandas as pd
from scipy.stats import ttest_1samp, norm
from statsmodels.stats.weightstats import ztest
pd.set_option('display.max_columns', None)

In [2]:
late_shipments = pd.read_feather("dados/late_shipments.feather")
stack_overflow = pd.read_feather("dados/stack_overflow.feather")
repub_votes_potus_08_12 = pd.read_feather("dados/repub_votes_potus_08_12.feather")
dem_votes_potus_12_16 = pd.read_feather("dados/dem_votes_potus_12_16.feather")

In [3]:
stack_overflow.head(3)

Unnamed: 0,respondent,main_branch,hobbyist,age,age_1st_code,age_first_code_cut,comp_freq,comp_total,converted_comp,country,currency_desc,currency_symbol,database_desire_next_year,database_worked_with,dev_type,ed_level,employment,ethnicity,gender,job_factors,job_sat,job_seek,language_desire_next_year,language_worked_with,misc_tech_desire_next_year,misc_tech_worked_with,new_collab_tools_desire_next_year,new_collab_tools_worked_with,new_dev_ops,new_dev_ops_impt,new_ed_impt,new_job_hunt,new_job_hunt_research,new_learn,new_off_topic,new_onboard_good,new_other_comms,new_overtime,new_purchase_research,purple_link,newso_sites,new_stuck,op_sys,org_size,platform_desire_next_year,platform_worked_with,purchase_what,sexuality,so_account,so_comm,so_part_freq,so_visit_freq,survey_ease,survey_length,trans,undergrad_major,webframe_desire_next_year,webframe_worked_with,welcome_change,work_week_hrs,years_code,years_code_pro,age_cat
0,36.0,"I am not primarily a developer, but I write co...",Yes,34.0,30.0,adult,Yearly,60000.0,77556.0,United Kingdom,Pound sterling,GBP,Microsoft SQL Server;MongoDB;SQLite,IBM DB2;Microsoft SQL Server;MongoDB;SQLite,Data or business analyst;Data scientist or mac...,Some college/university study without earning ...,Employed full-time,White or of European descent,Man,Flex time or a flexible schedule;Office enviro...,Slightly satisfied,"I’m not actively looking, but I am open to new...",C#;Go;HTML/CSS;JavaScript;Python;SQL,C#;Go;HTML/CSS;Java;JavaScript;Python;R;SQL,Keras;Node.js;Pandas;TensorFlow,Node.js;Pandas,Confluence;Jira;Github;Slack;Trello,Confluence;Jira;Github;Slack;Trello,Not sure,Neutral,Somewhat important,Having a bad day (or week or month) at work;Cu...,,Every few months,No,Yes,No,Sometimes: 1-2 days per month but less than we...,,"Hello, old friend",Stack Overflow (public Q&A for anyone who code...,Visit Stack Overflow;Go for a walk or other ph...,Windows,"1,000 to 4,999 employees",Linux;MacOS;Windows,MacOS;Windows,I have little or no influence,Straight / Heterosexual,Yes,"Yes, somewhat",Less than once per month or monthly,Multiple times per day,Easy,Appropriate in length,No,"Computer science, computer engineering, or sof...",Express;React.js,Express;React.js,Just as welcome now as I felt last year,40.0,4.0,3.0,At least 30
1,47.0,I am a developer by profession,Yes,53.0,10.0,child,Yearly,58000.0,74970.0,United Kingdom,Pound sterling,GBP,PostgreSQL;SQLite,Microsoft SQL Server;Oracle;PostgreSQL;SQLite,Data scientist or machine learning specialist;...,"Other doctoral degree (Ph.D., Ed.D., etc.)",Employed full-time,White or of European descent,Man,Remote work options;How widely used or impactf...,Very satisfied,"I’m not actively looking, but I am open to new...",Bash/Shell/PowerShell;Java;Python;SQL,Bash/Shell/PowerShell;C#;Java;JavaScript;Pytho...,Pandas,.NET;.NET Core,Github;Gitlab,Confluence;Jira;Github;Gitlab;Microsoft Azure;...,Yes,Neutral,Not at all important/not necessary,Just because;Having a bad day (or week or mont...,"Read company media, such as employee blogs or ...",Once a year,No,Onboarding? What onboarding?,Yes,Occasionally: 1-2 days per quarter but less th...,Start a free trial;Ask developers I know/work ...,"Hello, old friend",Stack Overflow (public Q&A for anyone who code...,Call a coworker or friend;Visit Stack Overflow...,Linux-based,10 to 19 employees,Arduino;Docker;Linux;Raspberry Pi,Arduino;AWS;Linux;Microsoft Azure;Raspberry Pi,I have some influence,Straight / Heterosexual,Yes,"Yes, definitely",A few times per week,A few times per week,Neither easy nor difficult,Appropriate in length,No,"A natural science (such as biology, chemistry,...",Flask;Spring,Flask;Spring,Just as welcome now as I felt last year,40.0,43.0,28.0,At least 30
2,69.0,I am a developer by profession,Yes,25.0,12.0,child,Yearly,550000.0,594539.0,France,European Euro,EUR,PostgreSQL,MongoDB,Data scientist or machine learning specialist;...,"Master’s degree (M.A., M.S., M.Eng., MBA, etc.)",Employed full-time,White or of European descent,Man,Flex time or a flexible schedule;How widely us...,Very satisfied,I am not interested in new job opportunities,Python;Rust;Scala;SQL,HTML/CSS;Python,Keras;Pandas;TensorFlow,Keras;Pandas;TensorFlow,"Github;Slack;Google Suite (Docs, Meet, etc)",Confluence;Jira;Github;Slack;Google Suite (Doc...,Yes,Extremely important,Very important,Curious about other opportunities;Better compe...,"Read company media, such as employee blogs or ...",Once a year,No,No,No,Sometimes: 1-2 days per month but less than we...,Ask developers I know/work with;Visit develope...,"Hello, old friend",Stack Overflow (public Q&A for anyone who code...,Call a coworker or friend;Visit Stack Overflow...,MacOS,20 to 99 employees,Kubernetes;Linux,Linux;Microsoft Azure,I have some influence,Bisexual,Yes,"Yes, somewhat",A few times per month or weekly,A few times per week,Easy,Too short,No,"Computer science, computer engineering, or sof...",Django;Flask,Django;Flask,Just as welcome now as I felt last year,40.0,13.0,3.0,Under 30


# Testar a hipótese de que cientistas de dados ganham anualmente mais que 110K dólares anualmente

In [4]:
mean_comp_samp = stack_overflow['converted_comp'].mean()

## Calcular o desvio padrão usando a amostra bootstrap

In [5]:
std_comp_samp = stack_overflow['converted_comp'].std()

## Verificar se isso é o mesmo o Z score

In [6]:
(mean_comp_samp - 110000)/std_comp_samp

0.03605535085945209

0         77556.0
1         74970.0
2        594539.0
3       2000000.0
4         37816.0
          ...    
2256     145000.0
2257      33972.0
2258      97284.0
2259      72000.0
2260     180000.0
Name: converted_comp, Length: 2261, dtype: float64

## O valor-p é uma medida que calcula a probabilidade da estatística de teste ser improvável. Quanto maior o valor-p maior a evidência em favor a hipótese nula. Comparamos o valor p com a probabilidade do erro 1 para sabermos se rejeitamos ou não a hipótese nula. Normalmente os valores da probabilidade do erro do tipo 1(alpha) são 0.1, 0.05, 0.01

![image.png](attachment:image.png)
![image-2.png](attachment:image-2.png)
![image-3.png](attachment:image-3.png)
![image-4.png](attachment:image-4.png)

## Fazer o teste usando a função

In [9]:
ztest(stack_overflow['converted_comp'], value=110000, alternative='larger')

(1.7144309855393731, 0.04322480095025359)