# Projeto 1 - Ciência dos Dados 
##### Beatriz Muniz de Castro e Silva


## Introdução



Nesse projeto proposto em sala, o objetivo é examinar uma análise exploratória de uma base de dados obtida no site GapMinder. 

O tema geral do projeto é examinar como o pertencimento à uma organização internacional (ONU, MERCOSUL, G7, etc) pode afetar ou não algum aspecto dos países pertencentes. Para isso, devemos fazer uma comparação ao longo do tempo de países antes e depois de entrarem na organização escolhida, além de realizar uma comparação com países não integrantes.

## Pergunta a ser respondida

Os países que ingressam na União Europeia (UE) possuem um crescimento relativo no número de usuários de Internet maior do que os países fora da organização?

## Hipótese

Ao se ingressar na UE, mais pessoas possuem acesso à internet.

## Mecanismo

A União Europeia possui um sistema de mercado comum e união aduaneira entre os países membros, o que é benéfico financeiramente para tais. Além disso, maior parte dos países do bloco apresentaram, em 2018, um IDH (Índice de Desenvolvimento Humano) muito alto, o que evidencia ótima qualidade de vida e desenvolvimento econômico da população.

Com todos esses fatores, a economia e o nível de desenvolvimento dos países ingressantes deve crescer, e com o crescimento na economia, mais pessoas devem ter acesso à Internet.

## Base de Dados e Fontes

As bases de dados utilizadas foram obtidas no site GapMinder (https://www.gapminder.org/data/)

Fonte "Internet user per 100.xlsx" : The World Bank (https://data.worldbank.org/indicator/IT.NET.USER.ZS) - Dados de porcentagem da população de cada país que é usuário da internet

Fonte "indicator gapminder gdp_per_capita_ppp.xlsx" : múltiplas fontes, porém os dados usados provém do The World Bank (https://data.worldbank.org/indicator/NY.GDP.PCAP.PP.KD) - Dados de gdp per capita

https://europa.eu/european-union/about-eu/history_pt - Informações sobre o mercado comum e união aduaneira da UE

http://www.br.undp.org/content/brazil/pt/home/idh0/rankings/idh-global.html - Ranking de IDH global de 2018 dividido em IDHs muito altos, altos, médios e baixos

https://europa.eu/european-union/about-eu/countries_en#tab-0-1 - Lista de países membros da UE e data de entrada

#### Carregando os Dados


In [17]:
#importando pacotes necessários
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

In [41]:
#leitura dos dados
dados = pd.read_excel('Internet user per 100.xlsx')
gdp = pd.read_excel('indicator gapminder gdp_per_capita_ppp.xlsx')

#### Organização Inicial dos Dados

Os dados de GDP per capita serão usados mais a baixo como critério para selecionar países para comparação com os países analizados. Os dados foram reorganizados de modo que os países representem as colunas para melhor navegação e foram selecionados apenas os dados de 1990-2011, que é o intervalo de tempo para o qual temos os dados de usuários de internet.

In [42]:
#reorganização dos dados de GDP per capita
gdp = gdp.set_index("Country").T
gdp = gdp[190:212]
gdp

Country,Abkhazia,Afghanistan,Akrotiri and Dhekelia,Albania,Algeria,American Samoa,Andorra,Angola,Anguilla,Antigua and Barbuda,...,North Yemen (former),South Yemen (former),Yemen,Yugoslavia,Zambia,Zimbabwe,Åland,South Sudan,nan,nan.1
1990,,1028.0,,4350.0,10113.0,,28417.0,4232.0,,17154.0,...,,,3441.0,,2407.0,2532.0,,2013.0,,
1991,,1022.0,,3081.0,9748.0,,28029.0,4056.0,,17361.0,...,,,3482.0,,2348.0,2604.0,,2089.0,,
1992,,941.0,,2877.0,9693.0,,27218.0,3656.0,,17226.0,...,,,3578.0,,2253.0,2316.0,,2137.0,,
1993,,810.0,,3172.0,9279.0,,26011.0,2663.0,,17753.0,...,,,3536.0,,2351.0,2292.0,,2141.0,,
1994,,725.0,,3457.0,9006.0,,25907.0,2669.0,,18400.0,...,,,3598.0,,2098.0,2456.0,,2125.0,,
1995,,872.0,,3941.0,9168.0,,26143.0,2859.0,,17167.0,...,,,3644.0,,2106.0,2416.0,,2132.0,,
1996,,895.0,,4326.0,9375.0,,27180.0,3091.0,,17830.0,...,,,3676.0,,2180.0,2619.0,,2183.0,,
1997,,921.0,,3909.0,9322.0,,29729.0,3246.0,,18186.0,...,,,3746.0,,2203.0,2645.0,,2338.0,,
1998,,947.0,,4434.0,9646.0,,30819.0,3376.0,,18483.0,...,,,3857.0,,2136.0,2680.0,,2407.0,,
1999,,972.0,,4912.0,9810.0,,31940.0,3389.0,,18780.0,...,,,3892.0,,2176.0,2625.0,,2518.0,,


In [43]:
#reorganização dos dados de porcentagem de usuários da Internet por população
dados = dados.set_index("Country").T
dados

Country,Abkhazia,Afghanistan,Akrotiri and Dhekelia,Albania,Algeria,American Samoa,Andorra,Angola,Anguilla,Antigua and Barbuda,...,Antarctica,"Virgin Islands, British",Hawaiian Trade Zone,U.S. Pacific Islands,Wake Island,Bonaire,Sark,Chinese Taipei,Saint Eustatius,Saba
1990,,0.0,,0.0,0.0,0.0,0.0,0.0,,0.0,...,,,,,,,,,,
1991,,,,,,,,,,,...,,,,,,,,,,
1992,,,,,,,,,,,...,,,,,,,,,,
1993,,,,,,,,,,,...,,,,,,,,,,
1994,,,,,0.000361,,,,,,...,,,,,,,,,,
1995,,,,0.011169,0.001769,,,,,2.200769,...,,,,,,,,,,
1996,,,,0.032197,0.001739,,1.526601,0.000776,,2.85845,...,,,,,,,,,,
1997,,,,0.048594,0.010268,,3.050175,0.005674,,3.480537,...,,,,,,,,,,
1998,,,,0.065027,0.020239,,6.886209,0.018454,,4.071716,...,,,,,,,,,,
1999,,,,0.081437,0.199524,,7.635686,0.071964,,5.300681,...,,,,,,,,,,


Para melhor analisar os dados, criaremos dois novos Dataframes com os países da UE. Esses Dataframes não serão combinados pois, por se tratar de duas séries temporais, causaria mais confusão do que esclarecimento.

In [44]:
#separação dos países da UE (porcentagem de usuários da Internet)
pue = dados.loc[:,['Austria','Belgium','Bulgaria','Croatia','Cyprus','Czech Republic','Denmark','Estonia','Finland','France','Germany','Greece','Hungary','Ireland','Italy','Latvia','Lithuania','Luxemborg','Malta','Netherlands','Poland','Portugal','Romania','Slovak Republic','Slovenia','Spain','Sweden','United Kingdom']]
pue

Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.

See the documentation here:
https://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike
  return self._getitem_tuple(key)


Country,Austria,Belgium,Bulgaria,Croatia,Cyprus,Czech Republic,Denmark,Estonia,Finland,France,...,Malta,Netherlands,Poland,Portugal,Romania,Slovak Republic,Slovenia,Spain,Sweden,United Kingdom
1990,0.130245,0.001005,0.0,0.0,0.0,0.0,0.097279,0.0,0.401088,0.051358,...,0.0,0.33305,0.0,0.0,0.0,0.0,0.0,0.012886,0.584192,0.087305
1991,0.257902,0.020016,,,,,0.194084,,1.396163,0.136293,...,,0.528847,0.005218,0.100014,,,,0.025713,1.160432,0.174063
1992,0.637697,0.099636,,,0.05006,,0.386964,0.065253,1.884176,0.271338,...,,1.312927,0.05197,0.250731,,,,0.076882,1.499701,0.260366
1993,0.758953,0.19841,0.002361,0.096982,0.056367,0.580761,0.578636,0.301363,2.5659,0.574087,...,,1.957085,0.129471,0.452217,0.003733,0.128074,0.40166,0.127735,1.720367,0.519429
1994,1.386068,0.691999,0.019541,0.268816,0.111035,1.257874,1.345829,1.163321,4.9132,0.874352,...,,3.244027,0.3873,0.724185,0.026393,0.318953,1.055578,0.280238,3.416398,1.036123
1995,1.887201,0.986093,0.118962,0.514029,0.410121,1.452462,3.82565,2.786853,13.900337,1.590891,...,0.234801,6.457731,0.644342,1.508423,0.074942,0.523883,2.864506,0.381186,5.09803,1.894456
1996,6.910418,2.95133,0.717462,0.890082,0.673344,1.939395,5.706452,3.535403,16.782053,2.5088,...,1.098336,9.640759,1.287641,3.013829,0.221117,0.784376,5.028554,1.333241,9.05017,4.122936
1997,9.538177,4.905171,1.203075,1.749807,4.378272,2.913393,11.366027,5.721171,19.456495,4.129441,...,4.090527,14.061157,2.059488,5.01198,0.443696,1.17482,7.552927,2.805178,23.749063,7.385629
1998,15.41992,7.829148,1.816698,3.332665,8.890539,3.889977,22.649215,10.829198,25.440306,6.13081,...,6.774804,22.220455,4.070753,9.990346,2.224249,2.69282,10.092528,4.363312,33.47385,13.67019
1999,23.022317,13.667105,2.857298,4.391857,11.342399,6.817172,30.584504,14.546308,32.273341,8.862279,...,8.086648,39.085436,5.410683,14.918747,2.67485,5.443129,12.606697,7.089058,41.407585,21.289628


In [46]:
#separação dos países da UE (GDP per capita)
puegdp = gdp.loc[:,['Austria','Belgium','Bulgaria','Croatia','Cyprus','Czech Republic','Denmark','Estonia','Finland','France','Germany','Greece','Hungary','Ireland','Italy','Latvia','Lithuania','Luxemborg','Malta','Netherlands','Poland','Portugal','Romania','Slovak Republic','Slovenia','Spain','Sweden','United Kingdom']]
puegdp

Country,Austria,Belgium,Bulgaria,Croatia,Cyprus,Czech Republic,Denmark,Estonia,Finland,France,...,Malta,Netherlands,Poland,Portugal,Romania,Slovak Republic,Slovenia,Spain,Sweden,United Kingdom
1990,31053.0,30798.0,9333.0,17890.0,23802.0,19839.0,33256.0,13260.0,28599.0,29476.0,...,16596.0,32534.0,10088.0,20282.0,11449.0,14366.0,19147.0,24126.0,30901.0,26424.0
1991,31802.0,31247.0,8630.0,14020.0,23366.0,17577.0,33601.0,12008.0,26761.0,29707.0,...,17459.0,33066.0,9347.0,21216.0,10059.0,12223.0,17370.0,24684.0,30340.0,26017.0
1992,32113.0,31596.0,8089.0,12269.0,24891.0,17470.0,34152.0,10523.0,25726.0,30033.0,...,18100.0,33377.0,9553.0,21464.0,9253.0,11364.0,16351.0,24831.0,29813.0,26062.0
1993,32017.0,31170.0,8033.0,11182.0,24467.0,17462.0,34008.0,9870.0,25414.0,29719.0,...,18732.0,33562.0,9884.0,21000.0,9406.0,11536.0,16748.0,24498.0,29028.0,26688.0
1994,32660.0,32077.0,8207.0,11764.0,25380.0,17964.0,35766.0,9870.0,26301.0,30303.0,...,19616.0,34348.0,10386.0,21146.0,9794.0,12204.0,17576.0,25015.0,30000.0,27691.0
1995,33480.0,32774.0,8479.0,12543.0,26445.0,19093.0,36670.0,10464.0,27303.0,30823.0,...,20720.0,35244.0,11093.0,21975.0,10516.0,12879.0,18240.0,25645.0,31044.0,28317.0
1996,34237.0,33217.0,8659.0,13797.0,26450.0,19934.0,37521.0,11245.0,28210.0,31141.0,...,21373.0,36152.0,11776.0,22658.0,10969.0,13720.0,18894.0,26270.0,31465.0,28998.0
1997,34952.0,34377.0,8617.0,14463.0,26704.0,19821.0,38584.0,12709.0,29884.0,31756.0,...,22344.0,37414.0,12602.0,23555.0,10329.0,14525.0,19887.0,27167.0,32360.0,29662.0
1998,36157.0,34992.0,8976.0,14965.0,27721.0,19777.0,39297.0,13705.0,31423.0,32764.0,...,23347.0,38816.0,13225.0,24560.0,9855.0,15085.0,20585.0,28238.0,33709.0,30614.0
1999,37382.0,36209.0,8516.0,14652.0,28698.0,20082.0,40321.0,13723.0,32743.0,33707.0,...,24330.0,40306.0,13824.0,25371.0,9752.0,15039.0,21655.0,29353.0,35208.0,31474.0


Abaixo iremos separar os dados dos países que ingressaram na UE em 2004. Como possuimos poucos dados, escolhemos os de 2004 por possuirem uma quantidade razoável de dados pré e pós entrada na UE.