# **Visualização da informação**
Escola de Matemática Aplicada - Fundação Getúlio Vargas

Alunos: *Edilton Brandão & Erick Brito*

---

# Projeto Final: Análise Exploratória

Estaremos usando a seguinte base de dados para esse projeto: [International football results from 1872 to 2022
](https://www.kaggle.com/datasets/martj42/international-football-results-from-1872-to-2017?select=results.csv).

Utilizaremos a tabela *results.csv*, que mostra os resultados de 44,152 partidas internacionais de futebol desde a primeira partida oficial em 1872 até 2022. As partidas variam da Copa do Mundo da FIFA à FIFI Wild Cup e amistosos regulares. As partidas são estritamente internacionais masculinas e os dados não incluem Jogos Olímpicos ou partidas em que pelo menos um dos times era o time B do país, Sub-23 ou um time selecionado da liga. 

Além disso, como estávamos interessados também no número de vitórias de cada país, fez-se necessário o uso da tabela *shoutouts.csv*, que mostra os vencedores na disputa de pênaltis de jogos eliminatórios que terminaram em empate. Nessa tabela temos apenas as colunas *date*, *home_team*, *away_team* e *winner*.

As colunas desses dataset estão explicadas na tabela abaixo:

<br />

<div align='center'>

| Coluna | Descrição |
| :-------- | :-------: |
| date | Data da partida |
| home_team | Nome do time da casa|
| away_team | Nome do time visitante|
| home_score |Pontuação do time da casa em tempo integral, incluindo prorrogação, não incluindo pênaltis |
| away_score |Pontuação do time visitante em tempo integral, incluindo prorrogação, não incluindo pênaltis |
|tournament | Nome do torneio|
|city | O nome da cidade/município/unidade administrativa onde o jogo foi disputado |
| country |O nome do país onde o jogo foi disputado |
| neutral | Coluna TRUE/FALSE indicando se a partida foi disputada em um local neutro|
| winner | Vencedor da disputa de pênaltis |

</div>

<br />


Dessas, as mais interessantes para nós são as colunas *date*, *home_team*, *away_team*, *home_score*, *away_score*, já que elas mostram os times que fizeram gols e quantos gols eles fizeram em uma partida e em que data os confrontos ocorreram. 



In [261]:
import numpy as np
import pandas as pd
import plotly.express as px
import scipy.stats
import plotly.io as pio
from datetime import date, time, datetime
import plotly.graph_objects as go

# Lendo os dados
url = "https://raw.githubusercontent.com/Erickslb/copa-do-mundo-vis/main/data/results.csv"
url2 = "https://raw.githubusercontent.com/Erickslb/copa-do-mundo-vis/main/data/shootouts.csv"

df = pd.read_csv(url)
penalties = pd.read_csv(url2)

In [262]:
# Visualizando a base
display(df)

Unnamed: 0,date,home_team,away_team,home_score,away_score,tournament,city,country,neutral
0,1872-11-30,Scotland,England,0.0,0.0,Friendly,Glasgow,Scotland,False
1,1873-03-08,England,Scotland,4.0,2.0,Friendly,London,England,False
2,1874-03-07,Scotland,England,2.0,1.0,Friendly,Glasgow,Scotland,False
3,1875-03-06,England,Scotland,2.0,2.0,Friendly,London,England,False
4,1876-03-04,Scotland,England,3.0,0.0,Friendly,Glasgow,Scotland,False
...,...,...,...,...,...,...,...,...,...
44201,2022-12-01,Canada,Morocco,1.0,2.0,FIFA World Cup,Doha,Qatar,True
44202,2022-12-02,Serbia,Switzerland,,,FIFA World Cup,Doha,Qatar,True
44203,2022-12-02,Cameroon,Brazil,,,FIFA World Cup,Lusail,Qatar,True
44204,2022-12-02,Ghana,Uruguay,,,FIFA World Cup,Al Wakrah,Qatar,True


In [263]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 44206 entries, 0 to 44205
Data columns (total 9 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   date        44206 non-null  object 
 1   home_team   44206 non-null  object 
 2   away_team   44206 non-null  object 
 3   home_score  44202 non-null  float64
 4   away_score  44202 non-null  float64
 5   tournament  44206 non-null  object 
 6   city        44206 non-null  object 
 7   country     44206 non-null  object 
 8   neutral     44206 non-null  bool   
dtypes: bool(1), float64(2), object(6)
memory usage: 2.7+ MB


## Pré-processamento

Nessa parte vamos fazer o pré-processamento necessário para partirmos para as análises. Além disso, vamos criar novos dataframes que contém informação gerada com os dados que temos, visando auxiliar nas visualizações e análises.

In [264]:
# Verificando se há valores nulos
df.isnull().sum()

date          0
home_team     0
away_team     0
home_score    4
away_score    4
tournament    0
city          0
country       0
neutral       0
dtype: int64

Pode-se notar que não há tantos dados faltantes. E os que estão faltando são do ano de 2022, pois a copa do mundo ainda está acontecendo. Além disso, como não são tantas copas, conferimos realmente se a quantidade de partidas estava correta, isto é, que nenhum jogo está faltando na base de dados, e verificamos que está tudo certo. (Em 1942 e 1946 não houve Copa do Mundo por conta da Segunda Guerra Mundial)

Para o nosso projeto, pretendemos apenas tratar das Copas do Mundo da FIFA, então, vamos remover as linhas que são relativas aos demais campeonatos, ficando apenas com as partidas de Copas do Mundo FIFA:

In [265]:
df = df[df["tournament"] == "FIFA World Cup"].reset_index()

Outra alteração que ajudará nas análises e nas visualizações é transformar os valores da coluna *date*, que são strings, em valores do tipo datetime.date, um tipo que nos permite manipular dados que representam datas. Com isso, criaremos a coluna *year*, com os anos em que as partidas ocorreram.

In [266]:
df_copas = df.copy()
df_copas['date'] = df_copas.loc[:,'date'].apply(lambda x: date.fromisoformat(x))

df_copas['year'] = [i.year for i in df_copas.loc[:,'date']]

df_copas

penalties['date'] = penalties.loc[:,'date'].apply(lambda x: date.fromisoformat(x))

penalties['year'] = [i.year for i in penalties.loc[:,'date']]

penalties



Unnamed: 0,date,home_team,away_team,winner,year
0,1967-08-22,India,Taiwan,Taiwan,1967
1,1971-11-14,South Korea,Vietnam Republic,South Korea,1971
2,1972-05-07,South Korea,Iraq,Iraq,1972
3,1972-05-17,Thailand,South Korea,South Korea,1972
4,1972-05-19,Thailand,Cambodia,Thailand,1972
...,...,...,...,...,...
505,2022-09-23,Iraq,Oman,Oman,2022
506,2022-09-25,Malaysia,Tajikistan,Tajikistan,2022
507,2022-11-16,Lithuania,Iceland,Iceland,2022
508,2022-11-16,Latvia,Estonia,Latvia,2022


In [267]:
home_scores = df_copas[['home_team', 'home_score', 'year']].rename({'home_team':'team', 'home_score':'score'}, axis='columns')
away_scores = df_copas[['away_team', 'away_score', 'year']].rename({'away_team':'team', 'away_score':'score'}, axis='columns')

scores_fifa = pd.concat([home_scores, away_scores]).reset_index(drop=True).fillna(0)

# número de gols de cada seleção por ano

scores_fifa.groupby(['team','year']).score.sum().unstack().fillna(0)

year,1930,1934,1938,1950,1954,1958,1962,1966,1970,1974,...,1986,1990,1994,1998,2002,2006,2010,2014,2018,2022
team,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Algeria,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,1.0,0.0,0.0,0.0,0.0,0.0,0.0,7.0,0.0,0.0
Angola,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
Argentina,18.0,2.0,0.0,0.0,0.0,5.0,2.0,4.0,0.0,9.0,...,14.0,5.0,8.0,10.0,2.0,11.0,10.0,8.0,6.0,5.0
Australia,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,5.0,3.0,3.0,2.0,3.0
Austria,0.0,7.0,0.0,0.0,17.0,2.0,0.0,0.0,0.0,0.0,...,0.0,2.0,0.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
United Arab Emirates,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
United States,7.0,1.0,0.0,4.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,2.0,3.0,1.0,7.0,2.0,5.0,5.0,0.0,2.0
Uruguay,15.0,0.0,0.0,15.0,16.0,0.0,4.0,2.0,4.0,1.0,...,2.0,2.0,0.0,0.0,4.0,0.0,11.0,4.0,7.0,0.0
Wales,0.0,0.0,0.0,0.0,0.0,4.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0


In [268]:
home_conceded = df_copas[['home_team', 'away_score', 'year']].rename({'home_team':'team', 'away_score':'conceded'}, axis='columns')
away_conceded = df_copas[['away_team', 'home_score', 'year']].rename({'away_team':'team', 'home_score':'conceded'}, axis='columns')

conceded_fifa = pd.concat([home_conceded, away_conceded]).reset_index(drop=True).fillna(0)

# número de gols tomados por cada seleção
conceded_fifa.groupby(['team','year']).conceded.sum().unstack().fillna(0)

year,1930,1934,1938,1950,1954,1958,1962,1966,1970,1974,...,1986,1990,1994,1998,2002,2006,2010,2014,2018,2022
team,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Algeria,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,5.0,0.0,0.0,0.0,0.0,0.0,2.0,7.0,0.0,0.0
Angola,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0
Argentina,9.0,3.0,0.0,0.0,0.0,10.0,3.0,2.0,0.0,12.0,...,5.0,4.0,6.0,4.0,2.0,3.0,6.0,4.0,9.0,2.0
Australia,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0,...,0.0,0.0,0.0,0.0,0.0,6.0,6.0,9.0,5.0,4.0
Austria,0.0,7.0,0.0,0.0,12.0,7.0,0.0,0.0,0.0,0.0,...,0.0,3.0,0.0,4.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
United Arab Emirates,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,11.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
United States,6.0,7.0,0.0,8.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,8.0,4.0,5.0,7.0,6.0,5.0,6.0,0.0,1.0
Uruguay,3.0,0.0,0.0,5.0,9.0,0.0,6.0,5.0,5.0,6.0,...,8.0,5.0,0.0,0.0,5.0,0.0,8.0,6.0,3.0,2.0
Wales,0.0,0.0,0.0,0.0,0.0,4.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,6.0


In [269]:
# Construindo um DataFrame com os campeões da copa do mundo
# Dá pra usar a mesma ideia para vitórias e derrotas
campeoes = []

for i in df_copas['year'].unique()[:-1]:
  data_ultimo = (df_copas[df_copas['year'] == i])['date'].max()
  ultimo_jogo = df_copas[df_copas['date'] == data_ultimo].reset_index(drop=True)
  if ultimo_jogo['away_score'][0] < ultimo_jogo['home_score'][0]:
    campeoes.append((i,(ultimo_jogo['home_team'])[0]))
  elif ultimo_jogo['away_score'][0] > ultimo_jogo['home_score'][0]:
    campeoes.append((i,(ultimo_jogo['away_team'])[0]))
  else:
    vencedor_penaltis = penalties[penalties['date'] == data_ultimo].reset_index(drop=True)['winner'][0]
    campeoes.append((i,vencedor_penaltis))

campeoes = pd.DataFrame(campeoes, columns = ['year', 'team'])

campeoes = campeoes.append({'year': 2022, 'team': '?'}, ignore_index = True)

campeoes

Unnamed: 0,year,team
0,1930,Uruguay
1,1934,Italy
2,1938,Brazil
3,1950,Uruguay
4,1954,Germany
5,1958,Brazil
6,1962,Brazil
7,1966,England
8,1970,Brazil
9,1974,Germany


In [280]:
# Construindo um dataframe com a quantidade de vitórias e derrotas
# de cada time nas copas do mundo

def WL_year(year: str):
  copa_ano = df_copas[df_copas['year'] == year].reset_index()
  teams = copa_ano[['away_team', 'home_team']].to_numpy()
  teams = pd.DataFrame(np.unique(teams.reshape((2*len(teams),1))))

  vitorias = np.zeros(len(teams))
  derrotas = np.zeros(len(teams))

  teams = teams.rename({0: "team"}, axis = 1)
  teams['vitorias'] = vitorias
  teams['derrotas'] = derrotas

  iterador = teams.to_numpy()
  
  for i in range(len(copa_ano)):
    if copa_ano.loc[i,:]['away_score'] > copa_ano.loc[i,:]['home_score']:
      for j in iterador:
        if copa_ano.loc[i,:]['away_team'] in j:
          j[1] += 1
        if copa_ano.loc[i,:]['home_team'] in j:
          j[2] += 1
    if copa_ano.loc[i,:]['away_score'] < copa_ano.loc[i,:]['home_score']:
      for j in iterador:
        if copa_ano.loc[i,:]['away_team'] in j:
          j[2] += 1
        if copa_ano.loc[i,:]['home_team'] in j:
          j[1] += 1    
    teams = pd.DataFrame(iterador, columns = ['team', 'wins', 'losses'])

    teams['year'] = year
    
  return teams

concatenate = []

for i in df_copas['year'].unique():
  concatenate.append(WL_year(i))  

vitorias_derrotas = pd.concat(concatenate, ignore_index=True)

vitorias_derrotas

### Colocando gols marcados e gols tomados nesse DataFrame

a = conceded_fifa.groupby(['team','year']).conceded.sum().unstack()
b = scores_fifa.groupby(['team','year']).score.sum().unstack()
vitorias_derrotas

gols_marcados = np.zeros(len(vitorias_derrotas))
gols_recebidos = np.zeros(len(vitorias_derrotas))
for i in range(0,len(vitorias_derrotas)):
  time = vitorias_derrotas.loc[i,:]['team']
  ano = vitorias_derrotas.loc[i,:]['year']
  
  if a.loc[time,ano] >= 0:
    gols_recebidos[i] = a.loc[time,ano]
  
  if b.loc[time,ano] >= 0:
    gols_marcados[i] = b.loc[time,ano]

vitorias_derrotas['score'] = gols_marcados
vitorias_derrotas['conceded'] = gols_recebidos

vitorias_derrotas['losses'] = pd.to_numeric(df_final['losses'])
vitorias_derrotas['wins'] = pd.to_numeric(df_final['wins'])

df_final = vitorias_derrotas.copy()

df_final

Unnamed: 0,team,wins,losses,year,score,conceded
0,Argentina,4.0,1.0,1930,18.0,9.0
1,Belgium,0.0,2.0,1930,0.0,4.0
2,Bolivia,0.0,2.0,1930,0.0,8.0
3,Brazil,1.0,1.0,1930,5.0,2.0
4,Chile,2.0,1.0,1930,5.0,3.0
...,...,...,...,...,...,...
484,Switzerland,1.0,1.0,2022,1.0,1.0
485,Tunisia,1.0,1.0,2022,1.0,1.0
486,United States,1.0,0.0,2022,2.0,1.0
487,Uruguay,0.0,1.0,2022,0.0,2.0


In [271]:
vitorias_derrotas.groupby(['team','year']).wins.sum().unstack().fillna(0)

year,1930,1934,1938,1950,1954,1958,1962,1966,1970,1974,...,1986,1990,1994,1998,2002,2006,2010,2014,2018,2022
team,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Algeria,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
Angola,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Argentina,4.0,0.0,0.0,0.0,0.0,1.0,1.0,2.0,0.0,1.0,...,6.0,2.0,2.0,3.0,1.0,3.0,4.0,5.0,1.0,2.0
Australia,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,2.0
Austria,0.0,2.0,0.0,0.0,4.0,0.0,0.0,0.0,0.0,0.0,...,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
United Arab Emirates,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
United States,2.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,1.0,0.0,2.0,0.0,1.0,1.0,0.0,1.0
Uruguay,4.0,0.0,0.0,3.0,3.0,0.0,1.0,1.0,2.0,0.0,...,0.0,1.0,0.0,0.0,0.0,0.0,3.0,2.0,4.0,0.0
Wales,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


## Medidas de resumo e visualizações interessantes

In [272]:
# Medidas de resumo
df_copas.describe()

Unnamed: 0,index,home_score,away_score,year
count,948.0,944.0,944.0,948.0
mean,19997.541139,1.556144,1.258475,1988.691983
std,12948.089267,1.494076,1.304585,23.832796
min,1311.0,0.0,0.0,1930.0
25%,9201.75,0.0,0.0,1974.0
50%,18703.5,1.0,1.0,1994.0
75%,32471.25,2.0,2.0,2010.0
max,44205.0,10.0,8.0,2022.0


In [273]:
fig = px.histogram(df_copas, x="home_score")
fig.update_layout(bargap = 0.1)
fig.show()

In [274]:
fig = px.histogram(df_copas, x="away_score")
fig.update_layout(bargap = 0.1)
fig.show()

Aqui vemos que na maioria das partidas os times marcam 1 gol nos jogos. Resultados com muitos gols acontecem na minoria das vezes, como esperado.

Outra informação interessante seria saber as seleções que fizeram mais gols em cada copa do mundo e as quantidades, ou seja, os melhores ataques de cada Copa do Mundo:

In [275]:
gols_marcados = scores_fifa.groupby(['team','year']).score.sum().unstack().fillna(0)

a = list(gols_marcados.max().index.values)
b = list(gols_marcados.max())
c = []

for i in df_copas['year'].unique():
  linha = gols_marcados[i].argmax()
  c.append(gols_marcados.index.values[linha])


maximos_feitos = pd.DataFrame({"maximo_gols": b, "ano": a, "pais": c })

fig = px.bar(maximos_feitos, x="ano", y="maximo_gols", color = "pais", labels = {'ano':'Ano', 'maximo_gols':'Máximo de gols feitos'}, text = campeoes['team'])
fig.update_traces(textfont_size=12, textangle=0, textposition="outside", cliponaxis=False)
fig.update_layout(bargap = 0.2)
fig.show()

Na mesma linha: quais são as seleções que tomaram mais gols em cada ano de Copa do Mundo e quantos gols tomaram? Isto é, quais foram as piores defesas de cada Copa do Mundo?

In [276]:
# Respondendo a pergunta
gols_recebidos = conceded_fifa.groupby(['team','year']).conceded.sum().unstack().fillna(0)

a = list(gols_recebidos.max().index.values)
b = list(gols_recebidos.max())
c = []

for i in df_copas['year'].unique():
  linha = gols_recebidos[i].argmax()
  c.append(gols_recebidos.index.values[linha])


maximos_recebidos = pd.DataFrame({"maximo_gols_recebidos": b, "ano": a, "pais": c })

fig = px.bar(maximos_recebidos, x="ano", y="maximo_gols_recebidos", color = "pais", labels = {'ano':'Ano', 'maximo_gols_recebidos':'Máximo de gols recebidos'},
             text = campeoes['team'])
fig.update_traces(textfont_size=12, textangle=0, textposition="outside", cliponaxis=False)
fig.update_layout(bargap = 0.2)
fig.show()

Seguindo nesse rumo, quais seleções tomaram menos gols em Copas do Mundo? Isto é, tiveram as melhores defesas em cada copa do mundo.

In [277]:
gols_recebidos = conceded_fifa.groupby(['team','year']).conceded.sum().unstack().fillna(0)

gols_recebidos = gols_recebidos[gols_recebidos >= 1]

a = list(gols_recebidos.min().index.values)
b = list(gols_recebidos.min())
c = []

for i in df_copas['year'].unique():
  linha = gols_recebidos[i].argmin()
  c.append(gols_recebidos.index.values[linha])

minimo_recebidos = pd.DataFrame({"minimo_gols_recebidos": b, "ano": a, "pais": c })

fig = px.bar(minimo_recebidos, x="ano", y="minimo_gols_recebidos", color = "pais", labels = {'ano':'Ano', 'minimo_gols_recebidos':'Mínimo de gols recebidos'},
             text = campeoes['team'])
fig.update_traces(textfont_size=12, textangle=0, textposition="outside", cliponaxis=False)

fig.update_layout(bargap = 0.2)
fig.show()


['Brazil', 'Romania', 'Norway', 'England', 'France', 'Brazil', 'Germany', 'Argentina', 'Russia', 'Scotland', 'Spain', 'Cameroon', 'Brazil', 'Brazil', 'Norway', 'France', 'Argentina', 'Angola', 'Portugal', 'Costa Rica', 'Denmark', 'Croatia']


**Nos 3 gráficos anteriores, os nomes em cima das barras representam o campeão do respectivo ano.**

Relação entre gols marcados e vitórias:

In [287]:
fig = px.scatter(df_final, x = "score", y = "wins", labels = {'score': 'Gols marcados', 'wins' : 'Vitórias'})
fig.show()

In [290]:
correlacao = scipy.stats.linregress(df_final['score'], df_final['wins'])
print("O coeficiente de correlação é", correlacao[0])
fig = px.scatter(df_final, x = "score", y = "wins", labels = {'score': 'Gols marcados', 'wins' : 'Vitórias'}, trendline = 'ols')
fig.show()

O coeficiente de correlação é 0.2843749085573587


### Conclusões

É interessante observar que nem sempre as seleções com melhores ataques ou melhores defesas foram campeãs da Copa do Mundo no ano em que obtiveram essas marcas. No caso de tomar menos gols, observe que quase todas as seleções campeãs não foram as que tomaram menos gols. Já quanto à fazerem mais gols, não acontece sempre, mas aconteceu mais vezes da seleção campeã ser a que fez mais gols (embora isso não seja necessário).

Além disso, observamos que há uma correlação entre o número de gols marcados em Copas por uma seleção e o número de vitórias, que é algo esperado. 

Um dos problemas que existe nas análises bidimensionais de derrotas derrotas é que uma Copa do Mundo não tem tantos jogos, por exemplo, um time pode ser eliminado perdendo 3 partidas, o que não dá uma variabilidade muito grande e não permite que façamos uma análise mais detalhada quanto à esse aspecto. Então acabamos trazendo apenas análises sobre as vitórias.

## Sobre esse documento

O fato é que essa análise exploratória não foi tão profunda, pode-se notar que o foco dela foi em processamento dos dados. Isso porque nosso maior objetivo nesse documento é preparar os dados para o projeto final e trazer outras visualizações interessantes lá.

