# Análise de datos con PANDAS

In [1]:
import pandas as pd

### Apertura do CSV

In [2]:
df = pd.read_csv("C:\\Users\\daniel.martinezcarre\\Desktop\\datasets\\datos_covid2021_3paises.csv")
df

Unnamed: 0.1,Unnamed: 0,dia,pais,contaxios,mortes
0,0,2021-01-01,Spain,18047,148
1,1,2021-01-02,Spain,0,0
2,2,2021-01-03,Spain,0,0
3,3,2021-01-04,Spain,30579,241
4,4,2021-01-05,Spain,23700,352
...,...,...,...,...,...
1090,1090,2021-12-27,France,29614,245
1091,1091,2021-12-28,France,179316,294
1092,1092,2021-12-29,France,202293,173
1093,1093,2021-12-30,France,206243,180



### Limpia valores nulos e/ou inválidos/vacíos.

In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1095 entries, 0 to 1094
Data columns (total 5 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   Unnamed: 0  1095 non-null   int64 
 1   dia         1095 non-null   object
 2   pais        1095 non-null   object
 3   contaxios   1095 non-null   int64 
 4   mortes      1095 non-null   int64 
dtypes: int64(3), object(2)
memory usage: 42.9+ KB


Vese que non hai ningún dato nulo

### 1. Calcula a media de novos casos ao día

In [4]:
df.describe()

Unnamed: 0.1,Unnamed: 0,contaxios,mortes
count,1095.0,1095.0,1095.0
mean,547.0,12094.346119,98.001826
std,316.243577,22131.97109,156.600004
min,0.0,0.0,0.0
25%,273.5,740.0,6.0
50%,547.0,4197.0,29.0
75%,820.5,14774.5,126.5
max,1094.0,232200.0,1163.0


### 2. Calcula a media de mortes ao día

In [11]:
print(df["mortes"].mean())

98.00182648401827


### 3. Calcula o número máximo de casos

In [12]:
print(df["contaxios"].max())

232200


### 4. Calcula o número medio de casos separado por grupos

In [5]:
df_media_paises = df.groupby("pais")["contaxios"].mean()
df_media_paises

pais
France      20638.731507
Portugal     2666.789041
Spain       12977.517808
Name: contaxios, dtype: float64

### 5. Existe unha correlación entre número de novos casos e mortes?

In [6]:
correlacion_cont_mortes = df["contaxios"].corr(df["mortes"])
print(correlacion_cont_mortes)

0.3852333025521261


### 6. No caso de que exista unha relación, en que país é maior?

In [7]:
correlacion_paises = df.groupby("pais")["contaxios"].corr(df["mortes"])
print(correlacion_paises)

pais
France      0.298286
Portugal    0.563438
Spain       0.323957
Name: contaxios, dtype: float64


A relación entre casos e mortes é maior en **Portugal**

### 7. Cal é a mediana de casos confirmados en Portugal?

In [8]:
mediana_portugal = df[ df["pais"] == "Portugal" ]["contaxios"].median()
print(mediana_portugal)

1186.0


### 8. Cales son os valores dos percentís Q1 e Q3 de Francia para o número de contaxios

In [9]:
import numpy as np
q1_france, q3_france = np.percentile(df[df["pais"] == "France"]["contaxios"], [25, 75])
print(f"Q1: {q1_france}, Q2: {q3_france}")

Q1: 4256.0, Q2: 25087.0


### 9. Cantos valores diferentes hai para o número de contaxios nun día en cada un dos países

In [16]:
df_valores_diferentes = df.groupby("pais")["contaxios"].unique()
df_valores_diferentes

pais
France      [19143, 3359, 12489, 4022, 20280, 25020, 0, 41...
Portugal    [6951, 3241, 3384, 4369, 4956, 10027, 9927, 10...
Spain       [18047, 0, 30579, 23700, 42360, 25456, 61422, ...
Name: contaxios, dtype: object

### 10. Valor mínimo e máximo de contaxios por día para cada país

> A función **agg** calcula o mínimo e o máximo da columna "contaxios" para cada grupo.

In [18]:
df_agrupado = df.groupby(["dia", "pais"])
df_min_max = df_agrupado["contaxios"].agg(["min", "max"])

df_min_max

Unnamed: 0_level_0,Unnamed: 1_level_0,min,max
dia,pais,Unnamed: 2_level_1,Unnamed: 3_level_1
2021-01-01,France,19143,19143
2021-01-01,Portugal,6951,6951
2021-01-01,Spain,18047,18047
2021-01-02,France,3359,3359
2021-01-02,Portugal,3241,3241
...,...,...,...
2021-12-30,Portugal,28659,28659
2021-12-30,Spain,161688,161688
2021-12-31,France,232200,232200
2021-12-31,Portugal,30829,30829
