ETAPA 3: Visualizaciones avanzadas
* Objetivo: Profundizar en el análisis visual, comparando regiones y tendencias.
* Rango temporal:  2 años de datos (2021–2022).
* Dificultad: Media-alta – manipulación de datasets grandes y generación de gráficos agregados.

In [30]:
%pylab inline
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import requests
from datetime import datetime

Populating the interactive namespace from numpy and matplotlib


`%matplotlib` prevents importing * from pylab and numpy
  warn("pylab import has clobbered these variables: %s"  % clobbered +


In [31]:
base_url = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/"

In [32]:
start_date = datetime(2021,1,1)
end_date = datetime(2022,12,31)

dates = pd.date_range(start=start_date, end=end_date, freq="D")

In [33]:
dframes = []
a = 0

for d in dates:
    archivo = d.strftime("%m-%d-%Y") + ".csv"
    url = base_url + archivo
    try:
        df = pd.read_csv(url)
        df["source_file"] = archivo  # Guardar origen
        dframes.append(df)
        a += 1
    except Exception as e:
        print(f"Error al leer {archivo}: {e}")

In [34]:
DF2021_2022 = pd.concat(dframes, ignore_index=True)

In [35]:
DF2021_2022.head()

Unnamed: 0,FIPS,Admin2,Province_State,Country_Region,Last_Update,Lat,Long_,Confirmed,Deaths,Recovered,Active,Combined_Key,Incident_Rate,Case_Fatality_Ratio,source_file
0,,,,Afghanistan,2021-01-02 05:22:33,33.93911,67.709953,52513,2201,41727.0,8585.0,Afghanistan,134.896578,4.191343,01-01-2021.csv
1,,,,Albania,2021-01-02 05:22:33,41.1533,20.1683,58316,1181,33634.0,23501.0,Albania,2026.409062,2.025173,01-01-2021.csv
2,,,,Algeria,2021-01-02 05:22:33,28.0339,1.6596,99897,2762,67395.0,29740.0,Algeria,227.809861,2.764848,01-01-2021.csv
3,,,,Andorra,2021-01-02 05:22:33,42.5063,1.5218,8117,84,7463.0,570.0,Andorra,10505.403482,1.034865,01-01-2021.csv
4,,,,Angola,2021-01-02 05:22:33,-11.2027,17.8739,17568,405,11146.0,6017.0,Angola,53.452981,2.305328,01-01-2021.csv


In [36]:
DF2021_2022.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2931108 entries, 0 to 2931107
Data columns (total 15 columns):
 #   Column               Dtype  
---  ------               -----  
 0   FIPS                 float64
 1   Admin2               object 
 2   Province_State       object 
 3   Country_Region       object 
 4   Last_Update          object 
 5   Lat                  float64
 6   Long_                float64
 7   Confirmed            int64  
 8   Deaths               int64  
 9   Recovered            float64
 10  Active               float64
 11  Combined_Key         object 
 12  Incident_Rate        float64
 13  Case_Fatality_Ratio  float64
 14  source_file          object 
dtypes: float64(7), int64(2), object(6)
memory usage: 335.4+ MB


In [37]:
#DF2021_2022["Last_Update"] = pd.to_datetime(DF2021_2022["Last_Update"], format='%Y-%m-%d')
DF2021_2022["Last_Update"] = pd.to_datetime(DF2021_2022["Last_Update"], errors="coerce")

In [38]:
DF2021_2022.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2931108 entries, 0 to 2931107
Data columns (total 15 columns):
 #   Column               Dtype         
---  ------               -----         
 0   FIPS                 float64       
 1   Admin2               object        
 2   Province_State       object        
 3   Country_Region       object        
 4   Last_Update          datetime64[ns]
 5   Lat                  float64       
 6   Long_                float64       
 7   Confirmed            int64         
 8   Deaths               int64         
 9   Recovered            float64       
 10  Active               float64       
 11  Combined_Key         object        
 12  Incident_Rate        float64       
 13  Case_Fatality_Ratio  float64       
 14  source_file          object        
dtypes: datetime64[ns](1), float64(7), int64(2), object(5)
memory usage: 335.4+ MB


In [39]:
DF2021_2022.isnull().sum()

Unnamed: 0,0
FIPS,546018
Admin2,542820
Province_State,130650
Country_Region,0
Last_Update,37
Lat,66158
Long_,66158
Confirmed,0
Deaths,0
Recovered,2554964


In [40]:
DF2021_2022.drop(columns=["FIPS","Lat","Long_"],inplace=True)
DF2021_2022.head()

Unnamed: 0,Admin2,Province_State,Country_Region,Last_Update,Confirmed,Deaths,Recovered,Active,Combined_Key,Incident_Rate,Case_Fatality_Ratio,source_file
0,,,Afghanistan,2021-01-02 05:22:33,52513,2201,41727.0,8585.0,Afghanistan,134.896578,4.191343,01-01-2021.csv
1,,,Albania,2021-01-02 05:22:33,58316,1181,33634.0,23501.0,Albania,2026.409062,2.025173,01-01-2021.csv
2,,,Algeria,2021-01-02 05:22:33,99897,2762,67395.0,29740.0,Algeria,227.809861,2.764848,01-01-2021.csv
3,,,Andorra,2021-01-02 05:22:33,8117,84,7463.0,570.0,Andorra,10505.403482,1.034865,01-01-2021.csv
4,,,Angola,2021-01-02 05:22:33,17568,405,11146.0,6017.0,Angola,53.452981,2.305328,01-01-2021.csv


In [41]:
DF2021_2022.columns = DF2021_2022.columns.str.lower().str.replace(" ", "_").str.replace("-", "_")

In [42]:
DF2021_2022.columns

Index(['admin2', 'province_state', 'country_region', 'last_update',
       'confirmed', 'deaths', 'recovered', 'active', 'combined_key',
       'incident_rate', 'case_fatality_ratio', 'source_file'],
      dtype='object')

In [43]:
DF2021_2022.loc[DF2021_2022["country_region"] == "US", "country_region"] = "United States"

In [44]:
DF2021_2022["last_update"] = DF2021_2022["last_update"].dt.date
DF2021_2022["last_update"].head()

Unnamed: 0,last_update
0,2021-01-02
1,2021-01-02
2,2021-01-02
3,2021-01-02
4,2021-01-02


In [45]:
DF2021_2022["active_cases"] = DF2021_2022["confirmed"] - DF2021_2022["deaths"] - DF2021_2022["recovered"]

1. Evolución temporal global de casos confirmados, activos y fallecidos (líneas).

2. Comparativa Top 10 países con más casos confirmados (barras).

3. Heatmap de correlaciones entre columnas relevantes (confirmados, fallecidos, activos,
ratio).

4. Gráfico de barras horizontales comparando tasas de letalidad por continente.

In [46]:
#tasa letalidad
DF2021_2022["tasa_letalidad"] = np.where(DF2021_2022["confirmed"] > 0, (DF2021_2022["deaths"] / DF2021_2022["confirmed"]) * 100, np.nan)
tasa_letalidad = (DF2021_2022.groupby("country_region")["tasa_letalidad"].mean().sort_values(ascending=False))
tasa_letalidad.head(10)

Unnamed: 0_level_0,tasa_letalidad
country_region,Unnamed: 1_level_1
"Korea, North",600.0
MS Zaandam,22.222222
Yemen,19.737305
Vanuatu,9.076662
Mexico,7.750671
Belgium,7.488178
Sudan,7.330057
Peru,6.93995
Syria,6.149274
Egypt,5.321751


5. Mapa o gráfico geográfico que muestre la incidencia por continente o país (opcional).