Celem zadania jest stworzenie rozbudowanego zestawu interaktywnych wizualizacji z zastosowaniem biblioteki Plotly.

Pobranie i Przygotowanie Danych:

Pobierz dane dotyczące COVID-19 z Our World in Data lub podobnego źródła.
Wczytaj dane do Pandas DataFrame i wykonaj potrzebne czyszczenie, w tym obsługę brakujących wartości i przekształcenie typów danych. Oblicz ewentualnie dodatkowo statystyki (o ile będą przydatne w wizualizacji)

Tworzenie Interaktywnych Wizualizacji:

*     Wykres Liniowy z Obszarem: Stwórz wykres śledzący zmiany liczby nowych przypadków i zgonów w czasie dla kilku regionów jednocześnie z opcją kolorowania obszarów między liniami.
*     Interaktywna Mapa Cieplna: Stwórz wykres prezentujący globalne rozprzestrzenianie się przypadków na 100,000 mieszkańców, który będzie pozwalał na dynamiczne wyświetlanie danych dla wybranego dnia.
*     Wykres Kołowy: Stwórz wykres pokazujący proporcje różnych kategorii przypadków (nowe przypadki, nowe zgony, wyzdrowienia) w wybranym kraju.
*     Wykres Słupkowy: Zbuduj wykres prezentujący porównanie liczby szczepień wykonanych dziennie w różnych krajach, dodając opcję dynamicznej zmiany skali.
*     Wykres Różnicowy (Waterfall): Analizuj zmiany w liczbie przypadków i zgonów pomiędzy kolejnymi tygodniami w wybranym kraju.
*     Scatter Plot z Trendem: Stwórz wykres rozrzutu pokazujący zależność między liczbą nowych przypadków, a nowymi zgonami z nałożoną linią trendu dla wybranych krajów.



In [12]:
import pandas as pd
import numpy as np
import plotly.graph_objs as go
import plotly.io as pio
import plotly.offline as pyo
import plotly.express as px
from scipy.stats import linregress

# Inicjalizacja trybu offline w Plotly
pyo.init_notebook_mode(connected=True)

pio.renderers.default = "notebook"

# Pobranie danych COVID-19
df = pd.read_csv("data/owid-covid-data.csv")

df['date'] = pd.to_datetime(df['date'])

df = df[['location', 'iso_code', 'date', 'new_cases', 'new_deaths',
         'total_vaccinations', 'population', 'people_vaccinated',
         'people_fully_vaccinated', 'new_tests', 'total_cases']]

# Obsługa brakujących wartości
df.fillna(0, inplace=True)

# Dodanie kolumny przypadków na 100 tys. mieszkańców
df['cases_per_100k'] = (df['new_cases'] / df['population']) * 100000
df = df[df['new_cases'] > 0]

df.head(14)


Unnamed: 0,location,iso_code,date,new_cases,new_deaths,total_vaccinations,population,people_vaccinated,people_fully_vaccinated,new_tests,total_cases,cases_per_100k
56,Afghanistan,AFG,2020-03-01,1.0,0.0,0.0,41128772,0.0,0.0,0.0,1.0,0.002431
70,Afghanistan,AFG,2020-03-15,6.0,0.0,0.0,41128772,0.0,0.0,0.0,7.0,0.014588
77,Afghanistan,AFG,2020-03-22,17.0,0.0,0.0,41128772,0.0,0.0,0.0,24.0,0.041334
84,Afghanistan,AFG,2020-03-29,67.0,2.0,0.0,41128772,0.0,0.0,0.0,91.0,0.162903
91,Afghanistan,AFG,2020-04-05,183.0,3.0,0.0,41128772,0.0,0.0,0.0,274.0,0.444944
98,Afghanistan,AFG,2020-04-12,247.0,10.0,0.0,41128772,0.0,0.0,0.0,521.0,0.600553
105,Afghanistan,AFG,2020-04-19,387.0,15.0,0.0,41128772,0.0,0.0,0.0,908.0,0.940947
112,Afghanistan,AFG,2020-04-26,422.0,13.0,0.0,41128772,0.0,0.0,0.0,1330.0,1.026046
119,Afghanistan,AFG,2020-05-03,841.0,21.0,0.0,41128772,0.0,0.0,0.0,2171.0,2.044797
126,Afghanistan,AFG,2020-05-10,1392.0,41.0,0.0,41128772,0.0,0.0,0.0,3563.0,3.384492


In [6]:
# Wybór krajów do analizy
selected_countries = ["United States", "India", "Brazil", "United Kingdom", "Germany"]

# Filtracja danych dla wybranych krajów
df_filtered = df[df['location'].isin(selected_countries)]

fig = go.Figure()

for country in selected_countries:
    country_data = df_filtered[df_filtered['location'] == country]

    fig.add_trace(go.Scatter(
        x=country_data['date'],
        y=country_data['new_cases'],
        mode='lines',
        name=f"Nowe przypadki - {country}",
        fill='tozeroy',
        line=dict(width=2)
    ))

    fig.add_trace(go.Scatter(
        x=country_data['date'],
        y=country_data['new_deaths'],
        mode='lines',
        name=f"Nowe zgony - {country}",
        fill='tozeroy',
        line=dict(width=2, dash='dot')
    ))

fig.update_layout(
    title="Liczba nowych przypadków i zgonów COVID-19 w czasie",
    xaxis_title="Data",
    yaxis_title="Liczba przypadków",
    template="plotly_dark",
    hovermode="x",
    xaxis=dict(showgrid=True),
    yaxis=dict(showgrid=True),
)

fig.show(renderer="browser")


In [7]:
df['log_cases_per_100k'] = np.log10(df['cases_per_100k'] + 1)

df_reported_days = df.copy()

min_value = df["log_cases_per_100k"].min()
max_value = df["log_cases_per_100k"].max()

# Tworzenie mapy
fig = px.choropleth(df_reported_days,
                    locations="iso_code",
                    color="log_cases_per_100k",
                    hover_name="location",
                    animation_frame=df_reported_days["date"].dt.strftime("%Y-%m-%d"),  # Slider po datach
                    title="Rozprzestrzenianie COVID-19 w czasie",
                    color_continuous_scale="Reds",
                    projection="natural earth",
                    range_color=(min_value, max_value + 0.1),
                    )

fig.show(renderer="browser")


In [8]:
# Wybór kraju do analizy
selected_country = "Poland"

# Filtrowanie danych dla wybranego kraju
df_country = df[df['location'] == selected_country].copy()

total_cases = df_country['new_cases'].sum()
total_deaths = df_country['new_deaths'].sum()

if 'new_recoveries' in df.columns:
    total_recoveries = df_country['new_recoveries'].sum()
else:
    total_recoveries = total_cases - total_deaths  # Przybliżone wyzdrowienia (jeśli brak w danych)

data = {
    "Kategoria": ["Nowe przypadki", "Nowe zgony", "Wyzdrowienia"],
    "Liczba": [total_cases, total_deaths, total_recoveries]
}

fig = px.pie(
    names=data["Kategoria"],
    values=data["Liczba"],
    title=f"Proporcje przypadków COVID-19 w {selected_country}",
    color=data["Kategoria"],
    color_discrete_map={"Nowe przypadki": "blue", "Nowe zgony": "red", "Wyzdrowienia": "green"}
)

fig.show(renderer="browser")

Unnamed: 0,location,iso_code,date,new_cases,new_deaths,total_vaccinations,population,people_vaccinated,people_fully_vaccinated,new_tests,total_cases,cases_per_100k,log_cases_per_100k
56,Afghanistan,AFG,2020-03-01,1.0,0.0,0.0,41128772,0.0,0.0,0.0,1.0,0.002431,0.001055
70,Afghanistan,AFG,2020-03-15,6.0,0.0,0.0,41128772,0.0,0.0,0.0,7.0,0.014588,0.006290
77,Afghanistan,AFG,2020-03-22,17.0,0.0,0.0,41128772,0.0,0.0,0.0,24.0,0.041334,0.017590
84,Afghanistan,AFG,2020-03-29,67.0,2.0,0.0,41128772,0.0,0.0,0.0,91.0,0.162903,0.065543
91,Afghanistan,AFG,2020-04-05,183.0,3.0,0.0,41128772,0.0,0.0,0.0,274.0,0.444944,0.159851
...,...,...,...,...,...,...,...,...,...,...,...,...,...
429385,Zimbabwe,ZWE,2024-06-16,9.0,0.0,0.0,16320539,0.0,0.0,0.0,266374.0,0.055145,0.023312
429392,Zimbabwe,ZWE,2024-06-23,4.0,0.0,0.0,16320539,0.0,0.0,0.0,266378.0,0.024509,0.010516
429399,Zimbabwe,ZWE,2024-06-30,6.0,0.0,0.0,16320539,0.0,0.0,0.0,266384.0,0.036763,0.015680
429406,Zimbabwe,ZWE,2024-07-07,1.0,0.0,0.0,16320539,0.0,0.0,0.0,266385.0,0.006127,0.002653


In [9]:
selected_countries = ["United States", "India", "Brazil", "United Kingdom", "Germany"]

df_vaccination = df[df['location'].isin(selected_countries)]

df_vaccination_grouped = df_vaccination.groupby(['date', 'location'], as_index=False)['total_vaccinations'].sum()

fig = px.bar(
    df_vaccination_grouped,
    x="date",
    y="total_vaccinations",
    color="location",
    title="Liczba szczepień wykonanych dziennie w wybranych krajach",
    labels={"total_vaccinations": "Szczepienia", "date": "Data"},
    barmode="group"
)

fig.update_layout(
    updatemenus=[
        {
            "buttons": [
                {"label": "Liniowa", "method": "relayout", "args": [{"yaxis.type": "linear"}]},
                {"label": "Logarytmiczna", "method": "relayout", "args": [{"yaxis.type": "log"}]},
            ],
            "direction": "down",
            "showactive": True,
            "x": 0.1,
            "xanchor": "left",
            "y": 1.15,
            "yanchor": "top",
        }
    ],
    yaxis_type="linear",
)

fig.show(renderer="browser")


In [10]:
selected_country = "Poland"

df_country = df[df['location'] == selected_country].copy()

df_country['week'] = df_country['date'].dt.to_period('W')

df_weekly = df_country.groupby('week')[['new_cases', 'new_deaths']].sum().reset_index()

df_weekly['cases_diff'] = df_weekly['new_cases'].diff().fillna(0)
df_weekly['deaths_diff'] = df_weekly['new_deaths'].diff().fillna(0)

fig_cases = go.Figure(go.Waterfall(
    x=df_weekly['week'].astype(str),
    y=df_weekly['cases_diff'],
    connector={"line":{"color":"rgb(63, 63, 63)", "dash":"solid", "width":1}},
))

fig_cases.update_layout(
    title=f"Zmiany w liczbie przypadków COVID-19 w tygodniach ({selected_country})",
    xaxis_title="Tydzień",
    yaxis_title="Różnica w przypadkach",
)

fig_deaths = go.Figure(go.Waterfall(
    x=df_weekly['week'].astype(str),
    y=df_weekly['deaths_diff'],
    connector={"line":{"color":"rgb(63, 63, 63)", "dash":"solid", "width":1}},
))

fig_deaths.update_layout(
    title=f"Zmiany w liczbie zgonów COVID-19 w tygodniach ({selected_country})",
    xaxis_title="Tydzień",
    yaxis_title="Różnica w zgonach",
)

fig_cases.show(renderer="browser")
fig_deaths.show(renderer="browser")


In [13]:
selected_country = 'Poland'

df_country = df[df['location'] == selected_country].copy()

df_country['date'] = pd.to_datetime(df_country['date'])

df_country = df_country.dropna(subset=['new_cases', 'new_deaths'])

fig = px.scatter(df_country, x='new_cases', y='new_deaths',
                 title=f'Zależność między nowymi przypadkami a nowymi zgonami ({selected_country})',
                 labels={'new_cases': 'Nowe przypadki', 'new_deaths': 'Nowe zgony'})

slope, intercept, r_value, p_value, std_err = linregress(df_country['new_cases'], df_country['new_deaths'])

fig.add_scatter(x=df_country['new_cases'],
                y=slope * df_country['new_cases'] + intercept,
                mode='lines',
                name='Linia trendu',
                line=dict(color='red', dash='dash'))

fig.show(renderer="browser")
