[Data Source](https://srhdpeuwpubsa.blob.core.windows.net/whdh/COVID/WHO-COVID-19-global-hosp-icu-data.csv)

### FYI
#### Weekly COVID-19 hospitalizations and ICU admissions by date reported to WHO
Updated weekly. Users should note that, in addition to capturing new hospitalizations and ICU admissions reported on any given week, 
updates are made retrospectively to correct counts on previous weeks as needed based on subsequent information received.

# Post Pandemic COVID-19 Weekly Hospitalizations and ICU Admissions

In [279]:
import warnings
warnings.filterwarnings('ignore')

In [281]:
import requests

In [283]:
# We need to write a script to download the data from its source
def download_csv(url: str, output_file: str):
    try:
        response = requests.get(url)
        # We raise an error in case there's a bad status codes
        response.raise_for_status()

        with open(output_file, 'wb') as f:
            f.write(response.content)

        print(f"Download complete: {output_file}")
    except requests.exceptions.RequestException as e:
        print(f"An error occurred: {e}")

In [None]:
# Now, we use the above method to download our file and save it in the current directory
data_url = "https://srhdpeuwpubsa.blob.core.windows.net/whdh/COVID/WHO-COVID-19-global-hosp-icu-data.csv"
output_filename = "WHO_COVID19_Hospital_ICU_Data.csv"
download_csv(data_url, output_filename)

In [None]:
data_url = "https://srhdpeuwpubsa.blob.core.windows.net/whdh/COVID/vaccination-data.csv"
output_filename = "vaccination-data.csv"
download_csv(data_url, output_filename)

In [None]:
!ls -lh

In [288]:
# Now, we want to import pandas and use that to load out dataset into a pandas dataframe
import pandas as pd

In [None]:
df = pd.read_csv("WHO_COVID19_Hospital_ICU_Data.csv")

df.info()
df.head()

In [131]:
# Unfortunately, this datasetset doesn't have much of anything useful to work with.
# Below are some of the analysis we can perform on the dataset

**Time Series Analysis**
  1) **Trend analysis** over time for:
     * Total hospitalizations/ICU admissions per date globally or per country
  3) **Rolling averages** (e.g., 7-day, 28-day) to smooth out volatility
  4) **Seasonal patterns** in hospitalization and ICU spikes

**Geographic Analysis**
  1) **Compare by WHO region or country**:
     * Regional trends in hospital or ICU admissions
     * Top countries with the highest or lowest 7-day/28-day averages
  3) **Heatmap** or **choropleth** showing geographic variation in recent hospitalizations

**Comparative/Segmented Analysis**
  1) **Compare metrics** across time windows (7-day vs 28-day trends)
  2) **Regional comparisons** (e.g., AFRO vs EURO)
  3) **Country** performance over **time**, highlighting outbreaks or recovery trends

**Trend Detection & Alerts**
  1) **Spike detection** for sharp increases in ICU or hospital admissions
  2) **Threshold monitoring**, e.g., countries exceeding certain ICU admission rates
  3) Ratio metrics, like ICU to hospitalization ratio, to monitor severity

**Predictive Modeling**
  1) **Forecast future hospital/ICU admissions**
  2) **Identify predictors** of ICU surges using regression or machine learning


## Time Series Analysis

In [291]:
import plotly.graph_objects as go

In [None]:
# We prepare our data
df['Date_reported'] = pd.to_datetime(df['Date_reported'])

# Then, we group by date
trend_global = df.groupby('Date_reported')[[
    'Covid_new_hospitalizations_last_7days',
    'Covid_new_icu_admissions_last_7days',
    'Covid_new_hospitalizations_last_28days',
    'Covid_new_icu_admissions_last_28days'
]].sum().fillna(0)

# Next, we create interactive figure
fig = go.Figure()

fig.add_trace(go.Scatter(x=trend_global.index, y=trend_global['Covid_new_hospitalizations_last_7days'],
                         mode='lines', name='7-day Hospitalizations'))

fig.add_trace(go.Scatter(x=trend_global.index, y=trend_global['Covid_new_icu_admissions_last_7days'],
                         mode='lines', name='7-day ICU Admissions'))

fig.add_trace(go.Scatter(x=trend_global.index, y=trend_global['Covid_new_hospitalizations_last_28days'],
                         mode='lines', name='28-day Hospitalizations'))

fig.add_trace(go.Scatter(x=trend_global.index, y=trend_global['Covid_new_icu_admissions_last_28days'],
                         mode='lines', name='28-day ICU Admissions'))

fig.update_layout(
    title='Global COVID-19 Hospitalizations and ICU Admissions Over Time',
    xaxis_title='Date',
    yaxis_title='Count',
    template='plotly_white',
    hovermode='x unified'
)

fig.show()

In the chart above shows the global trend of **COVID-19** **hospitalizations** and **ICU admissions** over time, broken down by **7-day** and **28-day** intervals.
We can clearly observe fluctuations in admissions, with some pronounced **spikes—likely corresponding** to specific waves of the pandemic.

In [140]:
import plotly.express as px

In [None]:
# Next, we group by WHO region and date
trend_by_region = df.groupby(['WHO_region', 'Date_reported'])[
    ['Covid_new_hospitalizations_last_7days',
     'Covid_new_icu_admissions_last_7days',
     'Covid_new_hospitalizations_last_28days',
     'Covid_new_icu_admissions_last_28days']
].sum().reset_index().fillna(0)

# Then, we plot 7-day hospitalizations
fig = px.line(
    trend_by_region,
    x='Date_reported',
    y='Covid_new_hospitalizations_last_7days',
    color='WHO_region',
    title='7-Day COVID-19 Hospitalizations by WHO Region Over Time',
    labels={
        'Date_reported': 'Date',
        'Covid_new_hospitalizations_last_7days': 'Hospitalizations (7-day)'
    },
    template='plotly_white'
)

fig.update_layout(hovermode='x unified')
fig.show()

In [None]:
# Again, we group by WHO region and date
trend_by_region = df.groupby(['WHO_region', 'Date_reported'])[
    ['Covid_new_hospitalizations_last_28days']
].sum().reset_index().fillna(0)

# And, we plot 28-day hospitalizations
fig = px.line(
    trend_by_region,
    x='Date_reported',
    y='Covid_new_hospitalizations_last_28days',
    color='WHO_region',
    title='28-Day COVID-19 Hospitalizations by WHO Region Over Time',
    labels={
        'Date_reported': 'Date',
        'Covid_new_hospitalizations_last_28days': 'Hospitalizations (28-day)'
    },
    template='plotly_white'
)

fig.update_layout(hovermode='x unified')
fig.show()

From the **7-day** and **28-day** hospitalization trends by WHO region, we can draw the following conclusions:
**Short-Term** vs. **Long-Term Trends**
  * **7-day** data shows short-term fluctuations, which can highlight recent outbreaks, data corrections, or reporting spikes.
  * **28-day** data offers a smoothed perspective, helping to observe sustained trends or waves without the noise of daily variation.

**Regional Variation**
  1) Some WHO regions consistently report higher hospitalization volumes. This may be due to:
     * Population size
     * Testing/reporting infrastructure
     * Healthcare access and hospital admission policies
  3) Other regions may show sharp peaks followed by rapid declines, which could suggest:
     * Contained outbreaks
     * Policy changes (lockdowns, restrictions)
     * Underreporting followed by catch-up data

**Pattern Repetition**
  * If both **7-day** and **28-day** trends show parallel movement, that reinforces the validity of observed trends (e.g., an actual surge, not a one-off report).
  * If the **7-day** trend fluctuates while the **28-day** trend remains flat, that may indicate inconsistent reporting or short-lived events.

**Actionable Insights**
  * Regions with rising 28-day hospitalizations deserve further attention — they may be in the middle of or entering a new wave.
  * Regions with declining or stable trends in both 7-day and 28-day measures may be recovering or maintaining control.


In [None]:
# Once again, we group globally by date
trend_global = df.groupby('Date_reported')[
    ['Covid_new_hospitalizations_last_7days',
     'Covid_new_icu_admissions_last_7days',
     'Covid_new_hospitalizations_last_28days',
     'Covid_new_icu_admissions_last_28days']
].sum().fillna(0)

# Then, we compute rolling averages
rolling_avg = trend_global.copy()
rolling_avg['Hosp_7d_avg'] = rolling_avg['Covid_new_hospitalizations_last_7days'].rolling(window=7).mean()
rolling_avg['ICU_7d_avg'] = rolling_avg['Covid_new_icu_admissions_last_7days'].rolling(window=7).mean()
rolling_avg['Hosp_28d_avg'] = rolling_avg['Covid_new_hospitalizations_last_28days'].rolling(window=28).mean()
rolling_avg['ICU_28d_avg'] = rolling_avg['Covid_new_icu_admissions_last_28days'].rolling(window=28).mean()

# And we plot the rolling averages
fig = go.Figure()
fig.add_trace(go.Scatter(x=rolling_avg.index, y=rolling_avg['Hosp_7d_avg'], mode='lines', name='7-Day Avg Hospitalizations'))
fig.add_trace(go.Scatter(x=rolling_avg.index, y=rolling_avg['ICU_7d_avg'], mode='lines', name='7-Day Avg ICU Admissions'))
fig.add_trace(go.Scatter(x=rolling_avg.index, y=rolling_avg['Hosp_28d_avg'], mode='lines', name='28-Day Avg Hospitalizations'))
fig.add_trace(go.Scatter(x=rolling_avg.index, y=rolling_avg['ICU_28d_avg'], mode='lines', name='28-Day Avg ICU Admissions'))

fig.update_layout(
    title='Rolling Averages of COVID-19 Hospitalizations and ICU Admissions (Global)',
    xaxis_title='Date',
    yaxis_title='7-Day & 28-Day Rolling Average',
    template='plotly_white',
    hovermode='x unified',
    legend_title='Metric'
)

fig.show()

## Geographic Analysis

In [None]:
# Now, we group by month
df['Date_reported'] = pd.to_datetime(df['Date_reported'])
df['Month'] = df['Date_reported'].dt.month
seasonal_trend = df.groupby('Month')[
    ['Covid_new_hospitalizations_last_7days', 'Covid_new_icu_admissions_last_7days']
].mean().reset_index()

# Then, we create an interactive seasonal pattern plot
fig = px.line(
    seasonal_trend,
    x='Month',
    y=['Covid_new_hospitalizations_last_7days', 'Covid_new_icu_admissions_last_7days'],
    labels={
        'value': 'Average Admissions',
        'variable': 'Metric',
        'Month': 'Month'
    },
    title='Seasonal Pattern: Average COVID-19 Hospitalizations and ICU Admissions by Month for All Years',
    template='plotly_white'
)

fig.update_layout(
    xaxis=dict(
        tickmode='array',
        tickvals=list(range(1, 13)),
        ticktext=['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
                  'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
    ),
    hovermode='x unified'
)

fig.show()

In the above chart, we see the seasonal trends in average **COVID-19 hospitalizations** and **ICU admissions** (7-day averages) aggregated by month.
**Insights from the Seasonal Pattern**: 
  1) **Hospitalizations** and ICU spikes tend to cluster in certain months, which may reflect:
     * **Winter surges** (often in colder months like December–February due to seasonal respiratory illness overlap)
     * **Holiday travel** or gatherings (increased exposure during Nov–Jan)
     * Lag effects following infection surges
    
  3) **Plateaus** or **declines** in spring and summer months may reflect **improved immunity**, **fewer gatherings**, or **lower virus transmission** conditions.

In [None]:
df['Date_reported'] = pd.to_datetime(df['Date_reported'])

# Get latest data snapshot
latest_date = df['Date_reported'].max()
latest_snapshot = df[df['Date_reported'] == latest_date]

# Select and clean
country_avg = latest_snapshot[['Country', 'WHO_region',
                               'Covid_new_hospitalizations_last_7days',
                               'Covid_new_hospitalizations_last_28days']].fillna(0)

# Top 10 by 7-day hospitalizations
top_countries_7day = country_avg.sort_values(by='Covid_new_hospitalizations_last_7days', ascending=False).head(10)

# Plot
fig_7day = px.bar(
    top_countries_7day,
    x='Covid_new_hospitalizations_last_7days',
    y='Country',
    orientation='h',
    color='WHO_region',
    title='Top 10 Countries with Highest 7-Day COVID-19 Hospitalizations (Latest Date)',
    labels={'Covid_new_hospitalizations_last_7days': 'Hospitalizations (7-day)'},
    template='plotly_white'
)
fig_7day.update_layout(yaxis={'categoryorder':'total ascending'})
fig_7day.show()

In [156]:
# Re-import matplotlib in case it was lost during a prior kernel reset
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
# Re-attempt plotting the heatmap for 28-day hospitalizations instead, which might have more valid data
valid_heatmap_data_28 = latest_snapshot[
    (latest_snapshot['Covid_new_hospitalizations_last_28days'].notna()) &
    (latest_snapshot['Covid_new_hospitalizations_last_28days'] > 0)
][['Country', 'Covid_new_hospitalizations_last_28days']]

# Take top 30 countries
top_30_28day = valid_heatmap_data_28.sort_values(by='Covid_new_hospitalizations_last_28days', ascending=False).head(30)

# Prepare matrix
heatmap_matrix_28 = top_30_28day.set_index('Country').T

# Plot the heatmap if valid data is available
if not heatmap_matrix_28.empty:
    plt.figure(figsize=(14, 4))
    sns.heatmap(heatmap_matrix_28, annot=True, fmt='.0f', cmap='OrRd', cbar_kws={'label': 'Hospitalizations (28-day)'})
    plt.title('Top 30 Countries by 28-Day COVID-19 Hospitalizations (Latest Date)')
    plt.xlabel('Country')
    plt.ylabel('')
    plt.xticks(rotation=45, ha='right')
    plt.tight_layout()
    plt.show()
else:
    print("No valid 28-day hospitalization data available to plot for the latest date.")

## Segmented Analysis

In [None]:
df['Date_reported'] = pd.to_datetime(df['Date_reported'])

# Aggregate and calculate ratios
time_window_compare = df.groupby('Date_reported')[
    ['Covid_new_hospitalizations_last_7days', 'Covid_new_hospitalizations_last_28days']
].sum().reset_index().fillna(0)

time_window_compare['7d_vs_28d_ratio'] = (
    time_window_compare['Covid_new_hospitalizations_last_7days'] /
    time_window_compare['Covid_new_hospitalizations_last_28days']
).replace([float('inf'), -float('inf')], 0).fillna(0)

# Create figure
fig = go.Figure()

fig.add_trace(go.Scatter(
    x=time_window_compare['Date_reported'],
    y=time_window_compare['Covid_new_hospitalizations_last_7days'],
    mode='lines',
    name='7-Day Hospitalizations'
))

fig.add_trace(go.Scatter(
    x=time_window_compare['Date_reported'],
    y=time_window_compare['Covid_new_hospitalizations_last_28days'],
    mode='lines',
    name='28-Day Hospitalizations'
))

fig.add_trace(go.Scatter(
    x=time_window_compare['Date_reported'],
    y=time_window_compare['7d_vs_28d_ratio'],
    mode='lines',
    name='7d / 28d Ratio',
    yaxis='y2',
    line=dict(dash='dot', color='black')
))

fig.update_layout(
    title='Comparison of 7-Day vs 28-Day COVID-19 Hospitalizations (Global)',
    xaxis_title='Date',
    yaxis=dict(title='Hospitalizations Count'),
    yaxis2=dict(title='7d/28d Ratio', overlaying='y', side='right', showgrid=False),
    template='plotly_white',
    hovermode='x unified'
)

fig.show()

In the above **chart**, we see that:
  * When the **7-day** line crosses above the **28-day** line, **short-term** hospitalizations are surging.
  * The **7d/28d** ratio serves as an early warning indicator — spiking when recent admissions sharply increase relative to the long-term trend.


In [None]:
df['Date_reported'] = pd.to_datetime(df['Date_reported'])

# Group and calculate ICU trends
icu_time_window = df.groupby('Date_reported')[
    ['Covid_new_icu_admissions_last_7days', 'Covid_new_icu_admissions_last_28days']
].sum().reset_index().fillna(0)

icu_time_window['7d_vs_28d_ratio'] = (
    icu_time_window['Covid_new_icu_admissions_last_7days'] /
    icu_time_window['Covid_new_icu_admissions_last_28days']
).replace([float('inf'), -float('inf')], 0).fillna(0)

# Plot ICU segmented analysis
fig = go.Figure()

fig.add_trace(go.Scatter(
    x=icu_time_window['Date_reported'],
    y=icu_time_window['Covid_new_icu_admissions_last_7days'],
    mode='lines',
    name='7-Day ICU Admissions'
))

fig.add_trace(go.Scatter(
    x=icu_time_window['Date_reported'],
    y=icu_time_window['Covid_new_icu_admissions_last_28days'],
    mode='lines',
    name='28-Day ICU Admissions'
))

fig.add_trace(go.Scatter(
    x=icu_time_window['Date_reported'],
    y=icu_time_window['7d_vs_28d_ratio'],
    mode='lines',
    name='7d / 28d ICU Ratio',
    yaxis='y2',
    line=dict(dash='dot', color='black')
))

fig.update_layout(
    title='Comparison of 7-Day vs 28-Day COVID-19 ICU Admissions (Global)',
    xaxis_title='Date',
    yaxis=dict(title='ICU Admissions Count'),
    yaxis2=dict(title='7d/28d ICU Ratio', overlaying='y', side='right', showgrid=False),
    template='plotly_white',
    hovermode='x unified'
)

fig.show()

In the above chart, we see that:
  * **Rapid** spikes in the **7d/28d** **ICU ratio** may indicate worsening conditions or **overwhelmed** health systems.
  * **A stable** or declining ratio suggests steady or improving conditions over time.

In [None]:
df['Date_reported'] = pd.to_datetime(df['Date_reported'])

# Group by region and date
regional_comparison = df.groupby(['WHO_region', 'Date_reported'])[
    ['Covid_new_hospitalizations_last_7days', 'Covid_new_hospitalizations_last_28days']
].sum().reset_index().fillna(0)

# Compute 7d/28d ratio
regional_comparison['7d_vs_28d_ratio'] = (
    regional_comparison['Covid_new_hospitalizations_last_7days'] /
    regional_comparison['Covid_new_hospitalizations_last_28days']
).replace([float('inf'), -float('inf')], 0).fillna(0)

# Focus on AFRO and EURO
selected_regions = regional_comparison[regional_comparison['WHO_region'].isin(['AFR', 'EUR'])]

# Create interactive line plot
fig = px.line(
    selected_regions,
    x='Date_reported',
    y='7d_vs_28d_ratio',
    color='WHO_region',
    title='7-Day vs 28-Day Hospitalization Ratio: AFR vs EUR',
    labels={'7d_vs_28d_ratio': '7d / 28d Hospitalization Ratio', 'Date_reported': 'Date'},
    template='plotly_white'
)

fig.update_layout(hovermode='x unified')
fig.show()

In the above Chart, we can say:

1. **Short-Term** vs **Long-Term** Trends (Ratio Interpretation) 
  * The **7d/28d ratio** acts as a momentum indicator: 
    * **Ratio > 1 → A** recent spike in hospitalizations (last 7 days are worse than previous 3 weeks). 
    * **Ratio < 1 → A** decline in new hospitalizations (recent week is lighter than the past 28 days). 
    * **Assumption**: Sudden increases in this ratio could indicate new waves, variants, or policy relaxations. 

2. **Regional Dynamics** 
  * **EUR** may show more frequent and pronounced spikes:  
    * Could indicate better or more timely reporting. 
  * **Might** reflect higher transmission in certain seasons (e.g., winter). 
  * **AFR** trends may be flatter or delayed: 
    * Suggests either **lower** reported cases or potential data reporting **lags** or **gaps**. 
    * Could also reflect under-detection due to **limited healthcare** access or **testing infrastructure**. 

3. **Regional Outbreak Monitoring** 
  * The ratio trendlines can help predict: 
    * Upcoming hospitalization surges before they are visible in raw counts. 
    * Wave **decline** when ratios drop **below 1** consistently. 

4. **Data Reliability Considerations** 
  * The difference in patterns between **AFR** and **EUR** may not only be due to **epidemiological factors** but also: 
    * **Reporting capacity** 
    * **Frequency** of **data updates** 
    * **Public health infrastructure** 

In [None]:
df['Date_reported'] = pd.to_datetime(df['Date_reported'])

# Aggregate ICU data by region and date
icu_regional = df.groupby(['WHO_region', 'Date_reported'])[
    ['Covid_new_icu_admissions_last_7days', 'Covid_new_icu_admissions_last_28days']
].sum().reset_index().fillna(0)

# Calculate 7d vs 28d ICU ratio
icu_regional['7d_vs_28d_ratio'] = (
    icu_regional['Covid_new_icu_admissions_last_7days'] /
    icu_regional['Covid_new_icu_admissions_last_28days']
).replace([float('inf'), -float('inf')], 0).fillna(0)

# Filter for AFR and EUR
icu_selected = icu_regional[icu_regional['WHO_region'].isin(['AFR', 'EUR'])]

# Plot
fig = px.line(
    icu_selected,
    x='Date_reported',
    y='7d_vs_28d_ratio',
    color='WHO_region',
    title='7-Day vs 28-Day ICU Admission Ratio: AFR vs EUR',
    labels={
        '7d_vs_28d_ratio': '7d / 28d ICU Admission Ratio',
        'Date_reported': 'Date',
        'WHO_region': 'Region'
    },
    template='plotly_white'
)

fig.update_layout(hovermode='x unified')
fig.show()

**Interpretation of the Ratio**: 
  * **Ratio ≈ 1.0**: Steady ICU admissions over the past month.
  * **Ratio > 1.0**: ICU admissions have increased significantly in the last week — suggests a recent surge or outbreak.
  * **Ratio < 1.0**: ICU admissions have slowed — trend may be declining or stabilizing.

**Regional Comparison**: **AFR** vs **EUR**: 
  1) **EUR (Europe)**:
     * Likely to show more pronounced fluctuations due to:
     * More complete and frequent reporting.
     * Higher population density and travel volume.
     * Potential to reflect seasonal surges (e.g., winter).
  3) **AFR (Africa)**:
     * May show flatter or delayed spikes due to:
     * Data lags or underreporting.
     * Different pandemic wave timing.
     * Potential healthcare system limitations.


In [None]:
df['Date_reported'] = pd.to_datetime(df['Date_reported'])

# Group by country and date
country_trends = df.groupby(['Country', 'Date_reported'])[
    ['Covid_new_hospitalizations_last_7days', 'Covid_new_hospitalizations_last_28days']
].sum().reset_index().fillna(0)

# Compute ratio
country_trends['7d_vs_28d_ratio'] = (
    country_trends['Covid_new_hospitalizations_last_7days'] /
    country_trends['Covid_new_hospitalizations_last_28days']
).replace([float('inf'), -float('inf')], 0).fillna(0)

# Choose top 4 countries with highest total 7-day hospitalizations
top_countries = country_trends.groupby('Country')['Covid_new_hospitalizations_last_7days'].sum().nlargest(4).index.tolist()
filtered_trends = country_trends[country_trends['Country'].isin(top_countries)]

# Plot
fig = px.line(
    filtered_trends,
    x='Date_reported',
    y='7d_vs_28d_ratio',
    color='Country',
    title='Country Performance Over Time: Outbreak vs Recovery (7d/28d Ratio)',
    labels={
        '7d_vs_28d_ratio': '7d / 28d Hospitalization Ratio',
        'Date_reported': 'Date',
        'Country': 'Country'
    },
    template='plotly_white'
)

fig.update_layout(hovermode='x unified')
fig.show()

### In the above chart, we analyize that: 
1) **Outbreaks** 
  * When the **7d/28d** ratio spikes above **1.0**, it indicates a recent surge in **hospitalizations** — often a sign of an **outbreak** or **worsening** trend.
2) **Recovery** 
  * When the ratio consistently drops below **1.0**, the country is likely in a decline phase, with hospitalizations slowing.
3) **Volatility** 
  * **Fluctuations** may indicate **irregular reporting**, **policy changes**, or **data backlogs**.


In [None]:
df['Date_reported'] = pd.to_datetime(df['Date_reported'])

# Group by country and date for ICU admissions
icu_country_trends = df.groupby(['Country', 'Date_reported'])[
    ['Covid_new_icu_admissions_last_7days', 'Covid_new_icu_admissions_last_28days']
].sum().reset_index().fillna(0)

# Compute 7d vs 28d ICU ratio
icu_country_trends['7d_vs_28d_ratio'] = (
    icu_country_trends['Covid_new_icu_admissions_last_7days'] /
    icu_country_trends['Covid_new_icu_admissions_last_28days']
).replace([float('inf'), -float('inf')], 0).fillna(0)

# Top 4 countries by ICU volume
top_icu_countries = icu_country_trends.groupby('Country')['Covid_new_icu_admissions_last_7days'].sum().nlargest(4).index.tolist()
icu_filtered_trends = icu_country_trends[icu_country_trends['Country'].isin(top_icu_countries)]

# Plot
fig = px.line(
    icu_filtered_trends,
    x='Date_reported',
    y='7d_vs_28d_ratio',
    color='Country',
    title='Country Performance Over Time: ICU Outbreak vs Recovery (7d/28d Ratio)',
    labels={
        '7d_vs_28d_ratio': '7d / 28d ICU Admission Ratio',
        'Date_reported': 'Date',
        'Country': 'Country'
    },
    template='plotly_white'
)

fig.update_layout(hovermode='x unified')
fig.show()

#### Methodology

1. **Ratio Calculation**:
   - For each country-date pair, we compute:

     ```
     7d/28d ICU Ratio = ICU admissions (7 days) / ICU admissions (28 days)
     ```

   - This ratio tells us whether the **recent ICU admission trend is rising or falling**.

2. **Country Selection**:
   - Selected the **top 4 countries** with the highest total 7-day ICU admissions to ensure meaningful and dense trendlines.

3. **Visualization**:
   - An interactive **line chart** was created to display ICU trend ratios over time for each of the top countries.

#### Ratio Interpretation

| Ratio Value     | Meaning                              |
|-----------------|--------------------------------------|
| **> 1.0**       | Recent ICU surge or **outbreak**     |
| **≈ 1.0**       | Steady ICU admission trend           |
| **< 1.0**       | ICU admissions slowing → **recovery**|
| **Sharp drop**  | Possible data lag or rapid change    |

#### Country-Level Insights

- **Spiking curves** → Countries with **emerging ICU crises**
- **Flattening curves** → Countries showing signs of **recovery**
- **Country-to-country comparison** helps reveal:
  - Timing of waves
  - National response effectiveness
  - Healthcare infrastructure stress

#### Limitations

- ICU trends are affected by:
  - Testing and diagnosis practices
  - Reporting frequency and lag
  - Hospital capacity and admission policies

### Trend Detection & Alerts

In [None]:
df['Date_reported'] = pd.to_datetime(df['Date_reported'])

# Group by date
trend_data = df.groupby('Date_reported')[
    ['Covid_new_hospitalizations_last_7days', 'Covid_new_icu_admissions_last_7days']
].sum().reset_index().fillna(0)

# Compute daily changes
trend_data['Hosp_Change'] = trend_data['Covid_new_hospitalizations_last_7days'].diff()
trend_data['ICU_Change'] = trend_data['Covid_new_icu_admissions_last_7days'].diff()

# Define spike thresholds
hosp_thresh = trend_data['Hosp_Change'].mean() + 2 * trend_data['Hosp_Change'].std()
icu_thresh = trend_data['ICU_Change'].mean() + 2 * trend_data['ICU_Change'].std()

# Mark spikes
trend_data['Hosp_Spike'] = trend_data['Hosp_Change'] > hosp_thresh
trend_data['ICU_Spike'] = trend_data['ICU_Change'] > icu_thresh

# Plot
fig = go.Figure()

fig.add_trace(go.Scatter(x=trend_data['Date_reported'], y=trend_data['Covid_new_hospitalizations_last_7days'],
                         mode='lines', name='Hospitalizations (7-day)'))

fig.add_trace(go.Scatter(x=trend_data['Date_reported'], y=trend_data['Covid_new_icu_admissions_last_7days'],
                         mode='lines', name='ICU Admissions (7-day)'))

fig.add_trace(go.Scatter(x=trend_data.loc[trend_data['Hosp_Spike'], 'Date_reported'],
                         y=trend_data.loc[trend_data['Hosp_Spike'], 'Covid_new_hospitalizations_last_7days'],
                         mode='markers', name='Hospitalization Spike',
                         marker=dict(color='red', size=8, symbol='star')))

fig.add_trace(go.Scatter(x=trend_data.loc[trend_data['ICU_Spike'], 'Date_reported'],
                         y=trend_data.loc[trend_data['ICU_Spike'], 'Covid_new_icu_admissions_last_7days'],
                         mode='markers', name='ICU Spike',
                         marker=dict(color='orange', size=8, symbol='diamond')))

fig.update_layout(
    title='Spike Detection in COVID-19 Hospitalizations and ICU Admissions (7-day)',
    xaxis_title='Date',
    yaxis_title='Admissions',
    template='plotly_white',
    hovermode='x unified'
)

fig.show()

#### Methodology

1. **Data Aggregation**:
   - Grouped by `Date_reported` to sum total **7-day hospitalizations** and **7-day ICU admissions** globally.

2. **Change Calculation**:
   - Calculated daily differences:
     ```
     Hosp_Change = current_day_hosp - previous_day_hosp
     ICU_Change = current_day_icu - previous_day_icu
     ```

3. **Spike Threshold**:
   - Defined using the statistical rule:
     ```
     Spike Threshold = Mean Change + 2 * Standard Deviation
     ```
   - Used to flag significant increases in daily changes.

4. **Flagging Spikes**:
   - Days with changes above the threshold were flagged as spikes:
     - `Hosp_Spike`, `ICU_Spike`

5. **Visualization**:
   - Line chart of 7-day hospital and ICU admissions.
   - Markers:
     - Red Star: Hospitalization Spike
     - Orange Diamond: ICU Spike

#### Interpretation

| Marker           | Meaning                                   |
|------------------|-------------------------------------------|
| Red Star       | Sharp increase in **hospital admissions** |
| Orange Diamond | Sharp increase in **ICU admissions**       |

#### Insights

- **Spikes can indicate**:
  - New COVID-19 variant activity
  - Lifting of public health restrictions
  - Superspreader events
  - Reporting backlogs

#### Limitations

- Some spikes may result from **delayed data entry**.
- Fixed statistical threshold assumes normal distribution.
- For more accuracy, use rolling z-scores or anomaly detection models.

In [None]:
import plotly.express as px

In [None]:
# Set a lower threshold to detect more ICU alerts
icu_threshold = 5

# Get the latest date
latest_date = df['Date_reported'].max()
latest_snapshot = df[df['Date_reported'] == latest_date].copy()

# Select ICU data and flag threshold exceedance
icu_alert_data = latest_snapshot[['Country', 'WHO_region', 'Covid_new_icu_admissions_last_7days']].fillna(0)
icu_alert_data['ICU_Alert'] = icu_alert_data['Covid_new_icu_admissions_last_7days'] > icu_threshold

# Filter countries exceeding the threshold
icu_alert_countries = icu_alert_data[icu_alert_data['ICU_Alert']].sort_values(
    by='Covid_new_icu_admissions_last_7days', ascending=False)

# Plot interactive bar chart
if not icu_alert_countries.empty:
    fig = px.bar(
        icu_alert_countries,
        x='Covid_new_icu_admissions_last_7days',
        y='Country',
        orientation='h',
        color='WHO_region',
        title=f'Countries Exceeding ICU Threshold ({icu_threshold} ICU Admissions) on {latest_date.date()}',
        labels={'Covid_new_icu_admissions_last_7days': 'ICU Admissions (7-day)'},
        template='plotly_white'
    )
    fig.update_layout(yaxis={'categoryorder': 'total ascending'})
    fig.show()
else:
    print(f"No countries exceeded the ICU threshold of {icu_threshold} on {latest_date.date()}.")

#### Methodology

1. **Latest Snapshot**:
   - The analysis focuses on the **most recent reporting date** from the dataset.

2. **Threshold Definition**:
   - An ICU alert threshold was set at **50 new ICU admissions** over the past 7 days.

3. **Country Filtering**:
   - For each country, if `ICU admissions (7-day) > threshold`, it is flagged as an alert.

4. **Visualization**:
   - An interactive **horizontal bar chart** was generated to display countries with ICU alerts.
   - The bars are colored by WHO region to highlight **regional patterns**.

#### Result

- On the latest reporting date (**2025-04-27**), **no countries exceeded the ICU threshold of 50**.
- This may indicate:
  - A **low-severity period** in the pandemic.
  - **Reporting delays** or **incomplete data** for the latest date.
  - Effectiveness of control measures or treatment improvements.

#### Recommendation

- Repeat the analysis for:
  - **Earlier reporting dates** with more complete data
  - **Lower thresholds**  for finer sensitivity
  - **Time series alerts** to detect ICU threshold exceedance across any point in time

#### Limitations

- Threshold-based monitoring assumes timely and complete data.
- ICU admission figures may be underreported or delayed.
- National healthcare capacity varies, so a fixed threshold may not reflect local burden accurately.

In [None]:
df['Date_reported'] = pd.to_datetime(df['Date_reported'])

# Group and compute totals
severity_trend = df.groupby('Date_reported')[
    ['Covid_new_hospitalizations_last_7days', 'Covid_new_icu_admissions_last_7days']
].sum().reset_index().fillna(0)

# ICU-to-hospitalization ratio
severity_trend['ICU_to_Hosp_Ratio'] = (
    severity_trend['Covid_new_icu_admissions_last_7days'] /
    severity_trend['Covid_new_hospitalizations_last_7days']
).replace([float('inf'), -float('inf')], 0).fillna(0)

# Flag alerts
severity_threshold = 0.3
severity_trend['Severity_Alert'] = severity_trend['ICU_to_Hosp_Ratio'] > severity_threshold

# Plot
fig = go.Figure()

fig.add_trace(go.Scatter(x=severity_trend['Date_reported'],
                         y=severity_trend['ICU_to_Hosp_Ratio'],
                         mode='lines', name='ICU to Hospitalization Ratio'))

fig.add_trace(go.Scatter(x=severity_trend.loc[severity_trend['Severity_Alert'], 'Date_reported'],
                         y=severity_trend.loc[severity_trend['Severity_Alert'], 'ICU_to_Hosp_Ratio'],
                         mode='markers', name='Severity Alert',
                         marker=dict(color='red', size=8, symbol='star')))

fig.update_layout(
    title='ICU-to-Hospitalization Ratio Over Time (Severity Monitoring)',
    xaxis_title='Date',
    yaxis_title='ICU / Hospitalization Ratio',
    template='plotly_white',
    hovermode='x unified'
)

fig.show()

#### Methodology

1. **Data Aggregation**:
   - Summed `Covid_new_hospitalizations_last_7days` and `Covid_new_icu_admissions_last_7days` globally for each date.

2. **Severity Ratio Calculation**:
   - Calculated the ratio:
     ```
     ICU-to-Hospitalization Ratio = ICU Admissions (7d) / Hospitalizations (7d)
     ```
   - This ratio estimates the **proportion of hospitalized patients requiring intensive care**.

3. **Severity Alert Threshold**:
   - A threshold of **0.3 (30%)** was used to flag potential severity alerts.
   - If the ratio exceeds this threshold, it is marked as a potential red flag for health system stress.

4. **Visualization**:
   - A line chart tracks the ratio over time.
   - Spike alerts are highlighted with red stars.

#### Interpretation

| Ratio Range         | Interpretation                       |
|---------------------|--------------------------------------|
| < 0.15              | Mild to moderate severity            |
| 0.15 - 0.3          | Moderate concern                     |
| > 0.3               | High severity alert / ICU strain     |


#### Insights

- **Spikes in ratio** suggest a higher percentage of patients requiring critical care.
- This may point to:
  - More severe outbreaks
  - Lag in hospitalization data
  - Regional healthcare overload

#### Limitations

- Incomplete or delayed ICU reporting can skew the ratio.
- Hospital admission criteria vary between countries.
- Ratio does not account for **absolute volume** (a high ratio with low case numbers may not be alarming).

## Predictive Modeling

In [None]:
from statsmodels.tsa.holtwinters import ExponentialSmoothing

In [None]:
# Group and prepare
forecast_df = df.groupby('Date_reported')[
    ['Covid_new_hospitalizations_last_7days', 'Covid_new_icu_admissions_last_7days']
].sum().reset_index().fillna(0)
forecast_df.set_index('Date_reported', inplace=True)

# Fit models
hosp_model = ExponentialSmoothing(forecast_df['Covid_new_hospitalizations_last_7days'], trend='add', seasonal=None).fit()
icu_model = ExponentialSmoothing(forecast_df['Covid_new_icu_admissions_last_7days'], trend='add', seasonal=None).fit()

# Forecast 30 days
hosp_forecast = hosp_model.forecast(30)
icu_forecast = icu_model.forecast(30)
forecast_dates = pd.date_range(start=forecast_df.index.max() + pd.Timedelta(days=1), periods=30)

# Merge results
forecast_output = pd.DataFrame({
    'Date': forecast_dates,
    'Hospitalizations_Forecast': hosp_forecast,
    'ICU_Admissions_Forecast': icu_forecast
})
historical = forecast_df.reset_index().rename(columns={
    'Date_reported': 'Date',
    'Covid_new_hospitalizations_last_7days': 'Hospitalizations',
    'Covid_new_icu_admissions_last_7days': 'ICU_Admissions'
})
combined = pd.concat([historical, forecast_output], ignore_index=True)

# Plot
fig = go.Figure()
fig.add_trace(go.Scatter(x=combined['Date'], y=combined['Hospitalizations'], name='Observed Hospitalizations'))
fig.add_trace(go.Scatter(x=combined['Date'], y=combined['ICU_Admissions'], name='Observed ICU Admissions'))
fig.add_trace(go.Scatter(x=forecast_output['Date'], y=forecast_output['Hospitalizations_Forecast'],
                         name='Forecast Hospitalizations', line=dict(dash='dash')))
fig.add_trace(go.Scatter(x=forecast_output['Date'], y=forecast_output['ICU_Admissions_Forecast'],
                         name='Forecast ICU Admissions', line=dict(dash='dash')))
fig.update_layout(title='Forecast of COVID-19 Hospital and ICU Admissions (Next 30 Days)',
                  xaxis_title='Date', yaxis_title='Admissions', template='plotly_white')
fig.show()

#### Methodology

1. **Data Preparation**
- Daily totals were aggregated across all countries:
  - `Covid_new_hospitalizations_last_7days`
  - `Covid_new_icu_admissions_last_7days`
- Dates were used as a time index for modeling.

2. **Forecasting Technique**
- The analysis uses **Holt’s Exponential Smoothing**:
  - Captures linear trends in the data
  - Suitable for short-term forecasts where seasonality is not dominant
  - Automatically adjusts for recent trends and momentum

3. **Forecast Horizon**
- The model forecasts the next **30 days** of:
  - Hospital admissions
  - ICU admissions

#### Visualization

- **Historical trends** were shown as solid lines.
- **Forecasted values** were represented as **dashed lines**.
- Separate lines were plotted for:
  - Hospitalizations
  - ICU admissions

#### Interpretation

- Rising forecast lines suggest **potential surges** in hospital or ICU demand.
- Flat or declining trends may indicate a **plateau or recovery phase**.
- These forecasts assist in:
  - **Policy response planning**
  - **Resource allocation**
  - **Public health preparedness**

#### Limitations

- Exponential smoothing does not account for:
  - Seasonality
  - Vaccination, variants, or policy changes
- Highly dependent on the **completeness and recency** of input data
- Best suited for **short-term trend extrapolation**

In [None]:
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score, mean_absolute_error
from sklearn.preprocessing import StandardScaler

In [None]:
# Select features and target for ICU prediction
features = [
    'Covid_new_hospitalizations_last_7days',
    'Covid_new_hospitalizations_last_28days',
    'Covid_new_icu_admissions_last_28days'
]

target = 'Covid_new_icu_admissions_last_7days'

# Drop missing values and prepare data
ml_df = df[features + [target]].dropna()

# Feature scaling
X = ml_df[features]
y = ml_df[target]

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

# Train regression model
model = LinearRegression()
model.fit(X_train, y_train)

# Predict and evaluate
y_pred = model.predict(X_test)
r2 = r2_score(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)

# Collect results
regression_summary = {
    'Model': 'Linear Regression',
    'R2 Score': r2,
    'Mean Absolute Error': mae,
    'Feature Coefficients': dict(zip(features, model.coef_)),
    'Intercept': model.intercept_
}

regression_summary

In [None]:
# Train a more advanced model: Random Forest
rf_model = RandomForestRegressor(n_estimators=100, random_state=42)
rf_model.fit(X_train, y_train)

# Predict and evaluate
rf_pred = rf_model.predict(X_test)
rf_r2 = r2_score(y_test, rf_pred)
rf_mae = mean_absolute_error(y_test, rf_pred)

# Feature importance from Random Forest
feature_importance = dict(zip(features, rf_model.feature_importances_))

# Collect results
rf_summary = {
    'Model': 'Random Forest Regressor',
    'R2 Score': rf_r2,
    'Mean Absolute Error': rf_mae,
    'Feature Importance': feature_importance
}

rf_summary

## Dataset & Target

- **Target Variable**: `Covid_new_icu_admissions_last_7days`
- **Predictor Features**:
  - `Covid_new_hospitalizations_last_7days`
  - `Covid_new_hospitalizations_last_28days`
  - `Covid_new_icu_admissions_last_28days`

## Models Compared

### 1. Linear Regression
- A simple and interpretable model to evaluate linear relationships.
- **R² Score**: `0.9801` (excellent)
- **Mean Absolute Error**: `5.37` ICU admissions

#### Coefficients:
| Feature                              | Coefficient |
|--------------------------------------|-------------|
| Hospitalizations (7-day)             | +254.47     |
| Hospitalizations (28-day)            | –290.11     |
| ICU Admissions (28-day)              | +295.90     |

**Insight**: Recent hospitalizations and ICU trends have strong positive effects on ICU admissions. Longer-term hospitalizations are inversely correlated, possibly indicating saturation or recovery trends.

### 2. Random Forest Regressor
- An ensemble-based model that captures non-linear relationships and interactions.
- **R² Score**: `0.744`
- **Mean Absolute Error**: `6.63` ICU admissions

#### Feature Importance:
| Feature                              | Importance |
|--------------------------------------|------------|
| ICU Admissions (28-day)              | ~80%       |
| Hospitalizations (7-day)             | ~10%       |
| Hospitalizations (28-day)            | ~10%       |

**Insight**: The model places dominant weight on recent ICU trends, showing ICU patterns tend to persist. Hospitalization patterns play a smaller but supportive role.

## Summary

| Model                | R² Score | MAE    | Best Use                                  |
|----------------------|----------|--------|-------------------------------------------|
| Linear Regression     | 0.9801   | 5.37   | Simplicity, interpretability               |
| Random Forest Regressor | 0.744   | 6.63   | Handling non-linearity, feature interactions |

## Limitations

- Feature set is limited to recent hospitalization/ICU counts.
- External factors like policy changes, demographics, or variants are not included.
- Models assume data completeness and consistent reporting.

## Conclusion

> The linear regression shows strong predictive power with interpretable coefficients. Random forest confirms the dominant role of recent ICU patterns in forecasting ICU surges, while adding robustness to noise and outliers.