## 📌 Workflow Roadmap

### **1. Data Preparation**

* **Tasks**: Parse dates, handle missing values, rename columns.
* **Python tools**:

  * `pd.to_datetime(df['ObservationDate'])`
  * `df.fillna(0)` or `dropna()`
  * `df.rename(columns={...})`

---

### **2. Global Overview**

* **Question**: Which countries had the most confirmed cases and deaths overall?
* **Python tools**:

  * `groupby('Country/Region').sum()`
  * `nlargest()`
  * `matplotlib` / `seaborn.barplot`

---

### **3. Volatility in Daily Cases (Easy)**

* **Question**: Which country had the highest variability in daily new cases?
* **Python tools**:

  * `df['new_cases'] = df.groupby('Country/Region')['Confirmed'].diff()`
  * `groupby().std()` or `.var()`

---

### **4. Peak Timing**

* **Question**: When did the top 5 countries hit their peak of daily new cases?
* **Python tools**:

  * `idxmax()` on `new_cases` per country
  * Subtract with first case date → `Timedelta`

---

### **5. Outlier Detection**

* **Question**: Which countries were outliers in death-to-case ratio?
* **Python tools**:

  * `df['death_rate'] = Deaths / Confirmed`
  * `scipy.stats.zscore()` or IQR method

---

### **6. Recovery vs. Mortality**

* **Question**: What’s the relationship between recovery rate and death rate?
* **Python tools**:

  * `Recovered / Confirmed`, `Deaths / Confirmed`
  * `seaborn.scatterplot()` or `sns.regplot()`
  * `df.corr()`

---

### **7. Case Fatality Dynamics**

* **Question**: How did fatality rates change over time for the top 3 countries?
* **Python tools**:

  * `groupby(['Country/Region','ObservationDate']).sum()`
  * Line plot with `sns.lineplot()`

---

### **8. Growth Analysis**

* **Question**: Which countries grew fastest in the first 30 days?
* **Python tools**:

  * Align with "Day 0" using `groupby().cummin()` or reindex
  * Compute daily growth rate: `pct_change()`

---

### **9. Comparing First Waves**

* **Question**: How fast did each country move from 100 → 10,000 cases?
* **Python tools**:

  * Mask data where `Confirmed >= 100`
  * Find index where `Confirmed >= 10000`
  * Subtract dates

---

### **10. Lag Analysis**

* **Question**: What’s the lag between peak confirmed and peak deaths?
* **Python tools**:

  * Find `idxmax()` of `new_cases` and `new_deaths`
  * Compute difference in days

---

### **11. Time to Plateau**

* **Question**: Which countries flattened the curve earliest?
* **Python tools**:

  * `df['7d_avg'] = df['new_cases'].rolling(7).mean()`
  * Compare to peak ( < 10% )
  * Find first date meeting condition

---

### **12. Continental Trends**

* **Question**: Which continent led the second wave?
* **Python tools**:

  * Add continent column (`pycountry_convert` or manual map)
  * `groupby('Continent').sum()`
  * Compare first vs. second wave periods


# **Details of the questions**

## **1. Data Preparation**

**Question:** How can I clean and prepare my dataset so that all dates, columns, and missing values are consistent, making it ready for analysis?

---

## **2. Global Overview**

**Question:** Which countries experienced the **highest total number of confirmed COVID-19 cases and deaths** overall, and how can I visualize this comparison to highlight the most affected countries?

---

## **3. Volatility in Daily Cases**

**Question:** Which country had the **largest fluctuations in daily new confirmed cases**, indicating highly unstable outbreak patterns, and how can I quantify this variability using statistical measures like standard deviation or variance?

---

## **4. Peak Timing**

**Question:** For the **top 5 countries with the most cases**, on which dates did they experience their **highest number of daily new cases**, and how many days after their first reported case did these peaks occur?

---

## **5. Outlier Detection**

**Question:** Which countries had **unusually high or low death-to-case ratios** compared to the global average, and how can I detect these outliers using statistical methods like Z-scores or the interquartile range (IQR)?

---

## **6. Recovery vs. Mortality Relationship**

**Question:** For the top countries, is there a **relationship between recovery rate and death rate**? Do countries with higher recovery rates generally have lower death rates, and how can this relationship be visualized or measured using correlation analysis?

---

## **7. Case Fatality Dynamics**

**Question:** How did the **case fatality rate** (deaths ÷ confirmed cases) evolve over time for the **top 3 most affected countries**, and which country experienced the **fastest increase or decrease** in fatality rate?

---

## **8. Growth Analysis**

**Question:** During the **first 30 days after the first confirmed case**, which countries experienced the **fastest growth in confirmed cases**, and how can I compute and compare the daily growth rates to determine the speed of the outbreak?

---

## **9. Comparing First Waves**

**Question:** After reaching their **first 100 confirmed cases**, which country reached **10,000 cases the fastest**, and how can I calculate the number of days it took to reach this milestone to compare the speed of the first wave among countries?

---

## **10. Lag Analysis**

**Question:** For each country, what is the **time lag between the peak of daily new confirmed cases and the peak of daily deaths**, and what does this tell us about the progression from infection to fatality in different countries?

---

## **11. Time to Plateau**

**Question:** Which countries **flattened the curve earliest**, defined as when the **7-day rolling average of new cases dropped below 10% of its peak**, and how can I identify the date when this condition was first met?

---

## **12. Continental Trends**

**Question:** Across continents, which region contributed the **most to the second wave of COVID-19**, and how does the **growth of confirmed cases during the second wave compare to the first wave** for each continent?