Step 1 - Load dataset with pandas

In [2]:
import pandas as pd
file_path = "COVID-19_Daily_Counts_of_Cases,_Hospitalizations,_and_Deaths_20251114.csv"

df = pd.read_csv(file_path)
df.head()

Unnamed: 0,date_of_interest,CASE_COUNT,PROBABLE_CASE_COUNT,HOSPITALIZED_COUNT,DEATH_COUNT,CASE_COUNT_7DAY_AVG,ALL_CASE_COUNT_7DAY_AVG,HOSP_COUNT_7DAY_AVG,DEATH_COUNT_7DAY_AVG,BX_CASE_COUNT,...,SI_CASE_COUNT,SI_PROBABLE_CASE_COUNT,SI_HOSPITALIZED_COUNT,SI_DEATH_COUNT,SI_PROBABLE_CASE_COUNT_7DAY_AVG,SI_CASE_COUNT_7DAY_AVG,SI_ALL_CASE_COUNT_7DAY_AVG,SI_HOSPITALIZED_COUNT_7DAY_AVG,SI_DEATH_COUNT_7DAY_AVG,INCOMPLETE
0,02/29/2020,1,0,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,03/01/2020,0,0,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,03/02/2020,0,0,2,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,03/03/2020,1,0,7,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,03/04/2020,5,0,2,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Step 2 — Select a numeric column

In [3]:
numeric_cols = df.select_dtypes(include="number").columns.tolist()
numeric_cols


['DEATH_COUNT',
 'DEATH_COUNT_7DAY_AVG',
 'BX_HOSPITALIZED_COUNT',
 'BX_DEATH_COUNT',
 'BX_HOSPITALIZED_COUNT_7DAY_AVG',
 'BX_DEATH_COUNT_7DAY_AVG',
 'BK_HOSPITALIZED_COUNT',
 'BK_DEATH_COUNT',
 'BK_HOSPITALIZED_COUNT_7DAY_AVG',
 'BK_DEATH_COUNT_7DAY_AVG',
 'MN_PROBABLE_CASE_COUNT',
 'MN_HOSPITALIZED_COUNT',
 'MN_DEATH_COUNT',
 'MN_PROBABLE_CASE_COUNT_7DAY_AVG',
 'MN_HOSPITALIZED_COUNT_7DAY_AVG',
 'MN_DEATH_COUNT_7DAY_AVG',
 'QN_HOSPITALIZED_COUNT',
 'QN_DEATH_COUNT',
 'QN_HOSPITALIZED_COUNT_7DAY_AVG',
 'QN_DEATH_COUNT_7DAY_AVG',
 'SI_PROBABLE_CASE_COUNT',
 'SI_HOSPITALIZED_COUNT',
 'SI_DEATH_COUNT',
 'SI_PROBABLE_CASE_COUNT_7DAY_AVG',
 'SI_HOSPITALIZED_COUNT_7DAY_AVG',
 'SI_DEATH_COUNT_7DAY_AVG']

In [4]:
numeric_col = numeric_cols[0]  # or change manually
numeric_col


'DEATH_COUNT'

Step 3 — Clean and show summary

In [5]:
series = df[numeric_col].dropna()
series.describe()


count    2054.000000
mean       22.954236
std        75.237910
min         0.000000
25%         2.000000
50%         5.000000
75%        14.750000
max       832.000000
Name: DEATH_COUNT, dtype: float64

Step 4 — Mean, Median, Mode with pandas

In [6]:
mean_pandas = series.mean()
median_pandas = series.median()
mode_pandas = list(series.mode())

print("Pandas mean:", mean_pandas)
print("Pandas median:", median_pandas)
print("Pandas mode(s):", mode_pandas)


Pandas mean: 22.95423563777994
Pandas median: 5.0
Pandas mode(s): [1]


PART 2 — The Hard Way!!
Step 5 — Read CSV manually

In [7]:
import csv

values = []
with open(file_path, newline="", encoding="utf-8") as f:
    reader = csv.DictReader(f)
    for row in reader:
        v = row.get(numeric_col)
        if v is None or v == "":
            continue
        try:
            values.append(float(v))
        except ValueError:
            pass

print("Loaded", len(values), "values")
values[:10]


Loaded 2054 values


[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]

Step 6 — Manual mean

In [8]:
n = len(values)
total = 0
for v in values:
    total += v
mean_manual = total / n
mean_manual


22.95423563777994

Step 7 — Manual median

In [9]:
vals_sorted = sorted(values)
n = len(vals_sorted)

if n % 2 == 1:
    median_manual = vals_sorted[n // 2]
else:
    median_manual = (vals_sorted[n//2 - 1] + vals_sorted[n//2]) / 2

median_manual


5.0

Step 8 — Manual mode

In [10]:
counts = {}

for v in values:
    counts[v] = counts.get(v, 0) + 1

max_count = max(counts.values())
manual_modes = [v for v,c in counts.items() if c == max_count]

manual_modes


[1.0]

Step 9 — Compare results

In [11]:
print("==== COMPARISON ====")
print("Mean:   pandas =", mean_pandas,  " manual =", mean_manual)
print("Median: pandas =", median_pandas, " manual =", median_manual)
print("Mode(s):")
print("  pandas:", mode_pandas)
print("  manual:", manual_modes)


==== COMPARISON ====
Mean:   pandas = 22.95423563777994  manual = 22.95423563777994
Median: pandas = 5.0  manual = 5.0
Mode(s):
  pandas: [1]
  manual: [1.0]


PART 3 — Visualization
Step 10 — Pick a date column

In [15]:

date_cols = [c for c in df.columns if "date" in c.lower()]
date_cols


['date_of_interest']

In [14]:
date_col = date_cols[0]  


Step 11 — Prepare 30-day subset

In [16]:
labels = df[date_col].astype(str).head(30).tolist()
values_30 = df[numeric_col].head(30).tolist()

len(labels), len(values_30)


(30, 30)

Step 12 — ASCII bar chart (STANDARD LIBRARY)

In [17]:
def ascii_bar_chart(labels, values, max_width=40):
    max_val = max(values) if max(values) != 0 else 1
    
    for label, value in zip(labels, values):
        bar_len = int(value / max_val * max_width)
        bar = "*" * bar_len
        short_label = label[:10]  # show only first 10 chars
        print(f"{short_label:>10} | {bar} ({value})")

ascii_bar_chart(labels, values_30)


02/29/2020 |  (0)
03/01/2020 |  (0)
03/02/2020 |  (0)
03/03/2020 |  (0)
03/04/2020 |  (0)
03/05/2020 |  (0)
03/06/2020 |  (0)
03/07/2020 |  (0)
03/08/2020 |  (0)
03/09/2020 |  (0)
03/10/2020 |  (0)
03/11/2020 |  (1)
03/12/2020 |  (0)
03/13/2020 |  (0)
03/14/2020 |  (3)
03/15/2020 |  (4)
03/16/2020 | * (9)
03/17/2020 | * (12)
03/18/2020 | ** (24)
03/19/2020 | *** (27)
03/20/2020 | ***** (49)
03/21/2020 | ***** (51)
03/22/2020 | ***** (52)
03/23/2020 | ********** (96)
03/24/2020 | ************ (107)
03/25/2020 | **************** (143)
03/26/2020 | ************************* (229)
03/27/2020 | ***************************** (262)
03/28/2020 | ************************************ (323)
03/29/2020 | **************************************** (356)


PART 4 — Markdown Cells You Should Add
INTRODUCTION

For this project, I chose to work with the COVID-19 Daily Counts of Cases, Hospitalizations, and Deaths dataset. I picked this dataset because it’s highly relevant, easy to understand, and contains clear numeric variables that can be analyzed using both pandas and Python’s standard library.

In this notebook, I focus on daily death counts as the main numeric variable. This variable is straightforward but meaningful — it provides a direct view of how the pandemic evolved over time and how severe certain periods were.

PANDAS SUMMARY STATISTICS
Using pandas, I calculated the mean, median, and mode of daily death counts.

Mean shows the average number of deaths per day.

Median represents the middle value of the distribution, revealing what a “typical” day looks like.

Mode shows the most frequently occurring daily death count.

Interpreting these together gives a better picture of the distribution.
For example, if the mean is much higher than the median, it indicates several high-death days pulled the average upward (a right-skewed distribution).
If the mode is low, it means most days saw relatively low deaths even if a few days had large spikes.

THE HARD WAY
Re-calculating these statistics using only the Python standard library helped me understand what pandas does behind the scenes.

Manually computing the mean was simple: add everything and divide by the count.
The median required sorting and checking whether the number of observations was odd or even.
The mode was the most work because I had to manually count frequencies using a dictionary.

This part made me realize how much convenience libraries like pandas provide — especially for larger datasets. Even though the logic is simple, implementing it manually requires a lot more steps and careful handling of missing or invalid values.

VISUALIZATION
For the visualization, I created an ASCII bar chart using only the Python standard library. Each line represents one day, with the number of stars showing the relative size of daily death counts.

Even though it’s a very simple visualization, it still reveals patterns. Periods with long bars correspond to days with unusually high death counts, while clusters of shorter bars indicate calmer periods.

This text-based view makes it easier to spot spikes and compare the scale of different days, even without external plotting libraries.

CONCLUSION
This project helped me practice both high-level data analysis using pandas and low-level computation using only built-in Python tools. Working through the manual calculations reinforced how summary statistics are actually computed and why libraries are so useful.

The visualization exercise also showed that even simple ASCII output can communicate meaningful trends. Overall, the project strengthened my understanding of basic data manipulation, computational thinking, and the value of clean code structure.