# T4 - Working with data

Modeling without data is like riding a bicycle while blindfolded –rarely dull, but often you don't get to where you want to go. This tutorial shows how to use data with Covasim.

## Data scrapers

Covasim includes a script to automatically download time series data on diagnoses, deaths, and other information from several major sources of COVID-19 data. These include the [Corona Data Scraper](https://coronadatascraper.com), the [European Centre for Disease Prevention and
Control](https://www.ecdc.europa.eu/en/geographical-distribution-2019-ncov-cases), and the [COVID Tracking Project](https://covidtracking.com). These scrapers provide data for a large number of locations (over 4000 at the time of writing), including the US down to the county level and many other countries down to the district level. The data they download is already in the correct format for Covasim.

## Data input and formats

The correct input data format for Covasim looks like this:

In [None]:
import pandas as pd
df = pd.read_csv('example_data.csv')
print(df)

The data can be CSV, Excel, or JSON format. There **must** be a column named `date` (not "Date" or "day" or anything else). Otherwise, each column label must start with `new_` (daily) or `cum_` (cumulative) and then be followed by any of: `tests`, `diagnoses`, `deaths`, `severe` (corresponding to hospitalizations), or `critical` (corresponding to ICU admissions). While other columns can be included and will be loaded, they won't be parsed by Covasim. Note that if you enter a `new_` (daily) column, Covasim will automatically calculate a `cum_` (cumulative) column for you.

<div class="alert alert-info">
    
**Note:** Sometimes date information fails to be read properly, especially when loading from Excel files. See Tutorial 8 for help on fixing this.
    
</div>

This example shows how a simulation can load in the data, and how it automatically plots it. (We'll cover interventions properly in the next tutorial.)

In [None]:
import covasim as cv
cv.options.set(dpi=100, show=False, close=True, verbose=0) # Standard options for Jupyter notebook

pars = dict(
    start_day = '2020-02-01',
    end_day   = '2020-04-11',
    beta      = 0.015,
)
sim = cv.Sim(pars=pars, datafile='example_data.csv', interventions=cv.test_num(daily_tests='data'))
sim.run()
sim.plot(to_plot=['cum_tests', 'cum_diagnoses', 'cum_deaths'])

As you can see, this is not a great fit to data – but we'll come to calibration in Tutorial 7.