# Finding data of interest

The Epidata API includes numerous data streams -- medical claims data, cases and
deaths, mobility, and many others -- covering different geographic regions. This
can make it a challenge to find the data stream that you are most interested in.
This page will provide some advice on how to locate donate that may be useful to
you.

## Using the Delphi Epidata API documentation

The Delphi Epidata API documentation lists all the available data sources and
signals for
[COVID-19](https://cmu-delphi.github.io/delphi-epidata/api/covidcast_signals.html)
and for [other
diseases](https://cmu-delphi.github.io/delphi-epidata/api/README.html#source-specific-parameters).
The site also includes a search tool if you have a keyword (e.g. "Taiwan") in
mind. Generally, any endpoint listed in the Delphi Epidata API has an associated
function in this client where its API endpoint name is prefixed with either
`pub_` or `pvt_`, e.g. `pub_covidcast` or `pvt_twitter`.

## Epidata data sources

The parameters available for each source data are documented in each linked
source-specific API page. The epidatpy client will also expect certain fields,
depending on the endpoint, though the Delphi Epidata API documentation will
contain more information about the accepted ranges of values for each field. 

A dynamically generated list of all available data sources can be obtained by
using the built-in `available_endpoints()`:

In [None]:
# Hidden cell (set in the metadata for this cell)
import pandas as pd

# Set common options and context
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", 10)
pd.set_option("display.width", 1000)

In [None]:
from IPython.display import HTML

from epidatpy import available_endpoints

table = available_endpoints()
HTML(table.to_html(index=False))

## Covidcast source and signal metadata

The `CovidcastEpidata` class provides a way to access information about the data
in the `pub_covidcast` endpoint directly from within the client. The cell below
demonstrates how to access this metadata by using `source_df` property, which
returns a Pandas DataFrame of metadata describing all data streams publically
accessible from the COVIDcast endpoint of the Delphi Epidata API. This mirrors
the information found in the [COVIDcast signals
endpoint](https://cmu-delphi.github.io/delphi-epidata/api/covidcast_signals.html).

In [None]:
from epidatpy import CovidcastEpidata

epidata = CovidcastEpidata()
epidata.source_df

This DataFrame contains the following columns:

- `source` - API-internal source name.
- `name` - Human-readable source name.
- `description` - Description of the signal.
- `reference_signal` - Geographic level for which this signal is available, such as county, state, msa, hss, hrr, or nation. Most signals are available at multiple geographic levels and will hence be listed in multiple rows with their own metadata.
- `license` - The license.
- `dua` - Link to the Data Use Agreement.
- `signals` - List of signals available from this data source.

The `signal_df` DataFrame can also be used to obtain information about the signals
that are available - for example, what time range they are available for,
and when they have been updated.

In [None]:
epidata.signal_df

This DataFrame contains one row each available signal, with the following columns:

- `source` - Data source name.
- `signal` - API-internal signal name.
- `name` - Human-readable signal name.
- `active` - Whether the signal is currently not updated or not. Signals may be inactive because the sources have become unavailable, other sources have replaced them, or additional work is required for us to continue updating them.
- `short_description` - Brief description of the signal.
- `description` - Full description of the signal.
- `geo_types` - Spatial resolution of the signal (e.g., `county`, `hrr`, `msa`, `dma`, `state`). More detail about all `geo_types` is given in the [geographic coding documentation](https://cmu-delphi.github.io/delphi-epidata/api/covidcast_geography.html).
- `time_type` - Temporal resolution of the signal (e.g., day, week; see [date coding details](https://cmu-delphi.github.io/delphi-epidata/api/covidcast_times.html)).
- `time_label` - The time label ("Date", "Week").
- `value_label` - The value label ("Value", "Percentage", "Visits", "Visits per 100,000 people").
- `format` - The value format ("per100k", "percent", "fraction", "count", "raw").
- `category` - The signal category ("early", "public", "late", "other").
- `high_values_are`- What the higher value of signal indicates ("good", "bad", "neutral").
- `is_smoothed` - Whether the signal is smoothed.
- `is_weighted` - Whether the signal is weighted.
- `is_cumulative` - Whether the signal is cumulative.
- `has_stderr` - Whether the signal has `stderr` statistic.
- `has_sample_size` - Whether the signal has `sample_size` statistic.
- `geo_types` - Geographical levels for which this signal is available.
