# USGS dataretrieval Python Package `get_qwdata()` Examples

This notebook provides examples of using the Python dataretrieval package to retrieve water quality sample data for United States Geological Survey (USGS) monitoring sites. The dataretrieval package provides a collection of functions to get data from the USGS National Water Information System (NWIS) and other online sources of hydrology and water quality data, including the United States Environmental Protection Agency (USEPA).

### Install the Package

Use the following code to install the package if it doesn't exist already within your Jupyter Python environment.

In [1]:
!pip install dataretrieval

Defaulting to user installation because normal site-packages is not writeable




Load the package so you can use it along with other packages used in this notebook.

In [2]:
from dataretrieval import nwis
from IPython.display import display

### Basic Usage

The dataretrieval package has several functions that allow you to retrieve data from different web services. This examples uses the `get_qwdata()` function to retrieve water quality sample data for USGS monitoring sites from NWIS. The following arguments are supported:

Arguments (Additional arguments, if supplied, will be used as query parameters)

* **sites** (string or list of strings): A list of USGS site identifiers for which to retrieve data. If the qwdata parameter site_no is supplied, it will overwrite the sites parameter.
* **parameterCd** (string or list of strings): A list of USGS parameter codes for which to retrieve data.
* **start** (string): The beginning date for a period for which to retrieve data. If the qwdata parameter begin_date is supplied, it will overwrite the start parameter.
* **end** (string): The ending date for a period for which to retrieve data. If the qwdata parameter end_date is supplied, it will overwrite the end parameter.
* **datetime_index** (boolean): If True, create a datetime index
* **wide_format** (boolean): If True, return data in wide format with multiple samples per row and one row per time.

#### Example 1: Get all water quality sample data for a single monitoring site

In [3]:
siteID = '10109000'
wq_data = nwis.get_qwdata(sites=siteID)
print('Retrieved data for ' + str(len(wq_data[0])) + ' samples.')



KeyError: 'sample_start_time_datum_cd'

### Interpreting the Result

The result of calling the `get_qwdata()` function is an object that contains a Pandas data frame object and an associated metadata object. The Pandas data frame contains the water quality sample data for the requested site, and or observed variables and time frame.

Once you've got the data frame, there's several useful things you can do to explore the data.

Display the data frame as a table. The default data frame for this function is a  wide, cross-tabulated table, with columns for each observed variable and a row for each sample date (wide_format=True).

In [4]:
display(wq_data[0])

NameError: name 'wq_data' is not defined

Show the data types of the columns in the resulting data frame.

In [5]:
print(wq_data[0].dtypes)

NameError: name 'wq_data' is not defined

The other part of the result returned from the `get_qwdata()` function is a metadata object that contains information about the query that was executed to return the data. For example, you can access the URL that was assembled to retrieve the requested data from the USGS web service. The USGS web service responses contain a descriptive header that defines and can be helpful in interpreting the contents of the response.

In [6]:
print('The query URL used to retrieve the data from NWIS was: ' + wq_data[1].url)

NameError: name 'wq_data' is not defined

### Additional Examples

#### Example 2: Get water quality sample data for multiple sites for a single parameter

In [7]:
site_ids = ['04024430', '04024000']
parameter_code = '00065'
wq_multi_site = nwis.get_qwdata(sites=site_ids, parameterCd=parameter_code)
print('Retrieved data for ' + str(len(wq_multi_site[0])) + ' samples.')
display(wq_multi_site[0])



KeyError: 'sample_start_time_datum_cd'

The following example is the same as the previous example but with multi index turned off (multi_index=False)

In [8]:
site_ids = ['04024430', '04024000']
parameter_code = '00065'
wq_multi_site = nwis.get_qwdata(sites=site_ids, parameterCd=parameter_code, multi_index=False)
print('Retrieved data for ' + str(len(wq_multi_site[0])) + ' samples.')
display(wq_multi_site[0])



KeyError: 'sample_start_time_datum_cd'

#### Example 3: Retrieve water quality sample data for multiple sites, including a list of parameters, within a time period defined by start and end dates

In [9]:
site_ids = ['04024430', '04024000']
parameterCd = ['34247', '30234', '32104', '34220']
startDate = '2012-01-01'
endDate = ''
wq_data2 = nwis.get_qwdata(sites=site_ids, parameterCd=parameterCd,
                           start=startDate, end=endDate)
print('Retrieved data for ' + str(len(wq_multi_site[0])) + ' samples.')
display(wq_data2[0])




KeyError: 'sample_start_time_datum_cd'

The following example is the same as the previous example but with multi index turned off (multi_index=False)

In [10]:
site_ids = ['04024430', '04024000']
parameterCd = ['34247', '30234', '32104', '34220']
startDate = '2012-01-01'
endDate = ''
wq_data2 = nwis.get_qwdata(sites=site_ids, parameterCd=parameterCd,
                           start=startDate, end=endDate, multi_index=False)
print('Retrieved data for ' + str(len(wq_multi_site[0])) + ' samples.')
display(wq_data2[0])



KeyError: 'sample_start_time_datum_cd'

#### Example 4: Retrieve water quality sample data for one site in serial format

Each row in the resulting table represents a single observation of a single parameters. Each sample may be analyzed for multiple parameters and so a single water quality sample can result in multiple rows in serial format.

In [11]:
siteID = '10109000'
wq_data = nwis.get_qwdata(sites=siteID, wide_format=False)
print('Retrieved data for ' + str(len(wq_data[0])) + ' sample results.')
display(wq_data[0])



KeyError: 'sample_start_time_datum_cd'