# USGS dataretrieval Python Package `get_qwdata()` Examples

This notebook provides examples of using the Python dataretrieval package to retrieve water quality sample data for United States Geological Survey (USGS) monitoring sites. The dataretrieval package provides a collection of functions to get data from the USGS National Water Information System (NWIS) and other online sources of hydrology and water quality data, including the United States Environmental Protection Agency (USEPA).

### Install the Package

Use the following code to install the package if it doesn't exist already within your Jupyter Python environment.

In [1]:
!pip install dataretrieval

Defaulting to user installation because normal site-packages is not writeable




Load the package so you can use it along with other packages used in this notebook.

In [2]:
from dataretrieval import nwis
from IPython.display import display

### Basic Usage

The dataretrieval package has several functions that allow you to retrieve data from different web services. This examples uses the `get_qwdata()` function to retrieve water quality sample data for USGS monitoring sites from NWIS. The following arguments are supported:

Arguments (Additional arguments, if supplied, will be used as query parameters)

* **sites** (string or list of strings): A list of USGS site identifiers for which to retrieve data. If the qwdata parameter site_no is supplied, it will overwrite the sites parameter.
* **parameterCd** (string or list of strings): A list of USGS parameter codes for which to retrieve data.
* **start** (string): The beginning date for a period for which to retrieve data. If the qwdata parameter begin_date is supplied, it will overwrite the start parameter.
* **end** (string): The ending date for a period for which to retrieve data. If the qwdata parameter end_date is supplied, it will overwrite the end parameter.
* **datetime_index** (boolean): If True, create a datetime index
* **wide_format** (boolean): If True, return data in wide format with multiple samples per row and one row per time.

#### Example 1: Get all water quality sample data for a single monitoring site

In [3]:
siteID = '10109000'
wq_data = nwis.get_qwdata(sites=siteID)
print('Retrieved data for ' + str(len(wq_data[0])) + ' samples.')



Retrieved data for 345 samples.




### Interpreting the Result

The result of calling the `get_qwdata()` function is an object that contains a Pandas data frame object and an associated metadata object. The Pandas data frame contains the water quality sample data for the requested site, and or observed variables and time frame.

Once you've got the data frame, there's several useful things you can do to explore the data.

Display the data frame as a table. The default data frame for this function is a  wide, cross-tabulated table, with columns for each observed variable and a row for each sample date (wide_format=True).

In [4]:
display(wq_data[0])

Unnamed: 0_level_0,agency_cd,site_no,sample_dt,sample_tm,sample_end_dt,sample_end_tm,sample_start_time_datum_cd,tm_datum_rlbty_cd,coll_ent_cd,medium_cd,...,p70300,p70301,p70302,p70303,p71851,p71999,p81903,p82398,p84164,p99111
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1967-09-13 13:35:00+00:00,USGS,10109000,1967-09-13,07:35,,,-0600,T,,WS,...,196.0,,108.0,0.27,0.1,,,,,
1968-01-18 19:20:00+00:00,USGS,10109000,1968-01-18,12:20,,,-0700,T,,WS,...,210.0,,63.5,0.29,0.5,,,,,
1968-05-15 18:30:00+00:00,USGS,10109000,1968-05-15,12:30,,,-0600,T,,WS,...,155.0,156.0,151.0,0.21,0.6,,,,,
1968-07-26 20:40:00+00:00,USGS,10109000,1968-07-26,14:40,,,-0600,T,,WS,...,189.0,188.0,135.0,0.26,0.3,,,,,
1972-12-08 23:15:00+00:00,USGS,10109000,1972-12-08,16:15,,,-0700,T,,WS,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
NaT,USGS,10109000,1988-11-28,,,,-0700,T,USGS-WRD,WS,...,,,,,,,,,,
NaT,USGS,10109000,1989-04-27,,,,-0600,T,USGS-WRD,WS,...,,,,,,,,,,
NaT,USGS,10109000,1989-07-28,,,,-0600,T,USGS-WRD,WS,...,,,,,,,,,,
NaT,USGS,10109000,1989-08-31,,,,-0600,T,USGS-WRD,WS,...,,,,,,,,,,


Show the data types of the columns in the resulting data frame.

In [5]:
print(wq_data[0].dtypes)

agency_cd                      object
site_no                        object
sample_dt                      object
sample_tm                      object
sample_end_dt                 float64
sample_end_tm                 float64
sample_start_time_datum_cd     object
tm_datum_rlbty_cd              object
coll_ent_cd                    object
medium_cd                      object
project_cd                     object
aqfr_cd                       float64
tu_id                         float64
body_part_id                  float64
hyd_cond_cd                    object
samp_type_cd                    int64
hyd_event_cd                    int64
sample_lab_cm_txt             float64
p00003                        float64
p00004                        float64
p00009                        float64
p00010                        float64
p00020                         object
p00028                        float64
p00060                        float64
p00061                        float64
p00065      

The other part of the result returned from the `get_qwdata()` function is a metadata object that contains information about the query that was executed to return the data. For example, you can access the URL that was assembled to retrieve the requested data from the USGS web service. The USGS web service responses contain a descriptive header that defines and can be helpful in interpreting the contents of the response.

In [6]:
print('The query URL used to retrieve the data from NWIS was: ' + wq_data[1].url)

The query URL used to retrieve the data from NWIS was: https://nwis.waterdata.usgs.gov/nwis/qwdata?site_no=10109000&qw_sample_wide=qw_sample_wide&agency_cd=USGS&format=rdb&pm_cd_compare=Greater+than&inventory_output=0&rdb_inventory_output=file&TZoutput=0&rdb_qw_attributes=expanded&date_format=YYYY-MM-DD&rdb_compression=value&submitted_form=brief_list


### Additional Examples

#### Example 2: Get water quality sample data for multiple sites for a single parameter

In [7]:
site_ids = ['04024430', '04024000']
parameter_code = '00065'
wq_multi_site = nwis.get_qwdata(sites=site_ids, parameterCd=parameter_code)
print('Retrieved data for ' + str(len(wq_multi_site[0])) + ' samples.')
display(wq_multi_site[0])



Retrieved data for 281 samples.


Unnamed: 0_level_0,Unnamed: 1_level_0,agency_cd,sample_dt,sample_tm,sample_end_dt,sample_end_tm,sample_start_time_datum_cd,tm_datum_rlbty_cd,coll_ent_cd,medium_cd,project_cd,aqfr_cd,tu_id,body_part_id,hyd_cond_cd,samp_type_cd,hyd_event_cd,sample_lab_cm_txt,p00065
site_no,datetime,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
04024000,1984-04-24 14:30:00+00:00,USGS,1984-04-24,08:30,,,-0600,T,USGS-WRD,WS,,,,,9,9,9,,4.62
04024000,1984-06-19 15:15:00+00:00,USGS,1984-06-19,10:15,,,-0500,T,USGS-WRD,WS,,,,,9,9,9,,6.43
04024000,1984-08-22 15:45:00+00:00,USGS,1984-08-22,10:45,,,-0500,T,USGS-WRD,WS,,,,,9,9,9,,3.24
04024000,1985-02-11 20:30:00+00:00,USGS,1985-02-11,14:30,,,-0600,T,USGS-WRD,WS,,,,,9,9,9,,3.51
04024000,2010-10-07 15:45:00+00:00,USGS,2010-10-07,10:45,2010-11-02,10:15,-0500,K,USGS-WRD,WS,GR11NQ00E,,,,4,9,9,,3.11
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
04024430,1983-04-19 16:00:00+00:00,USGS,1983-04-19,10:00,,,-0600,T,USGS-WRD,WS,,,,,A,9,9,,8.83
04024430,1983-05-24 21:10:00+00:00,USGS,1983-05-24,16:10,,,-0500,T,USGS-WRD,WS,,,,,9,9,9,,5.64
04024430,1983-07-06 17:00:00+00:00,USGS,1983-07-06,12:00,,,-0500,T,USGS-WRD,WS,,,,,5,9,9,,12.59
04024430,1983-08-16 19:20:00+00:00,USGS,1983-08-16,14:20,,,-0500,T,USGS-WRD,WS,,,,,9,9,9,,5.23


#### Example 3: Retrieve water quality sample data for multiple sites, including a list of parameters, within a time period defined by start and end dates

In [8]:
site_ids = ['04024430', '04024000']
parameterCd = ['34247', '30234', '32104', '34220']
startDate = '2012-01-01'
endDate = ''
wq_data2 = nwis.get_qwdata(sites=site_ids, parameterCd=parameterCd,
                           start=startDate, end=endDate)
print('Retrieved data for ' + str(len(wq_multi_site[0])) + ' samples.')
display(wq_data2[0])




Retrieved data for 281 samples.


Unnamed: 0_level_0,Unnamed: 1_level_0,agency_cd,sample_dt,sample_tm,sample_end_dt,sample_end_tm,sample_start_time_datum_cd,tm_datum_rlbty_cd,coll_ent_cd,medium_cd,project_cd,...,tu_id,body_part_id,hyd_cond_cd,samp_type_cd,hyd_event_cd,sample_lab_cm_txt,p30234,p32104,p34220,p34247
site_no,datetime,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
4024000,2012-01-25 14:40:00+00:00,USGS,2012-01-25,08:40,,,-600,K,USGS-WRD,WS,GR12NK00E,...,,,4,9,B,A-0270113 **Second GCC bottle (sch 4433) is an...,< 0.16,< 0.16,< 0.02,< 0.02
4024000,2012-02-22 14:45:00+00:00,USGS,2012-02-22,08:45,,,-600,K,USGS-WRD,WS,GR12NK00E,...,,,4,9,B,A-0540032 **Second GCC bottle (sch 4433) is an...,< 0.16,< 0.16,< 0.02,< 0.02
4024000,2012-03-12 13:55:00+00:00,USGS,2012-03-12,08:55,,,-500,K,USGS-WRD,WS,GR12NK00E,...,,,4,9,B,A-0740087 ** Second GCC bottle (Sch 4433) is a...,< 0.16,0.01 bt,< 0.02,< 0.02
4024000,2012-03-27 13:55:00+00:00,USGS,2012-03-27,08:55,,,-500,K,USGS-WRD,WS,GR12NK00E,...,,,9,9,A,A-0900013 **Second GCC bottle (sch 4433) is an...,< 0.16,< 0.16,< 0.02,< 0.02
4024000,2012-04-17 14:20:00+00:00,USGS,2012-04-17,09:20,,,-500,K,USGS-WRD,WS,GR12NK00E,...,,,8,9,J,A-1110047 **Second GCC bottle (sch 4433) is an...,< 0.16,< 0.16,< 0.02,< 0.02
4024000,2012-04-25 15:47:00+00:00,USGS,2012-04-25,10:47,,,-500,K,USGS-WRD,WS,GR12NK00E,...,,,5,9,9,A-1180039 **Second GCC bottle (SCH4433) is and...,< 0.16,< 0.16,< 0.02,< 0.02
4024000,2012-05-17 13:40:00+00:00,USGS,2012-05-17,08:40,,,-500,K,USGS-WRD,WS,GR12NK00E,...,,,5,9,9,A-1390189 **Second GCC bottle (sch 4433) is an...,< 0.16,< 0.16,< 0.02,< 0.02
4024000,2012-05-24 18:23:00+00:00,USGS,2012-05-24,13:23,,,-500,K,USGS-WRD,WS,GR12NK00E,...,,,8,9,J,A-1500157 **Second GCC bottle (sch 4433) is an...,< 0.16,< 0.16,< 0.02,< 0.02
4024000,2012-06-14 14:15:00+00:00,USGS,2012-06-14,09:15,,,-500,K,USGS-WRD,WS,GR12NK00E,...,,,5,9,9,A-1710114 **Second GCC bottle (sch. 4433) is a...,< 0.16,< 0.16,< 0.02,< 0.02
4024000,2012-06-20 18:43:00+00:00,USGS,2012-06-20,13:43,,,-500,K,USGS-WRD,WS,GR12NK00E,...,,,7,9,J,A-1740217 **Second GCC bottle (sch 4433) is an...,< 0.16,< 0.16,< 0.02,< 0.02


#### Example 4: Retrieve water quality sample data for one site in serial format

Each row in the resulting table represents a single observation of a single parameters. Each sample may be analyzed for multiple parameters and so a single water quality sample can result in multiple rows in serial format.

In [9]:
siteID = '10109000'
wq_data = nwis.get_qwdata(sites=siteID, wide_format=False)
print('Retrieved data for ' + str(len(wq_data[0])) + ' sample results.')
display(wq_data[0])



Retrieved data for 2465 sample results.




Unnamed: 0_level_0,agency_cd,site_no,sample_dt,sample_tm,sample_end_dt,sample_end_tm,sample_start_time_datum_cd,tm_datum_rlbty_cd,coll_ent_cd,medium_cd,...,dqi_cd,rpt_lev_va,rpt_lev_cd,lab_std_va,prep_set_no,prep_dt,anl_set_no,anl_dt,result_lab_cm_tx,anl_ent_cd
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1967-09-13 13:35:00+00:00,USGS,10109000,1967-09-13,07:35,,,-0600,T,,WS,...,A,,,,,,,,,
1967-09-13 13:35:00+00:00,USGS,10109000,1967-09-13,07:35,,,-0600,T,,WS,...,A,,,,,,,,,
1967-09-13 13:35:00+00:00,USGS,10109000,1967-09-13,07:35,,,-0600,T,,WS,...,A,,,,,,,,,
1967-09-13 13:35:00+00:00,USGS,10109000,1967-09-13,07:35,,,-0600,T,,WS,...,A,,,,,,,,,
1967-09-13 13:35:00+00:00,USGS,10109000,1967-09-13,07:35,,,-0600,T,,WS,...,A,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
NaT,USGS,10109000,1989-10-05,,,,-0600,T,USGS-WRD,WS,...,A,,,,,,,,,
NaT,USGS,10109000,1989-10-05,,,,-0600,T,USGS-WRD,WS,...,A,,,,,,,,,
NaT,USGS,10109000,1989-10-05,,,,-0600,T,USGS-WRD,WS,...,A,,,,,,,,,
NaT,USGS,10109000,1989-10-05,,,,-0600,T,USGS-WRD,WS,...,A,,,,,,,,,
