 # Table of Contents
<div class="toc" style="margin-top: 1em;"><ul class="toc-item" id="toc-level0"><li><span><a href="http://localhost:8888/notebooks/work/bisdev/bcb-dm/Protected-Areas.ipynb#Introduction" data-toc-modified-id="Introduction-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Introduction</a></span></li><li><span><a href="http://localhost:8888/notebooks/work/bisdev/bcb-dm/Protected-Areas.ipynb#Setup-the-Working-Environment" data-toc-modified-id="Setup-the-Working-Environment-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Setup the Working Environment</a></span></li><li><span><a href="http://localhost:8888/notebooks/work/bisdev/bcb-dm/Protected-Areas.ipynb#Summary-Information" data-toc-modified-id="Summary-Information-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Summary Information</a></span></li><li><span><a href="http://localhost:8888/notebooks/work/bisdev/bcb-dm/Protected-Areas.ipynb#PAD-US-Archive-Management" data-toc-modified-id="PAD-US-Archive-Management-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>PAD-US Archive Management</a></span></li><li><span><a href="http://localhost:8888/notebooks/work/bisdev/bcb-dm/Protected-Areas.ipynb#Data-sources" data-toc-modified-id="Data-sources-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Data sources</a></span><ul class="toc-item"><li><span><a href="http://localhost:8888/notebooks/work/bisdev/bcb-dm/Protected-Areas.ipynb#ArcGIS-Services" data-toc-modified-id="ArcGIS-Services-5.1"><span class="toc-item-num">5.1&nbsp;&nbsp;</span>ArcGIS Services</a></span></li><li><span><a href="http://localhost:8888/notebooks/work/bisdev/bcb-dm/Protected-Areas.ipynb#ScienceBase" data-toc-modified-id="ScienceBase-5.2"><span class="toc-item-num">5.2&nbsp;&nbsp;</span>ScienceBase</a></span></li></ul></li></ul></div>

# Introduction
The Biogeographic Characterization Branch of CSASL partners with many organizations to produce the official inventory of protected open space in the United States, the Protected Areas Database of the United States (PAD-US). Working with federal, state, local, national and nongovernmental organizations, the PAD-US group assembles, checks and produces integrated information that describes public open space and other protected areas and delineates their boundaries. The resulting national inventory is an key resource for informing decisions about conservation, recreation or land use planning at different scales and across administrative boundaries.  

This notebook provides an overview of that inventory's assets and foundational data management practices. It also provides an entry point for exploring the interplay of managed lands, national conservation policy and resource management decisions. A goal of this notebook is to provide easy access to BCB data assets associated with protected areas and managed lands in general, documenting programmatic ways to quickly inventory and explore those assets, and provide code examples for working with and analyzing these data.

# Setup the Working Environment
This section of the notebook is a convenient place to setup the various python modules we plan to use. It is also a good place to setup (and document) ScienceBase and GC2 base URLs as well as the ScienceBase item for PAD-US v1.4 that we'll use and reuse throughout our notebook. 

In [1]:
import requests
import pysb
import datetime
from IPython.display import display
from IPython.display import HTML

In [2]:
# To access private items in ScienceBase, we need to establish a connection using the pysb package. You will see summary output in this notebook from previous runs of the code when it is posted to GitHub, but we take care not to display particularly sensitive information from the in review items.

sb = pysb.SbSession()
username = input("Username: ")
sb.loginc(str(username))

Username: saulenbach@usgs.gov
········


<pysb.SbSession.SbSession at 0x106a3ba58>

In [3]:
# Set up some parameters for this notebook
_gc2BaseURL = "https://gc2.datadistillery.org/api/v1/sql/bcb"
_sbCatalogBaseURL = "https://www.sciencebase.gov/catalog/item/"

# ScienceBase item for PAD-US (currently v1.4)
_padusCollectionItem = "56bba648e4b08d617f657960"

# Summary Information
Before we start digging deeply into our online Protected Areas data assets let's pull down the basic summary information we provide to our users; let's see what they see. Generally, we tend to store this type of information in ScienceBase and use it to populate our online tools such as the PAD-US Map Viewer (https://maps.usgs.gov/padus/) and the National Biogeographic Map (https://maps.usgs.gov/biogeography/). We'll use the PAD-US ScienceBase Collection item we setup above to customize a brief, summary report.

In [4]:
# Get the PAD-US (v1.4) ScienceBase collection item for summary
padusCollection = sb.get_item(_padusCollectionItem,{'fields':'title,body,purpose,contacts'})

# Display a few summary fields for the collection in a lightweight report format
_shortReport = "<h3>"+padusCollection["title"]+"</h4>"
_shortReport = _shortReport+"<h4>Abstract</h4><p>"+padusCollection["body"]+"</p>"
_shortReport = _shortReport+"<h4>Purpose</h4><p>"+padusCollection["purpose"]+"</p>"
_shortReport = _shortReport+"<h4>Contacts</h4>"
for contact in padusCollection["contacts"]:
    _shortReport = _shortReport+"<div>"
    _shortReport = _shortReport+contact["name"]+" ("+contact["type"]+")"
    _shortReport = _shortReport+"</div>"
    
HTML(_shortReport)

# PAD-US Archive Management

PAD-US has a number of older versions in play somewhere on our file systems. One of the things we need to do some work on is how we manage older versions of the data, what we do with older files, and how we portray the archive online. One of the areas we are working to clean up and refine a bit is the set of quite a number of files sitting in the AmazonS3 storage bucket (now referred to usgs-gap-data in the USGS CHS cloud). There are a bunch of what looks like PAD-US files in that bucket (see below). We need to work on:
* What we need to keep
* What we should do with the stuff we need to keep
* How the stuff we keep should be better documented so that we and others know what's there and why we kept it
The following code block looks at a local text file dump of the directory listing on the usgs-gap-data S3 bucket provided by Ivan Fetch.

In [5]:
import mmap
import re

pattern = re.compile(rb'(\.\W+)?([^.]?PADUS[^.]*?\.)')

with open("usgs-gap-data-ls.txt", "r") as gapfiles:
    with mmap.mmap(gapfiles.fileno(), 0, access=mmap.ACCESS_READ) as m:
        for match in pattern.findall(m):
            print(match[1].replace(b'\n', b' '))

b' PADUS/ByLCC/PAD-US_LCC_00.'
b' PADUS/ByLCC/PAD-US_LCC_01.'
b' PADUS/ByLCC/PAD-US_LCC_02.'
b' PADUS/ByLCC/PAD-US_LCC_03.'
b' PADUS/ByLCC/PAD-US_LCC_04.'
b' PADUS/ByLCC/PAD-US_LCC_05.'
b' PADUS/ByLCC/PAD-US_LCC_06.'
b' PADUS/ByLCC/PAD-US_LCC_07.'
b' PADUS/ByLCC/PAD-US_LCC_08.'
b' PADUS/ByLCC/PAD-US_LCC_09.'
b' PADUS/ByLCC/PAD-US_LCC_10.'
b' PADUS/ByLCC/PAD-US_LCC_11.'
b' PADUS/ByLCC/PAD-US_LCC_12.'
b' PADUS/ByLCC/PAD-US_LCC_13.'
b' PADUS/ByLCC/PAD-US_LCC_14.'
b' PADUS/ByLCC/PAD-US_LCC_15.'
b' PADUS/ByLCC/PAD-US_LCC_16.'
b' PADUS/ByLCC/PAD-US_LCC_17.'
b' PADUS/ByLCC/PAD-US_LCC_19.'
b' PADUS/ByLCC/PAD-US_LCC_20.'
b' PADUS/ByLCC/PAD-US_LCC_21.'
b' PADUS/ByLCC/PADUS1_2LCC/PAD-US_LCC_00.'
b' PADUS/ByLCC/PADUS1_2LCC/PAD-US_LCC_01.'
b' PADUS/ByLCC/PADUS1_2LCC/PAD-US_LCC_02.'
b' PADUS/ByLCC/PADUS1_2LCC/PAD-US_LCC_03.'
b' PADUS/ByLCC/PADUS1_2LCC/PAD-US_LCC_04.'
b' PADUS/ByLCC/PADUS1_2LCC/PAD-US_LCC_05.'
b' PADUS/ByLCC/PADUS1_2LCC/PAD-US_LCC_06.'
b' PADUS/ByLCC/PADUS1_2LCC/PAD-US_LCC_07.'
b' PA

# Data sources
One of the aspects of PAD-US data management we need to focus on for a number of reasons is the area of source data documentation. Right now, we don't have a lot of this visible, and it is mostly clues within the final database that we have to go by. We should work on cataloging the actual sources in a ScienceBase collection, similar to what we are doing for other products, where we can record details that are currently only available to backend data managers offline.

## ArcGIS Services
The following codeblock hits one of the PAD-US ArcGIS services with a query for distinct values of GIS_Src. The first 1000 values returned demonstrates how this information is insufficient for real utility to downstream users.

In [6]:
padGISSources = requests.get("https://gis1.usgs.gov/arcgis/rest/services/PADUS1_4/Protected_Areas_by_Manager/MapServer/0/query?where=0%3D0&outFields=GIS_Src&returnGeometry=false&returnTrueCurves=false&returnIdsOnly=false&returnCountOnly=false&returnZ=false&returnM=false&returnDistinctValues=true&f=pjson").json()

for feature in padGISSources["features"]:
    print (feature["attributes"]["GIS_Src"])

Sumter_Co_Parcels.shp
Ducks Unlimited - digitized by eye from NLT map on website
The Nature Conservancy -Texas Chapter
COMBINATION OF DIGITAL BOUNDARIES PROVIDED BY ST. JOHNS RIVER WATER MANAGEMENT DISTRICT. 05/2012, A*
SCANNED TAX MAP
Chesapeake Bay Foundation
DEED + SURVEY
New York State Office of Parks Recreation and Historic Preservation
TPWD_LWRCRP2012.shp/Harris, County of
Town of York Digital Parcel Data
Carteret_Parcels2011.shp
PPL
ScenicGalvestonINC_parcels.shp/GalvestonCAD
Broomfield County Open Space and Trails
New Jersey Department of Environmental Protection
lee_parcels_2013_03_05.shp
Pima County D.O.T. Technical Services
Neil Jordan - The Nature Conservancy SC Chapter
8.4, 7.12
Unita_County_Bear_River_Park_2012.gdb
CHAGRIN VALLEY ENGINEERING
West Virginia Agricultural Land Protection Authority (Matt Monroe)
GPS, DXF
Survey, York Parcel Data, Color DOQ
DIGITAL BOUNDARIES PROVIDED BY SOUTH FLORIDA WATER MANAGEMENT DISTRICT. 04/2008 AND 07/2008
New Hampshire Water Supply Lan

"Agg_Src" is another attribute that contains potentially useful source information, and the following code block pulls some unique values from that one to look over. This also seems to be less than fully informative and not very standardized. We should look at replacing all of this with a URI pointer to ScienceBase Items that document source material in standardized ways, from which we can pull out individual attributes to include in the data in addition to providing human and software links back to more details.

In [7]:
padAggregatorSources = requests.get("https://gis1.usgs.gov/arcgis/rest/services/PADUS1_4/Protected_Areas_by_Manager/MapServer/0/query?where=0%3D0&outFields=Agg_Src&returnGeometry=false&returnTrueCurves=false&returnIdsOnly=false&returnCountOnly=false&returnZ=false&returnM=false&returnDistinctValues=true&f=pjson").json()

for feature in padAggregatorSources["features"]:
    print (feature["attributes"]["Agg_Src"])

PADUS_State_Parks_and_Historic_Sites_2012.gdb
GAP_PADUS1_4Fee_USFS_ALP_S_USA.BasicOwnership.gdb/BasicOwnership
GAP_PADUS1_4Designation_FWSSpecialDesignation_preprocess
GAP_PADUS1_4Designation_USFS_ALP_S_USA.WildScenicRiver
Alabama Department of Conservation and Natural Resources (ADCNR)_WMAOutlines2013.shp
GAP_PADUS1_4Easements_NRCS_easement_a_extract
The Trust for Public Land
PADUS_Campbell_County_2012.gdb
NPS_Lands_nps_tracts.shp
PADUS_City_of_Sheridan_2012.gdb
AGRC_SGID10_Archive.CADASTRE.PADUS_Submission2012.sde
PADUS_WGFD_2012.gdb
CALMIT
GAP_PADUS1_4Designation_Proclamation_NPS_Boundary.shp
PADUS_City_of_Casper_2012.gdb
TPL_Conservation_Almanac_State_Template/Conservation_Almanac_Database_US_Nov2011.gdb
USGS_Pacific_Protected_Areas_Database
UGA_NARSAL_GAConservationLands2012.gdb
GAP_PADUS1_4Designation_BLM_NOC_WSR
MFC_SchoolTrustLands2012.shp
Alabama Department of Conservation and Natural Resources (ADCNR)_PublicFishingLakes2013.shp
Missouri Resource Assessment Partnership (MoRAP)

## ScienceBase
We also store many PAD-US assets in ScienceBase. This codeblock uses the PAD-US ScienceBase Collection item from above to query ScienceBase and return a count of the number of child items associated with that item. It also gets a list of PAD-US items and produces a custom summary report for our use.

In [9]:
# Report on the current number of PAD-US items in ScienceBase
padusIDs = sb.get_child_ids(_padusCollectionItem)
_shortReport = "The number of PAD-US v1.4 items in the ScienceBase collection is currently <strong>"+str(len(padusIDs))+"</strong></div>"

# Get a list of PAD-US items with a search and produce a summary
for padusID in padusIDs:
    padusItem = sb.get_item(padusID)
    # Display a few summary fields for the collection
    pid = _sbCatalogBaseURL+padusID
    _shortReport = _shortReport+"<li>"+padusItem["title"]+" ("+str(padusID)+")"+"</li>"
    _
HTML(_shortReport)