# Data Download 
The purpose of this notebook is to troubleshoot the downloading of GBIF and city parks datasets in a programmatic way. 

Accessing GBIF occurrence records for City of Vancouver, requires providing the regional boundary limits. A facet grid boundary was obtained from the [Vancouver Open Data Portal](https://opendata.vancouver.ca/explore/dataset/facet-grid-boundaries/information/?location=11,49.24271,-123.16189) includes the city of Vancouver as well as UBC and the Endownment Lands which is techinically outside the jurisdiction of City of Vancouver. 

An attempt was made to download GBIF data using the python wrapper for the GBIF API; the city grid boundary was supposed to be used to obtain GBIF data only within that boundary. However, the data request ended up querying data from all across Canada for the past 11 years. Once the GBIF data was downloaded it was then clipped to the city boundary. 

The parks dataset was obtained through the [BC Open Data Catalogue](https://catalogue.data.gov.bc.ca/dataset/local-and-regional-greenspaces) and included regional and city parks for the entire province of British Columbia. This dataset was chosen instead of the city parks dataset offered through City of Vancouver Open Data so that regional parks like `Pacific Spirit Park` could be included in the analysis. The BC parks dataset was then filtered to include only parks contained within the municipality of Vancouver. 

# import libraries 


In [16]:
from pygbif import occurrences as occ
import geopandas as gpd
from shapely.geometry.multipolygon import MultiPolygon
import shapely.geometry as geom
import shapely.wkt
import pandas as pd
from zipfile import ZipFile
from keplergl import KeplerGl

# load the city boundary 

In [17]:
url = 'https://opendata.vancouver.ca/explore/dataset/facet-grid-boundaries/download/?format=shp&timezone=America/Los_Angeles&lang=en'
city_boundary = gpd.read_file(url)

In [18]:
city_boundary.head()

Unnamed: 0,facet_text,geometry
0,X04,"POLYGON ((-123.01232 49.30354, -123.01232 49.2..."
1,X05,"POLYGON ((-123.01232 49.29905, -123.01232 49.2..."
2,X07,"POLYGON ((-123.02332 49.29005, -123.01232 49.2..."
3,W15,"POLYGON ((-123.02331 49.25407, -123.02330 49.2..."
4,X18,"POLYGON ((-123.01231 49.24058, -123.01231 49.2..."


In [25]:
poly_wkt = [str(poly_str) for poly_str in city_boundary.geometry]
poly_list = [shapely.wkt.loads(poly) for poly in poly_wkt]
city_boundary_multipolygon = MultiPolygon(poly_list)
geometry = str(city_boundary_multipolygon)

# download GBIF data 

In [72]:
occ.download(['geometry =  geometry',
             'hasCoordinate = True',
             'year = 2019',
             'user = GBIF_USER', 
             'pwd = GBIF_PWD', 
             'email = GBIF_EMAIL'])

Exception: error: Instantiation of [simple type, class org.gbif.api.model.occurrence.predicate.EqualsPredicate] value failed: <value> may not be empty (through reference chain: org.gbif.api.model.occurrence.predicate.EqualsPredicate["value"]), with error status code 400check your number of active downloads.

In [73]:
occ.download_list(user = GBIF_USER, pwd = GBIF_PWD)

{'meta': {'offset': 0, 'limit': 20, 'endofrecords': True, 'count': 1},
 'results': [{'key': '0019909-190415153152247',
   'doi': '10.15468/dl.txhrd1',
   'license': 'http://creativecommons.org/licenses/by-nc/4.0/legalcode',
   'request': {'predicate': {'type': 'and',
     'predicates': [{'type': 'equals', 'key': 'COUNTRY', 'value': 'CA'},
      {'type': 'and',
       'predicates': [{'type': 'greaterThanOrEquals',
         'key': 'YEAR',
         'value': '2009'},
        {'type': 'lessThanOrEquals', 'key': 'YEAR', 'value': '2019'}]},
      {'type': 'or',
       'predicates': [{'type': 'within',
         'geometry': 'POLYGON((-180 -90,180 -90,180 90,-180 90,-180 -90))'},
        {'type': 'within',
         'geometry': 'POLYGON((-123.39727 49.02333,-122.53448 49.02333,-122.53448 49.4198,-123.39727 49.4198,-123.39727 49.02333))'}]},
      {'type': 'equals',
       'key': 'DATASET_KEY',
       'value': '50c9509d-22c7-4a22-a47d-8c48425ef4a7'},
      {'type': 'equals', 'key': 'HAS_COORDINATE

In [75]:
occ.download_get(key = '0019909-190415153152247', 
                 path = '/Users/lesley/data_science_portfolio/vancouver_park_biodiversity/data/')

Download file size: 56878354 bytes
On disk at /Users/lesley/data_science_portfolio/vancouver_park_biodiversity/data//0019909-190415153152247.zip


{'path': '/Users/lesley/data_science_portfolio/vancouver_park_biodiversity/data//0019909-190415153152247.zip',
 'size': 56878354,
 'key': '0019909-190415153152247'}

In [81]:
# extract the data 
zip = ZipFile('/Users/lesley/data_science_portfolio/vancouver_park_biodiversity/data/0019909-190415153152247.zip')
zip.extractall(path='data/')

# load in city parks data

In [12]:
# extract the data 
zip = ZipFile('/Users/lesley/data_science_portfolio/vancouver_park_biodiversity/data/local_and_regional_greenspaces.zip')
zip.extractall(path='data/')

In [13]:
# load the data 
parks = gpd.read_file('data/GBA_LOCAL_REG_GREENSPACES_SP.geojson')

In [30]:
parks.head(3)

Unnamed: 0,LOCAL_REG_GREENSPACE_ID,PARK_NAME,PARK_TYPE,PARK_PRIMARY_USE,REGIONAL_DISTRICT,MUNICIPALITY,CIVIC_NUMBER,CIVIC_NUMBER_SUFFIX,STREET_NAME,LATITUDE,...,WEBSITE_URL,LICENCE_COMMENTS,FEATURE_AREA_SQM,FEATURE_LENGTH_M,OBJECTID,SE_ANNO_CAD_DATA,SHAPE.AREA,SHAPE.LEN,fme_feature_type,geometry
0,17,55D - Greenbelt,Local,Green Space,Metro Vancouver,Surrey,16152.0,,76A Ave,49.141291,...,http://www.surrey.ca/culture-recreation/2015.aspx,Contains information licensed under the Open G...,1935.4529,198.8813,1558482,,0,0,WHSE_BASEMAPPING.GBA_LOCAL_REG_GREENSPACES_SP,"POLYGON ((-122.77514 49.14156, -122.77504 49.1..."
1,18,55E - Greenbelt,Local,Green Space,Metro Vancouver,Surrey,16322.0,,77 Ave,49.141708,...,http://www.surrey.ca/culture-recreation/2015.aspx,Contains information licensed under the Open G...,32681.9914,1081.1908,1558483,,0,0,WHSE_BASEMAPPING.GBA_LOCAL_REG_GREENSPACES_SP,"POLYGON ((-122.76772 49.14230, -122.76780 49.1..."
2,19,55F - Greenbelt,Local,Green Space,Metro Vancouver,Surrey,7634.0,,164 St,49.141307,...,http://www.surrey.ca/culture-recreation/2015.aspx,Contains information licensed under the Open G...,41307.6204,1215.1427,1558484,,0,0,WHSE_BASEMAPPING.GBA_LOCAL_REG_GREENSPACES_SP,"POLYGON ((-122.76391 49.14150, -122.76434 49.1..."


In [46]:
# filter parks data to the municipality of Vancouver
vancouver_parks = parks.query('MUNICIPALITY == "Vancouver"')

In [48]:
# save the vancouver parks as new shp file 
vancouver_parks.to_file('data/vancouver_parks.shp')