# RW API Import

This notebook deals with the import of RW API data into [API Highways](https://apihighways.org).

In [74]:
import requests
from IPython.display import HTML, display
import tabulate
import csv
import time
import collections

In [75]:
# Generating an initial list
all_datasets = requests.get("https://api.resourcewatch.org/v1/dataset?page[size]=1000&page[number]=1&env=production,preproduction&includes=metadata&application=rw,gfw,aqueduct,prep").json()['data']

summary_info = map(lambda dataset: [
    ', '.join(dataset['attributes']['application']) if len(dataset['attributes']['application']) >= 2 else dataset['attributes']['application'][0],
    # dataset['attributes']['application'],
    dataset['attributes']['name'],
    dataset['attributes']['provider'],
    dataset['attributes']['type'],
    dataset['attributes']['connectorUrl'],
    len(dataset['attributes']['metadata']),
    dataset['id'],
], all_datasets)

display(HTML(tabulate.tabulate(summary_info, tablefmt='html')))

0,1,2,3,4,5,6
prep,Historical Precipitation -- U.S. (Puget Sound Lowlands),csv,tabular,https://raw.githubusercontent.com/fgassert/PREP-washington-observed-data/master/observed_precip.csv,1,06c44f9a-aae7-401e-874c-de13b7764959
prep,Social Vulnerability Index (2006-2010) -- U.S. (Minnesota),featureservice,tabular,https://coast.noaa.gov/arcgis/rest/services/sovi/sovi_tracts2010/MapServer/16?f=pjson,1,125b181c-653a-4a27-8a5f-e507c5a7c530
gfw,"GLAD alerts summarized by month, iso, adm1 within IDN moratorium areas",json,,http://gfw2-data.s3.amazonaws.com/alerts-tsv/output/to-api/output.json,0,01cbf8d0-bfba-46b0-b713-20f566d980a8
prep,Social Vulnerability Index (2006-2010) -- U.S. (New Hampshire),featureservice,tabular,https://coast.noaa.gov/arcgis/rest/services/sovi/sovi_tracts2010/MapServer/19?f=pjson,1,0706f039-b929-453e-b154-7392123ae99e
"aqueduct, gfw, rw, forest-atlas, data4sdgs, gfw-climate",Subnational Political Boundaries,cartodb,tabular,https://wri-01.carto.com/tables/gadm28_adm1/public,2,098b33df-6871-4e53-a5ff-b56a7d989f9a
prep,USGS Land Cover - Impervious Surface (2001) -- U.S. (Alaska),wms,raster,https://raster.nationalmap.gov/arcgis/rest/services/LandCover/USGS_EROS_LandCover_NLCD/MapServer/25?f=pjson,1,0c630feb-8146-4fcc-a9be-be5adcb731c8
prep,Coastal Energy Facilities (2012) -- U.S.,featureservice,tabular,https://coast.noaa.gov/arcgis/rest/services/MarineCadastre/OceanEnergy/MapServer/0?f=pjson,1,0f6c1472-0c16-4ffb-8313-2468d941f52c
prep,Social Vulnerability Index (2006-2010) -- U.S. (Indiana),featureservice,tabular,https://coast.noaa.gov/arcgis/rest/services/sovi/sovi_tracts2010/MapServer/10?f=pjson,1,06ef7a04-eef3-42a9-8356-f9fb927909ce
gfw,GLAD alerts summarized by important places,json,,,0,0abaf944-1bef-4a7c-90c7-b3cfc1eed563
gfw,2010 Hansen Tree Cover Extent by GADM2.8 Boundaries,json,,http://gfw2-data.s3.amazonaws.com/alerts-tsv/output/to-api/output.json,0,0ef4a861-930f-4f56-865d-89f5c0c6aef0


In [76]:
all_datasets[0]

{'attributes': {'application': ['prep'],
  'attributesPath': None,
  'blockchain': {},
  'clonedHost': {},
  'connectorType': 'document',
  'connectorUrl': 'https://raw.githubusercontent.com/fgassert/PREP-washington-observed-data/master/observed_precip.csv',
  'dataPath': None,
  'env': 'production',
  'errorMessage': None,
  'geoInfo': False,
  'layerRelevantProps': [],
  'legend': {'country': [], 'date': [], 'nested': [], 'region': []},
  'mainDateField': None,
  'metadata': [{'attributes': {'application': 'prep',
     'citation': 'Vose, R. S. et al., 2014. Improved historical temperature and precipitation time series for US climate divisions. Journal Of Applied Meteorology and Climatology, 53(5), 1232-1251.',
     'createdAt': '2016-12-13T10:02:28.337Z',
     'dataset': '06c44f9a-aae7-401e-874c-de13b7764959',
     'description': 'These data were derived from the U.S. Climate Divisional Dataset, developed by the National Centers for Environmental Information (NCEI). NCEI provides lon

In [49]:
# After inspection and manually dropping some of the datasets of the list, we'll parse those with a single metadata object.

datasets = dict()

with open('./data/single_metadata.csv', 'r') as csvfile:
    reader = csv.reader(csvfile, delimiter=',', quotechar='"')
    for row in reader:
        dataset_id = row[0]
        dataset_url = f"https://api.resourcewatch.org/v1/dataset/{dataset_id}"
        metadata_url = f"https://api.resourcewatch.org/v1/dataset/{dataset_id}/metadata"
        dataset = requests.get(dataset_url).json()['data']
        # Sleep for a little bit
        dataset['metadata'] = (requests.get(metadata_url).json())['data'][0]
        datasets[dataset_id] = dataset
        
        time.sleep(1)
        
        # datasets[dataset_id] = dataset
datasets

{'018719a6-009b-4b04-970d-08b93c8192cb': {'attributes': {'application': ['rw'],
   'attributesPath': None,
   'blockchain': {},
   'clonedHost': {},
   'connectorType': 'document',
   'connectorUrl': 'http://wri-api-backups.s3.amazonaws.com/raw/1520975652559_WorldData-Withdrawal_eng.csv',
   'dataPath': '',
   'env': 'production',
   'errorMessage': '',
   'geoInfo': False,
   'layerRelevantProps': [],
   'legend': {'country': [], 'date': [], 'nested': [], 'region': []},
   'mainDateField': None,
   'name': 'Water withdrawals by sector',
   'overwrite': False,
   'protected': False,
   'provider': 'csv',
   'published': False,
   'slug': 'Water-withdrawals-by-sector',
   'status': 'saved',
   'subtitle': '',
   'tableName': 'index_018719a6009b4b04970d08b93c8192cb_1520975655098',
   'taskId': '/v1/doc-importer/task/bdebdda7-42ce-4fa2-8100-8906f2f295ae',
   'type': 'tabular',
   'updatedAt': '2018-03-13T21:14:26.740Z',
   'userId': '57d021e329309063404573a8',
   'verified': False,
   'wi

In [77]:
# 'info' has a free schema - let's see what fields are used
all_metadata_fields = list(map(lambda dset_key: 
                        (
                            # list(datasets[dset_key]['metadata']['data'][0]['attributes'].keys()),
                            list(datasets[dset_key]['metadata']['attributes']['info'].keys())
                        ),
                        datasets))

def flatten(l):
    for el in l:
        if isinstance(el, collections.Iterable) and not isinstance(el, (str, bytes)):
            yield from flatten(el)
        else:
            yield el

unique_metadata_fields = set(flatten(all_metadata_fields))
unique_metadata_fields

{'cautions',
 'citation',
 'dataDownload',
 'data_download_link',
 'data_download_original_link',
 'data_type',
 'date_of_content',
 'description',
 'endpoint',
 'frequency_of_updates',
 'function',
 'functions',
 'geographic_coverage',
 'language',
 'learn_more_link',
 'license',
 'license_link',
 'loca',
 'name',
 'nexgddp',
 'organization',
 'organization-long',
 'published_date',
 'rwId',
 'service',
 'short-description',
 'source',
 'sources',
 'spatial_resolution',
 'subtitle',
 'technical_title',
 'title',
 'wri_rw_id'}

In [99]:
from functools import reduce
def get_field(dataset, fields):
    try:
        result = reduce(lambda c, k: c.get(k, None), fields, dataset)
    except:
        result = None
    return result

def gen_main_url(apps, dataset):
    # print(dataset)
    app = apps[0]
    dataset_id = dataset['id']
    try:
        main_url = {
            'prep': f'https://www.prepdata.org/dataset/{dataset_id}',
            'gfw': f'https://production-api.globalforestwatch.org/v1/dataset/{dataset_id}',
            'rw': f'https://api.resourcewatch.org/v1/dataset/{dataset_id}',
            'aqueduct': f'http://www.wri.org/applications/maps/aqueduct-atlas/' 
        }[app]
    except:
        return ''
    
    return main_url

def get_fields(dataset):
    apps = dataset['attributes']['application']
    #print(apps)
    out = {
        "dset_id": dataset['id'],
        "dset_name": dataset['attributes']['name'],
        #"dset_desc": dataset['attributes']['description'],
        "dset_provider": dataset['attributes']['provider'],
        "dset_type": dataset['attributes']['type'],
        "dset_published": dataset['attributes']['published'],
        "dset_connector": dataset['attributes']['connectorType'],
        "dset_curl": dataset['attributes']['connectorUrl'],
        "dset_env": dataset['attributes']['env'],
        "dset_last_update": dataset['attributes']['updatedAt'],
        "dset_apps": ', '.join(dataset['attributes']['application']) if len(dataset['attributes']['application']) >= 2 else dataset['attributes']['application'][0],
        "metadata_source": get_field(dataset, ['metadata', 'attributes', 'source']),
        "metadata_desc": get_field(dataset, ['metadata', 'attributes', 'description']),
        "platform_url": gen_main_url(apps, dataset)
    }
    # applications = dataset['attributes']['application'] if len(dataset['attributes']['application']) >= 2 else dataset['attributes']['application']
    return out

final_res = [get_fields(datasets.get(dataset)) for dataset in datasets]
display(HTML(tabulate.tabulate(final_res, tablefmt='html')))

0,1,2,3,4,5,6,7,8,9,10,11,12
b7bf012f-4b8b-4478-b5c9-6af3075ca1e4,All Worldwide Crops,cartodb,,True,rest,https://wri-01.carto.com/tables/crops,production,2017-02-07T17:56:43.581Z,aqueduct,MapSPAM 2005,"These crops were selected based on their importance in the global commodities market and for food security. 'All crops' represent all of the crops that are included in the tool as displayed in the menu. Pixels are shaded if they contain at least 10 hectares of crop area within the 10x10 km pixel. If there are multiple crops meeting this criteria per pixel, the predominant crop (based on production) is displayed. If a single crop is selected, and the pixel colors are shaded by level of production. The crop layers displayed on the map reflect 2005 data regardless of the timeframe selected.",http://www.wri.org/applications/maps/aqueduct-atlas/
8ab33548-78f5-4276-a5d7-d9a4143698db,Change in net trade over time,cartodb,,True,rest,https://wri-rw.carto.com/tables/combined01_prepared/public,production,2017-08-21T14:10:50.152Z,aqueduct,,This figure displays the change in the selected country's net trade of the selected crop over time.,http://www.wri.org/applications/maps/aqueduct-atlas/
3174a5df-4892-44ca-b780-4c91373a32d1,Change in net trade over time for top 5 crops in highest demand,cartodb,,True,rest,https://wri-rw.carto.com/tables/combined01_prepared/public,production,2017-08-14T16:36:27.875Z,aqueduct,,This figure displays the change over time in net trade for the 5 crops in highest demand in the country selected.,http://www.wri.org/applications/maps/aqueduct-atlas/
9c450642-f976-40eb-96b4-0c904d519578,Drought Severity Soil Moisture,cartodb,,True,rest,https://wri-rw.carto.com/api/v2/sql?q=SELECT * FROM water_risk_indicators_v3,production,2017-08-02T11:06:21.665Z,aqueduct,WRI Aqueduct 2018 (forthcoming),Estimates the average magnitude of droughts based on the severity and frequency of periods of time during which soil moisture remains low. Baseline values are generated using hydrological modeling of long-term trends from 1960 to 2014.,http://www.wri.org/applications/maps/aqueduct-atlas/
e38d74e2-dabc-44b0-b9d8-d2da677ea7f2,Global dataset for one crop,cartodb,,True,rest,https://wri-rw.carto.com/tables/combined01_prepared/public,production,2017-08-11T10:11:25.102Z,aqueduct,,This figure displays the top 5 net exporting and net importing countries for the selected crop and timeframe.,http://www.wri.org/applications/maps/aqueduct-atlas/
5be16fea-5b1a-4daf-a9e9-9dc1f6ea6d4e,Global water stress,cartodb,tabular,True,rest,https://wri-01.carto.com/tables/crops/public,production,2017-10-20T09:09:18.067Z,aqueduct,,"This figure displays the percent of all crop area facing each level of water stress (vertical axis) and the volume of the demand (width of bars) for maize, rice, soybean, and wheat for the year selected.",http://www.wri.org/applications/maps/aqueduct-atlas/
cbd0d0f8-edf9-47bc-93ef-71c1a5e5fed7,Groundwater decline trend,cartodb,,True,rest,https://wri-rw.carto.com/api/v2/sql?q=SELECT * FROM water_risk_indicators_v3,production,2017-08-02T07:59:47.977Z,aqueduct,WRI Aqueduct 2018 (forthcoming),Measures trends in the decline of the groundwater table. The slope of the decline correlates to the severity of the trend. Baseline values are generated using hydrological modeling from 1990 to 2014.,http://www.wri.org/applications/maps/aqueduct-atlas/
b53ef8c5-8554-4a4a-a100-fdf5a022cb4e,Groundwater stress,cartodb,,True,rest,https://wri-rw.carto.com/api/v2/sql?q=SELECT * FROM water_risk_indicators_v3,production,2017-08-01T09:55:31.081Z,aqueduct,WRI Aqueduct 2018 (forthcoming),Measures the relative ratio of groundwater withdrawal to recharge rate. Values above one indicate that groundwater is being depleted faster than it is being restored. Unsustainable groundwater consumption could affect groundwater availability and groundwater-dependent ecosystems. Baseline values are generated using hydrological modeling of long-term trends from 1990 to 2014.,http://www.wri.org/applications/maps/aqueduct-atlas/
bf657e60-de9c-4b7e-8736-d573d38e3ce1,Inter Annual Variability,cartodb,,True,rest,https://wri-rw.carto.com/api/v2/sql?q=SELECT * FROM water_risk_indicators_v3,production,2017-08-02T09:03:35.912Z,aqueduct,WRI Aqueduct 2018 (forthcoming),Measures the variability in water supply from year to year. It is an indicator of the unpredictability of supply. Baseline values are generated using hydrological modeling of long-term trends from 1960 to 2014.,http://www.wri.org/applications/maps/aqueduct-atlas/
345cfef3-ee8a-46bc-9bb9-164c406dfd2c,Interanual variability,cartodb,,True,rest,https://wri-01.carto.com/tables/aqueduct_projections_20150309/public,production,2016-12-14T16:34:36.076Z,aqueduct,,Interannual variability measures the variability in water supply from year to year.,http://www.wri.org/applications/maps/aqueduct-atlas/


In [96]:
gen_main_url(['rw'], datasets['09b01386-7614-47fb-ac9f-c9cccf86bc9d'])

'https://api.resourcewatch.org/v1/dataset/{id}'