<h1 align = "center">Aqueduct Floods Hazard Maps</h1>

---

Data is available at [this link](http://wri-projects.s3.amazonaws.com/AqueductFloodTool/download/v2/index.html) - under anchor tabs `<a>...</a>`. Now, from the problem statement and URL - each data is available as a `pickle` format or as a `tif` file. Now, considering `python`, I've loaded the data which are pickled. This is done using **web-scrapping** of the given URL, and fetching all the attributes. If the URL is of pickled data, then data is parsed.

Steps I've followed:
* Fetch the URLs from the URL, and hyper-link attributes are fetched using **`utilities.getContent()`** - a user defined function created for this purpose! (check file for more information)
* Once all the links is available, a sample data is printed into the notebook, and I can conclude the following:
  - data is of type `dict`
  - nested dictionary is present
* So, looking at the data, I have used two for-loops - one if there is a nested dictionary, to append the data as a list. Given this, the `dictionary` can either be converted into a `pd.DataFrame` or, necessary data can be first seperated, and then converted.

In [48]:
import json
import pickle
import urllib.request
from tqdm import tqdm as TQ
from utilities import getContent # custom file, please keep the file in PYTHON_ENV or same directory

In [2]:
url = "http://wri-projects.s3.amazonaws.com/AqueductFloodTool/download/v2/index.html"
dataLinks = getContent(url)

A sample data - to understand how the data looks!

In [13]:
sample = pickle.load(urllib.request.urlopen("http://wri-projects.s3.amazonaws.com/AqueductFloodTool/download/v2/inuncoast_historical_nosub_hist_rp0001_5.pickle"))
sample

{'properties_from_filename': {'subsidence': 'nosub',
  'returnperiod': 'rp0001',
  'year': 'hist',
  'climate': 'historical',
  'returnperiod_decimal': '5',
  'floodtype': 'inuncoast'},
 'root': '/volumes/data/Y2018M08D08_RH_S3_EC2_V01/output_V02/coastal/inun_subsidence_v2_95/HIST',
 'variable_attributes': {'inun_units': 'm',
  'inun_comment': 'water_surface_reference_datum_altitude is given in file /p/1209884-aqueduct/Datasets/MERIT_1km/MERIT_glob_1km_waterp50Mask.tif',
  'inun__FillValue': -9999.0,
  'inun_long_name': 'Coastal flooding',
  'inun_coordinates': 'lat lon',
  'inun_standard_name': 'water_surface_height_above_reference_datum'},
 'global_attributes': {'title': 'Aqueduct Coastal hazard layer',
  'references': 'http://floods.wri.org/',
  'project': 'Aqueduct Global Flood Analyzer',
  'config_file': '/p/1209884-aqueduct/Coastal_inun/settings_h6/coastal_inun.ini',
  'Conventions': 'CF-1.6',
  'history': 'Created by: $Id: coastal_inun.py 219 2017-11-13 11:12:22Z eilan_dk $,\n  

In [29]:
print(json.dumps(sample, default = str, indent = 4)) ## pretty print using json

{
    "properties_from_filename": {
        "subsidence": "nosub",
        "returnperiod": "rp0001",
        "year": "hist",
        "climate": "historical",
        "returnperiod_decimal": "5",
        "floodtype": "inuncoast"
    },
    "root": "/volumes/data/Y2018M08D08_RH_S3_EC2_V01/output_V02/coastal/inun_subsidence_v2_95/HIST",
    "variable_attributes": {
        "inun_units": "m",
        "inun_comment": "water_surface_reference_datum_altitude is given in file /p/1209884-aqueduct/Datasets/MERIT_1km/MERIT_glob_1km_waterp50Mask.tif",
        "inun__FillValue": "-9999.0",
        "inun_long_name": "Coastal flooding",
        "inun_coordinates": "lat lon",
        "inun_standard_name": "water_surface_height_above_reference_datum"
    },
    "global_attributes": {
        "title": "Aqueduct Coastal hazard layer",
        "references": "http://floods.wri.org/",
        "project": "Aqueduct Global Flood Analyzer",
        "config_file": "/p/1209884-aqueduct/Coastal_inun/settings_h6/co

In [24]:
sample.keys() # keys, and their type can be seen from previous cell

dict_keys(['properties_from_filename', 'root', 'variable_attributes', 'global_attributes', 'filename'])

In [38]:
for link in dataLinks[:5]: # also I have printed out the type, to check if the same type prevails
    if "pickle" in link["href"]:
        curLink = link["href"]
        _json = pickle.load(urllib.request.urlopen(curLink))
        
        for key, value in _json.items():
            print(key, type(value))

properties_from_filename <class 'dict'>
root <class 'str'>
variable_attributes <class 'dict'>
global_attributes <class 'dict'>
filename <class 'str'>
properties_from_filename <class 'dict'>
root <class 'str'>
variable_attributes <class 'dict'>
global_attributes <class 'dict'>
filename <class 'str'>


In [47]:
len(dataLinks) # pretty long data

1380

In [49]:
data = dict()
for link in TQ(dataLinks):
    if "pickle" in link["href"]:
        curLink = link["href"]
        _json = pickle.load(urllib.request.urlopen(curLink))
        
        for key, value in _json.items():
            if key not in data.keys():
                if type(value) == dict:
                    data[key] = dict()
                else: # consider type str
                    data[key] = []
            
            # when the key is created - append new data
            if type(value) == str:
                data[key].append(value)
            if type(value) == dict:
                for k2, v2 in value.items():
                    if k2 not in data[key].keys():
                        data[key][k2] = []
                        
                    data[key][k2].append(v2)

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1380/1380 [08:46<00:00,  2.62it/s]


Code is working fine with `dataLinks[:5]` as seen below. So I have processed the whole data, and used `tqdm` to find the execution time - which is about **8:46** minutes.

In [46]:
# data

{'properties_from_filename': {'subsidence': ['nosub', 'nosub'],
  'returnperiod': ['rp0001', 'rp0002'],
  'year': ['hist', 'hist'],
  'climate': ['historical', 'historical'],
  'returnperiod_decimal': ['5', '0'],
  'floodtype': ['inuncoast', 'inuncoast']},
 'root': ['/volumes/data/Y2018M08D08_RH_S3_EC2_V01/output_V02/coastal/inun_subsidence_v2_95/HIST',
  '/volumes/data/Y2018M08D08_RH_S3_EC2_V01/output_V02/coastal/inun_subsidence_v2_95/HIST'],
 'variable_attributes': {'inun_units': ['m', 'm'],
  'inun_comment': ['water_surface_reference_datum_altitude is given in file /p/1209884-aqueduct/Datasets/MERIT_1km/MERIT_glob_1km_waterp50Mask.tif',
   'water_surface_reference_datum_altitude is given in file /p/1209884-aqueduct/Datasets/MERIT_1km/MERIT_glob_1km_waterp50Mask.tif'],
  'inun__FillValue': [-9999.0, -9999.0],
  'inun_long_name': ['Coastal flooding', 'Coastal flooding'],
  'inun_coordinates': ['lat lon', 'lat lon'],
  'inun_standard_name': ['water_surface_height_above_reference_datum'