# CSV to GeoJSON with Indicator Attributes

This Jupyter Notebook will walk you through the process of converting a CSV file (exported and cleaned or otherwise transformed from your DevInfo database) into the GeoJSON with full attributes and geometry.

To export the resulting GeoJSON files to a File Geodatabase, follow the instructions in the **Export GeoJSON to File Geodatabase** notebook.

### **Important**

You need to have previously run the **Export DevInfo Access Database Shapefiles to GeoJSON** notebook **before** running this notebook as it will need a folder location of the individual shapefiles. 

## Import the needed python libraries
install the GDAL pythyon library (ogr import below) by opening the anaconda prompt and using `conda install gdal`

In [None]:
import os
import sys
import json
import csv
from osgeo import ogr
from IPython.display import display

### Set your working directory
example: *C:\users\me\working_directory\my_country*

In [None]:
output_base = r'C:\Users\adam6475\devinfo\tanzania'

### Other defaults for outputs

Location of geojson filescreated by running the **Export DevInfo Access Database Shapefiles to GeoJSON** jupyter notebook, see *Important* note above


In [None]:
in_geojson_folder = r'C:\Users\adam6475\devinfo\tanzania\geojson'

Where do you want to write the output geojson files?

In [None]:
output_geojson_folder = r'C:\Users\adam6475\devinfo\tanzania\geojson_full'

## Check for the `Esri Shapefile` driver
The DevInfo Access database stores the geometry as a Shapefile. We can use this driver to read that into our script

In [None]:
shp_driver_lbl = 'Esri Shapefile'
shp_driver = ogr.GetDriverByName(shp_driver_lbl)
if shp_driver is None:
    print ('{} driver not available.'.format(shp_driver_lbl))
else:
    print ('{} driver IS available.'.format(shp_driver_lbl))

## Specify a CSV to use as your input.

This can be the output of the **Export All Data to Single CSV** notebook.

### Get the path to your CSV file

In [None]:
csv_file_path = r'C:\Users\adam6475\devinfo\tanzania\tz_out_new.csv'

### Parse the CSV file and store the rows

In [None]:
# set encoding 
# when reading some CSV files, there may be encoding issues that result in unicode characters appearing in the field names
# if you have any unexpected behavior that is related to this, either re-save your file with "utf-8" encoding
# or try using 'utf-8-sig' as your encoding value below
# Stack Overflow reference: https://stackoverflow.com/questions/17912307/u-ufeff-in-python-string/17912811#17912811
encoding = 'utf-8'
with open(csv_file_path, encoding=encoding) as file:
    reader = csv.DictReader(file)
    rows = list(reader)
#     reader = csv.reader(file)
#     rows = list(reader)

print ('done reading rows from CSV.')

# print the first row of data to validate that the header rows were successfully parsed w/o encoding issues
print (rows[0])

## Chunk up the data
Chunk up the data by Indicator. This will let us create one layer per Indicator

In [None]:
chunks = {}
for row in rows:
    
    ind_id = row['INDICATOR_ID']
    parent_id = row['REF_AREA_PARENT_ID']
    
    if ind_id not in chunks:
        chunks[ind_id] = {}
    
    if parent_id not in chunks[ind_id]:
        chunks[ind_id][parent_id] = {}
        chunks[ind_id][parent_id]['rows'] = []
    
    chunks[ind_id][parent_id]['rows'].append(row)

print ('done chunking data by indicator, by parent layer')

## Setup fields for the output geojson

In [None]:
field_map = [
    'INDICATOR_ID',
    'INDICATOR',
    'OBS_VALUE',
    'UNIT',
    'REF_AREA_ID',
    'REF_AREA',
    'REF_AREA_PARENT_ID',
    'TIME_PERIOD',
    'SUBGROUP',
    'SOURCE'
]

## Join the Attribute Data & Spatial Data
Finally, we will step through our data to create individual geojson files for each layer, for each layer.

In [None]:
# store the already reference geometry in memory for faster recall
geom_cache = {}

for c in chunks:
    
    # test with just one indicator
#     if float(c) != 5:
#         continue
        
    ind = c
    lyrs = chunks[c]
    
    for lyr in lyrs:
        
        feature_collection = {
            'type' : 'FeatureCollection',
            'features': []
        }
    
        rows = lyrs[lyr]['rows']
        for row in rows:
            parent_id = row['REF_AREA_PARENT_ID']
            area_id = row['REF_AREA_ID']
            
            # test with just a few layers
#             if parent_id not in ['2', '3','4']:
#                 continue
            
            geom = None
            if area_id not in geom_cache:
                # look for the already created geojson file
                gj_file_path = os.path.abspath(os.path.join(in_geojson_folder, 'parent_{}.geojson'.format(parent_id)))

                # check to see if we were able to get the geojson file
                # TODO: add logging rather than just printing an exception
                gj_file = os.path.isfile(gj_file_path)
                if not gj_file:
                    print ('{} geojson file not found'.format(gj_file_path))
                    continue

                fc = None
                with open(gj_file_path, 'r') as gj_opened:
                    fc = json.load(gj_opened)

                feature = None

                for f in fc['features']:
                    if f['properties']['REF_AREA_ID'] == area_id:
                        feature = f

                if feature is None:
                    print ('unable to get feature for parent layer {} :: where area_id is {}'.format(parent_id, area_id))
                    continue

                geom = feature['geometry']

                #store in cache
                geom_cache[area_id] = geom

            else:
                geom = geom_cache[area_id]
    
            # setup the new feature
            feature = {
                'type': 'Feature',
                'properties': {},
                'geometry': geom
            }
            
            # loop through the field_map variable and add each attribute and its value for the row
            for f in field_map:
                feature['properties'][f] = row[f]
                            
            # add the feature to the features array of the current layer
            feature_collection['features'].append(feature)
        
        # create filename for output geojson file
        new_ind = str(c)
        new_ind_id = new_ind.replace('.', '_')
        layer_name = 'indicator_{}_layer_{}.geojson'.format(new_ind_id, parent_id)

        # full path for the output geojson file
        full_path = os.path.abspath(os.path.join(output_geojson_folder, layer_name))    
    
        print ('writing {} features to {}'.format(len(feature_collection['features']), layer_name))
        with open(full_path, 'w') as file:
            file.write(json.dumps(feature_collection))

del geom_cache
print ('done')