# City of Cape Town - Data Science Unit Code Challenge
Author: Cobus Louw

## Introduction
This notebook serves as the application's main entry point and "glue" code. It is responsible for setting up the necessary objects and for coordinating the various components of the application. The notebook is dependent on the `cptcc` package, developed for this project, which contains the necessary classes and functions for the application. The reason for packaging the application in this way is to make it easier to reuse the code and to write unit tests.

Each section of the notebook is preceded by a markdown cell that describes the purpose of the section. The code cells that follow are usually preceded by a comment that describes the purpose of the code. The section are grouped in a similar manner as the questions in the project README.MD file.

We start of by installing the `cptcc` package along with its dependencies. You may have to restart the notebook.

## Install cptcc package

In [1]:
!pip install . > /dev/null

You should consider upgrading via the '/Users/cobus/.pyenv/versions/3.8.16/envs/cptcc/bin/python3.8 -m pip install --upgrade pip' command.[0m[33m
[0m

Next we import the necessary packages and classes, including the `cptcc` package.

Note the structure of the `cptcc` package:

```bash
src
└── cptcc
    ├── __init__.py
    ├── anonymize.py
    ├── cptcc.py
    ├── distance.py
    ├── utils.py
    └── wind.py
```

In [3]:
!pytest

platform darwin -- Python 3.8.16, pytest-7.2.2, pluggy-1.0.0
rootdir: /Users/cobus/Documents/personal/git-repos/ds_code_challenge, configfile: pytest.ini
plugins: anyio-3.6.2
collected 9 items                                                              [0m[1m

tests/test_anonymize.py::test_add_distance [32mPASSED[0m[32m                        [ 11%][0m
tests/test_anonymize.py::test_add_random_distance [32mPASSED[0m[32m                 [ 22%][0m
tests/test_cptcc.py::test_get_geojson_records 
[1m-------------------------------- live log setup --------------------------------[0m
[32mINFO    [0m botocore.credentials:credentials.py:1124 Found credentials in environment variables.
[32mINFO    [0m botocore.credentials:credentials.py:1124 Found credentials in environment variables.
[1m-------------------------------- live log call ---------------------------------[0m
[32mINFO    [0m cptcc.utils:utils.py:18 Function get_geojson Took 1.95 seconds
[32mINFO    [0m botocore.c

In [4]:
import yaml
import pandas as pd
import folium
import logging

import cptcc
from cptcc.wind import WindData
from cptcc import CPTDataLoader


logger = logging.getLogger('cptcc')
logger.setLevel(logging.INFO)
handler = logging.StreamHandler()
formatter = logging.Formatter('%(asctime)s - %(levelname)s - %(message)s')
handler.setFormatter(formatter)
logging.getLogger().addHandler(handler)


from dotenv import load_dotenv
load_dotenv()

with open('config.yaml') as f:
    config = yaml.load(f, Loader=yaml.FullLoader)
config

data_loader = CPTDataLoader(config['bucket'])


## Question 1
### Data Extraction (if applying for a Data Engineering Position)

<i>
Use the AWS S3 SELECT command to read in the H3 resolution 8 data from `city-hex-polygons-8-10.geojson`. Use the `city-hex-polygons-8.geojson` file to validate your work.

Please log the time taken to perform the operations described, and within reason, try to optimise latency and computational resources used. Please also note the comments above about the nature of the code that we expect.
</i>

For question 1 we use developed a function that allows to specify the H3 resolution and the file to read. The function returns a geopandas dataframe with the data. The response from S3 is directly piped to Geopandas to utilise its low level C code, avoiding python `for loops` - improving performance. The function also logs the time taken to read the data. A utility decorator, `timeit` is used to log the time taken to execute the function. 

In [5]:
gdf = data_loader.get_geojson_gdf(
    'city-hex-polygons-8-10.geojson', resolution=8)
gdf.drop(columns=['resolution'], inplace=True)
gdf.head()

2023-04-08 23:06:18,925 - INFO - Function get_geojson Took 1.91 seconds
2023-04-08 23:06:19,372 - INFO - Function get_geojson_gdf Took 2.35 seconds


Unnamed: 0,index,centroid_lat,centroid_lon,geometry
0,88ad361801fffff,-33.859427,18.677843,"POLYGON ((18.68119 -33.86330, 18.68357 -33.859..."
1,88ad361803fffff,-33.855696,18.668766,"POLYGON ((18.67211 -33.85957, 18.67450 -33.855..."
2,88ad361805fffff,-33.855263,18.685959,"POLYGON ((18.68931 -33.85914, 18.69169 -33.855..."
3,88ad361807fffff,-33.851532,18.676881,"POLYGON ((18.68023 -33.85541, 18.68261 -33.851..."
4,88ad361809fffff,-33.867322,18.678806,"POLYGON ((18.68215 -33.87120, 18.68454 -33.867..."


## Question 2
### Initial Data Transformation (if applying for a Data Engineering and/or Science Position and Visualisation Engineer)
<i>
Join the equivalent of the contents of the file `city-hex-polygons-8.geojson` to the service request dataset, such that each service request is assigned to a single H3 resolution level 8 hexagon. Use the `sr_hex.csv.gz` file to validate your work.

For any requests where the `Latitude` and `Longitude` fields are empty, set the index value to `0`.

Include logging that lets the executor know how many of the records failed to join, and include a join error threshold above which the script will error out. Please motivate why you have selected the error threshold that you have. Please also log the time taken to perform the operations described, and within reason, try to optimise latency and computational resources used.
</i>

We start off by downloading the service requests (`sr.csv.gz`) compressed file. A general function is implemented to allow to download any `.csv.gz` file from the S3 bucket. The body of the response is again directly fed to pandas along with the `compression` argument set to `gzip`. This allows to read the file directly into a pandas dataframe without having to decompress the file first.

The next step is to join the service requests to the H3 hexagons. We use the `assign_sr_to_gdf` function from the `cptcc` package. The function takes the service requests dataframe and the H3 resolution as arguments. The function returns a dataframe with the service requests joined to the H3 hexagons.

In [6]:
sr_df = data_loader.get_csv_gz_df(key='sr.csv.gz')
sr_gdf = data_loader.assign_sr_to_gdf(gdf, sr_df)
sr_gdf.head()

2023-04-08 23:06:26,762 - INFO - Function get_csv_gz_df Took 7.35 seconds
2023-04-08 23:06:51,435 - INFO - Failed to assign 212367 records (22.55%) of service requests to a hexagon
2023-04-08 23:06:52,837 - INFO - Function assign_sr_to_gdf Took 26.07 seconds


Unnamed: 0,notification_number,reference_number,creation_timestamp,completion_timestamp,directorate,department,branch,section,code_group,code,cause_code_group,cause_code,official_suburb,latitude,longitude,h3_level8_index
0,400583534,9109492000.0,2020-10-07 06:55:18+02:00,2020-10-08 15:36:35+02:00,URBAN MOBILITY,Roads Infrastructure Management,RIM Area Central,District: Blaauwberg,TD Customer complaint groups,Pothole&Defect Road Foot Bic Way/Kerbs,Road (RCL),Wear and tear,MONTAGUE GARDENS,-33.872839,18.522488,88ad360225fffff
1,400555043,9108995000.0,2020-07-09 16:08:13+02:00,2020-07-14 14:27:01+02:00,URBAN MOBILITY,Roads Infrastructure Management,RIM Area East,District : Somerset West,TD Customer complaint groups,Manhole Cover/Gully Grid,Road (RCL),Vandalism,SOMERSET WEST,-34.078916,18.84894,88ad36d5e1fffff
2,400589145,9109614000.0,2020-10-27 10:21:59+02:00,2020-10-28 17:48:15+02:00,URBAN MOBILITY,Roads Infrastructure Management,RIM Area East,District : Somerset West,TD Customer complaint groups,Manhole Cover/Gully Grid,Road (RCL),Vandalism,STRAND,-34.102242,18.821116,88ad36d437fffff
3,400538915,9108601000.0,2020-03-19 06:36:06+02:00,2021-03-29 20:34:19+02:00,URBAN MOBILITY,Roads Infrastructure Management,RIM Area North,District : Bellville,TD Customer complaint groups,Paint Markings Lines&Signs,Road Markings,Wear and tear,RAVENSMEAD,-33.920019,18.607209,88ad361133fffff
4,400568554,,2020-08-25 09:48:42+02:00,2020-08-31 08:41:13+02:00,URBAN MOBILITY,Roads Infrastructure Management,RIM Area South,District : Athlone,TD Customer complaint groups,Pothole&Defect Road Foot Bic Way/Kerbs,Road (RCL),Surfacing failure,CLAREMONT,-33.9874,18.45376,88ad361709fffff


## Question 5
### Further Data Transformations (if applying for a Data Engineering Position)
### Q5.1

<i>
Create a subsample of the data by selecting all of the requests in `sr_hex.csv.gz` which are within 1 minute of the centroid of the BELLVILLE SOUTH official suburb. You may determine the centroid of the suburb by the method of your choice, but if any external data is used, your code should programmatically download and perform the centroid calculation. Please clearly document your method.
</i>

We start by loading the `sr_hex.csv.gz` from S3 using the `CPTDataLoader` class. The `creation_timestamp` and `completion_time` columns dtypes are converted to datetimes. 

In [7]:
sr_hex_df = data_loader.get_csv_gz_df(key='sr_hex.csv.gz')

sr_hex_df['creation_timestamp'] = pd.to_datetime(
    sr_hex_df['creation_timestamp'], utc=True)
    
sr_hex_df['completion_timestamp'] = pd.to_datetime(
    sr_hex_df['completion_timestamp'], utc=True)

sr_hex_df.head(2)

2023-04-08 23:07:00,343 - INFO - Function get_csv_gz_df Took 7.46 seconds


Unnamed: 0,reference_number,creation_timestamp,completion_timestamp,directorate,department,branch,section,code_group,code,cause_code_group,cause_code,official_suburb,latitude,longitude,h3_level8_index
0,9109492000.0,2020-10-07 04:55:18+00:00,2020-10-08 13:36:35+00:00,URBAN MOBILITY,Roads Infrastructure Management,RIM Area Central,District: Blaauwberg,TD Customer complaint groups,Pothole&Defect Road Foot Bic Way/Kerbs,Road (RCL),Wear and tear,MONTAGUE GARDENS,-33.872839,18.522488,88ad360225fffff
1,9108995000.0,2020-07-09 14:08:13+00:00,2020-07-14 12:27:01+00:00,URBAN MOBILITY,Roads Infrastructure Management,RIM Area East,District : Somerset West,TD Customer complaint groups,Manhole Cover/Gully Grid,Road (RCL),Vandalism,SOMERSET WEST,-34.078916,18.84894,88ad36d5e1fffff


Next we obtain the Centroid of Bellville South


In [8]:
coords = data_loader.get_geoloc('Bellville South, Cape Town')
print(f'Bellville South Centroid: {(coords)}')

2023-04-08 23:07:05,327 - INFO - Function get_geoloc Took 0.88 seconds


Bellville South Centroid: (-33.9161111, 18.6444444)


Next we filter entries based on a one minute thresholda around the longitude and latidute of the centroid obtained in the previous step. We use a helper function defined in `cptcc.distance`

In [9]:
minute = 1 / 60
boundary = (coords[0] - minute, coords[0] + minute,
            coords[1] - minute, coords[1] + minute)
subsample_df = sr_hex_df[sr_hex_df[['latitude', 'longitude']].
                      apply(lambda x: cptcc.distance.filter_lon_lat(*x, *boundary), axis=1)]
subsample_df.head(2)

Unnamed: 0,reference_number,creation_timestamp,completion_timestamp,directorate,department,branch,section,code_group,code,cause_code_group,cause_code,official_suburb,latitude,longitude,h3_level8_index
6,,2020-10-23 08:33:48+00:00,2020-10-26 12:16:49+00:00,URBAN MOBILITY,Roads Infrastructure Management,RIM Area North,District : Bellville,TD Customer complaint groups,Pothole&Defect Road Foot Bic Way/Kerbs,,,GLENHAVEN,-33.917996,18.658031,88ad361a19fffff
12,9108689000.0,2020-04-22 05:37:53+00:00,2020-07-05 10:11:35+00:00,,,,,TD Customer complaint groups,Paint Markings Lines&Signs,,,BELLVILLE CBD,-33.901032,18.631005,88ad361ac7fffff


### Q5.2

<i>
Augment your filtered subsample of `sr_hex.csv.gz` from (1) with the appropriate wind direction and speed data for 2020 from the Bellville South Air Quality Measurement site, from when the notification was created. All of the steps for downloading and preparing the wind data, as well as the join should be performed programmatically within your script.
</i>

1. The scripts starts by downloading the `.ods` file and loading it into a pandas dataframe.
2. Since the formatting is a bit untidy we clean the dataframe by using the `clean_wind_data` function defined in `WindData`.
   1. This involves stripping the redundant rows at the end and start of the dataframe.
   2. Changing the datatype of the `date_time` column to `pd.datetime`.
   3. Reformatting columns to be lower case and remove spaces as well as convert to MultiIndex for column names.


In [10]:
wind_df = WindData.get_df(config['wind_data'])
wind_df = WindData.clean_wind_data(wind_df)
wind_df.head(2)

Unnamed: 0_level_0,date_time,atlantis_aqm_site,atlantis_aqm_site,bellville_south_aqm_site,bellville_south_aqm_site,bothasig_aqm_site,bothasig_aqm_site,goodwood_aqm_station,goodwood_aqm_station,khayelitsha_aqm_site,khayelitsha_aqm_site,somerset_west_aqm_site,somerset_west_aqm_site,tableview_aqm_site,tableview_aqm_site
Unnamed: 0_level_1,Unnamed: 1_level_1,wind_dir_v(deg),wind_speed_v(m/s),wind_dir_v(deg),wind_speed_v(m/s),wind_dir_v(deg),wind_speed_v(m/s),wind_dir_v(deg),wind_speed_v(m/s),wind_dir_v(deg),wind_speed_v(m/s),wind_dir_v(deg),wind_speed_v(m/s),wind_dir_v(deg),wind_speed_v(m/s)
0,2020-01-01 00:00:00+00:00,173.0,4.1,191.0,2.5,163.7,5.3,247.8,19.2,34.2,1.3,135.0,3.8,179.8,5.2
1,2020-01-01 01:00:00+00:00,177.7,4.0,209.7,1.6,159.0,5.4,247.0,17.9,34.9,1.1,132.7,2.1,177.9,5.2


Since the structure of the dataframe suitable for further processing, we can easily extract both the `wind_dir` and `wind_speed` for the `bellville_south_aqm_site` with a simple lookup. 

In [11]:
SUBURB = 'bellville_south_aqm_site'
bs_wind_df = wind_df[['date_time', SUBURB]]
bs_wind_df.set_index('date_time', inplace=True, drop=True)
bs_wind_df.columns = bs_wind_df.columns.droplevel(0)
bs_wind_df.reset_index(inplace=True)
bs_wind_df.columns = ['wind_timestamp'] + bs_wind_df.columns[1:].to_list()
bs_wind_df.head(5)

Unnamed: 0,wind_timestamp,wind_dir_v(deg),wind_speed_v(m/s)
0,2020-01-01 00:00:00+00:00,191.0,2.5
1,2020-01-01 01:00:00+00:00,209.7,1.6
2,2020-01-01 02:00:00+00:00,202.5,1.4
3,2020-01-01 03:00:00+00:00,224.7,1.2
4,2020-01-01 04:00:00+00:00,244.3,1.3


The final step is to join the wind data on the `creation_timestamp` column. We use the `pd.merge_asof` function to perform the join. The reason for using `merge_asof` is that the wind data is not available for every timestamp in the `sr_hex.csv.gz` file. We, therefore, use the `nearest` option to get the closest wind data for each timestamp in the `sr_hex.csv.gz` file.

In [12]:
# merge dataframes based on closest hour
subsample_df = subsample_df.sort_values('creation_timestamp')
bs_wind_df = bs_wind_df.sort_values('wind_timestamp')

subsample_df = pd.merge_asof(subsample_df, bs_wind_df,
                             left_on='creation_timestamp',
                             right_on='wind_timestamp',
                             direction='nearest')

subsample_df.head(2)

Unnamed: 0,reference_number,creation_timestamp,completion_timestamp,directorate,department,branch,section,code_group,code,cause_code_group,cause_code,official_suburb,latitude,longitude,h3_level8_index,wind_timestamp,wind_dir_v(deg),wind_speed_v(m/s)
0,9108191000.0,2019-12-31 23:49:58+00:00,2020-01-02 05:31:40+00:00,WATER AND SANITATION,Distribution Services,Reticulation,,SEWER,Sewer: Blocked/Overflow,,,LABIANCE,-33.91172,18.654891,88ad361a11fffff,2020-01-01 00:00:00+00:00,191.0,2.5
1,9108191000.0,2020-01-01 00:19:27+00:00,2020-01-29 13:26:22+00:00,WATER AND SANITATION,Distribution Services,Reticulation,Reticulation Water Distribution,WATER,Leak at Valve,,,BELLVILLE SOUTH,-33.918821,18.642055,88ad361127fffff,2020-01-01 00:00:00+00:00,191.0,2.5


### Q5.3

<i>
Write a script which anonymises your augmented subsample from (2), but preserves the following precisions (You may use H3 indice or lat/lon coordinates for your spatial data):
   * location accuracy to within approximately 500m
   * temporal accuracy to within 6 hours
Please also remove any columns which you believe could lead to the resident who made the request being identified. We expect in the accompanying report that you will justify as to why this data is now anonymised. Please limit this commentary to less than 500 words. If your code is written in a code notebook such as Jupyter notebook or Rmarkdown, you can include this commentary in your notebook.
</i>

We start by anonymising the location data by rounding the `longitude` and `latitude` columns to 3 decimal places. This will give us a precision of approximately 500m.

In [13]:
from cptcc.anonymize import add_random_distance

MIN_DISTANCE = 100  # meters
MAX_DISTANCE = 500  # meters

subsample_df[['latitude', 'longitude',]] = \
    subsample_df[['latitude', 'longitude',]].apply(lambda x:
                                                   add_random_distance(lat=x[0],
                                                                       lon=x[1],
                                                                       min_distance=MIN_DISTANCE,
                                                                       max_distance=MAX_DISTANCE))
subsample_df.head(3)


Unnamed: 0,reference_number,creation_timestamp,completion_timestamp,directorate,department,branch,section,code_group,code,cause_code_group,cause_code,official_suburb,latitude,longitude,h3_level8_index,wind_timestamp,wind_dir_v(deg),wind_speed_v(m/s)
0,9108191000.0,2019-12-31 23:49:58+00:00,2020-01-02 05:31:40+00:00,WATER AND SANITATION,Distribution Services,Reticulation,,SEWER,Sewer: Blocked/Overflow,,,LABIANCE,-33.918727,18.640138,88ad361a11fffff,2020-01-01 00:00:00+00:00,191.0,2.5
1,9108191000.0,2020-01-01 00:19:27+00:00,2020-01-29 13:26:22+00:00,WATER AND SANITATION,Distribution Services,Reticulation,Reticulation Water Distribution,WATER,Leak at Valve,,,BELLVILLE SOUTH,-33.913164,18.651795,88ad361127fffff,2020-01-01 00:00:00+00:00,191.0,2.5
2,,2020-01-01 04:42:56+00:00,2020-01-02 05:34:55+00:00,ENERGY,Electricity Generation and Distribution,Electricity Retail Management,Customer Support Services and Rev Man,ELECTRICITY TECHNICAL COMPLAINTS,Identify Cables,,,BELLVILLE SOUTH,,,88ad361ac9fffff,2020-01-01 05:00:00+00:00,245.0,1.6


Next we round all temporal data to the closest 6 hours

In [14]:
ROUNDING = '6H'
subsample_df['wind_timestamp'] = subsample_df['wind_timestamp'].dt.round(ROUNDING)
subsample_df['creation_timestamp'] = subsample_df['creation_timestamp'].dt.round(ROUNDING)
subsample_df['completion_timestamp' ] = subsample_df['completion_timestamp'].dt.round(ROUNDING)
subsample_df.head(10)

Unnamed: 0,reference_number,creation_timestamp,completion_timestamp,directorate,department,branch,section,code_group,code,cause_code_group,cause_code,official_suburb,latitude,longitude,h3_level8_index,wind_timestamp,wind_dir_v(deg),wind_speed_v(m/s)
0,9108191000.0,2020-01-01 00:00:00+00:00,2020-01-02 06:00:00+00:00,WATER AND SANITATION,Distribution Services,Reticulation,,SEWER,Sewer: Blocked/Overflow,,,LABIANCE,-33.918727,18.640138,88ad361a11fffff,2020-01-01 00:00:00+00:00,191.0,2.5
1,9108191000.0,2020-01-01 00:00:00+00:00,2020-01-29 12:00:00+00:00,WATER AND SANITATION,Distribution Services,Reticulation,Reticulation Water Distribution,WATER,Leak at Valve,,,BELLVILLE SOUTH,-33.913164,18.651795,88ad361127fffff,2020-01-01 00:00:00+00:00,191.0,2.5
2,,2020-01-01 06:00:00+00:00,2020-01-02 06:00:00+00:00,ENERGY,Electricity Generation and Distribution,Electricity Retail Management,Customer Support Services and Rev Man,ELECTRICITY TECHNICAL COMPLAINTS,Identify Cables,,,BELLVILLE SOUTH,,,88ad361ac9fffff,2020-01-01 06:00:00+00:00,245.0,1.6
3,9108191000.0,2020-01-01 06:00:00+00:00,2020-01-02 06:00:00+00:00,WATER AND SANITATION,Commercial Services,Customer Services (Water),Meter Management,WATER MANAGEMENT DEVICE,No Water WMD,,,BELRAIL,,,88ad361ac5fffff,2020-01-01 06:00:00+00:00,194.5,1.7
4,9108191000.0,2020-01-01 06:00:00+00:00,2020-01-16 12:00:00+00:00,WATER AND SANITATION,Distribution Services,Reticulation,,SEWER,Sewer: Blocked/Overflow,,,BELRAIL,,,88ad361ac5fffff,2020-01-01 06:00:00+00:00,194.5,1.7
5,9108191000.0,2020-01-01 06:00:00+00:00,2020-01-01 06:00:00+00:00,WATER AND SANITATION,Distribution Services,Reticulation,,SEWER,Sewer: Blocked/Overflow,,,BELRAIL,,,88ad361ac5fffff,2020-01-01 06:00:00+00:00,194.5,1.7
6,9108191000.0,2020-01-01 06:00:00+00:00,2020-01-02 06:00:00+00:00,WATER AND SANITATION,Distribution Services,Reticulation,,SEWER,Sewer: Blocked/Overflow,General,Roots,LABIANCE,,,88ad361a11fffff,2020-01-01 06:00:00+00:00,176.6,1.5
7,9108191000.0,2020-01-01 12:00:00+00:00,2020-01-07 12:00:00+00:00,URBAN MOBILITY,Transport Planning & Network Management,Transport Network Facilitation and Dev.,Transport Network Development,TRAFFIC SIGNALS,Stage Stuck,,,BELGRAVIA -BELLVILLE,,,88ad361ae9fffff,2020-01-01 12:00:00+00:00,199.7,3.7
8,9108191000.0,2020-01-01 12:00:00+00:00,2020-01-02 06:00:00+00:00,WATER AND SANITATION,Commercial Services,Customer Services (Water),Meter Management,WATER MANAGEMENT DEVICE,No Water WMD,,,BELRAIL,,,88ad361ac5fffff,2020-01-01 12:00:00+00:00,187.9,3.7
9,9108192000.0,2020-01-01 18:00:00+00:00,2020-01-01 18:00:00+00:00,WATER AND SANITATION,Distribution Services,Reticulation,,WATER,Leak at Valve,,,BELLVILLE SOUTH,,,88ad361127fffff,2020-01-01 18:00:00+00:00,192.7,4.2


Incase the reference number is used to identify the resident, we replace it with an untraceable UUID.

In [15]:
# count occurences of each reference number
import uuid
r_number = list(subsample_df['reference_number'].unique())
r_to_id = {r: uuid.uuid4() for r in r_number}
subsample_df['reference_number'] = subsample_df['reference_number'].map(r_to_id)
subsample_df.head(2)

Unnamed: 0,reference_number,creation_timestamp,completion_timestamp,directorate,department,branch,section,code_group,code,cause_code_group,cause_code,official_suburb,latitude,longitude,h3_level8_index,wind_timestamp,wind_dir_v(deg),wind_speed_v(m/s)
0,49544128-5c37-4b0c-affa-7cc2e089a8ba,2020-01-01 00:00:00+00:00,2020-01-02 06:00:00+00:00,WATER AND SANITATION,Distribution Services,Reticulation,,SEWER,Sewer: Blocked/Overflow,,,LABIANCE,-33.918727,18.640138,88ad361a11fffff,2020-01-01 00:00:00+00:00,191.0,2.5
1,2e8c7c50-884b-4494-aef3-309a344ad197,2020-01-01 00:00:00+00:00,2020-01-29 12:00:00+00:00,WATER AND SANITATION,Distribution Services,Reticulation,Reticulation Water Distribution,WATER,Leak at Valve,,,BELLVILLE SOUTH,-33.913164,18.651795,88ad361127fffff,2020-01-01 00:00:00+00:00,191.0,2.5


## Supplementary

In [16]:
m = folium.Map(location=[coords[0], coords[1]], zoom_start=13)
folium.Marker(location=[coords[0], coords[1]]).add_to(m)
m