# <center>Text Description to Centreline Geometry Automation Official Doc</center>
## <center>August 2019</center>

## Intro

This is a `README` for the `text_to_centreline(highway, fr, t)` function, which is written in `postgresql` (currently living in the `crosic` schema). The general purpose of this function is to take input of a text description of a street location in the City of Toronto, and make an output of centreline segments that match this description. The input descriptions typically state the street that the bylaw occurs on and the two intersections between which the bylaw occrus. For example, you could use the function to get the centreline segments of Bloor Street between Royal York Road and St George Street.

The function is supposed to be mainly used for City of Toronto [transportation bylaw data](https://open.toronto.ca/dataset/traffic-and-parking-by-law-schedules/), for example we have already used previous versions of this process for [posted speed limits](https://github.com/CityofToronto/bdit_data-sources/tree/master/gis/posted_speed_limit_update) on streets in the City, locations of [community safety zones](https://github.com/CityofToronto/bdit_vz_programs/tree/master/safety_zones/commuity_safety_zone_bylaws), [turn restrictions](https://github.com/CityofToronto/bdit_vz_programs/blob/master/notebooks/Turn%20Restrictions.ipynb), etc. The function can handle most bylaws, even more advanced cases. Any limitations that we are currently aware of will be listed somewhere in this document. It also should be noted that some bylaws are incorrect (for many reasons, such as a described intersection not existing), and our function probably cannot match a lot of these bylaws, since the data is incorrect. 

## Function Inputs

The function takes three inputs, called `highway`, `frm`, and `to`. They are called these names to emmulate the names of columns in bylaw certain documents. There are two types of ways that bylaws are explained in documents and they can influence what you input to the function. 

If you have a bylaw written like: 

|highway|from|to|
|--------|---|---|
| Bloor Street |Royal York Road | St George Street |

Then I'm sure its obvious what parameter is what. 

However there is a different format: 

|highway|between|
|--------|---|
| Bloor Street|Between Royal York Road and St George Street |

In this case you would input:
- `Bloor Street` as `highway` 
- `Between Royal York Road and St George Street` as `frm`
- `NULL` as `to`.

# How the Function Works

The main steps for every type of input (complex or not so complex) are:

1. Cleaning the data 
2. Match location text data to intersections
3. Create lines from matched intersections and match lines to centreline segments

## Step 1: Clean the data

The first step is to clean the location description data so it can easily be matched to ID's in the `gis.centreline_intersection` table. 

We want to be able to extract: 

1. `highway2`: the street name where the bylaw occurs (written in a format that is easily matchable to the intersections table)
2. `btwn1` and `btwn2`: the name of both the streets between which the bylaw occurs
3. `metres_btwn1` and `metres_btwn2`: will be null for everything except for [special case 1 and 2](#Special-Cases). These are the number of metres away from the intersections (intersections meaning intersection of `highway2` and `btwn1` and/or `highway2` and `btwn2`) that the bylaws occur
4. `direction_btwn1` and `direction_btwn2`: will be null for everything except for [special case 1 and 2](#Special-Cases). These are directions away from the intersections (intersections meaning intersection of `highway2` and `btwn1` and/or `highway2` and `btwn2`) that the bylaws occur

There are different cases for how the data is input ([see above](#Function-Inputs)), so both of those cases should have to be cleaned differently, hence there is a lot of code like: `CASE WHEN t IS NULL THEN ` .... `ELSE`. An idea to make this cleaner in the future could be to make 2 different functions for cleaning inputs. 

The `gis.abbr_street2` function is called a lot in the cleaning process. It is a function that replaces string segments such as ' Street' with ' St', and ' North' with ' N'. 

## Step 2: Match the location text data to intersections

The function `get_intersection_geom` is the main function that is called to get the geometry of the intersections between which the bylaw is in effect. The function returns an array with the geometry of the intersection and the `objectid` (unique `ID`) of the intersection. If the `direction` and `metres` values that are inputted to the function are not `NULL`, then the function returns a translated intersection geometry (translated in the direction specified by the number of metres specified). The function takes a value `not_int_id` as an input. This is an intersection `int_id` (intersection `ID`) that we do not want the function to return. We use `int_id` instead of `objectid` since sometimes there are intersection points that are in the exact same location but have different `objectid` values. This is a parameter to this function because sometimes streets can intersect more than once, and we do not want the algorithm to match to the same intersection twice. 

In  most cases, the function `get_intersection_geom` calls on another function called `get_intersection_id`. This function returns the `objectid` and `intersection id` of the intersection, as well as how close the match was (where closeness is measured by levenshtein distance). The query in this function works by gathering all of the streets from the City of Toronto intersection streets table that have the same/very similar names to the streets that intersect each other to create an end point of the bylaw location. If there are more than one street with the same unique intersection ID in this subset, then this means that both streets in the enpoint of the bylaw have been matched to streets in the City of Toronto intersection. We can use a `HAVING` clause (i.e. `HAVING COUNT(intersections.street) > 1`) to ensure that only the intersections that have been matched to both street names are chosen. The `gis.centreline_intersection_streets` view (that is called in this query) assigns a unique ID to each intersection in the City of Toronto (`gis.centreline_intersection`). Each row contains one street name from an intersection and the ID associated with the intersection. 

If the names for `highway` and `btwn` are the same, the `get_intersection_geom` calls on the function called `get_intersection_id_highway_equals_btwn`. This function is intended for cases where the intersection that we want is a cul de sac or a dead end or a pseudo intersection. In these cases the intersection would just be the name of the street. `not_int_id` is a parameter of this function as well since some streets both start and end with an intersection that is a cul de sac or pseudo intersection. The process to find the appropriate intersection is very similar to the `get_intersection_id` function, except it does not have the `HAVING COUNT(intersections.street) > 1`. This is because cul de sac or pseudo intersections intersection only have one entry in the `gis.centreline_intersection_streets` view (since there is only one road involved in the intersection).
    
The `oid1_geom` and `oid2_geom` values that are assigned in the `text_to_centreline` function and represented the (sometimes) translated geometry of the intersections.

## Step 3: Create lines from matched intersections and match lines to centreline segments
Take the intersection points that were found in the previous step and create lines between these points. Lines that had a length greater than 3000 metres or less than 11 metres were filtered out and not included in this table, since one of the intersections was most likely incorrectly matched. 

A buffer that is 3 times the length of the line was placed around the line. All centreline segments that are 90% encapsulated in the buffer and have a very similar (if not exactly the same) street name (i.e. segments with street names with a [Levenshtein distance](https://www.cuelogic.com/blog/the-levenshtein-algorithm/) of less than 4) were captured into the `vz_safety_programs_staging.community_safety_zones_centrelines` table.


# Special Cases

There are about 100 records that contain community safety zones that start and/or end at locations that are a certain amount of metres away from an intersection. In order to match these to a geom, before we would have done a manual process (which could take over a day). In order to save time in the future and increase the accuracy of our geoms, we will automate further and create a method to assign a geom to the locations of these bylaws. 

### Explore the data/cases

**Case 1** 

btwn2 is formatted like: "a point (insert number here) metres (direction - north/south/east/west)"

The point is located on the street in the highway column a certain number of metres away from the intersection identified in btwn1.

These records can be filtered with the WHERE clause: `btwn2 LIKE '%point%'`

There are 23 records with this case.


**Case 2** 

btwn1 or btwn2 formatted like: "(number) metres (direction) of (insert street name that is not btwn1 or highway)"

The point is located a certain number if metres away from the specified intersection. The intersection is an intersection between the street in the highway column and the (insert street name that is not btwn1 or highway).

These records can be filtered with the WHERE clause: `btwn LIKE '%metres%of%`

Example: 
street = "Watson Avenue"
btwn = "Between St Marks Road and 100 metres north of St Johns Road"

There are 57 records with this case.

For this case, we need to find the intersections St. Marks and Watson Avenue, and St. Johns Road and Watson Avenue. Then find a point that is 100 metres north of the intersection of St. Johns Road and Watson Avenue.


### Workflow for Special Case 1:
1. make a point that is x metres away from intersection in the cardinal direction indicated
2. make a line from the intersection to that point
3. make a buffer around the line and catch centreline segments in the buffer with the correct street name
4. dissolve the centreline segments for each community safety zone into one geom 
5. cut the dissolved segments to be from the intersection described in `btwn1` column to x metres away from that intersection in the direction specified in `direction_btwn2`. 

### Workflow for Special Case 2: 
1. make a point that is x metres away from intersection in the cardinal direction indicated
2. make a line from the intersection to that point
3. make a buffer around the line and catch centreline segments in the buffer with the correct street name
4. dissolve the centreline segments for each community safety zone into one geom 
5.  
 - **High level:** cut the dissolved segment to either be from `btwn1` intersection to x metres away from `btwn2` in a certain direction **OR** from x metres away from `btwn1` intersection in a specified direction to the `btwn2` intersection **OR** x metres away from intersection `btwn1` in specified direction to y metres away from `btwn2` intersection in a specified direction. 
 - I used the [ST_LineSubstring function](ST_LineSubstring), which takes a geom and two numbers between 0 and 1 inclusive, and returns a substring of that line geom from the first fraction to the second fraction (i.e. if the fractions were 0 and 0.20 ST_LineSubstirng would return the first 20% of the inputted line geom). To find the fraction that the point x metres away from the original intersection was located, I calculated the fraction of the line that the location the specified intersection represented. I then added or subtracted (depending on direction and location of the intersection on the dissolved centreline geom). One of my concerns was: **how do I know if I should add or subtract the percentage of x metres of the line?**
 - The [ST_LineLocatePoint function](https://postgis.net/docs/ST_LineLocatePoint.html) was used to find the location of points as a fraction of the length of the dissolved centreline geom. In order to figure out if I should add or subtract the fraction of the line that is x metres away from the intersection, I used the line from step 1. I found the closest point on the dissolved centreline to the end of the line created in step 1. 
 - If the fraction returned by ST_LineLocatePoint of the dissolved centreline that the original intersection occured on was smaller than the fration that returned by ST_LineLocatePoint represented the closest point on the dissolved centreline to the end of the line from step 1, then the situation would look like this (assuming that the line goes from left to right so points on the left have a lower fraction value): 
![](jpg/figure1_case2.jpg)
 - In the situation in the above image, to get the fraction of the point x metres away from the original intersection, you would add the `x metres/total length of dissolved centreline` to the fraction returned by ST_LineLOcatePoint of the original intersection.
 ![](jpg/figure2_case2.jpg)
 - In the situation in the above image (assuming the line is drwan from left to right so that the fractions from St_LineLocatePoint of the points on the left are smaller), to get the fraction of the point x metres away from the original intersection, you would subtract the `x metres/total length of dissolved centreline` to the fraction returned by LineLocatePoint of the original intersection.
 

## Quality Control (QC) 

QC was done at various stages throughout the process. It is important to find efficient ways to conduct QC so we can verify that the streets that were matched to the large number of bylaw locations are correct. The QC involved a lot of manual checks to the final dataset. The modified final dissolved segment should go into a `manual_geom` file, along with the unique id assigned to the bylaw. If a change needs to be made, changing the original table (i.e. changing the intersection ID's) or adding an entry ot the table with manual entries are the best ways to make corrections. The checks that I did included: 
- After finding the `objectid` of the two intersections that each bylaw occurs between, I checked for cases when the two `objectid`'s were equal because a bylaw cannot occur between same point.
- manually checked all the bylaw locations with final geoms over 2 km that occured between unfamiliar intersections (or intersections between which I was uncertain that the distance was over 2 km) 
- looked at the centreline segments matched to bylaws that do not have the exact same street name as the street that the bylaw occurs on (i.e. the street in the highway column)
- looked at final lines that were of type `ST_MultiLineString` (this means that the lines were not continuous, or the line was a circular shape, or that there was a fork in the line) 
- looked at final bylaw lines that overlapped with a different bylaw's line