# Assignment 3 and 4

The  focus of the following exercises is to create geometric objects and functions, from a simple text files that contain coordinate locations - in this case, latitude and longtitude values.  
Python is an excellent tool for this kind of a task: it can read data from (almost) any input format (CSV, text, Excel, GPX, various databases).

The reading of the data files is commonly done  using [pandas](https://pandas.pydata.org/), while the geometric analysis through the [shapely](https://shapely.readthedocs.io/), library.

## Sample data set

For this exercise, we read simulated data from a file that lists travel times between different locations.
The data is stored in a semicolon-separated text file [`travel_times.txt`](data/travel_times.txt).

The first four rows of our data look like this:

```
from_id;to_id;fromid_toid;route_number;at;from_x;from_y;to_x;to_y;total_route_time;route_time;route_distance;route_total_lines
5807270;5814548;5807270_5814548;1;08:10;18.0715972;59.3091487;18.0900702;59.3274161;62.8095746;63.8796211;9120.9728828;3
5800860;5813108;5800860_5813108;1;08:10;18.0658924;59.3877461;18.0421721;59.3079419;83.8267506;85.1239344;10918.6411340;2
5805390;5817158;5805390_5817158;4;08:10;18.0027096;59.3265600;18.0276828;59.3085658;142.8083640;138.4989656;11327.4226781;2
5805191;5817400;5805191_5817400;2;08:10;18.0221972;59.3129515;18.0592350;59.3894191;72.4615612;117.2012476;11165.3161290;1
```

In this exercise, we are interested in the following columns:

| Column name        | Description                                              |
|:------------------ |:-------------------------------------------------------- |
| `from_x`           | x-coordinate of the **origin** location (longitude)      |
| `from_y`           | y-coordinate of the **origin** location (latitude)       |
| `to_x`             | x-coordinate of the **destination** location (longitude) |
| `to_y`             | y-coordinate of the **destination** location (latitude)  |
| `total_route_time` | Travel time with public transportation at the route      |


NOTE: At the parts where your code can be checked (i.e., # CODE CELL FOR TESTING YOUR SOLUTION) you might need to adjust the variable names if you have used different. In general, you can adapt any part of the code and customize it as you wish, as long as you fulfil the primary objectives.



----

## Assignment 3: Reading coordinates from a text file, and creating geometries

In this problem, your task is to read data from the file described above, and create two lists of points representing 
the origins and destinations of the routes described in the data set.

This task entails multiple steps:

1. Read the data into a `pandas.DataFrame`
2. Discard all unnecessary columns (this is good practice, as it helps reduce the memory footprint of a program)
3. Create two lists of `shapely.geometry.Point`s

Let’s go step-by-step. 



----

#### (1)

First, use `pandas` to read the file into a variable `data`. Consult the [pandas documentation](https://pandas.pydata.org/docs/user_guide/) to find the best way to do this.

In [2]:
# Import 
import pandas as pd
import numpy as np
import shapely as shp
from shapely.geometry import Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon

# Read File
colx=["from_id",
      "to_id",
      "fromid_toid",
      "route_number",
      "at",
      "from_x",
      "from_y",
      "to_x",
      "to_y",
      "total_route_time",
      "route_time",
      "route_distance",
      "route_total_lines"]

uc = [5, 6, 7, 8, 9]

data = pd.read_csv("simulated_travel_times_stockholm.txt", names=colx, usecols = uc, sep = ";", skiprows = 1)
print(data)

        from_x     from_y       to_x       to_y  total_route_time
0    18.071597  59.309149  18.090070  59.327416         62.809575
1    18.065892  59.387746  18.042172  59.307942         83.826751
2    18.002710  59.326560  18.027683  59.308566        142.808364
3    18.022197  59.312951  18.059235  59.389419         72.461561
4    18.023107  59.388875  18.091236  59.319187         87.216702
..         ...        ...        ...        ...               ...
235  18.022444  59.357867  18.099126  59.360188        101.938586
236  18.022382  59.343862  18.012839  59.358242         70.488127
237  18.053697  59.372526  18.010411  59.374807        137.783008
238  18.059294  59.348667  18.072434  59.381177        137.958186
239  18.058009  59.387342  18.057839  59.365648        137.057842

[240 rows x 5 columns]


As a little sanity check, print the number of rows and columns of the data set:

In [4]:
# RUN YOUR OWN CODE HERE
print(data.shape)

(240, 5)


If you loaded the data set successfully, the following code cell will print the first few rows of the data:

In [5]:
# CODE CELL FOR TESTING YOUR SOLUTION
data.tail()

Unnamed: 0,from_x,from_y,to_x,to_y,total_route_time
235,18.022444,59.357867,18.099126,59.360188,101.938586
236,18.022382,59.343862,18.012839,59.358242,70.488127
237,18.053697,59.372526,18.010411,59.374807,137.783008
238,18.059294,59.348667,18.072434,59.381177,137.958186
239,18.058009,59.387342,18.057839,59.365648,137.057842



----
#### (2)

Now, select the 4 columns that contain coordinate information (**`from_x`**, **`from_y`**, **`to_x`**, **`to_y`**), and store them in a DataFrame **`data`**. 
(i.e. update the variable `data`  to contain only these four columns).

In [3]:
# Select specific columns and update data frame
data = data[["from_x", "from_y", "to_x", "to_y"]]

Run the following code cell to test whether you have successfully replaced `data` with only the required data columns: it prints an error if you haven’t.

In [4]:
#  CODE CELL FOR TESTING YOUR SOLUTION
assert list(data.columns) == ["from_x", "from_y", "to_x", "to_y"], "Error: `data` does not (or not only) contain the four columns it should"


----

#### (3)

Finally, create two lists called **`origin_points`** and **`destination_points`** that contain `shapely.geometry.Point` objects created using the coordinates from `data`. 

In particular, the origin points in `origin_points` should be based on columns `from_x` and `from_y`, and the destination points in `destination_points` on columns `to_x` and `to_y`.

There are many ways to achieve this, find two possible approaches below (you can implement either one of them):

##### **Approach A**

- Create two empty lists for the origin and destination points, respectively
- Use a for-loop to iterate over the rows of your dataframe:
    - For each row, create a `shapely.geometry.Point` object based on the coordinate columns
    - Append the point object to the `origin_points` and `destination_point` lists


##### **Approach B (more advanced)**

- Make use of the `.apply()` function of the `pandas.DataFrame` to operate on all rows at once (see its [documentation](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.apply.html), *HINT:* you might want to use the `axis` parameter)
- Use the `shapely.geometry.Point` constructor directly, or wrap it into a [lambda function](https://towardsdatascience.com/apply-and-lambda-usage-in-pandas-b13a1ea037f7)
- Finally, convert the output `pandas.Series` into `list`s

In [11]:
# Create Point-creating fucntion 
def create_point_geometry (A,B):
    pt = Point(A, B)
    return pt

# Create empty lists
origin_points = []
destination_points = []

# For Loop (METHOD A)
# Start point for loop 
for i, row in data.iterrows():
    start_point = create_point_geometry(row["from_x"], row["from_y"])
    origin_points.append(start_point)
    #print(start_point)

# Destination point for loop 
for i, row in data.iterrows():
    end_point = create_point_geometry(row["to_x"], row["to_y"])
    destination_points.append(end_point)
    #print(destination_point)

"""
# (METHOD B)

origin_points = data.apply(lambda row: Point(row['from_x'], row['from_y']), axis=1).tolist()
destination_points = data.apply(lambda row: Point(row['to_x'], row['to_y']), axis=1).tolist()

print(origin_points[:5])
print(destination_points[:5])
"""


"\n\n\n# (METHOD B)\n\norigin_points = data.apply(lambda row: Point(row['from_x'], row['from_y']), axis=1).tolist()\ndestination_points = data.apply(lambda row: Point(row['to_x'], row['to_y']), axis=1).tolist()\n\nprint(origin_points[:5])\n#print(destination_points.head())\n"


**NOTE: After you have solved this problem, there might be some left-over variables around.<br />We recommend you *restart the kernel and run all cells* from the toolbar or JupyterLab’s menu.***



Use the following code cell to test whether your solution works:

In [12]:
# CODE CELL FOR TESTING YOUR SOLUTION

# This test print should print out the first origin and destination coordinates in the two lists:
print("ORIGIN X Y:", origin_points[0].x, origin_points[0].y)
print("DESTINATION X Y:", destination_points[0].x, destination_points[0].y)

# Check that you created a correct amount of points:
assert len(origin_points) == len(data), "Number of origin points must be the same as number of rows in the original file"
assert len(destination_points) == len(data), "Number of destination points must be the same as number of rows in the original file"

ORIGIN X Y: 18.0715972 59.3091487
DESTINATION X Y: 18.0900702 59.3274161




### Done!

That’s it. Now you are ready to continue to problem 4.


----

## Assignment 4: Creating LineStrings that represent the movements:

This problem continues where we left off after completing *Assignment 3*. 

The task is to:

1. create a list lines (`shapely.geometry.LineString`) between each pair of origin and destination points, and 
2. calculate the over-all total_length of all those lines.

Store the list of lines in a variable called `lines`, and the sum of lengths in a variable called `total_length`.

Once you have working solutions for both tasks, 

3. create functions for them so you can apply them to other similar data sets in the future (see instructions below).

#### (1)

To create the `shapely.geometry.LineString`s for each pair of origins and destinations, you need to loop over both lists at the same time.

Again, there are many ways to achieve this, here are two suggestions:

- (alternative 1) Use the `zip()` function that allows you to iterate over multiple lists at the same time. See this week’s exercise hints!
- (alternative 2) Use the *for-range loop*  and an index variable to access the same value in both lists


In [10]:
# Create Line String-creating fucntion 
def create_linestring_geometry(A, B):
    lns = LineString([A, B])     # Here, you MUST use [] to pass a list of points 
    return lns

# Empty list
lines = []

#For loop and zip two lists 
for A, B in zip(origin_points, destination_points):
    line = create_linestring_geometry(A, B)
    lines.append(line)

print(lines)

[<LINESTRING (18.072 59.309, 18.09 59.327)>, <LINESTRING (18.066 59.388, 18.042 59.308)>, <LINESTRING (18.003 59.327, 18.028 59.309)>, <LINESTRING (18.022 59.313, 18.059 59.389)>, <LINESTRING (18.023 59.389, 18.091 59.319)>, <LINESTRING (18.067 59.396, 18.021 59.332)>, <LINESTRING (18.002 59.386, 18.062 59.323)>, <LINESTRING (18.01 59.381, 18.063 59.335)>, <LINESTRING (18.08 59.366, 18.073 59.307)>, <LINESTRING (18.018 59.355, 18.013 59.352)>, <LINESTRING (18.065 59.309, 18.072 59.307)>, <LINESTRING (18.024 59.341, 18.091 59.38)>, <LINESTRING (18.01 59.337, 18.018 59.323)>, <LINESTRING (18.024 59.326, 18.024 59.354)>, <LINESTRING (18.072 59.372, 18.097 59.388)>, <LINESTRING (18.086 59.35, 18.018 59.365)>, <LINESTRING (18.083 59.308, 18.085 59.353)>, <LINESTRING (18.04 59.322, 18.049 59.332)>, <LINESTRING (18.067 59.368, 18.025 59.333)>, <LINESTRING (18.02 59.308, 18.087 59.367)>, <LINESTRING (18.029 59.385, 18.045 59.399)>, <LINESTRING (18.09 59.35, 18.051 59.366)>, <LINESTRING (18.001


**NOTE: After you have solved this problem, there might be some left-over variables around.<br />We recommend you *restart the kernel and run all cells* from the toolbar or JupyterLab’s menu.***


In [11]:
# CODE CELL FOR TESTING YOUR SOLUTION

# Test that the list has correct number of LineStrings
assert len(lines) == len(data), "There should be as many lines as there are rows in the original data"


----

#### (2)

Create a variable called **`total_length`**, and store the total (Euclidian) distance of all the origin-destination LineStrings that we just created into that variable.

*Hint*: A simple solution is to start with a `total_length` of `0`, and add each line’s length while iterating over the list of lines.


In [12]:
# Total length 
total_length = 0 

In [13]:
# Create get length fucntion 
def get_length(line_input):
    lng = line_input.length
    return lng

# For loop and append
for i in lines:
    lng = get_length(i)
    total_length = lng + total_length
  
print(total_length)

12.565719181847934



----

#### (3)

Now, create functions that automate the functionality you implemented for part (1) and part (2) of this problem:

- `create_od_lines()`: accepts two `list`s of `shapely.geometry.Point`s and returns a `list` of `shapely.geometry.LineString`s 
- `calculate_total_distance()`: takes a `list` of `shapely.geometry.LineString` geometries and returns their total length

You can copy and paste the codes you have written earlier into the functions. Be sure to add a **docstring** to each function.
Below, you can find a code cell for testing your functions (you should get the same result as earler).

In [16]:
# Create Line String-creating fucntion 
def create_od_lines(list_a, list_b):
    """
    This function generates a list of Line String from two lists)

    Input: list_a, list_b (Two lists of "shapely.geometry.Point")

    Return: lines (One list of "shapely.geometry.LineString")

    """
    lines = []    
    for pt_a, pt_b in zip(list_a,list_b): 
        LinStr = LineString([pt_a, pt_b])     # Here put the POINTS into the fucntion NOT the lists!
        lines.append(LinStr)

    return lines 
    
# Create a total length calcualtor fucntion 
def calculate_total_distance(list_c):

    """
    This function calcualtes total distances of all lines from a list)

    Input: list_c (One list of "shapely.geometry.LineString")

    Return: total_lng (Float)

    """
    total_length = 0  # Initial length = 0 
    for line in list_c: 
        length = line.length     
        total_length = length + total_length   # accumulative total length
    
    
    return total_length


In [17]:
# CODE CELL FOR TESTING YOUR SOLUTION

# Create origin-destination lines
od_lines = create_od_lines(origin_points, destination_points)
total_dist = 0


# Calculate the total distance
tot_dist = calculate_total_distance(od_lines)

print("Total distance", round(tot_dist,2))
assert tot_dist == total_length


Total distance 12.57


In [18]:

## Well done!

## Sources

This lesson is inspired by the [Programming in Python lessons](http://swcarpentry.github.io/python-novice-inflammation/) from the [Software Carpentry organization](http://software-carpentry.org) and has adapted or reused material from University of Helsinki Automating GIS processis course (https://autogis-site.readthedocs.io/en/latest/course-info/license.html) under a Creative Commons Attribution-ShareAlike 4.0 International licence (https://creativecommons.org/licenses/by-sa/4.0/deed.en).
