# Assignment 3 and 4

The  focus of the following exercises is to create geometric objects and functions, from a simple text files that contain coordinate locations - in this case, latitude and longtitude values.  
Python is an excellent tool for this kind of a task: it can read data from (almost) any input format (CSV, text, Excel, GPX, various databases).

The reading of the data files is commonly done  using [pandas](https://pandas.pydata.org/), while the geometric analysis through the [shapely](https://shapely.readthedocs.io/), library.

## Sample data set

For this exercise, we read simulated data from a file that lists travel times between different locations.
The data is stored in a semicolon-separated text file [`travel_times.txt`](data/travel_times.txt).

The first four rows of our data look like this:

```
from_id;to_id;fromid_toid;route_number;at;from_x;from_y;to_x;to_y;total_route_time;route_time;route_distance;route_total_lines
5807270;5814548;5807270_5814548;1;08:10;18.0715972;59.3091487;18.0900702;59.3274161;62.8095746;63.8796211;9120.9728828;3
5800860;5813108;5800860_5813108;1;08:10;18.0658924;59.3877461;18.0421721;59.3079419;83.8267506;85.1239344;10918.6411340;2
5805390;5817158;5805390_5817158;4;08:10;18.0027096;59.3265600;18.0276828;59.3085658;142.8083640;138.4989656;11327.4226781;2
5805191;5817400;5805191_5817400;2;08:10;18.0221972;59.3129515;18.0592350;59.3894191;72.4615612;117.2012476;11165.3161290;1
```

In this exercise, we are interested in the following columns:

| Column name        | Description                                              |
|:------------------ |:-------------------------------------------------------- |
| `from_x`           | x-coordinate of the **origin** location (longitude)      |
| `from_y`           | y-coordinate of the **origin** location (latitude)       |
| `to_x`             | x-coordinate of the **destination** location (longitude) |
| `to_y`             | y-coordinate of the **destination** location (latitude)  |
| `total_route_time` | Travel time with public transportation at the route      |


NOTE: At the parts where your code can be checked (i.e., # CODE CELL FOR TESTING YOUR SOLUTION) you might need to adjust the variable names if you have used different. In general, you can adapt any part of the code and customize it as you wish, as long as you fulfil the primary objectives.



----

## Assignment 3: Reading coordinates from a text file, and creating geometries

In this problem, your task is to read data from the file described above, and create two lists of points representing
the origins and destinations of the routes described in the data set.

This task entails multiple steps:

1. Read the data into a `pandas.DataFrame`
2. Discard all unnecessary columns (this is good practice, as it helps reduce the memory footprint of a program)
3. Create two lists of `shapely.geometry.Point`s

Let’s go step-by-step.



----

#### (1)

First, use `pandas` to read the file into a variable `data`. Consult the [pandas documentation](https://pandas.pydata.org/docs/user_guide/) to find the best way to do this.

In [6]:
# RUN YOUR OWN CODE HERE
import pandas as pd
data = pd.read_csv("/travel_times.txt", delimiter=';')

As a little sanity check, print the number of rows and columns of the data set:

In [7]:
# RUN YOUR OWN CODE HERE
print(data.shape)

(240, 13)


If you loaded the data set successfully, the following code cell will print the first few rows of the data:

In [8]:
# CODE CELL FOR TESTING YOUR SOLUTION
data.head()

Unnamed: 0,from_id,to_id,fromid_toid,route_number,at,from_x,from_y,to_x,to_y,total_route_time,route_time,route_distance,route_total_lines
0,5807270,5814548,5807270_5814548,1,08:10,18.071597,59.309149,18.09007,59.327416,62.809575,63.879621,9120.972883,3
1,5800860,5813108,5800860_5813108,1,08:10,18.065892,59.387746,18.042172,59.307942,83.826751,85.123934,10918.641134,2
2,5805390,5817158,5805390_5817158,4,08:10,18.00271,59.32656,18.027683,59.308566,142.808364,138.498966,11327.422678,2
3,5805191,5817400,5805191_5817400,2,08:10,18.022197,59.312951,18.059235,59.389419,72.461561,117.201248,11165.316129,1
4,5805734,5813170,5805734_5813170,3,08:10,18.023107,59.388875,18.091236,59.319187,87.216702,42.716742,18614.165038,3



----
#### (2)

Now, select the 4 columns that contain coordinate information (**`from_x`**, **`from_y`**, **`to_x`**, **`to_y`**), and store them in a DataFrame **`data`**.
(i.e. update the variable `data`  to contain only these four columns).

In [9]:
# RUN YOUR OWN CODE HERE
data = data[["from_x", "from_y", "to_x", "to_y"]]

Run the following code cell to test whether you have successfully replaced `data` with only the required data columns: it prints an error if you haven’t.

In [10]:
#  CODE CELL FOR TESTING YOUR SOLUTION
assert list(data.columns) == ["from_x", "from_y", "to_x", "to_y"], "Error: `data` does not (or not only) contain the four columns it should"


----

#### (3)

Finally, create two lists called **`origin_points`** and **`destination_points`** that contain `shapely.geometry.Point` objects created using the coordinates from `data`.

In particular, the origin points in `origin_points` should be based on columns `from_x` and `from_y`, and the destination points in `destination_points` on columns `to_x` and `to_y`.

There are many ways to achieve this, find two possible approaches below (you can implement either one of them):

##### **Approach A**

- Create two empty lists for the origin and destination points, respectively
- Use a for-loop to iterate over the rows of your dataframe:
    - For each row, create a `shapely.geometry.Point` object based on the coordinate columns
    - Append the point object to the `origin_points` and `destination_point` lists


##### **Approach B (more advanced)**

- Make use of the `.apply()` function of the `pandas.DataFrame` to operate on all rows at once (see its [documentation](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.apply.html), *HINT:* you might want to use the `axis` parameter)
- Use the `shapely.geometry.Point` constructor directly, or wrap it into a [lambda function](https://towardsdatascience.com/apply-and-lambda-usage-in-pandas-b13a1ea037f7)
- Finally, convert the output `pandas.Series` into `list`s





In [11]:
# RUN YOUR OWN CODE HERE
from shapely.geometry import Point
origin_points = []
destination_points = []
for idx, row in data.iterrows():
    origin_points.append(Point(row["from_x"], row["from_y"]))
    destination_points.append(Point(row["to_x"], row["to_y"]))


**NOTE: After you have solved this problem, there might be some left-over variables around.<br />We recommend you *restart the kernel and run all cells* from the toolbar or JupyterLab’s menu.***



Use the following code cell to test whether your solution works:

In [12]:
# CODE CELL FOR TESTING YOUR SOLUTION

# This test print should print out the first origin and destination coordinates in the two lists:
print("ORIGIN X Y:", origin_points[0].x, origin_points[0].y)
print("DESTINATION X Y:", destination_points[0].x, destination_points[0].y)

# Check that you created a correct amount of points:
assert len(origin_points) == len(data), "Number of origin points must be the same as number of rows in the original file"
assert len(destination_points) == len(data), "Number of destination points must be the same as number of rows in the original file"

ORIGIN X Y: 18.0715972 59.3091487
DESTINATION X Y: 18.0900702 59.3274161




### Done!

That’s it. Now you are ready to continue to problem 4.


----

## Assignment 4: Creating LineStrings that represent the movements:

This problem continues where we left off after completing *Assignment 3*.

The task is to:

1. create a list lines (`shapely.geometry.LineString`) between each pair of origin and destination points, and
2. calculate the over-all total_length of all those lines.

Store the list of lines in a variable called `lines`, and the sum of lengths in a variable called `total_length`.

Once you have working solutions for both tasks,

3. create functions for them so you can apply them to other similar data sets in the future (see instructions below).

#### (1)

To create the `shapely.geometry.LineString`s for each pair of origins and destinations, you need to loop over both lists at the same time.

Again, there are many ways to achieve this, here are two suggestions:

- (alternative 1) Use the `zip()` function that allows you to iterate over multiple lists at the same time. See this week’s exercise hints!
- (alternative 2) Use the *for-range loop*  and an index variable to access the same value in both lists


In [13]:
# RUN YOUR OWN CODE HERE
from shapely.geometry import LineString
lines = [LineString([origin, destination]) for origin, destination in zip(origin_points, destination_points)]


**NOTE: After you have solved this problem, there might be some left-over variables around.<br />We recommend you *restart the kernel and run all cells* from the toolbar or JupyterLab’s menu.***


In [14]:
# CODE CELL FOR TESTING YOUR SOLUTION

# Test that the list has correct number of LineStrings
assert len(lines) == len(data), "There should be as many lines as there are rows in the original data"


----

#### (2)

Create a variable called **`total_length`**, and store the total (Euclidian) distance of all the origin-destination LineStrings that we just created into that variable.

*Hint*: A simple solution is to start with a `total_length` of `0`, and add each line’s length while iterating over the list of lines.


In [15]:
# RUN YOUR OWN CODE HERE
total_length = sum(line.length for line in lines)

In [21]:
# CODE CELL FOR TESTING YOUR SOLUTION
print(round(total_length,2))

12.57



----

#### (3)

Now, create functions that automate the functionality you implemented for part (1) and part (2) of this problem:

- `create_od_lines()`: accepts two `list`s of `shapely.geometry.Point`s and returns a `list` of `shapely.geometry.LineString`s
- `calculate_total_distance()`: takes a `list` of `shapely.geometry.LineString` geometries and returns their total length

You can copy and paste the codes you have written earlier into the functions. Be sure to add a **docstring** to each function.
Below, you can find a code cell for testing your functions (you should get the same result as earler).

In [22]:
# RUN YOUR OWN CODE HERE
def create_od_lines(origin_points, destination_points):
    return [LineString([o, d]) for o, d in zip(origin_points, destination_points)]

def calculate_total_distance(lines):
      return sum(line.length for line in lines)

In [23]:
# CODE CELL FOR TESTING YOUR SOLUTION

# Create origin-destination lines
od_lines = create_od_lines(origin_points, destination_points)

# Calculate the total distance
tot_dist = calculate_total_distance(od_lines)

print("Total distance", round(tot_dist,2))
assert tot_dist == total_length

Total distance 12.57


In [24]:

## Well done!

## Sources

This lesson is inspired by the [Programming in Python lessons](http://swcarpentry.github.io/python-novice-inflammation/) from the [Software Carpentry organization](http://software-carpentry.org) and has adapted or reused material from University of Helsinki Automating GIS processis course (https://autogis-site.readthedocs.io/en/latest/course-info/license.html) under a Creative Commons Attribution-ShareAlike 4.0 International licence (https://creativecommons.org/licenses/by-sa/4.0/deed.en).
