## Problem 3: How long distance individuals have travelled? (8 points)

In this problem the aim is to calculate the distance in meters that the individuals have travelled according the social
media posts (Euclidean distances between points).

We use the shapefile "Kruger_posts.shp" generated in Problem 2 as input.

**In your code, you should first:**
 - read in the shapefile as a geodataframe
 - Reproject the data from WGS84 projection into `EPSG:32735` -projection which stands for UTM Zone 35S (UTM zone for South Africa) to transform the data into metric system.
 - Group the data by userid


In [28]:
import geopandas as gpd
import pandas as pd
from shapely.geometry import Point, LineString, Polygon


fp = "Kruger_posts.shp"
data = gpd.read_file(fp)
#- Converting into a GeoDataFrame
data = gpd.GeoDataFrame(data, geometry='geometry', crs={'init': 'epsg:32735'})
data

Unnamed: 0,lat,lon,timestamp,userid,geometry
0,-24.980792,31.484633,2015-07-07 03:02,66487960,POINT (31.484633302 -24.980792492)
1,-25.499225,31.508906,2015-07-07 03:18,65281761,POINT (31.508905612 -25.499224667)
2,-24.342578,30.930866,2015-03-07 03:38,90916112,POINT (30.930866066 -24.342578456)
3,-24.854614,31.519718,2015-10-07 05:04,37959089,POINT (31.519718439 -24.85461393)
4,-24.921069,31.520836,2015-10-07 05:19,27793716,POINT (31.520835558 -24.921068894)
5,-24.047833,31.687000,2015-05-07 05:55,88751696,POINT (31.687 -24.047833333)
6,-24.044000,31.687000,2015-03-07 05:57,88751696,POINT (31.687 -24.044)
7,-25.498371,30.983844,2015-09-07 06:21,52431146,POINT (30.983844086 -25.498371054)
8,-24.342279,30.990728,2015-07-07 07:15,11505530,POINT (30.990728478 -24.342279492)
9,-25.413386,31.159930,2015-11-07 07:17,24990235,POINT (31.15993 -25.4133857)


In [29]:
# Check that the crs is correct (should be epsg:32735)
print(data.crs)


{'init': 'epsg:32735'}


**Then:**
- Create an empty GeoDataFrame called `movements`
- For each user: 
    - [sort](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sort_values.html) the rows by timestamp 
    - create a LineString object based on the user's points
    - [add](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.append.html) the geometry and the userid into the GeoDataFrame you created in the last step
- Set the CRS of the ``movements`` GeoDataFrame as ``EPSG:32735`` (epsg code: 32735)

In [104]:
movements = gpd.GeoDataFrame()
data = data.sort_values(by=['userid','timestamp'])
data

Unnamed: 0,lat,lon,timestamp,userid,geometry
30535,-24.759508,31.371200,2015-02-08 06:18,16301,POINT (31.3712 -24.759508333)
30770,-24.749845,31.338317,2015-02-09 08:09,16301,POINT (31.338316667 -24.749845)
38235,-24.995803,31.592000,2015-03-13 10:59,16301,POINT (31.592 -24.995803333)
38232,-24.791483,31.865172,2015-05-13 10:51,16301,POINT (31.865171667 -24.791483333)
30512,-24.760170,31.339430,2015-06-08 04:34,16301,POINT (31.33943 -24.76017)
38909,-25.102336,31.894695,2015-08-16 14:27,16301,POINT (31.894695 -25.102336167)
30545,-24.774158,31.380342,2015-09-08 06:58,16301,POINT (31.380341667 -24.774158333)
38911,-24.985142,31.625662,2015-09-16 14:30,16301,POINT (31.625661667 -24.985141667)
38913,-25.122811,31.911867,2015-11-16 14:31,16301,POINT (31.911866667 -25.122811167)
61781,-25.493720,30.985914,2015-02-25 01:54,26589,POINT (30.985913773 -25.493720243)


In [138]:
userids = data.userid.unique()
#- creates list of unique userid values

lines = []

for ID in userids:
    try:
        points = tuple(data.loc[data['userid']==ID]['geometry'])
        #- Creates tuple of points
        lines.append(LineString(points))
        #- Transforms the tuple of points into a LineString object and appends it to a list of lines
    except:
        #- Since LineString requires multiple Point objects, it will fail if there's just one for a userid.
        #- This try statement catches the error and instead inserts two of the same Point values as the 
        #- arguments for the LineString.
        points = tuple(data.loc[data['userid']==ID]['geometry'])+tuple(data.loc[data['userid']==ID]['geometry'])
        #- Creates a LineString input with two of THE SAME points because LineString requires
        #- two points. This allows us to treat the resultant movements dataframe with 
        #- consistency.
        lines.append(LineString(points))
        #- Transforms the tuple of points into a LineString object and appends it to a list of lines
        
movements['userids'] = userids
movements['geometry'] = lines

**Finally:**
- Calculate the lengths of the lines into a new column called ``distance`` in ``movements`` GeoDataFrame.
- Save the movements of into a Shapefile called ``Some_movements.shp``

In [139]:
movements = gpd.GeoDataFrame(movements, geometry='geometry', crs={'init': 'epsg:32735'})

In [140]:
print(movements.crs)
movements

{'init': 'epsg:32735'}


Unnamed: 0,userids,geometry
0,16301,"LINESTRING (31.3712 -24.759508333, 31.33831666..."
1,26589,"LINESTRING (30.985913773 -25.493720243, 30.985..."
2,29322,"LINESTRING (31.726971969 -25.493266016, 31.726..."
3,42181,"LINESTRING (31.080977814 -25.30595914, 31.0809..."
4,45136,"LINESTRING (31.025820481 -25.321310937, 31.025..."
5,48971,"LINESTRING (30.95611 -25.47135, 30.95611 -25.4..."
6,50136,"LINESTRING (31.394466839 -24.769847681, 31.592..."
7,50530,"LINESTRING (31.307684567 -24.275886434, 31.307..."
8,66129,"LINESTRING (30.892143333 -22.771206667, 30.892..."
9,74329,"LINESTRING (31.428991822 -24.786132156, 31.428..."


At the end your you should be able to print answers to the following questions: 

 - What was the shortest distance travelled in meters?
 - What was the mean distance travelled in meters?
 - What was the maximum distance travelled in meters?

In [122]:
# Print solutions:


In [148]:
import numpy as np

minimumDist = min(movements['geometry'].length)
meanDist = np.mean(movements['geometry'].length)
maximumDist = max(movements['geometry'].length)

print("The minimum distance is %s meters, the average distance is %s meters and the maximum distance is %s meters." %(minimumDist,meanDist,maximumDist))

The minimum distance is 0.0 meters, the average distance is 0.6048310274269603 meters and the maximum distance is 63.947497652366735 meters.
