## Problem 3: How long distance individuals have travelled? (8 points)

In this problem the aim is to calculate the distance in meters that the individuals have travelled according the social media posts (Euclidean distances between points). In this problem, we will need the `userid` -column an the points created in the previous problem. You will need the shapefile `Kruger_posts.shp` generated in Problem 2 as input file.

Our goal is to answer these questions based on the input data:

 - What was the shortest distance travelled in meters?
 - What was the mean distance travelled in meters?
 - What was the maximum distance travelled in meters?

**In your code, you should first:**
 - Import required modules
 - Read in the shapefile as a geodataframe called `data`

In [4]:
# YOUR CODE HERE
import geopandas as gpd
import math 
import os


try: 
    outfolder = r"C:\Users\paul-\Desktop\AutoGIS\Lesson2\Exercise\data"
    filename = "Kruger_posts.shp"
    fp = os.path.join(outfolder, filename)
    data = gpd.read_file(fp)
    print(data.crs)
    
except:
    raise NotImplementedError()

epsg:4326


 - Check the crs of the input data. If this information is missing, set it as epsg:4326 (WGS84).
 - Reproject the data from WGS84 to `EPSG:32735` -projection which stands for UTM Zone 35S (UTM zone for South Africa) to transform the data into metric system. (don't create a new variable, update the existing variable `data`!)

In [5]:
# YOUR CODE HERE
from pyproj import CRS

try:
    data_wgs84 = data.copy()
    
    data_utm35s = data.to_crs(epsg=32735)
    data = data_utm35s
    print(data.crs)
    
except:
    raise NotImplementedError()

epsg:32735


In [6]:
# NON-EDITABLE CODE CELL FOR TESTING YOUR SOLUTION
print(data.head())

         lat        lon         timestamp    userid  \
0 -24.980792  31.484633  2015-07-07 03:02  66487960   
1 -25.499225  31.508906  2015-07-07 03:18  65281761   
2 -24.342578  30.930866  2015-03-07 03:38  90916112   
3 -24.854614  31.519718  2015-10-07 05:04  37959089   
4 -24.921069  31.520836  2015-10-07 05:19  27793716   

                            geometry  
0  POINT (-4695752.719 14973674.275)  
1  POINT (-4748939.258 15014098.837)  
2  POINT (-4672729.591 14859391.193)  
3  POINT (-4679391.656 14969037.444)  
4  POINT (-4686373.982 14973910.589)  


In [7]:
# NON-EDITABLE CODE CELL FOR TESTING YOUR SOLUTION
# Check that the crs is correct after re-projecting (should be epsg:32735)
print(data.crs)

epsg:32735


 - Group the data by userid

In [41]:
# YOUR CODE HERE
from shapely.geometry import Point, LineString
try:
    grouped = data.groupby("userid")
    group1 = grouped.get_group(37959089 )
    group1_sort = group1.sort_values(by=['timestamp'])
    print(group1_sort)
    line = LineString(list(group1_sort['geometry']))
    print(line.bounds)
except:
    raise NotImplementedError()

             lat        lon         timestamp    userid  \
11543 -24.920770  31.522062  2015-01-04 13:18  37959089   
19774 -24.921250  31.520842  2015-01-09 03:37  37959089   
12343 -24.921069  31.520836  2015-01-12 03:11  37959089   
12639 -24.920897  31.521201  2015-01-15 02:20  37959089   
26746 -24.946092  31.469872  2015-01-15 14:54  37959089   
...          ...        ...               ...       ...   
31763 -24.897742  31.509882  2015-12-14 10:59  37959089   
12583 -24.920902  31.521211  2015-12-14 12:08  37959089   
21069 -24.921016  31.521449  2015-12-17 13:13  37959089   
27548 -24.918002  31.523565  2015-12-21 05:01  37959089   
14373 -25.210852  31.035177  2015-12-28 14:11  37959089   

                                geometry  
11543  POINT (-4686240.069 14974041.654)  
19774  POINT (-4686392.690 14973924.281)  
12343  POINT (-4686373.982 14973910.589)  
12639  POINT (-4686325.270 14973943.802)  
26746  POINT (-4693284.175 14969359.364)  
...                              

In [9]:
# NON-EDITABLE CODE CELL FOR TESTING YOUR SOLUTION
assert len(grouped.groups) == data["userid"].nunique(), "Number of groups should match number of unique users!"

**Then:**
- Create an empty GeoDataFrame called `movements`
- Create a for-loop where you iterate over the grouped object. For each user's data: 
    - [sort](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sort_values.html) the rows by timestamp 
    - create a LineString object based on the user's points
    - add the geometry and the userid into the `movements` dataframe (one userid per row). You can achieve this either by using the `.at` indexer, or the `append` method. See hints for more help.
- Set the CRS of the ``movements`` GeoDataFrame as ``EPSG:32735`` 

In [46]:
# YOUR CODE HERE
from shapely.geometry import Point, LineString

d = {'geometry':[], 'userid':[]}
movements = gpd.GeoDataFrame(data=d)

for key, group in grouped:
    if len(group) >= 2:
        group.sort_values(by=['timestamp'])
        line = LineString(list(group['geometry']))
        movements.at[key,'userid'] = int(key)
        movements.at[key,'geometry'] = line


    

In [48]:
# NON-EDITABLE CODE CELL FOR TESTING YOUR SOLUTION
movements.head()

Unnamed: 0,geometry,userid
16301,"LINESTRING (-4684246.015 14939886.378, -468155...",16301.0
45136,"LINESTRING (-4770692.230 14940874.449, -477069...",45136.0
50136,"LINESTRING (-4687987.866 14987928.782, -468073...",50136.0
88775,"LINESTRING (-4773713.345 14938272.132, -477371...",88775.0
88918,"LINESTRING (-4699374.159 14988142.858, -468798...",88918.0


**Finally:**
- Check once the crs definition of your dataframe (should be epsg:32735, define the correct crs if this information is missing)
- Calculate the lenghts of the lines into a new column called ``distance`` in ``movements`` GeoDataFrame.

In [54]:
# YOUR CODE HERE
from pyproj import CRS

try:
    movements.set_crs(epsg=32735, inplace=True)
    print(movements.crs)
except:
    raise NotImplementedError()

epsg:32735


In [55]:
# NON-EDITABLE CODE CELL FOR TESTING YOUR SOLUTION
movements.head()

Unnamed: 0,geometry,userid
16301,"LINESTRING (-4684246.015 14939886.378, -468155...",16301.0
45136,"LINESTRING (-4770692.230 14940874.449, -477069...",45136.0
50136,"LINESTRING (-4687987.866 14987928.782, -468073...",50136.0
88775,"LINESTRING (-4773713.345 14938272.132, -477371...",88775.0
88918,"LINESTRING (-4699374.159 14988142.858, -468798...",88918.0


You should now be able to print answers to the following questions: 

 - What was the shortest distance travelled in meters?
 - What was the mean distance travelled in meters?
 - What was the maximum distance travelled in meters?

In [85]:
movements["line_length"] = movements['geometry'].length

max_length = movements['line_length'].max()
mean_length = movements['line_length'].mean()
min_length = movements['line_length'].min()

print("Max area: {maximum} meters".format(maximum=round(max_length, 0)))
print("Min area: {minimum} meters".format(minimum=round(min_length, 0)))
print("Mean area: {mean} meters".format(mean=round(mean_length, 0)))

Max area: 5486426.0 meters
Min area: 0.0 meters
Mean area: 90257.0 meters


- Finally, save the movements of into a Shapefile called ``some_movements.shp``

In [90]:
# YOUR CODE HERE

try:
    out_folder = r"C:\Users\paul-\Desktop\AutoGIS\Lesson2\Exercise\data"
    fn = "some_movements.shp"
    output_fp = os.path.join(out_folder, fn)
    movements.to_file(output_fp)
except:
    raise NotImplementedError()

In [91]:
# NON-EDITABLE CODE CELL FOR TESTING YOUR SOLUTION
import os
assert os.path.isfile(fp), "output shapefile does not exits"

That's all for this week!