# Assignment 1   
#### Student: Bakhtiyar Garashov

**Working with Geometric Objects**

This lesson we will practice how to create geometric objects using
Shapely module and how to find out different useful attributes from
those geometries. We will also take advantage of what we have learned
earlier, specifically functions, that you should use for making
different GIS operations easier to use in the future. We will also use
Pandas to read data from a file.

Complete this notebook and submit your completed functional notebook in Moodle.

- Don't forget to check out the hints for this lesson's assignment at the end if you're having trouble.
- Scores on this exercise are out of **10 points**.

## Sections

- Problem 1: Creating basic geometries
- Problem 2: Attributes of geometries
- Problem 3: Reading coordinates from a file and creating geometries
- Problem 4 (optional): Creating LineStrings that represent the movements


### Problem 1: Creating basic geometries (3 Points)

1. Create a function called `createPointGeom()` that has two parameters
(x_coord, y_coord). Function should create a shapely Point geometry
object and return that. Demonstrate the usage of the function by
creating 3 different Point -objects with the function.

2. Create a function called `createLineGeom()` that takes a list of
Shapely Point objects as parameter and returns a LineString object of
those input points. Ideally, the function should try to check that the
input list really contains Shapely Point(s). Demonstrate the usage of
the function by creating 2 different LineString -objects with the function (one
with coordinate tuples, and one with list of shapely Points from above).

3. Create a function called `createPolyGeom()` that takes a list of
coordinate tuples **OR** a list of Shapely Point objects and
creates/returns a Polygon object of the input data. Both ways of passing
the data to the function should be working. Demonstrate the usage of the
function by passing data first with coordinate-tuples and then with
Point -objects.

In [1]:
# importing all of the necessary objects and libraries
from shapely.geometry import Point, LineString, Polygon
import pandas as pd
import pyproj

In [2]:
# createPointGeom()

# function declaration
def createPointGeom(x_coord,y_coord):
    point=Point(x_coord,y_coord)
    return point

# invoke functions and store results as variables
point1=createPointGeom(2,4.2)
point2=createPointGeom(7.2, -25.1)
point3=createPointGeom(9.26, -2.456)


#print out results
print(point1)
print(point2)
print(point3)


POINT (2 4.2)
POINT (7.2 -25.1)
POINT (9.26 -2.456)


In [3]:
# createLineGeom()

# function declaration
def createLineGeom(points):
    for point in points: 
        if not isinstance(point,Point) and not isinstance(point,tuple): #check whether the input is correct ot not
            raise TypeError("{} is not a correct input".format(point)) # if not, raise proper exception with custom error message
            return #stop execution of the function
    line = LineString(points) #if input is correct, create linestring object
    return line # and return it

# invoke function and store result as variable
line1=createLineGeom([(1.8,4.6),(4.3,7.9),(0.5,9.6)])
line2=createLineGeom([point1,point2,point3])


#print result of function
print(line1)
print(line2)


LINESTRING (1.8 4.6, 4.3 7.9, 0.5 9.6)
LINESTRING (2 4.2, 7.2 -25.1, 9.26 -2.456)


In [4]:
# createPolyGeom()

# function declaration
def createPolyGeom(points):
    for point in points:
        if not isinstance(point,Point) and not isinstance(point,tuple): # check if each element of list is a point object
            raise  TypeError("{} is not a correct input".format(point)) # if not, raise proper exception with custom error message
            return #stop execution of the function
    polygon=Polygon(points) #if input is correct, create polygon object
    return polygon # and return it

# function invocation
polygon1=createPolyGeom([point1,point2,point3])
polygon2=createPolyGeom([(0,0),(0,1),(1,1),(1,0)])

#print result of function
print(polygon1)
print(polygon2)


POLYGON ((2 4.2, 7.2 -25.1, 9.26 -2.456, 2 4.2))
POLYGON ((0 0, 0 1, 1 1, 1 0, 0 0))


### Problem 2: Attributes of geometries (3 Points)

1.  Create a function called `getCentroid()` that takes any kind of
    Shapely's geometric -object as input and returns a centroid of that
    geometry. Demonstrate the usage of the function.
2.  Create a function called `getArea()` that takes a Shapely's Polygon
    -object as input and returns the area of that geometry. Demonstrate
    the usage of the function.

3. Create a function called `getLength()` takes either a Shapely's
LineString or Polygon -object as input. Function should check the type
of the input and returns the length of the line if input is LineString
and length of the exterior ring if input is Polygon. If something else
is passed to the function, it should tell the user --> `"Error:
LineString or Polygon geometries required!"`. Demonstrate the usage of
the function.

In [5]:
# getCentroid()

# function declaration
def getCentroid(geom_object):
    if hasattr(geom_object,'geom_type'): # if function argument has 'geom_type' attribute then it means it is a correct geom object
        object_centroid=geom_object.centroid
        return object_centroid
    else:
        raise TypeError("{} is not a geometric object".format(geom_object)) # otherwise, throw exception with message
        return

# invoke the function
centroid_point=getCentroid(point1)
centroid_line=getCentroid(line1)
centroid_poly=getCentroid(polygon1)

# printing result
print(centroid_point)
print(centroid_line)
print(centroid_poly)

POINT (2 4.2)
POINT (2.724104286785056 7.503445050826704)
POINT (6.153333333333334 -7.785333333333334)


In [6]:
# getArea()

# function declaration
def getArea(geom_object):
    if hasattr(geom_object,'geom_type'):
        if geom_object.geom_type=='Polygon': # validate if the object is polygon
            polygon_area=geom_object.area
            return polygon_area
        else:
            raise TypeError("Input object is not a polygon") #if not throw exception
            return
    else:
        raise TypeError("Input object is not a geometric object") # if input is not a geom object, throw another exception
        return
    

# function invocation
area1=getArea(polygon1)
area2=getArea(polygon2)

# printing result
print(area1)
print(area2)

89.0534
1.0


In [7]:
# getLength()

# function declaration
def getLength(geom_object):
    if hasattr(geom_object,'geom_type'):
        if geom_object.geom_type=="LineString": #check if input geom object is proper to get line property
            return geom_object.length
        elif geom_object.geom_type=="Polygon": # polygon.exterior.length returns length of exterior ring
            return geom_object.exterior.length
        else:
            raise Exception("LineString or Polygon geometry required!") # if not, inform the user
    else:
        raise TypeError("Input object is not a geometric object") # if the input is not a geom object show proper warning
        return



# function invocation
length_line=getLength(line1)
length_poly=getLength(polygon1)

#print the results
print(length_line)
print(length_poly)

8.30297996819662
62.34472776867281


### Problem 3: Reading coordinates from a file and creating the geometries (4 Points)

One of the "classical" problems in GIS is the situation where you have a
set of coordinates in a file and you need to get them into a map (or
into a GIS-software). Python is a really handy tool to solve this
problem as with Python it is basically possible to read data from any
kind of input datafile (such as csv-, txt-, excel-, or gpx-files (gps
data) or from different databases). So far, I haven't faced any kind of
data or file that would be impossible to read with Python.

Thus, let's see how we can read data from a file and create Point
-objects from them that can be saved e.g. as a new Shapefile (we will
learn this next lesson). Our dataset **[Years.2015-2017.ibtracs_wmo.storms.csv](https://moodle.ut.ee/pluginfile.php/1615808/mod_assign/introattachment/0/Years.2015-2017.ibtracs_wmo.storms.csv?forcedownload=1)** consist of tracked paths of tropical storms, hurricanes etc in the
years 2015-2017. The first four rows of our data looks like this:

    Name,Serial_Num,year,Basin,Sub_basin,Num,Latitude_first,Longitude_first,Latitude_last,Longitude_last,ISO_time_first,ISO_time_last,Nature
    ADJALI,2014319S06066,2015, SI, MM,1,-6.7,66.4,-11.9,51.4,2014-11-15 06:00:00,2014-11-24 06:00:00, TS
    0220142015:TWO,2014327S08077,2015, SI, MM,2,-8.0,77.3,-28.9,62.5,2014-11-23 06:00:00,2014-12-02 00:00:00, TS
    KATE,2014356S08101,2015, SI, WA,4,-7.5,100.5,-30.0,89.6,2014-12-21 15:00:00,2015-01-04 12:00:00, NR

Thus, we have many columns of data, but the few important ones are:

| Column           | Description                                            |
| ---------------- | ------------------------------------------------------ |
| Longitude_first | Longitude-coordinate of the **first** time of tracking |
| Latitude_first  | Latitude-coordinate of the **first** time of tracking  |
| Longitude_last  | Longitude-coordinate of the **last** time of tracking  |
| Latitude_last   | Latitude-coordinate of the **last** time of tracking   |

**Tasks**

1.  Save the `Years.2015-2017.ibtracs_wmo.storms.csv` into your
    computer.
2.  We will use only 4 columns, i.e. 'Longitude_first' (x),
    'Latitude_first' (y), 'Longitude_last', 'Latitude_last' from the
    data in.
3.  Iterate over the rows of your DataFrame and create Shapely Point
    -objects for `orig_points` and `dest_points` representing the origin
    locations and destination locations of the storm paths rows,
    accordingly. Therefore, create two additional columns called
    `orig_points` and `dest_points` by applying a function that creates
    shapely points from the coordinates. Think through, how to make a
    step-by-step approach based on the lecture and hints provided.

In [8]:
# load csv into pandas dataframe ... etc

# declare a function that creates point geom objects from appropriate columns
def make_point(row,longitude,latitude):
    return Point(row[longitude], row[latitude])

# reading data from csv and making pandas dataframe object
df = pd.read_csv('Years.2015-2017.ibtracs_wmo.storms.csv', header=0, sep=',', encoding='latin1')

# apply make_point function to make origin and destination points
df['Origin']=df.apply(make_point, axis=1,args=('Longitude_first','Latitude_first')) # args used to pass extra arguments to function
df['Destination']=df.apply(make_point, axis=1,args=('Longitude_last','Latitude_last'))

df


Unnamed: 0,Name,Serial_Num,year,Basin,Sub_basin,Num,Latitude_first,Longitude_first,Latitude_last,Longitude_last,ISO_time_first,ISO_time_last,Nature,Origin,Destination
0,ADJALI,2014319S06066,2015,SI,MM,1,-6.7,66.4,-11.9,51.4,2014-11-15 06:00:00,2014-11-24 06:00:00,TS,POINT (66.40000000000001 -6.7),POINT (51.4 -11.9)
1,0220142015:TWO,2014327S08077,2015,SI,MM,2,-8.0,77.3,-28.9,62.5,2014-11-23 06:00:00,2014-12-02 00:00:00,TS,POINT (77.3 -8),POINT (62.5 -28.9)
2,KATE,2014356S08101,2015,SI,WA,4,-7.5,100.5,-30.0,89.6,2014-12-21 15:00:00,2015-01-04 12:00:00,NR,POINT (100.5 -7.5),POINT (89.59999999999999 -30)
3,BANSI,2015009S19054,2015,SI,MM,5,-18.9,53.9,-30.1,88.0,2015-01-08 12:00:00,2015-01-19 00:00:00,TS,POINT (53.9 -18.9),POINT (88 -30.1)
4,MEKKHALA,2015012N09146,2015,WP,MM,1,8.7,142.9,17.3,125.2,2015-01-13 00:00:00,2015-01-20 18:00:00,TS,POINT (142.9 8.699999999999999),POINT (125.2 17.3)
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
196,ERNIE,2017095S11113,2017,SI,WA,4,-10.9,113.2,-18.6,103.1,2017-04-05 00:00:00,2017-04-10 06:00:00,NR,POINT (113.2 -10.9),POINT (103.1 -18.6)
197,NONAME,2017096S08135,2017,SP,EA,3,-7.6,135.3,-15.8,115.1,2017-04-06 00:00:00,2017-04-16 00:00:00,NR,POINT (135.3 -7.6),POINT (115.1 -15.8)
198,MUIFA,2017113N09144,2017,WP,MM,1,8.6,143.9,22.7,141.8,2017-04-22 18:00:00,2017-04-29 06:00:00,TS,POINT (143.9 8.6),POINT (141.8 22.7)
199,FRANCES,2017114S08137,2017,SP,EA,4,-7.8,136.5,-13.4,122.2,2017-04-24 00:00:00,2017-04-30 00:00:00,NR,POINT (136.5 -7.8),POINT (122.2 -13.4)


### Problem 4: Creating LineStrings that represent the movements (optional task for advanced students, additional max 3 points)

This is an optional extra task for those who likes to learn even more.

1.  Create an additional column called `lines`: Iterate over the
    dataframe again, row by row, and use the origin and destination
    fields from above and create a Shapely LineString -object between
    the origin and destination point and add as a new column to your
    dataframe
2.  Find out what is the mean distance in km of all the
    origin-destination LineStrings that we just created, and print it;
    see [Pandas calculate mean for a
    dataframe.](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.mean.html)

Consider: so far we only worded with Lat/Lon geographic coordinates, which represent degrees. We want to calculate distances in km (see Lesson 1: https://kodu.ut.ee/~kmoch/geopython2020/L1/Geometric-Objects.html#side-note-on-distances-in-gis)

Consider: To make things more reusable: write creation of the LineString
and calculating the distances into dedicated functions and use
them.

In [9]:
# declare a function that makes linestring from origin and destination points

geod = pyproj.Geod(ellps='WGS84')

def make_line(row):
    return LineString([row['Origin'],row['Destination']])
   

def calculate_distance(row):
    x1=row['Origin'].x
    y1=row['Origin'].y
    x2=row['Destination'].x
    y2=row['Destination'].y
    
    angle1,angle2,distance = geod.inv(x1, y1, x2, y2)
    return distance/1000

# here I have found another way of calculating distance. 
# For more info, see: https://pyproj4.github.io/pyproj/dev/api/geod.html#pyproj.Geod.geometry_length
def calc_distance(row):
    distance_meters=geod.geometry_length(row['Lines'])
    return distance_meters/1000
        
df['Lines']=df.apply(make_line,axis=1)
df["Distance(km)"]=df.apply(calculate_distance,axis=1) # Replace calculate_distance with calc_distance to try other function. 
                                                       # Generates perfectly the same result

mean_distance=df["Distance(km)"].mean()


print("Mean value of all distances is",mean_distance)

df

Mean value of all distances is 2406.354962892864


Unnamed: 0,Name,Serial_Num,year,Basin,Sub_basin,Num,Latitude_first,Longitude_first,Latitude_last,Longitude_last,ISO_time_first,ISO_time_last,Nature,Origin,Destination,Lines,Distance(km)
0,ADJALI,2014319S06066,2015,SI,MM,1,-6.7,66.4,-11.9,51.4,2014-11-15 06:00:00,2014-11-24 06:00:00,TS,POINT (66.40000000000001 -6.7),POINT (51.4 -11.9),"LINESTRING (66.40000000000001 -6.7, 51.4 -11.9)",1744.774567
1,0220142015:TWO,2014327S08077,2015,SI,MM,2,-8.0,77.3,-28.9,62.5,2014-11-23 06:00:00,2014-12-02 00:00:00,TS,POINT (77.3 -8),POINT (62.5 -28.9),"LINESTRING (77.3 -8, 62.5 -28.9)",2785.486904
2,KATE,2014356S08101,2015,SI,WA,4,-7.5,100.5,-30.0,89.6,2014-12-21 15:00:00,2015-01-04 12:00:00,NR,POINT (100.5 -7.5),POINT (89.59999999999999 -30),"LINESTRING (100.5 -7.5, 89.59999999999999 -30)",2738.911299
3,BANSI,2015009S19054,2015,SI,MM,5,-18.9,53.9,-30.1,88.0,2015-01-08 12:00:00,2015-01-19 00:00:00,TS,POINT (53.9 -18.9),POINT (88 -30.1),"LINESTRING (53.9 -18.9, 88 -30.1)",3655.110870
4,MEKKHALA,2015012N09146,2015,WP,MM,1,8.7,142.9,17.3,125.2,2015-01-13 00:00:00,2015-01-20 18:00:00,TS,POINT (142.9 8.699999999999999),POINT (125.2 17.3),"LINESTRING (142.9 8.699999999999999, 125.2 17.3)",2140.751463
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
196,ERNIE,2017095S11113,2017,SI,WA,4,-10.9,113.2,-18.6,103.1,2017-04-05 00:00:00,2017-04-10 06:00:00,NR,POINT (113.2 -10.9),POINT (103.1 -18.6),"LINESTRING (113.2 -10.9, 103.1 -18.6)",1380.659898
197,NONAME,2017096S08135,2017,SP,EA,3,-7.6,135.3,-15.8,115.1,2017-04-06 00:00:00,2017-04-16 00:00:00,NR,POINT (135.3 -7.6),POINT (115.1 -15.8),"LINESTRING (135.3 -7.6, 115.1 -15.8)",2379.339274
198,MUIFA,2017113N09144,2017,WP,MM,1,8.6,143.9,22.7,141.8,2017-04-22 18:00:00,2017-04-29 06:00:00,TS,POINT (143.9 8.6),POINT (141.8 22.7),"LINESTRING (143.9 8.6, 141.8 22.7)",1576.366714
199,FRANCES,2017114S08137,2017,SP,EA,4,-7.8,136.5,-13.4,122.2,2017-04-24 00:00:00,2017-04-30 00:00:00,NR,POINT (136.5 -7.8),POINT (122.2 -13.4),"LINESTRING (136.5 -7.8, 122.2 -13.4)",1682.251294


# Assignment 1 hints

Add below some tips for working on Exercise 1 if needed.

## Check of what type an object is

The `isinstance(actual_object, variable_type)` function is a Python
builtin function that will answer your question, if the `actual_object`
is of the variable typpe `variable_type`. Basic types are for example:

  - str: String
  - int: Integer number
  - float: Floating point numbers
  - list: List of thing aka [ ] brackety things
  - dict: Dictionaries, Python versatile data structures, based on
    associative lists and objects, where you address via named fields
    (see [Python recap lecture](https://kodu.ut.ee/~kmoch/geopython2020/L0/recap-python.html) )

An example:

```python

    a_string_var = "I am a string" an_int_var = 42 a_float_var = 3.5
    a_boolean_true_false_var = True

    is_string = isinstance(a_string_var, str) print(is_string)
    is_int = isinstance(an_int_var, int) print(is_int)
    is_float = isinstance(a_float_var, float) print(is_float)
    true_or_false = isinstance(a_float_var, str) print(true_or_false)

```

## Control flow for checks with `if` and `else`

If you want to make a "left" or "right" decision, you can use Python's
if then construct. For that you need to check a *condition* (or a fact)
if it's *true* or *false*. If it's true, go only through the first
block, if it's false, go only through the else block.

```python

    initial_demo_output = 0

    if 3 > 2:
        print("3 is larger than 2")
        initial_demo_output = 3
    else:
        print("3 not larger than 2")
        initial_demo_output = 2

    # guess the final value of ``initial_demo_output`` ?
    print(initial_demo_output)

```

For more details and practice, see [Python recap
lecture](https://kodu.ut.ee/~kmoch/geopython2020/L0/recap-python.html)


## Reading a CSV file into Pandas

With Python it is basically possible to read data from any kind of input
datafile (such as csv-, txt-, etc). The widely used library Pandas can
easily read a file with tabular data and present it to us as a so called
dataframe:

```python

    import pandas as pd

    # make sure you have the correct path to your working file, ideally in the same folder 

    df = pd.read_csv('data/L1/global-city-population-estimates.csv', sep=';', encoding='latin1')

    pd.set_option('max_columns',20) print(df.head(5))

```

## Applying a function to every row of a Pandas dataframe

```python

    # we make a function, that takes a row object coming from Pandas.
    # The single fields per row are addressed by their column name.
    
    def increase_by_factor_2(row):
        field_value = row['Population_2015']
        calc_value = field_value * 2
        return calc_value

    # Go through every row, and calculate the value for a new column
    # `Population_doubled`, by **apply**ing the function from above (downwards
    # row by row -> axis=1)
    
    df['Population_doubled'] = df.apply(increase_by_factor_2, axis=1)

    print(df.head(5))

```

## Tricky directory path names in windows

In the lesson and exercise 1 hints for reading a CSV file in Pandas, a
few students got a very cryptic error message, something about "decoding
of sequence UCXXXXX not possible". The error occurs in the line with
'pd.read_csv' and you likely have used the complete path:

```python
    df = pd.read_csv('c:\users\alex\geopython\2020\L1\global-city-population-estimates.csv', sep=';', encoding='latin1')
```

Windows uses backslashes '' as folder separators. However, using
backslashes can cause problems in String variables in programming
languages. Therefore in Python we put an 'r' for 'raw' in front of the
quotes for the String with the path name to the file, like so:

```python
    df = pd.read_csv(r'c:\users\alex\geopython\2020\L1\global-city-population-estimates.csv', sep=';', encoding='latin1')
```

You could also just omit the long path and use only the filename. For
that the file should also be saved where you Jupyter Notebook *.ipynb
is located.
