# Defining functions and working with time in Python: Exercise solutions

**Author**: Andrea Ballatore (Birkbeck, University of London)

**Abstract**: Learn how to use and define functions and how to manipulate temporal information in Python.

## Setup
This is to check that your environment is set up correctly (it should print 'env ok', ignore warnings).

In [1]:
# Test geospatial libraries
# check environment
import os
print("Conda env:", os.environ['CONDA_DEFAULT_ENV'])
assert os.environ['CONDA_DEFAULT_ENV'] == 'geoprogv1'
# spatial libraries 
import fiona as fi
import geopandas as gpd
import pandas as pd
import pysal as sal

# create output folders
folders = ['tmp']
for f in folders:
    if not os.path.exists(f):
        os.makedirs(f)

print('env ok')

Conda env: geoprogv1
env ok


-----
## Exercises

When you are in doubt about how a package or a function work, use the Python website (https://docs.python.org/3.9/) and **Google** to find relevant documentation.

a. Write and test a function that takes the coordinates of two points and returns the Euclidean distance. Test the function with at least 4 pairs of points.

In [3]:
def euclidean_distance(pt_a, pt_b):
    """ Calculate Euclidean distance between two dimensional points """
    pt_a_x = pt_a[0]
    pt_a_y = pt_a[1]
    pt_b_x = pt_b[0]
    pt_b_y = pt_b[1]
    
    squares = [(pt_b_x-pt_a_x)**2 + (pt_b_y-pt_a_y)**2]
    dist = sum(squares) ** .5
    return dist

# test function
print(euclidean_distance([1,2],[2,3]))
print(euclidean_distance([0,0],[0,0]))

1.4142135623730951
0.0


In [4]:
# In reality, you do not need to write these well-known functions yourselves.
# the package numpy provides a very fast implementation of the euclidean distance
# for multi-dimensional points.

import numpy as np

# numpy arrays 
point1 = np.array((1, 2, 3)) 
point2 = np.array((1, 1, 1)) 
  
# calculating Euclidean distance 
# using linalg.norm() 
dist = np.linalg.norm(point1 - point2) 
  
# printing Euclidean distance 
print(dist)

3.0


b. Given a list of values representing GDP annual growth, write a function that classifies them with the following categories:

- `bust`: < -4
- `negative`: < -.05
- `zero`: [-.05,.05]
- `positive`: > .05
- `boom`: > 4

In [5]:
def classify_growth(growth_rate):
    """ Classify growth rate """
    if growth_rate < -4:
        return 'bust'
    elif growth_rate < -.05:
        # here we know that growth_rate >= -4
        return 'negative'
    elif growth_rate < .05:
        # here we know that growth_rate >= -.05
        return 'zero'
    elif growth_rate < 4:
        # here we know that growth_rate >= .05
        return 'positive'
    else:
        assert growth_rate > 4, "check if value is > 4"
        return 'boom'

growth_rates = [-4.103, 10.53, -.4, 0.1, .5, 4.56, -2.45]

# use list comprehension to call the function
growth_rates_classified = [classify_growth(g) for g in growth_rates]

growth_rates_classified

['bust', 'boom', 'negative', 'positive', 'positive', 'boom', 'negative']

c. Write a utility function that, given a string, it trims it, replaces all spaces with `_`, and makes it lower case.

In [17]:
"\n  London. \t".strip()
"   Greater   London   ".strip().replace(" ","_")
import re
ins = "   Greater                    London   "
# Regular expressions
re.sub("\s\s+","_",ins)

'_Greater_London_'

In [11]:
def clean_string(s):
    # chain the operations on one line
    clean_s = s.strip().lower().replace(' ','_')
    return clean_s

# test function
clean_strings = []
for input_s in [' City of London ', 'Hammersmith ', '  Lambeth', 'ENFIELD ', 'KensinGton and ChelsEa  ']:
    clean_strings.append(clean_string(input_s))

print(clean_strings)

['city_of_london', 'hammersmith', 'lambeth', 'enfield', 'kensington_and_chelsea']


d. Given a data frame with data about cities, write a function to calculate the population growth rate between 2000 and 2020 (e.g., 10% or -4.2%). Round values to the second decimal digit.

In [21]:
cities_df = pd.DataFrame({"city_id" : [1,2,3,4],
    "city_name" : ["London", "Lagos", "Hong Kong", "Lima"],
    "population_2000" : [7195000,  7281000, 6665000,  7294000],
    "population_2020" : [8982000, 14368000, 7451000, 10719000],
    "area_km2" : [1572, 1171, 1106, 2672]})
cities_df

Unnamed: 0,city_id,city_name,population_2000,population_2020,area_km2
0,1,London,7195000,8982000,1572
1,2,Lagos,7281000,14368000,1171
2,3,Hong Kong,6665000,7451000,1106
3,4,Lima,7294000,10719000,2672


In [25]:

def calculate_growth(row):
    """ Calculate growth rate (aka percentage change) between pop 2000 and 2020. """
    # row is a single row of the data frame
    # extract fields
    pop00 = row['population_2000']
    pop20 = row['population_2020']
    # calculate growth
    growth = ((pop20-pop00)/pop00)*100
    return round(growth,2)

# call calculate_growth on cities_df with `apply` on each row
cities_df['growth_rate'] = cities_df.apply(calculate_growth, axis=1)

#cities_df['proj_pop40'] = cities_df['population_2020'] * (1+cities_df['growth_rate']/100)

cities_df

Unnamed: 0,city_id,city_name,population_2000,population_2020,area_km2,growth_rate,forecast_pop40,proj_pop40
0,1,London,7195000,8982000,1572,24.84,11213128.8,11213128.8
1,2,Lagos,7281000,14368000,1171,97.34,28353811.2,28353811.2
2,3,Hong Kong,6665000,7451000,1106,11.79,8329472.9,8329472.9
3,4,Lima,7294000,10719000,2672,46.96,15752642.4,15752642.4


e. A function `convert_area` has to be able to convert areas between m$^2$, km$^2$, and mi$^2$. It should handle all combinations. If `in_unit == out_unit`, just return the same value.

In [26]:
def convert_area(area, in_unit, out_unit):
    # check input 
    if area < 0:
        raise ValueError('area cannot be negative.')
    
    # valid units: 'sqm','sqkm','sqmi'
    if in_unit == out_unit:
        # no conversion needed
        return area
    # init converted area to None
    conv_area = None
    
    # create a data frame (matrix) with same rows and columns
    #unit_matrix 
    
    #conv_area = area * unit_matrix[in_unit,out_unit]
    
    # handle all combinations
    if in_unit=='sqm':
        if out_unit=='sqkm':
            conv_area = area * 1e-6
        if out_unit=='sqmi':
            conv_area = area * 3.861e-7
    if in_unit=='sqmi':
        if out_unit=='sqkm':
            conv_area = area * 2.58999
        if out_unit=='sqm':
            conv_area = area * 2.59e+6
    if in_unit=='sqkm':
        if out_unit=='sqmi':
            conv_area = area * 0.386102
        if out_unit=='sqm':
            conv_area = area * 1e6

    if conv_area is None: 
        # this should never be reached if the input is correct
        raise ValueError("conversion failed")
    return conv_area


print(convert_area(10,'sqm','sqkm'))
print(convert_area(100,'sqmi','sqkm'))

9.999999999999999e-06
258.99899999999997


f. Modify and test `convert_area` to give an error if it is supplied a negative value (areas cannot be smaller than 0). Then create a list with 6 areas in km$^2$. Use `convert_area` to convert them to m$^2$ and mi$^2$. 

In [27]:
areas_km2 = [3943,1243,135,419,34458,14701]
print("km2",areas_km2)

areas_m2 = [convert_area(a,'sqkm','sqm') for a in areas_km2]
print("m2",areas_m2)

areas_mi2 = [convert_area(a,'sqkm','sqmi') for a in areas_km2]
print("mi2",areas_mi2)

km2 [3943, 1243, 135, 419, 34458, 14701]
m2 [3943000000.0, 1243000000.0, 135000000.0, 419000000.0, 34458000000.0, 14701000000.0]
mi2 [1522.400186, 479.924786, 52.12377, 161.776738, 13304.302716, 5676.085502]


Units are so important in scientific computing that packages have been created to declare units for variables explicitly. See for example `pint`, in which you can write `3 * ureg.meter + 4 * ureg.cm` and obtain the quantity `3.04 <metres>`: https://pint.readthedocs.io. Errors caused by incorrect unit conversions have caused [very expensive disasters](https://www.mentalfloss.com/article/25845/quick-6-six-unit-conversion-disasters) in the aerospatial sector.

e. Write a function to check the validity of a lon/lat pair checking their range (e.g., -180,180). The function should return `False` if the pair is invalid, and `True` otherwise. The function should rely on two sub-functions `is_lat_valid` and `is_lon_valid`.

In [31]:
def is_lat_valid(lat):
    valid = lat >= -90 and lat <= 90
    return valid

def is_lon_valid(lon):
    valid = lon >= -180 and lon <= 180
    return valid

def is_latlon_valid(xlon, ylat):
    # This is an example of building functions using other functions.
    # both have to be True.
    valid = is_lon_valid(xlon) and is_lat_valid(ylat)
    return valid

print(is_latlon_valid(42.3, 5.53))
print(is_latlon_valid(542.3, -5.53))
print(is_lat_valid(-493))
print(is_lon_valid(2.54))

# Typically, this kind of function is used to validate the input and
# output of a programme. If you are making calculations 
# with lat/lon coordinates, it is reassuring to "assert":

lat = 54.215
lon = -0.347
# some calculations
# ...
# this line will fail and raise an error is the result is invalid!
assert is_latlon_valid(lon,lat),'invalid lon/lat'

# These assertions are widely used to develop professional 
# scientific and commercial programmes.

True
False
False
True


f. Given a list of timestamps (for example representing GPS fixes), write a function that sorts them and then calculates the interval (_timedelta_) between them. Observe the structure of the result `datetime.now()` to understand the `datetime` type. Enter some dates of notable events as specified below:

In [2]:
from datetime import datetime
# from <package> import <object>

import time

# build some example timestamps with sleep between them
example_timestamps = []
# format: datetime(year, month, day, hour, min, seconds, decimal)

# Fall of the Berlin wall
example_timestamps.append(datetime(1989, 11, 9, 0,0,0))

# Beginning of the Iraq War
example_timestamps.append(datetime(2003, 3, 20, 0,0,0))

# 9/11 attacks
example_timestamps.append(datetime(2001, 9, 11, 0,0,0))

# Beginning of the Syrian Civil War
example_timestamps.append(datetime(2011, 3, 15, 0,0,0))

# add now
example_timestamps.append(datetime.now())
time.sleep(2)
example_timestamps.append(datetime.now())

example_timestamps

[datetime.datetime(1989, 11, 9, 0, 0),
 datetime.datetime(2003, 3, 20, 0, 0),
 datetime.datetime(2001, 9, 11, 0, 0),
 datetime.datetime(2011, 3, 15, 0, 0),
 datetime.datetime(2021, 2, 2, 19, 6, 2, 462838),
 datetime.datetime(2021, 2, 2, 19, 6, 4, 467314)]

In [3]:
def time_intervals(timestamps):
    """ 
    @ timestamps is a list of datetimes 
    @ returns a list of timedeltas 
    """
    # sort datetime objects
    sorted_timestamps = sorted(timestamps)
    intervals = []
    # count timestamps
    n = len(sorted_timestamps)
    
    # scan timestamps getting an element (i) and 
    # the next one (i+1). For this reason we 
    # stop the range at n-1 and not at n. 
    # Element n+1 does not exist.
    for i in range(n-1):
        t_1 = sorted_timestamps[i]
        t_2 = sorted_timestamps[i+1]
        delta = t_2 - t_1
        intervals.append(delta)
        
    # the number of intervals is equal to n-1
    assert len(intervals) == n-1
    return intervals

intervals = time_intervals(example_timestamps)
print(intervals)
# for example, the delta between the fall of the Berlin Wall 
# and the 9/11 attacks is 4324 days.

[datetime.timedelta(days=4324), datetime.timedelta(days=555), datetime.timedelta(days=2917), datetime.timedelta(days=3612, seconds=68762, microseconds=462838), datetime.timedelta(seconds=2, microseconds=4476)]


g. Using `pytz`, write a function that given a datetime in UTC, returns the time shifted all common time zones (`pytz.common_timezones`). The function `astimezone()` supports the conversion between different timezones. Return the results in a pandas data frame with the following columns: `time_zone`,`time_iso`,`time_ctime`,`hours`,`minutes`. In the tests, save the results to a CSV file to inspect them more easily. Some ideas are discussed on [StackOverflow](https://stackoverflow.com/questions/25264811/pytz-converting-utc-and-timezone-to-local-time).

In [4]:
import pytz

def all_time_zones(a_utc_datetime):
    """ 
    Generates a data frame with a_utc_datetime in all common time zones.
    @returns a data frame 
    """
    # define empty data frame with 3 columns (no rows)
    times_df = pd.DataFrame(columns=['time_zone','time','offset'])
    
    for time_zone_name in pytz.common_timezones:
        # get the time zone from name
        tz = pytz.timezone(time_zone_name)
        # apply time zone shift to a_utc_datetime
        shifted_time = tz.fromutc(a_utc_datetime)
        offset = shifted_time.strftime('%z')
        new_row_dict = {'time_zone':time_zone_name, 'time':shifted_time, 'offset':offset}
        # this is a way to add a new row to a data frame.
        # Note the re-assignment: 'append' works differently 
        # for lists and data frames.
        times_df = times_df.append(new_row_dict, ignore_index=True)
        
    # sort by offset and time_zone
    times_df = times_df.sort_values(by=['offset','time_zone'])
    return times_df

now_world_df = all_time_zones(datetime.utcnow())
print("now_world_df rows:",len(now_world_df))
now_world_df.to_csv('tmp/all_time_zones.csv', index=False)

now_world_df

now_world_df rows: 439


Unnamed: 0,time_zone,time,offset
0,Africa/Abidjan,2021-02-02 19:06:13.542163+00:00,+0000
1,Africa/Accra,2021-02-02 19:06:13.542163+00:00,+0000
5,Africa/Bamako,2021-02-02 19:06:13.542163+00:00,+0000
7,Africa/Banjul,2021-02-02 19:06:13.542163+00:00,+0000
8,Africa/Bissau,2021-02-02 19:06:13.542163+00:00,+0000
...,...,...,...
426,Pacific/Tahiti,2021-02-02 09:06:13.542163-10:00,-1000
435,US/Hawaii,2021-02-02 09:06:13.542163-10:00,-1000
414,Pacific/Midway,2021-02-02 08:06:13.542163-11:00,-1100
416,Pacific/Niue,2021-02-02 08:06:13.542163-11:00,-1100


----
End of notebook.
