## 🌍<b> NB01 - City Selection </b>

## <center> Selection Criteria </center>


| What | Why and How |
| :--: | :--: |
| In the Northern Hemisphere | to somewhat standardize climate in my chosen time period (i.e., the seasons would relatively be the same in my chosen cities) |
| Have a reputation for being rainy | this will be decided upon intuitively/through general searches on infamous rainy movie scenes (e.g., "It rains nine months a year in Seattle", Sleepless in Seattle) |
| 5 cities in total | to have a big enough sample size to be able to comprehensively compare between cities, yet not have the number of cities be overwhelming |
| One of the cities is London | it is the focus of my research! |

#### Based on my criteria, these are the cities I have chosen: 

1) London, UK 🇬🇧
2) Oslo, Norway 🇳🇴
3) Seattle, USA 🇺🇸
4) Munich, Germany 🇩🇪
5) Kyoto, Japan 🇯🇵

#### The movies that prominently feature rain in these cities respectively are:
1) Notting Hill/Bridget Jones' Diary/Four Weddings and a Funeral 📔💐
2) Oslo, August 31st (<u> NOTE: </u> Mostly no rain in the movie but it rains a lot in Oslo/Norway, which is why it's on the list!) 🚲🗓
3) Sleepless in Seattle ☎️📻
4) Suspiria 💃🩸
5) Rashomon ⚔️💧

In [1]:
## I'm importing the packages that I'll be using throughout this assignment
import pandas as pd
import os
import json
import requests
import numpy as np

from lets_plot import *
LetsPlot.setup_html()

#!pip install geopandas

import geopandas as gpd
from lets_plot.geo_data import *

from lets_plot import *
LetsPlot.setup_html()

from functions import *

#!pip install geodatasets
import geodatasets

from IPython.display import *
from shapely.geometry import Point

The geodata is provided by © OpenStreetMap contributors and is made available here under the Open Database License (ODbL).


I have created and imported the function ```print_location_lat_lon``` to print out the output from ```get_lat_lon```, so I have imported that and will be using it to efficiently get the latitude and longitudes for all five of my cities!

In [2]:
print_location_lat_lon('GB', 'London')

The latitude and longitude of GB, London is (51.50853, -0.12574)


In [19]:
print_location_lat_lon('NO', 'Oslo')

The latitude and longitude of NO, Oslo is (59.91273, 10.74609)


In [20]:
print_location_lat_lon('US', 'Seattle')

The latitude and longitude of US, Seattle is (47.60621, -122.33207)


In [21]:
print_location_lat_lon('DE', 'Munich')

The latitude and longitude of DE, Munich is (48.13743, 11.57549)


In [22]:
print_location_lat_lon('JP', 'Kyoto')

The latitude and longitude of JP, Kyoto is (35.02107, 135.75385)


I am creating a CSV dataframe to hold city data between January 01, 2021 and January 01, 2024. I am doing this to achieve a neater working environment but also to limit my API calls, which has a daily cap. It's more convenient to be working with data in a local environment.

In [3]:
## Creating a list with my chosen country codes and cities
cities = [
    ("GB", "London"),
    ("JP", "Kyoto"),
    ("DE", "Munich"),
    ("US", "Seattle"),
    ("NO", "Oslo")
]

## Defining start and end dates
start_date = '2021-01-01'
end_date = '2024-01-01'

## Creating an empty list to hold the data
all_city_data = []

## Creating loop for calling country codes, city names, times and rain sum data and storing it in the JSON format
for country_code, city_name in cities:
    json_data = get_historical_data(country_code, city_name, start_date, end_date)
    
    city_data = {
        "country": country_code,
        "city": city_name,
        "date": json_data['daily']['time'],
        "rain_sum": json_data['daily']['rain_sum']
    }
    
    city_df = pd.DataFrame(city_data)
    all_city_data.append(city_df)

## Combining city rain sum data into a dataframe
final_df = pd.concat(all_city_data, ignore_index=True)
## Saving dataframe into the data file as CSV
final_df.to_csv('../data/historical_city_rain_data.csv', index=False)

### Plotting the selected cities

In [4]:
## Obtaining city geocodes/information
## This is necessary to ensure the right points are being plotted, as I previously encountered an issue with Munich, where it was being plotted in the US instead of Germany
geocode_cities(['munich', 'seattle', 'kyoto', 'london', 'oslo'])\
    .countries(['DE', 'USA', 'JP', 'GB', 'NO'])\
    .get_geocodes()

Unnamed: 0,id,city,found name,country,centroid,position,limit
0,1700534808,munich,Munich,DE,"[11.5258078608938, 48.1545735150576]","[11.3607765734196, 48.0616249144077, 11.722908...","[11.3607765734196, 48.0616249144077, 11.722908..."
1,29546940,seattle,Seattle,USA,"[-122.313062421052, 47.6189685612917]","[-122.436020672321, 47.4955514073372, -122.224...","[-122.436020672321, 47.4955514073372, -122.224..."
2,533681139,kyoto,Kyoto,JP,"[135.755607113242, 35.0210405141115]","[135.746623724699, 35.0136838853359, 135.76459...","[135.746623724699, 35.0136838853359, 135.76459..."
3,107775,london,London,GB,"[-0.144055305103075, 51.4893338084221]","[-0.510374754667282, 51.2867599725723, 0.33401...","[-0.510374754667282, 51.2867599725723, 0.33401..."
4,20981158,oslo,Oslo,NO,"[10.775728858116, 59.9723978340626]","[10.4891645908356, 59.8093114793301, 10.951389...","[10.4891645908356, 59.8093114793301, 10.951389..."


In [63]:
## Creating dataframe centroids and changing Munich's longitude and latitude data to the one obtained from the above code
centroids = geocode_cities(data["city"]).get_centroids()

correct_point = Point(11.52580, 48.15457)

centroids.loc[centroids['city'] == 'Munich', 'geometry'] = correct_point

centroids.to_csv('../data/centroid_data.csv', index=False)

centroids


Unnamed: 0,city,found name,geometry
0,Munich,Munich,POINT (11.5258 48.15457)
1,London,London,POINT (-0.14406 51.48933)
2,Kyoto,Kyoto,POINT (135.75561 35.02104)
3,Oslo,Oslo,POINT (10.77573 59.9724)
4,Seattle,Seattle,POINT (-122.31306 47.61897)


### Figure 1. 
#### <i> Interactive map of selected cities </i>

In [65]:
## Creating an interactive map, where selected cities are visualized as different coloured points
centroids = geocode_cities(centroids["city"]).get_centroids()

centroids.loc[centroids['city'] == 'Munich', 'geometry'] = Point(11.5258, 48.15457)

p = ggplot() + ggsize(800, 500)

lats = [0 * y for y in range(4)]

plot = (
    p + 
    geom_livemap(zoom=2) +  
    geom_hline(aes(yintercept=lats), color='#e0218a', linetype=2, size=1) +  
    geom_point(aes(color='city'), 
               data=centroids,  
               size=5,  
               show_legend=True,  
               tooltips=layer_tooltips().title("@city"))  
)

plot