In [3]:
# Import dependencies
%matplotlib inline
import json
import gzip
import urllib.request
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import random
import time
import openweathermapy.core as owm
from datetime import datetime,timedelta
#from scipy.interpolate import lagrange
#from numpy.polynomial.polynomial import Polynomial
#from urllib.error import HTTPError
from IPython.display import display, HTML
import config

errors = []
if config.owm_api_key == '':
    error_message = '\\n'.join(errors)
    js = f'<script>alert("{error_message}");</script>'
    display(HTML(js))
    exit()
print('Openweathermap API key is properly configured')

Openweathermap API key is properly configured


## Limitations

Bounded by the challenge limitations we have to use weather data available for the free account at `openweathermap.org`. Our account is limited to 60 API calls per minute. Generally, we have two types of data available:

- #### Current Weather at any specified city

    The disadvantage of this approach is that we'll end up comparing weather at different time zones thus the data will naturaly spread. Also if we just take one measurement per city we'll be noised by the temporary weather conditions. The advantage of this approach is that knowing their inner Id of the specific cities (more on where to get it later) we can bundle them in lists up to 25 cities so each bundle will be treated as one API calls.

- #### Forecast for any specified city

    Their forecast is done for next 5 days using 3 hour intervals.  The natural disadvantage is that this is not a real data but a prediction. The advantage is that we could take all these 40 data points (5 days * 24 hours / 3 hour inteval) and take the average value for each metric (reducing the noise generated by temporary conditions)

Ideally, we should perform this kind of research using historical average values for the same period of time expressed in specific time zone's local time but the historical weather is not available for the free account. For this challenge lets stick to the 'current weather' approach

## Building Cities List

Our goal here is to pick at least 500 cities for analysis, they should be evenly distributed across the whole range of latitudes. But since expressing a specific latitude weather by just one city is susceptible to errors lets follow the below algorithm:

1. Load all available cities from `openweathermap.org`
2. Sort them by latitude
3. Take the min an max available latitudes and calculate the available latitude range
4. Split this range into N/X (N>=500, 1 <= X <= 25) subranges where N is the maximum number of cities we are going to take and X is a number of cities we request weather for per one API call
5. For every latitude subrange pick X random cities that fall into it (up to X if the actual number of cities in this range is less)
6. For every city bundle perform an API cal for current weather and calculate the average metrics (a median will probably serve better than mean in that case)

This way we'll end up having N/X data points per city group which will be much more representative than the a single data point per city. Also these desired charts will look much cleaner and give higher probability of discovering meaningful trends.

This particular analyse will be done for the values **N=7500** and **X=25** (300 intervals)

In [11]:
# Initial configuration
cities_count = 7500
cities_per_bundle = 25
city_list_url = 'http://bulk.openweathermap.org/sample/city.list.min.json.gz'

In [16]:
# 1. Load all available cities from openweathermap.org
print(f'Loading cities list from {city_list_url} ...')
cities = []
start_time = time.monotonic()
with urllib.request.urlopen(city_list_url) as response:
    with gzip.GzipFile(fileobj=response) as cities_file:
        cities = json.load(cities_file)
elapsed_time = timedelta(seconds=(time.monotonic() - start_time))
print(f'It took {elapsed_time} to load {len(cities)} cities')

Loading cities list from http://bulk.openweathermap.org/sample/city.list.min.json.gz ...
It took 0:00:11.203000 to load 209579 cities


In [21]:
# 2. Sort them by latitude
cities.sort(key=(lambda x: x['coord']['lat']))

In [23]:
# 3. Take the min an max available latitudes and calculate the available latitude range
northernmost = cities[-1]
southernmost = cities[0]
min_latitude = southernmost['coord']['lat']
max_latitude = northernmost['coord']['lat']
latitude_range = max_latitude - min_latitude
print(f'The northernmost point is {northernmost["name"]},{northernmost["country"]}',end='')
print(f' at ({northernmost["coord"]["lat"]}, {northernmost["coord"]["lat"]})')
print(f'The southernmost point is {southernmost["name"]},{southernmost["country"]}',end='')
print(f' at ({southernmost["coord"]["lat"]}, {southernmost["coord"]["lat"]})')
print(f'All available latitudes are covered by the range of {latitude_range:.6f}°')

The northernmost point is Isachsen,CA at (78.785301, 78.785301)
The southernmost point is Antarctica, at (-78.158562, -78.158562)
All available latitudes are covered by the range of 156.943863°
