# Create a map of Countries with current travel restrictions

In [128]:

from dateutil.parser import parse
import config as cfg
import pandas as pd

**Define and get the website (estonian version)**

We are using [requests](https://requests.readthedocs.io/en/master/) to download the webpage, and the [BeautifulSoup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/) package to parse the source code of the webpage.

In [124]:
import requests
from bs4 import BeautifulSoup

url = r'https://vm.ee/et/teave-riikide-ja-liikumispiirangute-kohta-eestisse-saabujatele'

r = requests.get(url)
soup = BeautifulSoup(r.text, features="html.parser")

**Regex to find dates and current timeframe**

The data on the website is subject to change. The format can change with a new update, so the information we are looking for might be in a different position, or it has to be extracted from text. Right now, the website stores the last-update information on and the valid time frame in the text, in the format of: 

```DD.MM-DD.MM.YYYY (seisuga DD.MM.YYYY)```

The text might change at some point, but for now we can use regular expressions to extract the data.

In [None]:
import re

These are the too regular expressions we need. The first one extracts the time the data has been released/updated. The second contains the time interval of validity. Regular expression have been drafted on [RegExr](https://regexr.com/). Note, that as soon the text changes on the webite, the regular expression might need to be updated. They are somewhat dynamic (leading 0), but not entirely.

In [None]:
# Compiling to search for "seisuga DD.MM.YYYY"
p1 = re.compile(r'seisuga\s+((?:[0]?[1-9]|[1|2][0-9]|[3][0|1])[.](?:[0]?[1-9]|[1][0-2])[.](?:[0-9]{4}|[0-9]{2}))?')

# Compiling to search for "DD.MM-DD.MM.YYYY"
p2 = re.compile(r'((?:[0]?[1-9]|[1|2][0-9]|[3][0|1])[.](?:[0]?[1-9]|[1][0-2])[.]?(?:[0-9]{4}|[0-9]{2})?)-\s*((?:[0]?[1-9]|[1|2][0-9]|[3][0|1])[.](?:[0]?[1-9]|[1][0-2])[.](?:[0-9]{4}|[0-9]{2}))?')

Finding and parsing the Dates.

In [126]:
# Day of release (last update)
up_date = datetime.datetime.strptime(p1.findall(li)[0], '%d.%m.%Y')

# Validity period
li = soup.find(text=p1)
valid_from = parse(p2.findall(li)[0][0])
valid_to = parse(p2.findall(li)[0][1])

# Print to check if it has been extracted properly
print("The website has been updated on {}, the current values are valid from {} to {}.".format(up_date.strftime('%d.%m.%Y'), 
                                                                                             valid_from.strftime('%d.%m.%Y'),
                                                                                             valid_to.strftime('%d.%m.%Y')))

The website has been updated on 11.09.2020, the current values are valid from 14.09.2020 to 20.09.2020.


## Finding countries in the text

In the config.py file in the same folder a dictionary has been created containing all the countries, their English and Estonian translation, matched together by their 2-digit Country code. ([en](https://en.wikipedia.org/wiki/ISO_3166-1), [ee](https://et.wikipedia.org/wiki/ISO_maakoodide_loend)).

The file can be imported as a package, and the containing dictionary is assigned to a variable.

In [142]:
import config as cfg
a2_dct = cfg.a2_dct

dict(list(a2_dct.items())[0:5])

{'AF': {'ee': 'Afganistan', 'en': 'Afghanistan'},
 'AX': {'ee': 'Ahvenamaa', 'en': 'Åland Islands'},
 'AL': {'ee': 'Albaania', 'en': 'Albania'},
 'DZ': {'ee': 'Alžeeria', 'en': 'Algeria'},
 'AS': {'ee': 'Ameerika Samoa', 'en': 'American Samoa'}}

In [107]:
res = ul1.find(text=re.compile( 'Itaalia' + '\s*[\d]{1,3}([,]\d)?'))
res

'Itaalia 27'

### Finding the section containing EU-countries

The webpage has different sections, which are all subject to change. The sections are defined by [```<ul>```-tags](https://www.w3schools.com/tags/tag_ul.asp) (unordered lists), containing [```<li>```-items](https://www.w3schools.com/tags/tag_li.asp) (list items). We can use BeautifulSoup to find the unordered lists in the website.

In [147]:
uls = soup.find_all('ul')
ils = soup.find_all('li')
print("There are {} unordered lists defined in the website, in total they contain {} list items.".format(len(uls),len(ils)))

There are 33 unordered lists defined in the website, in total they contain 232 list items.


At this stage a bit of manual data exploration is advised. We need to find the unordered list, which contains the data we are looking for (list of EU-countries and their respective active cases / 100k inhabitants). This can certainly be automized, but as long it doesn't change permanently, it is easier to just select the right list. 

In [145]:
# The list we are after is currently the 20 list out of 33.
ul1 = soup.find_all('ul')[20]
ul1

<ul>
<li><span class="node-text-color-red"><strong>Andorra 266,5</strong></span></li>
<li><span class="node-text-color-red"><strong>Austria 59,7</strong></span></li>
<li><span class="node-text-color-red"><strong>Belgia 47,1</strong></span></li>
<li><span class="node-text-color-red"><strong>Bulgaaria 24,1</strong></span></li>
<li><span class="node-text-color-red"><strong>Hispaania 265,5</strong></span></li>
<li><span class="node-text-color-red"><strong>Holland 57,2</strong></span></li>
<li><span class="node-text-color-red"><strong>Horvaatia 91,4</strong></span></li>
<li><span class="node-text-color-red"><strong>Iirimaa 38,9</strong></span></li>
<li>
<p><span class="node-text-color-red"><strong>Island 19,6</strong></span></p>
</li>
<li><span class="node-text-color-red"><strong>Itaalia 31,9</strong></span></li>
<li><span class="node-text-color-red"><strong>Kreeka 27,2</strong></span></li>
<li><strong>Küpros 4,1</strong></li>
<li><strong>Leedu 15,6</strong></li>
<li><strong>Liechtenstein 7

In the printed section we can find all the data we are looking for. Unfortunately the formatting is not homogeneous, which will make extraction a bit more tedious. There are different ```<span>``` classes, and various use cases of the ```<strong>``` tag, which indicate whether the 2-week quarantine rule applies or not. 

Before we are going to loop through all the countries, we are testing our detection. The following command is used to find the country and the number. Note, that the regular expression contains the option for several spaces ```\s*``` (sometimes there is more than one), and the option for decimal number or not ```([,]\d)?```.

In [150]:
p1 = re.compile( 'Prantsusmaa' + '\s*[\d]{1,3}([,]\d)?')
res = ul1.find(text=p1)
res

'Prantsusmaa 140,6'

In the next step we want to extract the decimal value. At the same time we can look for a present asterisk, highlighting that there is a special treatment for this case. 

In [151]:
p2 = re.compile(r'(\d*\,?\d+)(\*)?')
res2 = p2.findall(res)[0]
res2

('140,6', '')

In case the country is highlighted in<font color=red> **bold and red ** </font> returning from the country implies a 14 day quarantine period. Whether or not the formatting exists can be checked like this.

In [153]:
if res.find_parent('span', {'class': 'node-text-color-red'}):
    print(True)
    
# or 
if res.find_parent('strong', {'style' : 'color: rgb(189, 73, 50);'}):
    print(True)


True


In [154]:
# Creating dictionaries and lists where the collected information is stored
a2_status = {}  # 
found_countries = [] # countries that were found on the webpage
quarantine = [] # countries needing quarantine on re-entry
no_quarantine = [] # countries without quarantine rule

for a2 in a2_dct.keys(): # looping through all country codes
    cntry = a2_dct[a2]['ee'] # get the estonian translation
    cntry_en = a2_dct[a2]['en'] # get the english translation
    # find the country in the selected unordered list
    res = ul1.find(text=p1) # for p1 see above
    
    if res: # if it is found
        found_countries.append(cntry_en) # append it to the list of countries found
        res2 = p2.findall(res)[0] # search for the value (see above)
        val = float(res2[0].replace(',','.')) # convert value from string decimal (komma) to float
        a2_status[a2] = {'val': val} # assign it to the dictionary
        
        # the following is checking whether a asterisk is present 
        try:
            note = res2[0][1]
        except:
            note = ''
        if note == '':
            a2_status[a2]['note'] = False # assign to the dictionary
        else:
            a2_status[a2]['note'] = True
            
        print(res, val, end = '')
        
        # check whether quarantine rules apply or not
        if res.find_parent('span', {'class': 'node-text-color-red'}) or res.find_parent('strong', {'style' : 'color: rgb(189, 73, 50);'}):
            print('####')
            a2_status[a2]['fom'] = False # assign to the dictionary
            quarantine.append(cntry_en) # add to the list of countries that need to quarantine
        else:
            print('')
            a2_status[a2]['fom'] = True
            no_quarantine.append(cntry_en) # add to the list of countries that do not need to quarantine
    else:
        pass

Andorra 266,5 266.5####
Austria 59,7 59.7####
Belgia 47,1 47.1####
Bulgaaria 24,1 24.1####
Hispaania 265,5 265.5####
Holland 57,2 57.2####
Horvaatia 91,4 91.4####
Iirimaa 38,9 38.9####
Island 19,6 19.6####
Itaalia 31,9 31.9####
Kreeka 27,2 27.2####
Küpros 4,1 4.1
Leedu 15,6 15.6
Liechtenstein 7,8 7.8
Luksemburg 88,8 88.8####
Läti 4,3 4.3
Malta 68,9 68.9####
Monaco 124 124.0####
Norra 23,3 23.3####
Poola 20,5 20.5####
Portugal 53,1 53.1####
Prantsusmaa 140,6 140.6####
Rootsi 22,7 22.7####
Rumeenia 85,2 85.2####
Saksamaa 20,9 20.9####
San Marino 72,6 72.6####
Slovakkia 25,6 25.6####
Sloveenia 30,4 30.4####
Soome 8,2 8.2
Šveits 55,0 55.0####
Taani 39,6 39.6####
Tšehhi 85,6 85.6####
Ungari 49,2 49.2####
Vatikan 0,0 0.0
Ühendkuningriik 41,7 41.7####


In [110]:
        
ul2 = soup.find_all('ul')[21]
for a2 in a2_dct.keys():
    cntry = a2_dct[a2]['ee']
    cntry_en = a2_dct[a2]['en']
    res = ul2.find(text=re.compile('(' + cntry + ')'))
    if res:
        a2_status[a2] = {'val': None}
        print(res, end = '')
        found_countries.append(cntry_en)
        if res.find_parent('strong'):
            print('####')
            a2_status[a2]['fom'] = False
            quarantine.append(cntry_en)
        else:
            print('')
            a2_status[a2]['fom'] = True
            no_quarantine.append(cntry_en)
        
        p = re.compile('([\*])')
        res2 = p.findall(res)
        if note == '':
            a2_status[a2]['note'] = False
        else:
            a2_status[a2]['note'] = True
            
print('Found countries: {}'.format(len(found_countries)))
print(found_countries)
print('Countries that need quarantine: {}'.format(len(quarantine)))
print(quarantine)
print('Countries that need NO quarantine: {}'.format(len(no_quarantine)))
print(no_quarantine)

Austraalia
Gruusia
Jaapan
Kanada
Lõuna-Korea
Rwanda
Tai
Tuneesia
Uruguay
Uus-Meremaa
Found countries: 45
['Andorra', 'Austria', 'Belgium', 'Bulgaria', 'Spain', 'Netherlands', 'Croatia', 'Ireland', 'Iceland', 'Italy', 'Greece', 'Cyprus', 'Lithuania', 'Liechtenstein', 'Luxembourg', 'Latvia', 'Malta', 'Monaco', 'Norway', 'Poland', 'Portugal', 'France', 'Sweden', 'Romania', 'Germany', 'San Marino', 'Slovakia', 'Slovenia', 'Finland', 'Switzerland', 'Denmark', 'Czechia', 'Hungary', 'Holy See', 'United Kingdom', 'Australia', 'Georgia', 'Japan', 'Canada', 'Korea, Republic of', 'Rwanda', 'Thailand', 'Tunisia', 'Uruguay', 'New Zealand']
Countries that need quarantine: 29
['Andorra', 'Austria', 'Belgium', 'Bulgaria', 'Spain', 'Netherlands', 'Croatia', 'Ireland', 'Iceland', 'Italy', 'Greece', 'Luxembourg', 'Malta', 'Monaco', 'Norway', 'Poland', 'Portugal', 'France', 'Sweden', 'Romania', 'Germany', 'San Marino', 'Slovakia', 'Slovenia', 'Switzerland', 'Denmark', 'Czechia', 'Hungary', 'United Kingdom

In [111]:
a2_status

{'AD': {'val': 266.5, 'note': True, 'fom': False},
 'AT': {'val': 59.7, 'note': True, 'fom': False},
 'BE': {'val': 47.1, 'note': True, 'fom': False},
 'BG': {'val': 24.1, 'note': True, 'fom': False},
 'ES': {'val': 265.5, 'note': True, 'fom': False},
 'NL': {'val': 57.2, 'note': True, 'fom': False},
 'HR': {'val': 91.4, 'note': True, 'fom': False},
 'IE': {'val': 38.9, 'note': True, 'fom': False},
 'IS': {'val': 19.6, 'note': True, 'fom': False},
 'IT': {'val': 31.9, 'note': True, 'fom': False},
 'GR': {'val': 27.2, 'note': True, 'fom': False},
 'CY': {'val': 4.1, 'note': True, 'fom': True},
 'LT': {'val': 15.6, 'note': True, 'fom': True},
 'LI': {'val': 7.8, 'note': True, 'fom': True},
 'LU': {'val': 88.8, 'note': True, 'fom': False},
 'LV': {'val': 4.3, 'note': True, 'fom': True},
 'MT': {'val': 68.9, 'note': True, 'fom': False},
 'MC': {'val': 124.0, 'note': True, 'fom': False},
 'NO': {'val': 23.3, 'note': True, 'fom': False},
 'PL': {'val': 20.5, 'note': True, 'fom': False},
 'PT

In [112]:
a2_status['GB'] = a2_status['UK']
a2_status['EE'] = {'val': None, 'note': False, 'fom': True}
a2_status

{'AD': {'val': 266.5, 'note': True, 'fom': False},
 'AT': {'val': 59.7, 'note': True, 'fom': False},
 'BE': {'val': 47.1, 'note': True, 'fom': False},
 'BG': {'val': 24.1, 'note': True, 'fom': False},
 'ES': {'val': 265.5, 'note': True, 'fom': False},
 'NL': {'val': 57.2, 'note': True, 'fom': False},
 'HR': {'val': 91.4, 'note': True, 'fom': False},
 'IE': {'val': 38.9, 'note': True, 'fom': False},
 'IS': {'val': 19.6, 'note': True, 'fom': False},
 'IT': {'val': 31.9, 'note': True, 'fom': False},
 'GR': {'val': 27.2, 'note': True, 'fom': False},
 'CY': {'val': 4.1, 'note': True, 'fom': True},
 'LT': {'val': 15.6, 'note': True, 'fom': True},
 'LI': {'val': 7.8, 'note': True, 'fom': True},
 'LU': {'val': 88.8, 'note': True, 'fom': False},
 'LV': {'val': 4.3, 'note': True, 'fom': True},
 'MT': {'val': 68.9, 'note': True, 'fom': False},
 'MC': {'val': 124.0, 'note': True, 'fom': False},
 'NO': {'val': 23.3, 'note': True, 'fom': False},
 'PL': {'val': 20.5, 'note': True, 'fom': False},
 'PT

In [113]:
from arcgis.gis import GIS
from copy import deepcopy

In [114]:
gis = GIS(profile="COVDemo")

In [115]:
fom_lu = {True: 1, False: 2}
fom_lu[True]

1

In [116]:
current_id = r'abe1b723785f4ecfbb602697d50872a0'

In [117]:
import pytz
from dateutil.tz import tzlocal

In [118]:
date

datetime.datetime(2020, 9, 11, 0, 0)

In [120]:
regq = gis.content.get(current_id)
regions_flayer = regq.layers[0]
regions_fset = regions_flayer.query()
sdf = regions_fset.sdf
all_features = regions_fset.features

features_to_update = []
#a2_status = {}

if len(a2_status) == 0 or pd.isnull(sdf.lastUpdate.max()) or date.astimezone(pytz.utc) > sdf.lastUpdate.max().replace(tzinfo=pytz.utc):
    for a2 in sdf.ISO_2DIGIT.values:
        print(a2, end=', ')
        original_features = [f for f in all_features if f.attributes['ISO_2DIGIT'] == a2] # query the layer
        for original_feature in original_features:
            feature_to_be_updated = deepcopy(original_feature) # copy the original thing
            del feature_to_be_updated.attributes['SHAPE']
            print(a2, end = ' ')
            if a2 in a2_status:
                print('found')
                curVal = a2_status[a2]['val']
                prevVal = feature_to_be_updated.attributes['activeCasesp100k']
                feature_to_be_updated.attributes['activeCasesp100k'] = curVal
                feature_to_be_updated.attributes['prevActiveCp100k'] = prevVal
                feature_to_be_updated.attributes['addInfo'] = int(a2_status[a2]['note'])
                feature_to_be_updated.attributes['StatusCode'] = fom_lu[a2_status[a2]['fom']]
                feature_to_be_updated.attributes['prevUpdate'] = feature_to_be_updated.attributes['lastUpdate']
                feature_to_be_updated.attributes['lastUpdate'] = date
                feature_to_be_updated.attributes['validFrom'] = valid_from
                feature_to_be_updated.attributes['validTo'] = valid_to
                if prevVal and curVal:
                    feature_to_be_updated.attributes['activeCasesTrend'] = 2 if round(prevVal,1) > round(curVal,1) else 0
                    feature_to_be_updated.attributes['activeCasesTrend'] = 1 if round(prevVal,1) < round(curVal,1) else 0
                else:
                    feature_to_be_updated.attributes['activeCasesTrend'] = 0
            else:
                feature_to_be_updated.attributes['activeCasesp100k'] = None
                feature_to_be_updated.attributes['prevActiveCp100k'] = None
                feature_to_be_updated.attributes['addInfo'] = None
                feature_to_be_updated.attributes['StatusCode'] = 9
                feature_to_be_updated.attributes['prevUpdate'] = None
                feature_to_be_updated.attributes['lastUpdate'] = None
                feature_to_be_updated.attributes['validFrom'] = None
                feature_to_be_updated.attributes['validTo'] = None
                feature_to_be_updated.attributes['activeCasesTrend'] = 0
            feature_to_be_updated2 = {}
            feature_to_be_updated2['attributes'] = feature_to_be_updated.attributes
            features_to_update.append(feature_to_be_updated2)
    print('')
    update_result = regions_flayer.edit_features(updates=features_to_update)

AS, AS UM, UM UM UM UM UM UM CK, CK PF, PF UM, UM UM UM UM UM UM UM, UM UM UM UM UM UM NU, NU PN, PN WS, WS TK, TK TO, TO WF, WF UM, UM UM UM UM UM UM UM, UM UM UM UM UM UM SV, SV GT, GT MX, MX CA, CA found
AR, AR FK, FK CL, CL EC, EC PE, PE BO, BO BR, BR PY, PY UY, UY found
GS, GS AQ, AQ FJ, FJ SH, SH AI, AI AG, AG AW, AW BS, BS BB, BB BZ, BZ BM, BM BQ, BQ BQ BQ VG, VG KY, KY CO, CO CR, CR CU, CU CW, CW DM, DM DO, DO GF, GF GD, GD GP, GP GY, GY HT, HT HN, HN JM, JM MQ, MQ MS, MS NI, NI PA, PA PR, PR BQ, BQ BQ BQ BL, BL BQ, BQ BQ BQ KN, KN LC, LC MF, MF PM, PM VC, VC SX, SX SR, SR TT, TT TC, TC VI, VI VE, VE BF, BF CV, CV CI, CI GM, GM GH, GH GI, GI GN, GN GW, GW LR, LR ML, ML MR, MR MA, MA PT, PT found
SN, SN SL, SL GL, GL GG, GG IE, IE found
IM, IM JE, JE GB, GB found
IS, IS found
FO, FO SJ, SJ SJ BV, BV NZ, NZ found
AO, AO BW, BW BI, BI KM, KM CG, CG CD, CD GA, GA KE, KE LS, LS MW, MW TF, TF TF TF MZ, MZ NA, NA RW, RW found
ST, ST ZA, ZA SZ, SZ TZ, TZ ZM, ZM ZW, ZW IO, IO TF, TF TF 

In [None]:
features_to_update