# Extraction of quarantine information from NOTAMs


In this notebook, we use the preprocessed NOTAMs dataset to search for the terms "quarantine" and "isolate" in the messages and add those messages to a separate column in the dataframe. We also run Named Entity Recognition on the filtered text to identify DATE tags. The intention in identifying DATE tags is based on the assumption that the DATE tag would correspond to quarantine duration. 


**Input**

To generate the input dataset, refer this notebook: ws2_snr_NOTAMs_1_data_preparation

Preprocessed datasets

    - valid_airport_notams_xx.csv
    - valid_airspace_notams_xx.csv

**Output**

    - valid_airport_notams_with_quarantine_xx.csv
    - valid_airspace_notams_with_quarantine_xx.csv

Datasets with additional columns corresponding to quarantine related text


The following steps are carried out:

1. Read the preprocessed datset

2. Extract quarantine related text

3. Save the file

In [1]:
import requests


import spacy

from collections import Counter, defaultdict

import pandas as pd
import os
import csv
import itertools
import re
import json
import numpy as np
import matplotlib.pyplot as plt
import datetime
import ast

from spacy_langdetect import LanguageDetector
import plac
from spacy.lang.en import English
from spacy.matcher import PhraseMatcher
from spacy.tokens import Doc, Span, Token

from nltk.tokenize import word_tokenize
from nltk.tokenize import sent_tokenize
from nltk.tokenize import regexp_tokenize
from nltk.tokenize import TweetTokenizer
from nltk.stem import WordNetLemmatizer

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn.naive_bayes import MultinomialNB

from wordcloud import WordCloud
from spacy import displacy
import seaborn as sbs
import geonamescache


plt.style.use('fivethirtyeight')
%matplotlib inline

**1. Read the preprocessed datset**

In [2]:
apt_df = pd.read_csv("/project_data/data_asset/ws2/notams/valid_airport_notams_20200717.csv")
asp_df = pd.read_csv("/project_data/data_asset/ws2/notams/valid_airspace_notams_20200717.csv")

**2. Extract quarantine related text**

- Load spacy

- Filter for text containing the terms "quarantine" and "isolation"

- Extract DATE (DAY or WEEK) tags for the filtered text

In [3]:
############
#Stop words#
############

nlp_ = spacy.load('en_core_web_md')

# Adding stop words
new_stop_words = ["create","source","euecyiyn",'etczyoyx','tel']

# Add airport codes to stop words
new_stop_words.extend([ac.lower() for ac in list(apt_df.airportCode.values)])

for new_word in new_stop_words:
    nlp_.vocab[new_word].is_stop = True

In [4]:
def extract_quarantine_info(df):
    quarantine_duration_df = df.copy()
    quarantine_duration_df['quarantine_text'] = ""
    quarantine_duration_df['quarantine_days'] = ""
    for idx, row in quarantine_duration_df.iterrows():
        quarantine_days = []
        quarantine_text = []
        message = row['cleaned_message']
        doc_ = nlp_(message)
        if ('quarantine' in row['tokens']) | ('isolation' in row['tokens']):
            for ent in doc_.ents:
                if (ent.label_ == "DATE") & (("DAY" in ent.text.upper())|("WEEK" in ent.text.upper())):
                    quarantine_days.append(ent.text)
                    quarantine_text.append(message)
                    #spacy.displacy.render(doc_, jupyter=True, style='ent',options={'ents':['DATE']})
        if not len(quarantine_days) == 0:
            quarantine_duration_df.loc[idx,'quarantine_days'] = ",".join(quarantine_days)
        if not len(quarantine_text) == 0:
            quarantine_duration_df.loc[idx,'quarantine_text'] = " ".join(quarantine_text)

    return quarantine_duration_df

In [5]:
apt_df = extract_quarantine_info(apt_df)
asp_df = extract_quarantine_info(asp_df)

**3. Save the file**

In [6]:
apt_df.to_csv("/project_data/data_asset/ws2/notams/valid_airport_notams_with_quarantine_20200717.csv",index=False,quoting=csv.QUOTE_NONNUMERIC)
asp_df.to_csv("/project_data/data_asset/ws2/notams/valid_airspace_notams_with_quarantine_20200717.csv",index=False,quoting=csv.QUOTE_NONNUMERIC)

**4. Observations**

In [7]:
apt_df[apt_df.quarantine_days != '']

Unnamed: 0,message,Qcode,createdDate,Closed,airportName,airportCode,cityName,countryCode,countryName,latitude,longitude,tokens,cleaned_message,quarantine_text,quarantine_days
40,CONT:3.ACFT REQ RON MUST OBTAIN DGCA AIR TRANS...,XXXX,2020-06-20 06:22:00+00:00,False,Kuwait Intl,OKBK,Kuwait,KWT,Kuwait,29.226767,47.979953,"['cont', 'aircraft', 'request', 'ron', 'obtain...",cont 3 aircraft request ron must obtain dgca a...,cont 3 aircraft request ron must obtain dgca a...,14 day
41,1. PER FCG & IAW GOVT OF KW COUNTER-COVID-19 P...,XXXX,2020-06-20 06:20:00+00:00,False,Kuwait Intl,OKBK,Kuwait,KWT,Kuwait,29.226767,47.979953,"['fcg', 'iaw', 'govt', 'counter', 'covid', 'pa...",1 per fcg iaw govt of kw counter covid 19 pand...,1 per fcg iaw govt of kw counter covid 19 pand...,14 day quarantine
54,COVID-19: ORDERS OF THE STATE GOVERNMENT OF BE...,FAXX,2020-06-29 08:22:00+00:00,False,Tegel,EDDT,Berlin,DEU,Germany,52.559686,13.287711,"['covid', 'order', 'state', 'government', 'ber...",covid 19 orders of the state government of ber...,covid 19 orders of the state government of ber...,14 days
88,"FOLLOWING THE DECLARATION OF WHO, MINISTRY OF ...",OEXX,,False,Kigali Intl,HRYR,Kigali,RWA,Rwanda,-1.968447,30.138386,"['follow', 'declaration', 'ministry', 'health'...",following the declaration of who ministry of h...,following the declaration of who ministry of h...,"7 days,the day"
290,AMEND APP/TWR HR SER: 2200 MON TO 0330 TUE 220...,STAH,2020-06-19 18:30:00+00:00,False,Rarotonga Intl,NCRG,Rarotonga I,COK,Cook Is,-21.200875,-159.79495,"['amend', 'app', 'aerodrome', 'control', 'towe...",amend app aerodrome control tower hour service...,amend app aerodrome control tower hour service...,"2200 monday,tuesday,thursday,friday 2200 frida..."
293,- FLIGHTS TO VIETNAM: NOT ALLOWED TO CARRY PAS...,AFXX,2020-06-16 04:26:00+00:00,False,Tan Son Nhat Intl,VVTS,Ho Chi Minh,VNM,Vietnam,10.820556,106.660833,"['flight', 'vietnam', 'allow', 'carry', 'passe...",flights to vietnam not allowed to carry passen...,flights to vietnam not allowed to carry passen...,14 days
321,"DUE TO THE COVID-19 OUTBREAK, THE GOVERNMENT O...",XXXX,2020-06-29 14:06:00+00:00,False,Bradshaw Intl,TKPK,St Kitts I.,KNA,St Kitts And Nevis,17.311181,-62.718686,"['covid', 'outbreak', 'government', 'kitts', '...",due to the covid 19 outbreak the government of...,due to the covid 19 outbreak the government of...,"fourteen 14 day quarantine,14 days"
345,BAILIWICK OF GUERNSEY COVID-19 PROCEDURES: EXE...,FAXX,2020-06-30 09:04:00+00:00,False,Guernsey,EGJB,Guernsey,GBR,UK,49.434931,-2.6028,"['bailiwick', 'guernsey', 'covid', 'procedure'...",bailiwick of guernsey covid 19 procedures exer...,bailiwick of guernsey covid 19 procedures exer...,"14 days,14 days,14 days"
379,COVID-19: ORDERS OF THE STATE GOVERNMENT OF BR...,FAXX,2020-06-15 15:58:00+00:00,False,Schonefeld,EDDB,Berlin,DEU,Germany,52.362247,13.500672,"['covid', 'order', 'state', 'government', 'bra...",covid 19 orders of the state government of bra...,covid 19 orders of the state government of bra...,14 days


Airport NOTAMS

* NOTE: for some rows the quarantine_days column might not correspond to the quarantine duration but to some other date tags in the text!

* From the above dataframe, we see that for most cases the quarantine duration corresponds to 14 days. The text has to be read to get the exact quarantine regulations

* Further work to be done on identifying different quarantine restrictions!

In [8]:
asp_df[asp_df.quarantine_days != '']

Unnamed: 0,message,Qcode,createdDate,Closed,FIRcode,FIRname,countryCode,countryName,tokens,cleaned_message,quarantine_text,quarantine_days
16,- FLIGHTS TO VIETNAM: NOT ALLOWED TO CARRY PAS...,AFXX,2020-06-16 04:26:00+00:00,False,VVVV,HANOI,VNM,Viet Nam,"['flight', 'vietnam', 'allow', 'carry', 'passe...",flights to vietnam not allowed to carry passen...,flights to vietnam not allowed to carry passen...,14 days
59,COVID-19. PASSENGER RESTRICTIONS: ALL FLIGHT A...,OEXX,2020-07-15 10:44:00+00:00,False,LIMM,MILANO,ITA,Italy,"['covid', 'passenger', 'restriction', 'flight'...",covid 19 passenger restrictions all flight arr...,covid 19 passenger restrictions all flight arr...,14 days
63,"COVID19, TRAVEL RESTRICTIONS: NORWAY HAS START...",XXXX,2020-07-10 12:50:00+00:00,False,ENOB,BODO OCEANIC,NOR,Norway,"['travel', 'restriction', 'norway', 'start', '...",covid19 travel restrictions norway has started...,covid19 travel restrictions norway has started...,10 days
87,COVID-19. PASSENGER RESTRICTIONS: ALL FLIGHT A...,OEXX,2020-07-15 10:44:00+00:00,False,LIRR,ROMA,ITA,Italy,"['covid', 'passenger', 'restriction', 'flight'...",covid 19 passenger restrictions all flight arr...,covid 19 passenger restrictions all flight arr...,14 days
124,COVID-19: CREWS/PASSENGERS REQUIREMENTS THE RE...,AFXX,2020-07-13 22:27:00+00:00,False,LOVV,WIEN,AUT,Austria,"['covid', 'crew', 'passenger', 'requirement', ...",covid 19 crews passengers requirements the rep...,covid 19 crews passengers requirements the rep...,"the last 14 days,older than 4 days,14 days,old..."
167,COVID-19: ALL PERSONS ENTERING THE TERRITORY O...,AFXX,2020-07-15 20:34:00+00:00,False,LBSR,SOFIA,BGR,Bulgaria,"['covid', 'person', 'enter', 'territory', 'rep...",covid 19 all persons entering the territory of...,covid 19 all persons entering the territory of...,14 days
171,IN ORDER TO PREVENT FURTHER OUTBREAK OF COVID-...,AFXX,2020-07-12 09:23:00+00:00,False,OIIX,TEHRAN,IRN,Iran (Islamic Republic of),"['order', 'prevent', 'outbreak', 'covid', 'inf...",in order to prevent further outbreak of covid ...,in order to prevent further outbreak of covid ...,"two week,14th day"
189,- FLIGHTS TO VIETNAM: NOT ALLOWED TO CARRY PAS...,AFXX,2020-06-16 04:26:00+00:00,False,VVTS,HO-CHI-MINH,VNM,Viet Nam,"['flight', 'vietnam', 'allow', 'carry', 'passe...",flights to vietnam not allowed to carry passen...,flights to vietnam not allowed to carry passen...,14 days
190,"FOLLOWING THE DECLARATION OF WHO, MINISTRY OF ...",OEXX,,False,HRYR,KIGALI,RWA,Rwanda,"['follow', 'declaration', 'ministry', 'health'...",following the declaration of who ministry of h...,following the declaration of who ministry of h...,"7 days,the day"
211,COVID-19: AUSTRALIA - TRAVEL RESTRICTIONS AND ...,PCCA,2020-07-05 21:53:00+00:00,False,YBBB,BRISBANE,AUS,Australia,"['covid', 'australia', 'travel', 'restriction'...",covid 19 australia travel restrictions and pas...,covid 19 australia travel restrictions and pas...,14 days


Airspace NOTAMS


- Norway has a quarantine duration of 10 days

- South Africa has a quarantine duration of upto 21 days

- In some cases there is mention of 7 and 14 days in the message. The message has to be read to understand the exact quarantine regulation!

**Author**

* Shri Nishanth Rajendran - AI Development Specialist, R² Data Labs, Rolls Royce