# .gov.ua Website Outage

This notebook analyzes data collected by the `monitor.py` program that checks to see what .gov.ua websites (obtained from Wikidata) can be connected to.

In [None]:
! pip install pandas plotly ipyleaflet python-geoip-geolite2 python-geoip-python3

In [1]:
import pandas

df = pandas.read_csv('https://raw.githubusercontent.com/edsu/gov-ua/main/data.csv.gz', parse_dates=['run', 'time'])
df

Unnamed: 0,run,time,url,error
0,2022-02-26 15:21:11.457890,2022-02-26 15:21:19.130888,http://zborivrayrada.gov.ua,HTTPConnectionPool(host='zborivrayrada.gov.ua'...
1,2022-02-26 15:21:11.457890,2022-02-26 15:21:53.422481,http://www.adm-pl.gov.ua,"HTTPConnectionPool(host='www.adm-pl.gov.ua', p..."
2,2022-02-26 15:21:11.457890,2022-02-26 15:21:55.066555,http://pogrda.gov.ua,"HTTPConnectionPool(host='pogrda.gov.ua', port=..."
3,2022-02-26 15:21:11.457890,2022-02-26 15:21:42.104289,http://www.oda.te.gov.ua,"HTTPSConnectionPool(host='oda.te.gov.ua', port..."
4,2022-02-26 15:21:11.457890,2022-02-26 15:21:45.803619,http://www.vberez.gov.ua,"HTTPConnectionPool(host='www.vberez.gov.ua', p..."
...,...,...,...,...
355651,2022-03-12 11:47:24.832034,2022-03-12 11:55:50.505424,http://www.vinrada.gov.ua,"HTTPConnectionPool(host='www.vinrada.gov.ua', ..."
355652,2022-03-12 11:47:24.832034,2022-03-12 11:54:16.047733,http://www.volynrada.gov.ua,HTTPConnectionPool(host='www.volynrada.gov.ua'...
355653,2022-03-12 11:47:24.832034,2022-03-12 11:54:48.945356,http://www.vru.gov.ua,"HTTPConnectionPool(host='www.vru.gov.ua', port..."
355654,2022-03-12 11:47:24.832034,2022-03-12 11:54:51.365681,http://www.yalta-gs.gov.ua,"HTTPConnectionPool(host='www.yalta-gs.gov.ua',..."


In [None]:
counts = df.groupby('run').count()
counts

In [None]:
from plotly import express as xp

xp.line(
    df, 
    x=counts.index,
    y=counts.error,
    labels={'x': 'Time (30 minute intervals)', 'y': 'Sites unreachable'},
    title='Ukrainian Government Websites Down (.gov.ua)'
)

While there have been blips here and there it looks like a sustained outage began on March 3 at 15:17. Can we zoom in to see which websites these are? We can get the observations the hour before and after and see what hostnames differ.

In [None]:
from datetime import datetime, timezone

after = df[df['run'] >= datetime(2022, 3, 3, 15, 17, 0)]
just_before = df[(df['run'] >= datetime(2022, 3, 3, 13, 17, 0)) & (df['run'] < datetime(2022, 3, 3, 15, 17, 0))]

We can gt the website URLs for each period:

In [None]:
urls_before = just_before['url'].unique()
urls_after = after['url'].unique()

Now we can see which website URLs weren't down before, but were after with a bit of set logic.

In [None]:
urls_down = set(urls_after) - set(urls_before)
urls_down

In [None]:
len(urls_down)

Scanning the list makes it clear that a large number of these are host names invoving of `gromada.gov.ua`. Gromada in Ukrainian translates to Community in English. Here is one example from the Wayback Machine:

https://web.archive.org/web/20220228201105/https://ezupilska-gromada.gov.ua/

We can see if it's possible to get a sense of where these hostnames are hosted. First we need an IP address for the host:

In [None]:
from socket import gethostbyname

gethostbyname('ezupilska-gromada.gov.ua')

And then we need to see if we can find geo information fot that IP:

In [None]:
from geoip import geolite2

geolite2.lookup('195.248.234.252')

Lets write a function and apply it to our dataset.

In [None]:
from urllib.parse import urlparse 

def geo(url):
    uri = urlparse(url)
    try:
        hostname = uri.netloc
        ip = gethostbyname(hostname)
        loc = geolite2.lookup(ip)
        return loc.location
    except Exception as e:
        print(f"Failed to lookup {url}: {e}")
        return None

geo('https://ezupilska-gromada.gov.ua/')

In [None]:
df2 = pandas.DataFrame({"url": list(urls_down)})
df2

In [None]:
df2['geo'] = df2.url.map(geo)
df2

Unpack the lat/lon into separate columns:

In [None]:
df3 = df2[df2['geo'].notna()].copy()

df3['lat'] = df3['geo'].map(lambda a: a[0])
df3['lon'] = df3['geo'].map(lambda a: a[1])
df3 = df3.drop(columns=['geo'])
df3

Save it so we don't need to recalculate:

In [None]:
df3.to_csv('notebook.csv', index=False)

In [None]:
df3 = pandas.read_csv('notebook.csv')

from ipywidgets import Layout
from ipyleaflet import Map, Marker, MarkerCluster, basemaps

center = (50.44676, 30.51313)

m = Map(center=center, zoom=4, basemap=basemaps.CartoDB.Positron, layout=Layout(height='800px'))


marker = Marker(location=center, draggable=False, title="Kyivt")
m.add_layer(marker);

markers = []
for i, row in df3.iterrows():
    markers.append(Marker(location=(row['lat'], row['lon']), draggable=False, title=row['url']))

marker_cluster = MarkerCluster(markers=markers)
m.add_layer(marker_cluster)
    
m

In [None]:
m.save('outage-map.html', title='Website Outage 2020-03-03')

## All the Websites

It could be useful to get IP addresses and geo-location for the entire dataset, even though this could change in time. To do this we need to pull apart the geo function:

In [None]:
from urllib.parse import urlparse 

def ip(url):
    uri = urlparse(url)
    try:
        hostname = uri.netloc
        ip = gethostbyname(hostname)
        print(f'{hostname} -> {ip}')
        return ip
    except Exception as e:
        print(f"Failed to lookup {url}: {e}")
        return None

ip('https://ezupilska-gromada.gov.ua/')

In [None]:
websites = pandas.DataFrame({"homepage": df['url'].unique()})
websites = websites.sort_values('homepage')
websites

In [None]:
websites['ip'] = websites['homepage'].map(ip)
websites

In [None]:
def location(ip):
    try:
        loc = geolite2.lookup(ip)
        if loc:
            print(f'{ip} -> {loc.location}')
            return loc.location
        else:
            print(f'{ip} no location')
    except Exception as e:
        print(e)
    return None

location('77.87.197.41')

In [None]:
websites['location'] = websites['ip'].map(location)

In [None]:
websites['lat'] = websites['location'].map(lambda a: a[0] if a else None)
websites['lon'] = websites['location'].map(lambda a: a[1] if a else None)
websites = websites.drop(columns=['location'])
websites

In [None]:
websites.to_csv('websites.csv', index=False)

## .gov.ua map

We can put all the known website locations on a map.

In [2]:
websites = pandas.read_csv('websites.csv')
websites = websites.dropna()
websites

Unnamed: 0,homepage,ip,lat,lon
0,http://2001.ukrcensus.gov.ua,194.44.147.62,50.4500,30.5233
1,http://academia.gov.ua,176.103.56.62,49.4859,28.3482
3,http://academy.kvs.gov.ua,193.19.229.52,50.4333,30.5167
4,http://adm.od.court.gov.ua,212.90.190.139,50.4500,30.5233
5,http://akim.gov.ua,178.20.153.53,50.4500,30.5233
...,...,...,...,...
1179,https://zhovtanetska-gromada.gov.ua,195.248.234.252,49.2328,28.4810
1180,https://zhuravnenska-gromada.gov.ua,195.248.234.252,49.2328,28.4810
1181,https://zp.gov.ua,80.254.6.205,50.4500,30.5233
1182,https://zpa.court.gov.ua,212.90.190.139,50.4500,30.5233


In [3]:
from ipywidgets import Layout
from ipyleaflet import Map, Marker, MarkerCluster, basemaps

center = (50.44676, 30.51313)

website_map = Map(center=center, zoom=4, basemap=basemaps.CartoDB.Positron, layout=Layout(height='800px'))

markers = []
for i, row in websites.iterrows():
    markers.append(Marker(location=(row['lat'], row['lon']), draggable=False, title=row['homepage']))

marker_cluster = MarkerCluster(markers=markers)
website_map.add_layer(marker_cluster)
website_map

Map(center=[50.44676, 30.51313], controls=(ZoomControl(options=['position', 'zoom_in_text', 'zoom_in_title', '…

In [9]:
#website_map.save('website-map.html', title='Ukrainian Government Website Hosting')
help(website_map.save)

Help on method save in module ipyleaflet.leaflet:

save(outfile, **kwargs) method of ipyleaflet.leaflet.Map instance
    Save the Map to an .html file.
    
    Parameters
    ----------
    outfile: str or file-like object
        The file to write the HTML output to.
    kwargs: keyword-arguments
        Extra parameters to pass to the ipywidgets.embed.embed_minimal_html function.



2