# Sanctions on Russia

This contains prototyping code for a scraper to load a list of sanctions on Russia.

Data source: https://www.ashurst.com/en/news-and-insights/hubs/sanctions-tracker/


## Autoreload

This extension will reload used python objects.

In [None]:
%load_ext autoreload
%autoreload 2

## Working directory

Let's make sure we are at the root working directory (not notebooks).

In [None]:
%pwd

In [None]:
%cd ..

## Logging
This will setup basic logging capabilities.

In [None]:
import logging
import sys

root = logging.getLogger()
root.setLevel(logging.DEBUG)

handler = logging.StreamHandler(sys.stdout)
handler.setLevel(logging.DEBUG)
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
handler.setFormatter(formatter)
root.addHandler(handler)

## Factory

The scraper factory provides a simple and programatic way to load scrapers dynamically, without need to declare imports.

In [None]:
from iea_scraper.core import factory

### Loading a scraper

The command to load a scraper using the factory is:

```
    job = factory.get_scraper_job(<module name>, <scraper package name>, [<optional parameters>, ...])
```
An example below:

In [None]:
job = factory.get_scraper_job('br_gov_anp', 'br_oil_prod', full_load=True)

In [None]:
del job

# Checking website

Let's inspect the website first with BeautifulSoap.

In [None]:
import pandas as pd

countries = ['UK', 'EU', 'Japan', 'Australia']
url = 'https://www.ashurst.com/en/news-and-insights/hubs/sanctions-tracker/'

dfs = pd.read_html(url)

if len(countries) != len(dfs):
    raise ValueError(f'The number of tracked countries ({len(countries)}) is different from what we get from the website ({len(dfs)}). Please check the website.')

for c, df in zip(countries, dfs):
    df['country'] = c

df = pd.concat(dfs)

def convert_date(date):

    return 
    if '(' not in date:
        return date
    else:
        return d.split('(')[0]
        


# convert date
df['Date of imposition'] = df['Date of imposition'].apply(lambda d: date if '(' not in d else d.split('(')[0])
df['Notes'] = df['Date of imposition'].apply(lambda d: None if '(' not in d else d.split('(')[1].split(')')[0])

df['Date of imposition'] = pd.to_datetime(df['Date of imposition'], errors='coerce', format='%d %B %Y')


display(df)

In [None]:
df['Notes'].drop_duplicates()

In [None]:
from datetime import datetime

date = '24 February 2022'
d = datetime.strptime(date, '%d %B %Y')

In [None]:
d

## Test scraper

In [None]:
job = factory.get_scraper_job('com_ashurst', 'russia_sanctions')

In [None]:
job.run()