# Scraping sanctions information on Russia - chronologically

I have scraped a list with information regarding the sanctions applied to the Russian Federation since the beginning of the conflict. This list can be found on the official website of the Ministry of Finance of Ukraine:
* https://index.minfin.com.ua/en/russian-invading/sanctions/

In [None]:
from bs4 import BeautifulSoup as bs
import requests
import urllib.request
import re
import pandas as pd
import datetime
from dateutil.relativedelta import relativedelta

In [2]:
base_url = 'https://index.minfin.com.ua/en/russian-invading/sanctions/'
start_date = datetime.datetime(2022, 2, 1)
end_date = datetime.datetime(2023, 7, 1)
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36'
}

In [3]:
url_list = []

current_date = start_date
while current_date < end_date:
    url = base_url + current_date.strftime('%Y-%m') + '/'
    url_list.append(url)
    current_date += relativedelta(months=1)

soup_list = []

for url in url_list:
    response = requests.get(url, headers=headers)
    soup = bs(response.content, 'html.parser')
    soup_list.append(soup)

**I iterated through every bs4 object that I previously created and extracted the needed information using bs4 methods.**

In [4]:
#items = soup_list[i].find_all('li', class_='gold')

data = []

# iterating over each bs4 object in soup_list and extracting the desired information
for soup in soup_list:
    items = soup.find_all('li', class_='gold')

    # iterating over each item
    for item in items:
        date_element = item.find('big', class_='gold')
        subject_element = item.find('div', class_='invading-subject')
        branch_element = item.find('div', class_='invading-branch')
        description_element = item.find('p')

        # checking if the elements are found before accessing their 'text' attribute
        date = date_element.text.strip() if date_element else ''
        subject = subject_element.text.strip() if subject_element else ''
        branch = branch_element.text.strip() if branch_element else ''
        description = description_element.text.strip() if description_element else ''

        # appending the extracted information to the data list as a dictionary
        data.append({
            "Date": date,
            "Subject": subject,
            "Branch": branch,
            "Description": description
        })

# creating a dataframe from the data list
df = pd.DataFrame(data)

In [5]:
df

Unnamed: 0,Date,Subject,Branch,Description
0,28.02.2022,Volvo Cars (Passenger transport),Restrictions on goods transportation to and fr...,suspend all shipments to Russia
1,28.02.2022,Pekao SA (Banking),Restrictions on the provision of services in r...,Transactions in Russian ruble suspended.
2,28.02.2022,Agrokoncernas (Agricultural industry),Suspension of cooperation with russia and cert...,"The group of companies ""Agroconcernas"", which ..."
3,28.02.2022,Czech Republic,Restrictions on russian banks,The Czech National Bank today launched steps t...
4,28.02.2022,Almaty (IT companies),Restrictions on the russian media,A company providing Internet and television se...
...,...,...,...,...
2079,10.06.2023,Canada,Seizure (confiscation) of russian property,Canada confiscated the An-124 transport plane ...
2080,09.06.2023,Estonia,Travel and visa bans for russians,"The Minister of Foreign Affairs of Estonia, Ma..."
2081,06.06.2023,Finland,Expulsion of russian diplomats,The government of Finland decided to expel fro...
2082,01.06.2023,Nintendo (Gaming industry),Temporary cease of production in russia,The Japanese company developing game consoles ...
