# Legal entities under sanctions with names and addresses

**Here https://sanctions.blackseanews.net/en I have found a list with every Russian sanctioned entity that includes addresses and dates since sanctions started taking effect.
Among other data transformations, this script also includes translating from Russian to the Latin alphabet.**

In [None]:
from bs4 import BeautifulSoup as bs
import requests
import urllib.request
import re
import pandas as pd

from transliterate import translit

In [None]:
url = 'https://sanctions.blackseanews.net/en?page=1&itemsPerPage=3000'
url2 = 'https://sanctions.blackseanews.net/en?page=2&itemsPerPage=3000'
url3 = 'https://sanctions.blackseanews.net/en?page=3&itemsPerPage=3000'

table1 = pd.read_html(url)
table2 = pd.read_html(url2)
table3 = pd.read_html(url3)

df1 = table1[0]
df2 = table2[0]
df3 = table3[0]

## Data transformation

Because the data contains many inaccuracies, I have to preprocess it as follows:

* Rename the columns.
* Concatenate all three dataframes (from those three different web sources).
* Fill NaN values.
* Remove inconsistencies.
* Translate from Russian to the Latin alphabet.

In [3]:
dataframes = [df1, df2, df3]

# new column names
column_mapping = {'Sort by': 'ID', 'Unnamed: 1': 'name', 'Unnamed: 2': 'country', 'Unnamed: 3': 'region',
                  'Unnamed: 4': 'address', 'Unnamed: 5': 'UA sanctions', 'Unnamed: 6': 'US sanctions',
                  'Unnamed: 7': 'US export restrictions', 'Unnamed: 8': 'EU sanctions',
                  'Unnamed: 9': 'UK sanctions', 'Unnamed: 10': 'Canadian sanctions'}

for df in dataframes:
    df.rename(columns=column_mapping, inplace=True)

In [4]:
dataframes = [df1, df2, df3]

# columns to apply the function
columns = ['UA sanctions', 'US sanctions', 'US export restrictions', 'EU sanctions', 'UK sanctions', 'Canadian sanctions']

#
def extract_date_from_string(string):
    date_match = re.search(r"\d{4}-\d{2}-\d{2}", string) # check if the string contains a date pattern
    if date_match:
        return date_match.group()
    else:
        pass  


for df in dataframes:
    for col in columns:
        df[col] = df[col].apply(extract_date_from_string)

In [5]:
df = pd.concat([df1, df2, df3], ignore_index=True)

In [6]:
df.fillna('', inplace=True)

In [8]:
#columns = ['country', 'region', 'address']

#def translate_ru(russian_text):
 #   roman_text = translit(russian_text, 'ru', reversed=True)
  #  return roman_text

#for col in columns:
 #   df[col] = df[col].apply(translate_ru)

In [9]:
def clean_string(value):
    cleaned_value = re.sub(r'^Name in Ukrainian|TOB', '', value) # remove "Name in Ukrainian"
    cleaned_value = re.sub(r'[«»"«]', '', cleaned_value) # remove double quotes and special characters
    return cleaned_value.strip() # return the cleaned value

df['name'] = df['name'].apply(clean_string)

In [10]:
# translating from Ukrainian
def translate_ua(ukr):
    en = translit(ukr, 'uk', reversed = True)
    return en

df['name'] = df['name'].apply(translate_ua)
df['country'] = df['country'].apply(translate_ua)
df['region'] = df['region'].apply(translate_ua)

In [11]:
def extract_country(value):
    country = value.replace('Country', '') # remove "Country" from the beginning of the string
    return country.strip()

df['country'] = df['country'].apply(extract_country)

In [12]:
def extract_region(value):
    region = value.replace('Region', '') # remove "Region" from the beginning of the string
    return region.strip()

df['region'] = df['region'].apply(extract_region)

In [13]:
def extract_address(value):
    address = value.replace('Address', '') # remove "Address" from the beginning of the string
    return address.strip() # return the extracted address

# Assuming the column name is 'your_column'
df['address'] = df['address'].apply(extract_address)

### Full dataset

In [14]:
df

Unnamed: 0,ID,name,country,region,address,UA sanctions,US sanctions,US export restrictions,EU sanctions,UK sanctions,Canadian sanctions
0,ID1,DUP RK Peredhir'ja,Ukraine,Crimea,"32 Pervomaiska St, Krynychne Village, Bilohirs...",2017-05-15,,,,,
1,ID2,Kryms'ke respublikans'ke pidpryyemstvo Azovs'k...,Ukraine,Crimea,"40 Zaliznychna St, Azovske Urban-Type Settleme...",2019-03-19,2015-12-22,2015-12-28,2014-07-25,2014-07-25,2014-08-06
2,ID3,Seljans'ke fermers'ke hospodarstvo Arija-N,Ukraine,Crimea,"25 Vynohradna St, Voskhod Village, Krasnohvard...",2019-03-19,,,,,
3,ID4,TOV Shturm Perekopu,Ukraine,Crimea,"4 Konstytutsii St, Illinka Village, Krasnopere...",2019-03-19,,,,,
4,ID5,DBU RK Kryms'kyj ryborozplidnyk,Ukraine,Crimea,"12A Peremohy St, Novorybatske Village, Krasnop...",2017-05-15,,,,,
...,...,...,...,...,...,...,...,...,...,...,...
7086,ID7101,AT AB Kholdinh,Russia,Moscow,"г. Москва, ул. Маши Порываевой, 7, стр. В",2023-07-05,,,,,
7087,ID7102,EjBiEjch Fajnenshynal Limited,Republic of Cyprus,Nicosia,"Vizantiou, 5, Spyrides Tower, Strovolos, 2064,...",2023-07-05,,,,,
7088,ID7103,Al'fastrakhovaniye Kholdynhz Limited,Republic of Cyprus,Nicosia,"Vizantiou, 5, Spyrides Tower, Strovolos, 2064,...",2023-07-05,,,,,
7089,ID7104,EjBiEjch Jukrejn Limited,Republic of Cyprus,Nicosia,"Vizantiou, 5, Spyrides Tower, Strovolos, 2064,...",2023-07-05,,,,,
