**this is an example notebook for querying from the Opensantions API**

*Proceeding:* 

1. Query the individual sanctions lists from the website and write a loop to access all the days from July 2021 onward 

2. Parse into a dataframe, that is filtered and cleaned 

3. Write a function that matches a day with the previous day and flags the new additions or deletions. Parse those as a new column indicating removal or addition date.  

4. Merge all dataframes for all lists (UK, EU and US) together and aggregate onto a monthly level 

5. Create a separate dataframe for all designations concerning Russian entities 

 
datasets of interest:

us_ofac_sdn

us_ofac_cons

us_bis_denied


eu_fsf

eu_sanctions_map


loop though versions of the json not the API
ok - I just confirmed, the API doesn't have historical data - but all the historical data is available online still. You would have to request the file you want to check for each date, download the file, and then read through it and filter for what you're looking for
14:38 Uhr
e.g. the latest OFAC file is https://data.opensanctions.org/datasets/20240115/us_ofac_cons/entities.ftm.json
14:39 Uhr
you can then loop back on the date in the URL - the date is in the format YYYYMMDD
14:39 Uhr
e.g.  https://data.opensanctions.org/datasets/20240114/us_ofac_cons/entities.ftm.json and so on

In [1]:
#import packages
import requests
import json
import pandas as pd
import numpy as np
import os
import re
from urllib.request import urlopen

**start with the SDN list**

In [6]:
#first check if the website is working
url = "https://data.opensanctions.org/datasets/20240121/us_ofac_sdn/entities.ftm.json"
response = requests.get(url)
print(response)
#if we get a response code 200 its working

<Response [200]>


In [8]:
# habe bspw. list und dict genutzt. Nimm was dir besser passt.
response_entities_list = []
response_entities_map = {}
# response.text gibt dir den String, dann kannst du mit splitlines() über die Zeilen iterieren und jede einzeln parsen.
for line in response.text.splitlines():
    entity = json.loads(line)
    response_entities_list.append(entity)
    response_entities_map[entity['id']] = entity

In [4]:
response_entities_list

[{'id': 'NK-28ZcFDmHBF9L3WkDBBwH6H',
  'caption': 'GAZPROMBANK LEASING ZAO',
  'schema': 'Organization',
  'properties': {'country': ['ru'],
   'addressEntity': ['addr-ef04fdefe829ee677e4720eb1b5772d32edebb5f',
    'addr-92f69e81040047e41a13209e4a25e5be75fa2648'],
   'address': ['D. 40 Ulitsa Miklukho-Maklaya, 117342 Moscow',
    "Proektiruyemiy proezd No 4062, building 6, structure 16, BTs 'Port Plaza', 115432 Moscow"],
   'alias': ['CLOSED JOINT-STOCK COMPANY GAZPROMBANK LIZING'],
   'website': ['http://www.gpbl.ru/'],
   'notes': ['For more information on directives, please visit the following link: https://www.treasury.gov/resource-center/sanctions/Programs/Pages/ukraine.aspx#directives'],
   'name': ['GAZPROMBANK LEASING ZAO'],
   'topics': ['sanction'],
   'sourceUrl': ['https://sanctionssearch.ofac.treas.gov/Details.aspx?id=20297'],
   'taxNumber': ['7728294503'],
   'registrationNumber': ['1037728033606']},
  'referents': ['permid-5000057165',
   'ua-nazk-company-4707',
   'ru-

In [2]:
#now I need to create a loop to access the datafiles from Jul 2021 to Dec 2023
#first create a list of datetimes to use in the for loop
date_list = pd.date_range(start='20220101',end='20220115',freq='D').strftime('%Y%m%d')
date_list

Index(['20220101', '20220102', '20220103', '20220104', '20220105', '20220106',
       '20220107', '20220108', '20220109', '20220110', '20220111', '20220112',
       '20220113', '20220114', '20220115'],
      dtype='object')

In [3]:
#get all the urls for all the dates we need the data for
#make each day an individuallist so we can compare and match them
websites = []

for i in date_list:
    test = 'https://data.opensanctions.org/datasets/'+(i)+'/us_ofac_sdn/entities.ftm.json'
    websites.append(test)
print(websites)

['https://data.opensanctions.org/datasets/20220101/us_ofac_sdn/entities.ftm.json', 'https://data.opensanctions.org/datasets/20220102/us_ofac_sdn/entities.ftm.json', 'https://data.opensanctions.org/datasets/20220103/us_ofac_sdn/entities.ftm.json', 'https://data.opensanctions.org/datasets/20220104/us_ofac_sdn/entities.ftm.json', 'https://data.opensanctions.org/datasets/20220105/us_ofac_sdn/entities.ftm.json', 'https://data.opensanctions.org/datasets/20220106/us_ofac_sdn/entities.ftm.json', 'https://data.opensanctions.org/datasets/20220107/us_ofac_sdn/entities.ftm.json', 'https://data.opensanctions.org/datasets/20220108/us_ofac_sdn/entities.ftm.json', 'https://data.opensanctions.org/datasets/20220109/us_ofac_sdn/entities.ftm.json', 'https://data.opensanctions.org/datasets/20220110/us_ofac_sdn/entities.ftm.json', 'https://data.opensanctions.org/datasets/20220111/us_ofac_sdn/entities.ftm.json', 'https://data.opensanctions.org/datasets/20220112/us_ofac_sdn/entities.ftm.json', 'https://data.o

In [4]:
contents_list=[]
json_data = []
response_entities_list = []
entity = []
date_pattern = r'/datasets/(\d{8})/'  # Regular expression pattern to extract the date from the URL
for site in websites:
    response = requests.get(site)
    print(response)
    if response.status_code != 200:
        continue
    match = re.search(date_pattern, site)  # Extract the date from the URL
    if match:
        date_stamp = match.group(1)
    else:
        # Handle the case where the date cannot be extracted
        date_stamp = 'unknown'

    for line in response.text.splitlines():
        entity = json.loads(line)
        response_entities_list.append([entity])

<Response [200]>
<Response [200]>
<Response [200]>
<Response [200]>
<Response [200]>
<Response [200]>
<Response [200]>
<Response [200]>
<Response [200]>
<Response [200]>
<Response [200]>
<Response [200]>
<Response [200]>
<Response [200]>
<Response [200]>


In [8]:
response_entities_list

[[{'caption': 'Tehran, Block 6, Petrochemical Zone Site 2, Special Economic Zone, Imam Khomeini Port, Mahshahr, 1965754351',
   'datasets': ['us_ofac_sdn'],
   'first_seen': '2021-09-27T09:09:30',
   'id': 'addr-00067850bf1025c9ff398079419bfd50b6c1289b',
   'last_seen': '2022-01-17T07:40:14',
   'properties': {'city': ['Tehran'],
    'country': ['ir'],
    'full': ['Tehran, Block 6, Petrochemical Zone Site 2, Special Economic Zone, Imam Khomeini Port, Mahshahr, 1965754351'],
    'postalCode': ['1965754351'],
    'street': ['Block 6, Petrochemical Zone Site 2, Special Economic Zone, Imam Khomeini Port, Mahshahr']},
   'referents': ['addr-00067850bf1025c9ff398079419bfd50b6c1289b'],
   'schema': 'Address',
   'target': False}],
 [{'caption': 'Havana Street, Juba',
   'datasets': ['us_ofac_sdn'],
   'first_seen': '2021-09-27T09:09:30',
   'id': 'addr-0010092418001e47d82da9b20a53adb1580bb1ec',
   'last_seen': '2022-01-17T07:40:14',
   'properties': {'city': ['Juba'],
    'country': ['ss'],


In [9]:
#dump the data into a file to save
with open("response_entities_list.json","w") as file:
    json.dump(response_entities_list, file)

OSError: [Errno 28] No space left on device

repeat with EU and other lists

**OFAC Consolidated list**

In [None]:
websites = []

for i in date_list:
    test = 'https://data.opensanctions.org/datasets/'+(i)+'/us_ofac_cons/entities.ftm.json'
    websites.append(test)
print(websites)

In [None]:
contents_list=[]
json_data = []
response_entities_list = []
response_entities_map = {}
for site in websites:
    response = requests.get(site)
    print(response)
    for line in response.text.splitlines():
        entity = json.loads(line)
        response_entities_list.append(entity)
        response_entities_map[entity['id']] = entity

**US BIS List**

In [None]:
websites = []

for i in date_list:
    test = 'https://data.opensanctions.org/datasets/'+(i)+'/us_bis_denied/entities.ftm.json'
    websites.append(test)
print(websites)

In [None]:
contents_list=[]
json_data = []
response_entities_list = []
response_entities_map = {}
for site in websites:
    response = requests.get(site)
    print(response)
    for line in response.text.splitlines():
        entity = json.loads(line)
        response_entities_list.append(entity)
        response_entities_map[entity['id']] = entity

**EU FSF**

In [None]:
websites = []

for i in date_list:
    test = 'https://data.opensanctions.org/datasets/'+(i)+'/eu_fsf/entities.ftm.json'
    websites.append(test)
print(websites)

In [None]:
contents_list=[]
json_data = []
response_entities_list = []
response_entities_map = {}
for site in websites:
    response = requests.get(site)
    print(response)
    for line in response.text.splitlines():
        entity = json.loads(line)
        response_entities_list.append(entity)
        response_entities_map[entity['id']] = entity

**EU Sanctions Map**

In [None]:
websites = []

for i in date_list:
    test = 'https://data.opensanctions.org/datasets/'+(i)+'/eu_sanctions_map/entities.ftm.json'
    websites.append(test)
print(websites)

In [None]:
contents_list=[]
json_data = []
response_entities_list = []
response_entities_map = {}
for site in websites:
    response = requests.get(site)
    print(response)
    for line in response.text.splitlines():
        entity = json.loads(line)
        response_entities_list.append(entity)
        response_entities_map[entity['id']] = entity

In [None]:
eu_travel_bans