**this is an example notebook for querying from the Opensantions API**

*Proceeding:* 

1. Query the individual sanctions lists from the website and write a loop to access all the days from July 2021 onward 

2. Parse into a dataframe, that is filtered and cleaned 

3. Write a function that matches a day with the previous day and flags the new additions or deletions. Parse those as a new column indicating removal or addition date.  

4. Merge all dataframes for all lists (UK, EU and US) together and aggregate onto a monthly level 

5. Create a separate dataframe for all designations concerning Russian entities 

 
datasets of interest:

us_ofac_sdn

us_ofac_cons

us_bis_denied


eu_fsf

eu_sanctions_map


loop though versions of the json not the API
ok - I just confirmed, the API doesn't have historical data - but all the historical data is available online still. You would have to request the file you want to check for each date, download the file, and then read through it and filter for what you're looking for
14:38 Uhr
e.g. the latest OFAC file is https://data.opensanctions.org/datasets/20240115/us_ofac_cons/entities.ftm.json
14:39 Uhr
you can then loop back on the date in the URL - the date is in the format YYYYMMDD
14:39 Uhr
e.g.  https://data.opensanctions.org/datasets/20240114/us_ofac_cons/entities.ftm.json and so on

In [1]:
#import packages
import requests
import json
import pandas as pd
import numpy as np
import os
from urllib.request import urlopen

**start with the SDN list**

In [2]:
#first check if the website is working
url = "https://data.opensanctions.org/datasets/20240115/us_ofac_cons/entities.ftm.json"
response = requests.get(url)
print(response)
#if we get a response code 200 its working

<Response [200]>


In [46]:
jsonurl =requests.get('https://data.opensanctions.org/datasets/20240115/us_ofac_cons/entities.ftm.json').json()
print(jsonurl)

JSONDecodeError: Extra data: line 2 column 1 (char 1268)

In [None]:
#solution? 
import json

with open("some_file.txt", "r") as f:
    content = f.read()
parsed_values = []
decoder = json.JSONDecoder()
while content:
    value, new_start = decoder.raw_decode(content)
    content = content[new_start:].strip()
    # You can handle the value directly in this loop:
    print("Parsed:", value)
    # Or you can store it in a container and use it later:
    parsed_values.append(value)
    

In [None]:
You could first read the file into a string, then create a loop that applies raw_decode, and then either directly handle the parsed value or store it in a container and handle all of them later. Should I update my answer with an example? – 
pschill
 Feb 14, 2019 at 7:58
That would be really helpful if you could update it for reading a file which has thousands of json objects (just like shown in my code above) which are NOT separated by comma . I'm very new to this parsing json format and I've spent much time on clicking every link on Google to find out how to do so. So your help would be greatly appreciated.Thanks in advance – 
Sohbet
 Feb 14, 2019 at 20:40
This fails if content has leading whitespace, but a simple content.lstrip() prior to the main loop sets things aright. content[new_start:].strip() can also be lstrip() although it's not a big deal either way. – 
ggorlen
 Oct 30, 2022 at 23:34
Also, content = f.read() slurps the whole file into memory, but a typical JSON stream is used because it may not fit into memory. – 
ggorlen
 Dec 6, 2023 at 5:56

In [19]:
#now I need to create a loop to access the datafiles from Jul 2021 to Dec 2023
#first create a list of datetimes to use in the for loop
date_list = pd.date_range(start='20210801',end='20231231',freq='D').strftime('%Y%m%d')
date_list

Index(['20210801', '20210802', '20210803', '20210804', '20210805', '20210806',
       '20210807', '20210808', '20210809', '20210810',
       ...
       '20231222', '20231223', '20231224', '20231225', '20231226', '20231227',
       '20231228', '20231229', '20231230', '20231231'],
      dtype='object', length=883)

In [26]:
#get all the urls for all the dates we need the data for
#make each day an individuallist so we can compare and match them
websites = []

for i in date_list:
    test = 'https://data.opensanctions.org/datasets/'+(i)+'/us_ofac_sdn/entities.ftm.json'
    websites.append(test)
print(websites)

['https://data.opensanctions.org/datasets/20210801/us_ofac_sdn/entities.ftm.json', 'https://data.opensanctions.org/datasets/20210802/us_ofac_sdn/entities.ftm.json', 'https://data.opensanctions.org/datasets/20210803/us_ofac_sdn/entities.ftm.json', 'https://data.opensanctions.org/datasets/20210804/us_ofac_sdn/entities.ftm.json', 'https://data.opensanctions.org/datasets/20210805/us_ofac_sdn/entities.ftm.json', 'https://data.opensanctions.org/datasets/20210806/us_ofac_sdn/entities.ftm.json', 'https://data.opensanctions.org/datasets/20210807/us_ofac_sdn/entities.ftm.json', 'https://data.opensanctions.org/datasets/20210808/us_ofac_sdn/entities.ftm.json', 'https://data.opensanctions.org/datasets/20210809/us_ofac_sdn/entities.ftm.json', 'https://data.opensanctions.org/datasets/20210810/us_ofac_sdn/entities.ftm.json', 'https://data.opensanctions.org/datasets/20210811/us_ofac_sdn/entities.ftm.json', 'https://data.opensanctions.org/datasets/20210812/us_ofac_sdn/entities.ftm.json', 'https://data.o

In [44]:
contents_list=[]
json_data = []
for site in websites:
    r= requests.get(site).json()
    #json_data = r.read().decode('utf-8')
    contents_list.append(r)

display(contents_list)

JSONDecodeError: Extra data: line 2 column 1 (char 555)

In [None]:
#merge and parse into one df
#from json import json_normalize
#df = pd.json_normalize(contents_list)
#df
data_ofac = pd.DataFrame.from_dict(contents_list,orient='columns')
data_ofac

In [None]:
#filter and clean up

repeat with EU and other lists

**OFAC Consolidated list**

In [None]:
websites = []

for i in date_list:
    test = 'https://data.opensanctions.org/datasets/'+(i)+'/us_ofac_cons/entities.ftm.json'
    websites.append(test)
print(websites)

In [35]:
from urllib.request import urlopen
contents_list=[]
json_data=[]
for site in websites:
    websites = site
    r = urlopen(websites)
    json_data = r.read().decode('utf-8', 'replace')
    contents_list.append(json_data)

display(contents_list)

AttributeError: 'set' object has no attribute 'timeout'

**US BIS List**

In [None]:
websites = []

for i in date_list:
    test = 'https://data.opensanctions.org/datasets/'+(i)+'/us_bis_denied/entities.ftm.json'
    websites.append(test)
print(websites)

In [None]:
from urllib.request import urlopen
contents_list=[]
json_data=[]
for site in websites:
    websites = site
    r = urlopen(websites)
    json_data = r.read().decode('utf-8', 'replace')
    contents_list.append(json_data)

display(contents_list)

**EU FSF**

In [None]:
websites = []

for i in date_list:
    test = 'https://data.opensanctions.org/datasets/'+(i)+'/eu_fsf/entities.ftm.json'
    websites.append(test)
print(websites)

In [None]:
from urllib.request import urlopen
contents_list=[]
json_data=[]
for site in websites:
    websites = site
    r = urlopen(websites)
    json_data = r.read().decode('utf-8', 'replace')
    contents_list.append(json_data)

display(contents_list)

**EU Sanctions Map**

In [None]:
websites = []

for i in date_list:
    test = 'https://data.opensanctions.org/datasets/'+(i)+'/eu_sanctions_map/entities.ftm.json'
    websites.append(test)
print(websites)

In [None]:
from urllib.request import urlopen
contents_list=[]
json_data=[]
for site in websites:
    websites = site
    r = urlopen(websites)
    json_data = r.read().decode('utf-8', 'replace')
    contents_list.append(json_data)

display(contents_list)

In [None]:
eu_travel_bans