# Apply Models to Parse Data

Code to apply GPT models to extract information from dockets at the docket entry level.

NOTE: The data for the article was parsed using OpenAI's ADA model. However, as of January 2024, that model is no longer available. This notebook uses a chat version of GPT3.5 instead. 

### Setup

In [1]:
# General Setup
import pandas as pd
import re
from datasets import load_dataset
from tqdm import tqdm
tqdm.pandas()
import time
import httpx

In [2]:
# Setup openai

# Load OpenAI API key
# NOTE: need to first place the key in .env file
# like this: OPENAI_API_KEY=sk-xxxxxx...
from dotenv import load_dotenv
load_dotenv()

# Load openai
from openai import OpenAI
client = OpenAI(timeout=httpx.Timeout(15.0, read=5.0, write=10.0, connect=3.0))



In [24]:
# Settings to change

# Select the years (year case filed) of interest

start_year = 2012
end_year = 2022

# Stem for file names
completions_file = 'output/stays_2012-22'
completions_file_deprecated = 'output/DEPRECATED-ADA-stays_2012-22'

# pick models to use
notice_type_model = 'ft:gpt-3.5-turbo-0613:refugee-law-lab:stays-notice-types:8uthOYkO' 
order_type_model = 'ft:gpt-3.5-turbo-0613:refugee-law-lab:stays-order-types:8uuXKz4l'
order_outcome_model = 'ft:gpt-3.5-turbo-0613:refugee-law-lab:stays-outcomes:8v442Ukp'
judge_model = 'ft:gpt-3.5-turbo-0613:refugee-law-lab:stays-judges:8v4XnuHv'

# Select number of cases to be parsed
# indicate 0 for all the refugee cases in the years selected above
# indicate another number for random sample from the refugee cases in the years selected above

# ____________________________________________________________________________________
# ____________ CAREFUL: Setting this wrong could make things expensive! ______________
# ____________________________________________________________________________________

sample_size = 0  # set to 0 if you want full df, otherwise specify sample size

# ____________________________________________________________________________________
# ____________________________________________________________________________________
# ____________________________________________________________________________________

## Load dockets

In [4]:
# Load dockets from Hugging Face
df = load_dataset("refugee-law-lab/luck-of-the-draw-iii", split="train").to_pandas()
print("Length of full dataset:", len(df))
df.head()

Length of full dataset: 218639


Unnamed: 0,citation,year,name,date_filed,city_filed,nature,class,track,documents,source_url,scraped_timestamp
0,IMM-10085-12,2012,EDITH VICTORIA CASTRO RODRIGUES v. MCI,2012-10-01,Toronto,Imm - Appl. for leave & jud. review - IRB - Re...,Non-Action,Immigration Leave & Judicial Review,"[{'DOCNO': None, 'DOC_DT': '2013-04-25', 'RECO...",https://www.fct-cf.gc.ca/en/court-files-and-de...,2022-11-23
1,IMM-10182-12,2012,ABDOU KHADIR SECK c. MCI,2012-10-04,Montréal,Imm - Appl. for leave & jud. review - IRB - Re...,Non-Action,Immigration Leave & Judicial Review,"[{'DOCNO': None, 'DOC_DT': '2013-03-19', 'RECO...",https://www.fct-cf.gc.ca/en/court-files-and-de...,2022-11-23
2,IMM-10196-12,2012,CYRIL JOHN DA SILVA v. MCI,2012-10-04,Toronto,Imm - Appl. for leave & jud. review - IRB -Imm...,Non-Action,Immigration Leave & Judicial Review,"[{'DOCNO': None, 'DOC_DT': '2017-07-25', 'RECO...",https://www.fct-cf.gc.ca/en/court-files-and-de...,2022-11-23
3,IMM-10211-12,2012,ALISA POGORELOVSKY ET AL v. MCI,2012-10-05,Toronto,Imm - Appl. for leave & jud. review - IRB - Re...,Non-Action,Immigration Leave & Judicial Review,"[{'DOCNO': None, 'DOC_DT': '2013-02-15', 'RECO...",https://www.fct-cf.gc.ca/en/court-files-and-de...,2022-11-23
4,IMM-10212-12,2012,DARIUSZ GLOWACKI ET AL v. MCI,2012-10-05,Toronto,Imm - Appl. for leave & jud. review - IRB - Re...,Non-Action,Immigration Leave & Judicial Review,"[{'DOCNO': 17.0, 'DOC_DT': '2014-05-15', 'RECO...",https://www.fct-cf.gc.ca/en/court-files-and-de...,2022-11-23


In [5]:
print(df.iloc[0]['documents'])

[{'DOCNO': None, 'DOC_DT': '2013-04-25', 'RECORDED_ENTRY': " Memorandum to file from Ann Murphy dated 25-APR-2013 further to phone conversations with the Law Society concerning the death of Applicant's counsel and the Applicant's dismissed order, I have been advised that the Law Society has advised the Applicant that she should retain new counsel and contact the Federal.  The Law Society will not provide the registry with the address of the Applicant in order for the registry to send out the dismissed order.  The Law Society will not advise  the Applicant that her Application was dismissed. placed on file.", 'RE_NO': 14}
 {'DOCNO': None, 'DOC_DT': '2013-04-16', 'RECORDED_ENTRY': " Memorandum to file from Ann Murphy dated 16-APR-2013 I have contacted the Law Society of Upper Canada concering status of Mr. Makepeace's legal file, for this Applicant in light of the fact he is now deceased.  They will call me back. BF 25-apr-2013 placed on file.", 'RE_NO': 13}
 {'DOCNO': None, 'DOC_DT': '2

In [6]:
# Filter for years
years_included = list(range(start_year, end_year+1))
df = df.loc[df['year'].isin(years_included)]
print("Length of dataset filtered by years sought", len(df))


Length of dataset filtered by years sought 87776


In [7]:
# Filter for dockets that include words related to stays

REstays = re.compile('stay|sursis|surseoir')

def check_stays(x):
    for doc in x:
        if REstays.search(doc['RECORDED_ENTRY']):
            return True
    return False

df=df[df['documents'].apply(lambda x: check_stays(x))]
df=df.reset_index(drop=True)

print("Length of filtered dataset", len(df))


Length of filtered dataset 7045


In [8]:
# Filter for sample size (if set) & reset index

if sample_size > 0:
    df = df.sample(sample_size, random_state=42)

df=df.reset_index(drop=True)

print("Length of final dataset, including sample", len(df))

Length of final dataset, including sample 7045


In [9]:
# Flatten the dataframe (each row is a docket entry)
cases_list=[]
for case_num, case_citation in enumerate(df['citation']):
    for dock_num, docket_entry in enumerate (df.iloc[case_num]['documents']):
        docket={}
        docket['citation']=case_citation
        docket['year'] = df.iloc[case_num]['year']
        docket['city_filed'] = df.iloc[case_num]['city_filed']
        docket['nature'] = df.iloc[case_num]['nature']
        docket['RECORDED_ENTRY']=docket_entry['RECORDED_ENTRY'].replace('\n', ' ').replace('\t', ' ').strip()
        docket['RE_NO']=docket_entry['RE_NO']
        docket['DOCNO']=docket_entry['DOCNO']
        docket['DOC_DT']=docket_entry['DOC_DT']
        cases_list.append(docket)
df_flat = pd.DataFrame(cases_list)

print('Length of flat df:', len(df_flat))
df_flat.head()

Length of flat df: 188584


Unnamed: 0,citation,year,city_filed,nature,RECORDED_ENTRY,RE_NO,DOCNO,DOC_DT
0,IMM-10486-12,2012,Toronto,Imm - Appl. for leave & jud. review - Pre-remo...,Certified French translation of the reasons fo...,45,27.0,2013-11-26
1,IMM-10486-12,2012,Toronto,Imm - Appl. for leave & jud. review - Pre-remo...,Acknowledgment of Receipt received from Applic...,44,,2013-10-16
2,IMM-10486-12,2012,Toronto,Imm - Appl. for leave & jud. review - Pre-remo...,(Final decision) Reasons for Judgment and Jud...,43,26.0,2013-10-16
3,IMM-10486-12,2012,Toronto,Imm - Appl. for leave & jud. review - Pre-remo...,Toronto 09-OCT-2013 BEFORE The Honourable Mr....,42,,2013-10-09
4,IMM-10486-12,2012,Toronto,Imm - Appl. for leave & jud. review - Pre-remo...,Affidavit of IAN MCILWAIN on behalf of the res...,41,25.0,2013-10-07


### Parse dockets using regex and GPT3 models

In [10]:
# Regex to help parse dockets

exclude_start_list = [
    '^\(Décision',
    '^\(Final',
    '^Accusé',
    '^Acknowledgment',
    '^Affidavit',
    '^Amended',
    '^Application',
    '^Carte',
    '^Certificate',
    '^Certified',
    '^Communication',
    '^Confirmation',
    '^Draft',
    '^First',
    '^Letter',
    '^Lettre',
    '^Memorandum',
    '^Second',
    '^Traduction',
    '^\*\*\*\*'
]

REexclude_start = re.compile('|'.join(exclude_start_list), re.I)

hearing_start_list = [
    '^Calgary', 
    '^Charlottetown', 
    '^Edmonton',
    '^Fredericton',
    '^Halifax',
    '^Hamilton',
    '^Iqaluit',
    '^Montréal',
    '^Montreal',
    '^Ottawa',
    '^Québec',
    '^Quebec',
    '^Regina',
    '^Saskatoon',
    '^St\. John',
    '^St John',
    '^Saint John',
    '^Toronto',
    '^Vancouver',
    '^Whitehorse',
    '^Winnipeg',
    '^Yellowknife'
]

REhearing_start = re.compile('|'.join(hearing_start_list), re.I)

REnotice_start =  re.compile('^Notice of motion|^Avis de requête', re.I)

order_start_list = [
    '^Order',
    '^Ordonnance',
    '^judgment',
    '^judgement',
    '^jugement',
    '^reasons',
    '^motifs'
]

REorder_start =  re.compile('|'.join(order_start_list), re.I)

copy_order_start_list = [
    '^Copy of order',
    '^Copie de l\'ordonnance',
    '^Copy of judgment',
    '^Copy of judgement',
    '^Copy of jugement',
    '^Copy of reasons',
    '^Copie des motifs'
]

REcopy_order_start =  re.compile('|'.join(copy_order_start_list), re.I)

directions_start_list = [
    '^oral direction',
    '^copy of oral direction',
    '^written direction',
    '^copy of written direction',
    '^direction',
    '^copy of direction',
    '^directives verbale',
    '^copie des directives verbale',
    '^directives écrite',
    '^copie des directives écrite'
]

REdirections_start =  re.compile('|'.join(directions_start_list), re.I)


In [11]:
# system messages

notice_type_system_message = """You are given a notice of motion. Return 'stay of removal' if the notice is about a stay of removal.
Otherwise return 'other'. Note that stays of release from detention are not stays of removal so should be categorized as other."""

order_type_system_message = """You are given a Federal Court docket entry about an order. Return 'motion #' (with the actual number) where
the order refers to a motion number. If no motion number is indicated, and the order clearly involves a stay of removal, return 'stay of removal'.
Otherwise return 'other'. Note that stays of release from detention are not stays of removal and so should be categorized as other."""

order_outcome_system_message = """You are given a Federal Court docket entry about an order. Return 'granted' if the entry reports that a motion or an application
has been granted. Return 'dismissed' if the entry reports that a motion or an application has been dismissed. Return 'other' if the outcome is unclear
or if the motion or application is clearly only procedural in nature (e.g. scheduling, documents, adjournment, etc.)."""

judge_system_message = """You are given a Federal Court docket entry. If the docket entry includes a specific judge identified by name
then return the judge's name. If the docket entry does not include a specific judge identified by name, or if anything is 
unclear then return 'other'.""" 


In [12]:
# Helper functions to parse dockets

def categorize_dockets(x):
    
    #Remove excluded dockets
    if re.match(REexclude_start, x):
        return 'exclude'

    # Keep hearing dockets
    if re.match(REhearing_start, x):
        return 'hearing'
    
    # Keep notice of motions dockets
    if re.match(REnotice_start, x):
        return 'notice'
    
    # Keep order dockets
    if re.match(REorder_start, x):
        return 'order'
    
    # Keep copy order dockets
    if re.match(REcopy_order_start, x):
        return 'copy_order'   

    # Keep directions dockets
    if re.match(REdirections_start, x):
        return 'directions'
    
    return 'exclude'

def apply_model(docket_prompt, system_message, model_to_use, max_tokens):
    
    for attempt in range(10):
        try:
            model_output = client.chat.completions.create(
                model=model_to_use,
                max_tokens = max_tokens,
                temperature=0,
                stop = "\n",
                messages=[
                    {"role": "system", "content": system_message},
                    {"role": "user", "content": docket_prompt}])
            return model_output.choices[0].message.content.strip()
        except:
            print('OpenAI error. Attempt: ' + str(attempt+1))
            print('Docket: ' + docket_prompt)
            print('Model: ' + model_to_use)
            time.sleep(30)

            print()
    
    # Generate error if 20 attempts fail
    raise ValueError('OpenAI error, 10 attempts failed. Docket: ' + docket_prompt + ' Model: ' + model_to_use)
    
def get_completion(docket_entry, system_message, model_to_use, max_tokens=3):
    return apply_model(docket_entry['RECORDED_ENTRY'].replace('\n',' ').strip(),
        system_message,
        model_to_use,
        max_tokens)

def parse_docket_entry(docket_entry):
    docket_entry['type'] = ''
    docket_entry['outcome'] = ''
    docket_entry['judge'] = ''

    if docket_entry['category'] == 'exclude':
        return docket_entry

    if docket_entry['category'] == 'notice':
       docket_entry['type'] =  get_completion(docket_entry, notice_type_system_message, notice_type_model)
    
    if docket_entry['category'] == 'order' or docket_entry['category'] == 'copy_order':        
        docket_entry['type'] =  get_completion(docket_entry, order_type_system_message, order_type_model)
        
        if docket_entry['type'] == 'stay of removal' or docket_entry['type'].startswith('motion'):
        
            docket_entry['outcome']=  get_completion(docket_entry, order_outcome_system_message, order_outcome_model)
            docket_entry['judge'] =  get_completion(docket_entry, judge_system_message, judge_model, max_tokens=10)


    return docket_entry


In [13]:
# Categorize dockets using regex
df_flat['category']=df_flat['RECORDED_ENTRY'].apply(lambda x: categorize_dockets(x))
print(df_flat['category'].value_counts())

category
exclude       156214
order           8776
directions      8402
notice          7564
hearing         6761
copy_order       867
Name: count, dtype: int64


In [14]:
# Apply model  - WARNING: CHECK TO MAKE SURE YOU WANT TO RUN THIS B/C IT RESULTS IN OPENAI CHARGES

df_flat = df_flat.progress_apply(lambda docket_entry: parse_docket_entry(docket_entry), axis=1)


100%|██████████| 188584/188584 [2:03:05<00:00, 25.53it/s]  


In [None]:
#filter df for category = 'notice' and get value counts for type
df_flat[df_flat['category']=='notice']['type'].value_counts()



In [15]:
# Save results of applying model

df_flat.to_excel(completions_file+'.xlsx', index=False)
df_flat.to_json(completions_file+'.jsonl', orient='records', lines='true', force_ascii=False)


In [33]:
# Made DOCNO usable
df_completion['DOCNO'] = df_completion['DOCNO'].fillna(0).astype(int)

# Make doc date usable
#df_completion['DOC_DT'] = pd.to_datetime(df_completion['DOC_DT'], format="%Y-%m-%d %H:%M:%S")

# Collapse completion df, one row per case
df_completion = df_completion.fillna('')
df_completion = df_completion.groupby(['citation'], as_index=False).agg({
    'year':'first',
    'city_filed': 'first',
    'nature': 'first',
    'RE_NO': lambda x: list(x),
    'DOCNO': lambda x: list(x),
    'DOC_DT': lambda x: list(x),
    'category': lambda x: list(x),
    'type': lambda x: list(x),
    'outcome': lambda x: list(x),
    'judge': lambda x: list(x),
})
print('Len of df_completion: ' + str(len(df_completion)))
df_completion

Len of df_completion: 7045


Unnamed: 0,citation,year,city_filed,nature,RE_NO,DOCNO,DOC_DT,category,type,outcome,judge
0,IMM-1-16,2016,Montréal,Imm - Appl. for leave & jud. review - Other Ar...,"[24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 1...","[0, 0, 0, 0, 11, 0, 10, 0, 0, 0, 9, 8, 7, 0, 0...","[2016-03-31 00:00:00, 2016-03-23 00:00:00, 201...","[exclude, exclude, exclude, exclude, exclude, ...","[, , , , , , stay of removal, , , , , , , , , ...","[, , , , , , dismissed, , , , , , , , , , , , ...","[, , , , , , St-Louis, , , , , , , , , , , , ,..."
1,IMM-1-17,2017,Toronto,Imm - Appl. for leave & jud. review - Other Ar...,"[47, 46, 45, 44, 43, 42, 41, 40, 39, 38, 37, 3...","[26, 25, 0, 0, 24, 0, 0, 0, 0, 23, 22, 0, 0, 2...","[2020-01-17 00:00:00, 2017-12-15 00:00:00, 201...","[exclude, exclude, directions, exclude, exclud...","[, , other, , , , , , , , , , , other, , , , ,...","[, , , , , , , , , , , , , granted, , , , , , ...","[, , , , , , , , , , , , , Gleeson, , , , , , ..."
2,IMM-1-22,2022,Toronto,Imm - Appl. for leave & jud. review - Other Ar...,"[17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5...","[0, 0, 10, 9, 0, 8, 0, 0, 0, 7, 6, 5, 4, 3, 0,...","[2022-09-06 00:00:00, 2022-09-06 00:00:00, 202...","[exclude, exclude, exclude, exclude, exclude, ...","[, , , , , stay of removal, , , other, , , , ,...","[, , , , , dismissed, , , , , , , , , , , ]","[, , , , , Brown, , , , , , , , , , , ]"
3,IMM-10-12,2012,Vancouver,Imm - Appl. for leave & jud. review - Pre-remo...,"[16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4,...","[0, 0, 0, 0, 9, 8, 0, 0, 0, 7, 6, 5, 4, 3, 2, 1]","[2012-03-05 00:00:00, 2012-02-15 00:00:00, 201...","[exclude, exclude, exclude, directions, exclud...","[, , , other, , , , , , , , , stay of removal,...","[, , , , , , , , , , , , , , , ]","[, , , , , , , , , , , , , , , ]"
4,IMM-10-14,2014,Toronto,Imm - Appl. for leave & jud. review - IRB - Re...,"[21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 1...","[0, 0, 0, 13, 12, 11, 10, 0, 0, 0, 0, 9, 8, 7,...","[2014-07-10 00:00:00, 2014-03-24 00:00:00, 201...","[exclude, exclude, exclude, exclude, exclude, ...","[, , , , , , , , , , , , , , stay of removal, ...","[, , , , , , , , , , , , , , , , , , , , ]","[, , , , , , , , , , , , , , , , , , , , ]"
...,...,...,...,...,...,...,...,...,...,...,...
7040,IMM-9965-22,2022,Toronto,Imm - Appl. for leave & jud. review - Pre-remo...,"[17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5...","[13, 12, 11, 0, 10, 0, 0, 9, 8, 0, 7, 6, 5, 4,...","[2022-11-16 00:00:00, 2022-11-16 00:00:00, 202...","[exclude, exclude, exclude, exclude, order, he...","[, , , , stay of removal, , , , , , , , , , st...","[, , , , dismissed, , , , , , , , , , , , ]","[, , , , Elliott, , , , , , , , , , , , ]"
7041,IMM-9968-22,2022,Ottawa,Imm - Appl. for leave & jud. review - Other Ar...,"[21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 1...","[0, 13, 12, 11, 10, 0, 9, 8, 7, 0, 0, 0, 0, 0,...","[2022-10-24 00:00:00, 2022-10-24 00:00:00, 202...","[exclude, exclude, exclude, exclude, order, he...","[, , , , stay of removal, , , , , , , , , othe...","[, , , , dismissed, , , , , , , , , , , , , , ...","[, , , , Grammond, , , , , , , , , , , , , , ,..."
7042,IMM-999-12,2012,Toronto,Imm - Appl. for leave & jud. review - Other Ar...,"[30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 1...","[0, 17, 0, 0, 0, 0, 0, 16, 0, 0, 15, 14, 13, 1...","[2012-09-25 00:00:00, 2012-09-25 00:00:00, 201...","[exclude, exclude, hearing, exclude, exclude, ...","[, , , , , , , other, , , , , , , , , stay of ...","[, , , , , , , granted, , , , , , , , , grante...","[, , , , , , , Hughes, , , , , , , , , Hughes,..."
7043,IMM-9992-12,2012,Toronto,Imm - Appl. for leave & jud. review - Other Ar...,"[11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1]","[10, 0, 9, 8, 7, 6, 5, 4, 3, 2, 1]","[2012-10-09 00:00:00, 2012-09-27 00:00:00, 201...","[exclude, exclude, exclude, exclude, exclude, ...","[, , , , , , , , stay of removal, , ]","[, , , , , , , , , , ]","[, , , , , , , , , , ]"


In [34]:
# Prepare docket level summary data based on processed docket entries

def stay_notice(x):                 # note, gets last stay notice (I think)
    if 'notice' in x['category']:
        docket_indexes = [i for i, n in enumerate(x['category']) if n == 'notice']
        for dock in docket_indexes:
            if 'stay of removal' in x['type'][dock]:
               x['stay_of_removal'] = True
               x['notice_docno'] = 'motion '+ str(x['DOCNO'][dock])
               x['notice_date'] = x['DOC_DT'][dock]
               return x
    return x

def stay_order(x):                 # note gets last stay order (I think)

    if not x['stay_of_removal']:
        return x

    if 'order' in x['category'] or 'copy_order' in x['category']:
        docket_indexes = [i for i, n in enumerate(x['category']) if n in ['order','copy_order']]
        for dock in docket_indexes:
            if x['type'][dock] == 'stay of removal' and x['outcome'][dock] in ['granted','dismissed'] and x['judge'][dock] not in ['', 'other']:
                x['stay_outcome']= x['outcome'][dock]
                x['stay_outcome_date']= x['DOC_DT'][dock]
                x['stay_judge']= x['judge'][dock]
                return x
            if x['type'][dock] == x['notice_docno'] and x['outcome'][dock] in ['granted','dismissed'] and x['judge'][dock] not in ['', 'other']:
                x['stay_outcome']= x['outcome'][dock]
                x['stay_outcome_date']= x['DOC_DT'][dock]
                x['stay_judge']= x['judge'][dock]
                return x
    
    return x

consolidate_judges ={
    'S. Noël':'Noël',
    'S Noël':'Noël',
    'John Norris':'Norris',
    'Elizabeth Walker':'Walker',
    'Angela Furlanetto': 'Furlanetto',
}

df_completion['stay_of_removal']=False
df_completion['notice_docno']=''
df_completion['notice_date']=pd.NaT
df_completion['stay_outcome']=''
df_completion['stay_outcome_date']=pd.NaT
df_completion['stay_judge']=''

df_completion=df_completion.apply(lambda x: stay_notice(x), axis=1)
df_completion=df_completion.apply(lambda x: stay_order(x), axis=1)
df_completion['stay_judge']=df_completion['stay_judge'].replace(consolidate_judges)

df_completion

Unnamed: 0,citation,year,city_filed,nature,RE_NO,DOCNO,DOC_DT,category,type,outcome,judge,stay_of_removal,notice_docno,notice_date,stay_outcome,stay_outcome_date,stay_judge
0,IMM-1-16,2016,Montréal,Imm - Appl. for leave & jud. review - Other Ar...,"[24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 1...","[0, 0, 0, 0, 11, 0, 10, 0, 0, 0, 9, 8, 7, 0, 0...","[2016-03-31 00:00:00, 2016-03-23 00:00:00, 201...","[exclude, exclude, exclude, exclude, exclude, ...","[, , , , , , stay of removal, , , , , , , , , ...","[, , , , , , dismissed, , , , , , , , , , , , ...","[, , , , , , St-Louis, , , , , , , , , , , , ,...",True,motion 2,2016-01-04,dismissed,2016-01-07,St-Louis
1,IMM-1-17,2017,Toronto,Imm - Appl. for leave & jud. review - Other Ar...,"[47, 46, 45, 44, 43, 42, 41, 40, 39, 38, 37, 3...","[26, 25, 0, 0, 24, 0, 0, 0, 0, 23, 22, 0, 0, 2...","[2020-01-17 00:00:00, 2017-12-15 00:00:00, 201...","[exclude, exclude, directions, exclude, exclud...","[, , other, , , , , , , , , , , other, , , , ,...","[, , , , , , , , , , , , , granted, , , , , , ...","[, , , , , , , , , , , , , Gleeson, , , , , , ...",True,motion 3,2017-01-03,granted,2017-01-06,Gleeson
2,IMM-1-22,2022,Toronto,Imm - Appl. for leave & jud. review - Other Ar...,"[17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5...","[0, 0, 10, 9, 0, 8, 0, 0, 0, 7, 6, 5, 4, 3, 0,...","[2022-09-06 00:00:00, 2022-09-06 00:00:00, 202...","[exclude, exclude, exclude, exclude, exclude, ...","[, , , , , stay of removal, , , other, , , , ,...","[, , , , , dismissed, , , , , , , , , , , ]","[, , , , , Brown, , , , , , , , , , , ]",True,motion 3,2022-01-04,dismissed,2022-01-05,Brown
3,IMM-10-12,2012,Vancouver,Imm - Appl. for leave & jud. review - Pre-remo...,"[16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4,...","[0, 0, 0, 0, 9, 8, 0, 0, 0, 7, 6, 5, 4, 3, 2, 1]","[2012-03-05 00:00:00, 2012-02-15 00:00:00, 201...","[exclude, exclude, exclude, directions, exclud...","[, , , other, , , , , , , , , stay of removal,...","[, , , , , , , , , , , , , , , ]","[, , , , , , , , , , , , , , , ]",True,motion 4,2012-01-20,,NaT,
4,IMM-10-14,2014,Toronto,Imm - Appl. for leave & jud. review - IRB - Re...,"[21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 1...","[0, 0, 0, 13, 12, 11, 10, 0, 0, 0, 0, 9, 8, 7,...","[2014-07-10 00:00:00, 2014-03-24 00:00:00, 201...","[exclude, exclude, exclude, exclude, exclude, ...","[, , , , , , , , , , , , , , stay of removal, ...","[, , , , , , , , , , , , , , , , , , , , ]","[, , , , , , , , , , , , , , , , , , , , ]",True,motion 6,2014-03-18,,NaT,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7040,IMM-9965-22,2022,Toronto,Imm - Appl. for leave & jud. review - Pre-remo...,"[17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5...","[13, 12, 11, 0, 10, 0, 0, 9, 8, 0, 7, 6, 5, 4,...","[2022-11-16 00:00:00, 2022-11-16 00:00:00, 202...","[exclude, exclude, exclude, exclude, order, he...","[, , , , stay of removal, , , , , , , , , , st...","[, , , , dismissed, , , , , , , , , , , , ]","[, , , , Elliott, , , , , , , , , , , , ]",True,motion 3,2022-10-19,dismissed,2022-10-25,Elliott
7041,IMM-9968-22,2022,Ottawa,Imm - Appl. for leave & jud. review - Other Ar...,"[21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 1...","[0, 13, 12, 11, 10, 0, 9, 8, 7, 0, 0, 0, 0, 0,...","[2022-10-24 00:00:00, 2022-10-24 00:00:00, 202...","[exclude, exclude, exclude, exclude, order, he...","[, , , , stay of removal, , , , , , , , , othe...","[, , , , dismissed, , , , , , , , , , , , , , ...","[, , , , Grammond, , , , , , , , , , , , , , ,...",False,,NaT,,NaT,
7042,IMM-999-12,2012,Toronto,Imm - Appl. for leave & jud. review - Other Ar...,"[30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 1...","[0, 17, 0, 0, 0, 0, 0, 16, 0, 0, 15, 14, 13, 1...","[2012-09-25 00:00:00, 2012-09-25 00:00:00, 201...","[exclude, exclude, hearing, exclude, exclude, ...","[, , , , , , , other, , , , , , , , , stay of ...","[, , , , , , , granted, , , , , , , , , grante...","[, , , , , , , Hughes, , , , , , , , , Hughes,...",True,motion 4,2012-02-15,granted,2012-02-21,Hughes
7043,IMM-9992-12,2012,Toronto,Imm - Appl. for leave & jud. review - Other Ar...,"[11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1]","[10, 0, 9, 8, 7, 6, 5, 4, 3, 2, 1]","[2012-10-09 00:00:00, 2012-09-27 00:00:00, 201...","[exclude, exclude, exclude, exclude, exclude, ...","[, , , , , , , , stay of removal, , ]","[, , , , , , , , , , ]","[, , , , , , , , , , ]",True,motion 3,2012-09-28,,NaT,


### Export 

In [22]:
# Export docket level summary data to excel
df_completion.to_excel(output_file+'.xlsx', index=False)

### NOTES

SPEED

Could speed up substantially by (1) running in parallel or (especially) (2) by batching.
For (1) could use dask dataframes, but would likely exceed OpenAI API rate-limiting. Not
sure if there is a way to limit number of workers. Alternatively, could iterate over the
df and use dask delayed, which can specify the number of workers. Better would be to try
(2) but would need some additional architecture.

REVISING INFRASTRUCTURE FOR UPDATING

If we want to run this periodically and to update when we get revised dockets without having to pay 
to reapply the models to everything, we could collect the name of the model applied, and a timestamp 
of when the model is applied, then load that, and before sending to GPT, check whether we already
have a completion on that identical docket entry using the same model that we want to apply, and if
yes, just grab that rather than sending to GPT. Could revise to use langchain which has caching functions
built in, but langchain is rapidly developing so code might need to be revised soon.



