# Final Parsing

Code to parse docket level data using the docket level data obtained via GPT.

NOTE: The data for the article was parsed using OpenAI's ADA model. However, as of January 2024, that model is no longer available. The data obtained using the deprecated process is available in this repo. In the settings cell below users can choose whether they want to use the deprecated data or the current data.

### Setup

In [1]:
# General Setup
import pandas as pd
from datasets import load_dataset
from tqdm import tqdm
tqdm.pandas()


In [24]:
# Settings to change

#############################

# if you want to use deprecated_data, set the following to True, otherwise set to False

deprecated_data = True

###########################

# Stem for file names
completions_file = 'output/stays_2012-22'
completions_file_deprecated = 'output/DEPRECATED-ADA-stays_2012-22'
parsed_file ='output/stays_parsed_2012-22'
parsed_file_deprecated ='output/DEPRECATED-ADA-stays_parsed_2012-22'

### Apply Docket level logic on the parsed the docket entries 

In [28]:
# Load Data

if deprecated_data:

    # Using ADA

    # set output file
    output_file = parsed_file_deprecated

    # Load completion df
    df_completion = pd.read_excel(completions_file_deprecated+'.xlsx')
    df_completion         

    # Load full dataset to get city and type
    df = load_dataset("refugee-law-lab/luck-of-the-draw-iii", split="train").to_pandas()

    # keep only df.citation, df.year, df.city_filed, df.nature
    df = df[['citation', 'year', 'city_filed', 'nature']]

    # merge df_completion with df, on citation, keeping only matches
    df_completion = df.merge(df_completion, on='citation', how='inner')

    # get values type
    df_completion['type'].value_counts()

    # if type is not 'stay of removal' and does not start with 'motion', set to 'other'
    df_completion['type'] = df_completion['type'].fillna('')

else:

    # Using GPT-3.5 

    # set ouput file
    output_file = parsed_file

    # Load completion df
    df_completion = pd.read_excel(completions_file+'.xlsx')




type
stay of removal    10491
other               4901
motion 3             438
motion 2             246
motion 4             244
                   ...  
motion 154             1
motion 633             1
motion  to             1
motion 213             1
motion 451             1
Name: count, Length: 94, dtype: int64

In [32]:
# Using ADA (legacy, used for article)
output_file = parsed_file_deprecated

# Load completion df
df_completion = pd.read_excel(completions_file_deprecated+'.xlsx')
df_completion         

# Load full dataset to get city and type
df = load_dataset("refugee-law-lab/luck-of-the-draw-iii", split="train").to_pandas()

# keep only df.citation, df.year, df.city_filed, df.nature
df = df[['citation', 'year', 'city_filed', 'nature']]

# merge df_completion with df, on citation, keeping only matches
df_completion = df.merge(df_completion, on='citation', how='inner')

# get values type
df_completion['type'].value_counts()

# if type is not 'stay of removal' and does not start with 'motion', set to 'other'
df_completion['type'] = df_completion['type'].fillna('')
df_completion['type'] = df_completion['type'].apply(lambda x: 'other' if x != 'stay of removal' and not x.startswith('motion') and not x=='' else x)

# get values type
df_completion['type'].value_counts()


type
                   162975
other               13445
stay of removal     10534
motion 3              412
motion 4              233
                    ...  
motion 133              1
motion 154              1
motion 36               1
motion 91               1
motion 69               1
Name: count, Length: 75, dtype: int64

In [33]:
# Made DOCNO usable
df_completion['DOCNO'] = df_completion['DOCNO'].fillna(0).astype(int)

# Make doc date usable
#df_completion['DOC_DT'] = pd.to_datetime(df_completion['DOC_DT'], format="%Y-%m-%d %H:%M:%S")

# Collapse completion df, one row per case
df_completion = df_completion.fillna('')
df_completion = df_completion.groupby(['citation'], as_index=False).agg({
    'year':'first',
    'city_filed': 'first',
    'nature': 'first',
    'RE_NO': lambda x: list(x),
    'DOCNO': lambda x: list(x),
    'DOC_DT': lambda x: list(x),
    'category': lambda x: list(x),
    'type': lambda x: list(x),
    'outcome': lambda x: list(x),
    'judge': lambda x: list(x),
})
print('Len of df_completion: ' + str(len(df_completion)))
df_completion

Len of df_completion: 7045


Unnamed: 0,citation,year,city_filed,nature,RE_NO,DOCNO,DOC_DT,category,type,outcome,judge
0,IMM-1-16,2016,Montréal,Imm - Appl. for leave & jud. review - Other Ar...,"[24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 1...","[0, 0, 0, 0, 11, 0, 10, 0, 0, 0, 9, 8, 7, 0, 0...","[2016-03-31 00:00:00, 2016-03-23 00:00:00, 201...","[exclude, exclude, exclude, exclude, exclude, ...","[, , , , , , stay of removal, , , , , , , , , ...","[, , , , , , dismissed, , , , , , , , , , , , ...","[, , , , , , St-Louis, , , , , , , , , , , , ,..."
1,IMM-1-17,2017,Toronto,Imm - Appl. for leave & jud. review - Other Ar...,"[47, 46, 45, 44, 43, 42, 41, 40, 39, 38, 37, 3...","[26, 25, 0, 0, 24, 0, 0, 0, 0, 23, 22, 0, 0, 2...","[2020-01-17 00:00:00, 2017-12-15 00:00:00, 201...","[exclude, exclude, directions, exclude, exclud...","[, , other, , , , , , , , , , , other, , , , ,...","[, , , , , , , , , , , , , granted, , , , , , ...","[, , , , , , , , , , , , , Gleeson, , , , , , ..."
2,IMM-1-22,2022,Toronto,Imm - Appl. for leave & jud. review - Other Ar...,"[17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5...","[0, 0, 10, 9, 0, 8, 0, 0, 0, 7, 6, 5, 4, 3, 0,...","[2022-09-06 00:00:00, 2022-09-06 00:00:00, 202...","[exclude, exclude, exclude, exclude, exclude, ...","[, , , , , stay of removal, , , other, , , , ,...","[, , , , , dismissed, , , , , , , , , , , ]","[, , , , , Brown, , , , , , , , , , , ]"
3,IMM-10-12,2012,Vancouver,Imm - Appl. for leave & jud. review - Pre-remo...,"[16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4,...","[0, 0, 0, 0, 9, 8, 0, 0, 0, 7, 6, 5, 4, 3, 2, 1]","[2012-03-05 00:00:00, 2012-02-15 00:00:00, 201...","[exclude, exclude, exclude, directions, exclud...","[, , , other, , , , , , , , , stay of removal,...","[, , , , , , , , , , , , , , , ]","[, , , , , , , , , , , , , , , ]"
4,IMM-10-14,2014,Toronto,Imm - Appl. for leave & jud. review - IRB - Re...,"[21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 1...","[0, 0, 0, 13, 12, 11, 10, 0, 0, 0, 0, 9, 8, 7,...","[2014-07-10 00:00:00, 2014-03-24 00:00:00, 201...","[exclude, exclude, exclude, exclude, exclude, ...","[, , , , , , , , , , , , , , stay of removal, ...","[, , , , , , , , , , , , , , , , , , , , ]","[, , , , , , , , , , , , , , , , , , , , ]"
...,...,...,...,...,...,...,...,...,...,...,...
7040,IMM-9965-22,2022,Toronto,Imm - Appl. for leave & jud. review - Pre-remo...,"[17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5...","[13, 12, 11, 0, 10, 0, 0, 9, 8, 0, 7, 6, 5, 4,...","[2022-11-16 00:00:00, 2022-11-16 00:00:00, 202...","[exclude, exclude, exclude, exclude, order, he...","[, , , , stay of removal, , , , , , , , , , st...","[, , , , dismissed, , , , , , , , , , , , ]","[, , , , Elliott, , , , , , , , , , , , ]"
7041,IMM-9968-22,2022,Ottawa,Imm - Appl. for leave & jud. review - Other Ar...,"[21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 1...","[0, 13, 12, 11, 10, 0, 9, 8, 7, 0, 0, 0, 0, 0,...","[2022-10-24 00:00:00, 2022-10-24 00:00:00, 202...","[exclude, exclude, exclude, exclude, order, he...","[, , , , stay of removal, , , , , , , , , othe...","[, , , , dismissed, , , , , , , , , , , , , , ...","[, , , , Grammond, , , , , , , , , , , , , , ,..."
7042,IMM-999-12,2012,Toronto,Imm - Appl. for leave & jud. review - Other Ar...,"[30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 1...","[0, 17, 0, 0, 0, 0, 0, 16, 0, 0, 15, 14, 13, 1...","[2012-09-25 00:00:00, 2012-09-25 00:00:00, 201...","[exclude, exclude, hearing, exclude, exclude, ...","[, , , , , , , other, , , , , , , , , stay of ...","[, , , , , , , granted, , , , , , , , , grante...","[, , , , , , , Hughes, , , , , , , , , Hughes,..."
7043,IMM-9992-12,2012,Toronto,Imm - Appl. for leave & jud. review - Other Ar...,"[11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1]","[10, 0, 9, 8, 7, 6, 5, 4, 3, 2, 1]","[2012-10-09 00:00:00, 2012-09-27 00:00:00, 201...","[exclude, exclude, exclude, exclude, exclude, ...","[, , , , , , , , stay of removal, , ]","[, , , , , , , , , , ]","[, , , , , , , , , , ]"


In [34]:
# Prepare docket level summary data based on processed docket entries

def stay_notice(x):                 # note, gets last stay notice (I think)
    if 'notice' in x['category']:
        docket_indexes = [i for i, n in enumerate(x['category']) if n == 'notice']
        for dock in docket_indexes:
            if 'stay of removal' in x['type'][dock]:
               x['stay_of_removal'] = True
               x['notice_docno'] = 'motion '+ str(x['DOCNO'][dock])
               x['notice_date'] = x['DOC_DT'][dock]
               return x
    return x

def stay_order(x):                 # note gets last stay order (I think)

    if not x['stay_of_removal']:
        return x

    if 'order' in x['category'] or 'copy_order' in x['category']:
        docket_indexes = [i for i, n in enumerate(x['category']) if n in ['order','copy_order']]
        for dock in docket_indexes:
            if x['type'][dock] == 'stay of removal' and x['outcome'][dock] in ['granted','dismissed'] and x['judge'][dock] not in ['', 'other']:
                x['stay_outcome']= x['outcome'][dock]
                x['stay_outcome_date']= x['DOC_DT'][dock]
                x['stay_judge']= x['judge'][dock]
                return x
            if x['type'][dock] == x['notice_docno'] and x['outcome'][dock] in ['granted','dismissed'] and x['judge'][dock] not in ['', 'other']:
                x['stay_outcome']= x['outcome'][dock]
                x['stay_outcome_date']= x['DOC_DT'][dock]
                x['stay_judge']= x['judge'][dock]
                return x
    
    return x

consolidate_judges ={
    'S. Noël':'Noël',
    'S Noël':'Noël',
    'John Norris':'Norris',
    'Elizabeth Walker':'Walker',
    'Angela Furlanetto': 'Furlanetto',
}

df_completion['stay_of_removal']=False
df_completion['notice_docno']=''
df_completion['notice_date']=pd.NaT
df_completion['stay_outcome']=''
df_completion['stay_outcome_date']=pd.NaT
df_completion['stay_judge']=''

df_completion=df_completion.apply(lambda x: stay_notice(x), axis=1)
df_completion=df_completion.apply(lambda x: stay_order(x), axis=1)
df_completion['stay_judge']=df_completion['stay_judge'].replace(consolidate_judges)

df_completion

Unnamed: 0,citation,year,city_filed,nature,RE_NO,DOCNO,DOC_DT,category,type,outcome,judge,stay_of_removal,notice_docno,notice_date,stay_outcome,stay_outcome_date,stay_judge
0,IMM-1-16,2016,Montréal,Imm - Appl. for leave & jud. review - Other Ar...,"[24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 1...","[0, 0, 0, 0, 11, 0, 10, 0, 0, 0, 9, 8, 7, 0, 0...","[2016-03-31 00:00:00, 2016-03-23 00:00:00, 201...","[exclude, exclude, exclude, exclude, exclude, ...","[, , , , , , stay of removal, , , , , , , , , ...","[, , , , , , dismissed, , , , , , , , , , , , ...","[, , , , , , St-Louis, , , , , , , , , , , , ,...",True,motion 2,2016-01-04,dismissed,2016-01-07,St-Louis
1,IMM-1-17,2017,Toronto,Imm - Appl. for leave & jud. review - Other Ar...,"[47, 46, 45, 44, 43, 42, 41, 40, 39, 38, 37, 3...","[26, 25, 0, 0, 24, 0, 0, 0, 0, 23, 22, 0, 0, 2...","[2020-01-17 00:00:00, 2017-12-15 00:00:00, 201...","[exclude, exclude, directions, exclude, exclud...","[, , other, , , , , , , , , , , other, , , , ,...","[, , , , , , , , , , , , , granted, , , , , , ...","[, , , , , , , , , , , , , Gleeson, , , , , , ...",True,motion 3,2017-01-03,granted,2017-01-06,Gleeson
2,IMM-1-22,2022,Toronto,Imm - Appl. for leave & jud. review - Other Ar...,"[17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5...","[0, 0, 10, 9, 0, 8, 0, 0, 0, 7, 6, 5, 4, 3, 0,...","[2022-09-06 00:00:00, 2022-09-06 00:00:00, 202...","[exclude, exclude, exclude, exclude, exclude, ...","[, , , , , stay of removal, , , other, , , , ,...","[, , , , , dismissed, , , , , , , , , , , ]","[, , , , , Brown, , , , , , , , , , , ]",True,motion 3,2022-01-04,dismissed,2022-01-05,Brown
3,IMM-10-12,2012,Vancouver,Imm - Appl. for leave & jud. review - Pre-remo...,"[16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4,...","[0, 0, 0, 0, 9, 8, 0, 0, 0, 7, 6, 5, 4, 3, 2, 1]","[2012-03-05 00:00:00, 2012-02-15 00:00:00, 201...","[exclude, exclude, exclude, directions, exclud...","[, , , other, , , , , , , , , stay of removal,...","[, , , , , , , , , , , , , , , ]","[, , , , , , , , , , , , , , , ]",True,motion 4,2012-01-20,,NaT,
4,IMM-10-14,2014,Toronto,Imm - Appl. for leave & jud. review - IRB - Re...,"[21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 1...","[0, 0, 0, 13, 12, 11, 10, 0, 0, 0, 0, 9, 8, 7,...","[2014-07-10 00:00:00, 2014-03-24 00:00:00, 201...","[exclude, exclude, exclude, exclude, exclude, ...","[, , , , , , , , , , , , , , stay of removal, ...","[, , , , , , , , , , , , , , , , , , , , ]","[, , , , , , , , , , , , , , , , , , , , ]",True,motion 6,2014-03-18,,NaT,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7040,IMM-9965-22,2022,Toronto,Imm - Appl. for leave & jud. review - Pre-remo...,"[17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5...","[13, 12, 11, 0, 10, 0, 0, 9, 8, 0, 7, 6, 5, 4,...","[2022-11-16 00:00:00, 2022-11-16 00:00:00, 202...","[exclude, exclude, exclude, exclude, order, he...","[, , , , stay of removal, , , , , , , , , , st...","[, , , , dismissed, , , , , , , , , , , , ]","[, , , , Elliott, , , , , , , , , , , , ]",True,motion 3,2022-10-19,dismissed,2022-10-25,Elliott
7041,IMM-9968-22,2022,Ottawa,Imm - Appl. for leave & jud. review - Other Ar...,"[21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 1...","[0, 13, 12, 11, 10, 0, 9, 8, 7, 0, 0, 0, 0, 0,...","[2022-10-24 00:00:00, 2022-10-24 00:00:00, 202...","[exclude, exclude, exclude, exclude, order, he...","[, , , , stay of removal, , , , , , , , , othe...","[, , , , dismissed, , , , , , , , , , , , , , ...","[, , , , Grammond, , , , , , , , , , , , , , ,...",False,,NaT,,NaT,
7042,IMM-999-12,2012,Toronto,Imm - Appl. for leave & jud. review - Other Ar...,"[30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 1...","[0, 17, 0, 0, 0, 0, 0, 16, 0, 0, 15, 14, 13, 1...","[2012-09-25 00:00:00, 2012-09-25 00:00:00, 201...","[exclude, exclude, hearing, exclude, exclude, ...","[, , , , , , , other, , , , , , , , , stay of ...","[, , , , , , , granted, , , , , , , , , grante...","[, , , , , , , Hughes, , , , , , , , , Hughes,...",True,motion 4,2012-02-15,granted,2012-02-21,Hughes
7043,IMM-9992-12,2012,Toronto,Imm - Appl. for leave & jud. review - Other Ar...,"[11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1]","[10, 0, 9, 8, 7, 6, 5, 4, 3, 2, 1]","[2012-10-09 00:00:00, 2012-09-27 00:00:00, 201...","[exclude, exclude, exclude, exclude, exclude, ...","[, , , , , , , , stay of removal, , ]","[, , , , , , , , , , ]","[, , , , , , , , , , ]",True,motion 3,2012-09-28,,NaT,


### Export 

In [22]:
# Export docket level summary data to excel
df_completion.to_excel(output_file+'.xlsx', index=False)

### NOTES

VERIFICATION

I compared the data for IMM-x-2020 (ADA) with the data from PAT who was looking at the 2020 CanLII
stay cases. My process caught each of the applications in his dataset, so it is at least that
inclusive. 

I informally checked the stay of removal True/False column (ADA), and it is working really well. Over 98% accuracy, and the mistakes are not unreasonable, errors involved typos (IMM-104-20), informal requests
(IMM-1638-20 & IMM-2826-20) and notices dealing with multiple types of motions at the same time
(IMM-5691-20). One thing to keep an eye on, I did see a hallucination (IMM-2826-20 invented a 
motion doc # for the hearing docket entry). Will see when I apply cross-references whether this
is common.

I had a research assistant review the output on 200 dockets (ADA) (and she also identified the ones that are in French and I reviewed those myself). Out of 200 cases there were errors in 2 cases. Both of those times the errors involved failing to identify the cases as involving a stay of removal because of unusual ways of recording the notice of motion. There were no errors in parsing the data for the cases that were identified as involving stays of removal. That is a 99% accuracy rate, which is really good.

ERRORS re: Informal requests

A small number of motions for stays of removal don't have notices of motions, and instead proceed
with an informal request that takes a wide variety of forms (letters, communications, etc). These are 
not caught by the system, which relies on notices of motions (the problem is that often other docket
entries are inconclusive about the type of motion (e.g. "motion doc 4", or "stay of execution"), so 
we can't rely on other docket entries -- which means if we want to catch these, we probably need to
run most docket entries through GPT, which becomes more expensive. My feeling is that we can live
with missing these instances

ERRORS re: (Final Decision)

A small number of dockets have the stay decision included in a docket entry that has the (Final Decision)
marker, which are excluded from our models. This is rare, and I don't think it is worth trying to catch

NEW GPT3.5 MODEL

Did not systematically compare accuracy of GPT3.5 vs ADA. Outcomes are broadly similar, but with a 1 to 2 per
difference. If relaying on 3.5 additional accuracy checking should be undertaken.

HEARTING OUTCOMES:

One possible way to improve the coding would be to look at hearing outcomes. We can pretty reliably link
the hearing with the motion (because hearing docket entries almost always refer to the relevant notice
docket entry), and in cases where we haven't detected an order we could look at the hearing entry to 
see whether it includes an outcome. But in most cases the outcome is reserved (maybe 70%), so it would
only help in a small proportion of cases, and we are not missing outcomes very often anyway, so I am 
inclined not to try to do this.




