## Notebook to examine readability in Federal Court immigration decisions

Ciation: Sean Rehaag, "Notebook to examine readability in Federal Court immigration decisions" (2025), online: <https://github.com/a2aj-ca/canadian-legal-data>.

### Setup: Install & Load Packages

In [1]:
# !pip install datasets
# !pip install pandas
# !pip install textstat

In [2]:
from datasets import load_dataset
import pandas as pd
import re
import textstat

### Load Data

In [3]:
# load dataset
laws = load_dataset("a2aj/canadian-case-law", data_dir="FC", split="train")

# convert to df
df = laws.to_pandas()
df.head(5)

Unnamed: 0,dataset,citation_en,citation2_en,name_en,document_date_en,url_en,scraped_timestamp_en,unofficial_text_en,citation_fr,citation2_fr,name_fr,document_date_fr,url_fr,scraped_timestamp_fr,unofficial_text_fr,upstream_license
0,FC,2011 FC 1028,,Viera Algueta v. Canada (Citizenship and Immig...,2011-08-31 00:00:00+00:00,https://decisions.fct-cf.gc.ca/fc-cf/decisions...,2022-08-22 16:18:42.750000+00:00,Viera Algueta v. Canada (Citizenship and Immig...,2011 CF 1028,,Viera Algueta c. Canada (Citoyenneté et Immigr...,2011-08-31 00:00:00+00:00,https://decisions.fct-cf.gc.ca/fc-cf/decisions...,2022-08-27 11:08:11.259000+00:00,Viera Algueta c. Canada (Citoyenneté et Immigr...,"See upstream license, including non-commercial..."
1,FC,2011 FC 1029,,Pontbriand v. Federal Public Service Health Ca...,2011-08-31 00:00:00+00:00,https://decisions.fct-cf.gc.ca/fc-cf/decisions...,2022-08-22 16:18:34.993000+00:00,Pontbriand v. Federal Public Service Health Ca...,2011 CF 1029,,Pontbriand c. Administration du Régime de soin...,2011-08-31 00:00:00+00:00,https://decisions.fct-cf.gc.ca/fc-cf/decisions...,2022-08-27 11:08:05.211000+00:00,Pontbriand c. Administration du Régime de soin...,"See upstream license, including non-commercial..."
2,FC,2011 FC 1030,,Wang v. Canada (Citizenship and Immigration),2011-09-02 00:00:00+00:00,https://decisions.fct-cf.gc.ca/fc-cf/decisions...,2022-08-22 16:17:42.816000+00:00,Wang v. Canada (Citizenship and Immigration)\n...,2011 CF 1030,,Wang c. Canada (Citoyenneté et Immigration),2011-09-02 00:00:00+00:00,https://decisions.fct-cf.gc.ca/fc-cf/decisions...,2022-08-27 11:07:13.428000+00:00,Wang c. Canada (Citoyenneté et Immigration)\nB...,"See upstream license, including non-commercial..."
3,FC,2011 FC 1032,,Patry v. Canada (Attorney General),2011-08-31 00:00:00+00:00,https://decisions.fct-cf.gc.ca/fc-cf/decisions...,2022-08-22 16:18:28.655000+00:00,Patry v. Canada (Attorney General)\nCourt (s) ...,2011 CF 1032,,Patry c. Canada (Procureur général),2011-08-31 00:00:00+00:00,https://decisions.fct-cf.gc.ca/fc-cf/decisions...,2022-08-27 11:07:57.942000+00:00,Patry c. Canada (Procureur général)\nBase de d...,"See upstream license, including non-commercial..."
4,FC,2011 FC 1033,,Chaaban v. Canada (Correctional Service),2011-08-31 00:00:00+00:00,https://decisions.fct-cf.gc.ca/fc-cf/decisions...,2022-08-22 16:18:22.360000+00:00,Chaaban v. Canada (Correctional Service)\nCour...,2011 CF 1033,,Chaaban c. Canada (Service correctionnel),2011-08-31 00:00:00+00:00,https://decisions.fct-cf.gc.ca/fc-cf/decisions...,2022-08-27 11:07:51.228000+00:00,Chaaban c. Canada (Service correctionnel)\nBas...,"See upstream license, including non-commercial..."


### Clean Data

In [4]:
# drop rows where no unofficial_text_en
df = df.dropna(subset=['unofficial_text_en'])

# drop rows document_date_en is before Aug 1 2020
df = df[df['document_date_en'] >= '2020-08-01']

# drop rows that are not immigration files
#  (use docket: IMM-#(up to 5 digits)-##(two digits))
df = df[df['unofficial_text_en'].str.contains(r'IMM-\d{1,5}-\d{2}', na=False)]

# extract judge
judge_name_pattern = r'(?:\r\n|\r|\n)PRESENT:[\s\t]+(.*?)(?:\r\n|\r|\n|$)'
df['judge_name'] = df['unofficial_text_en'].apply(
    lambda x: re.search(judge_name_pattern, x, re.IGNORECASE).group(1).strip() 
    if re.search(judge_name_pattern, x, re.IGNORECASE) else '')

# Clean Judges names
def simplify_name(name):
    name = name.upper().strip().replace("’", "'").replace("Ѐ", "È")
    name = name.replace("THE CHIEF JUSTICE", "CRAMPTON")
    name = name.split()[-1] if name else ''
    return name.strip()

df['simplified_judge_name'] = df['judge_name'].apply(simplify_name)

df.head(5)

Unnamed: 0,dataset,citation_en,citation2_en,name_en,document_date_en,url_en,scraped_timestamp_en,unofficial_text_en,citation_fr,citation2_fr,name_fr,document_date_fr,url_fr,scraped_timestamp_fr,unofficial_text_fr,upstream_license,judge_name,simplified_judge_name
22,FC,2024 FC 1988,,Ahmed v. Canada (Public Safety and Emergency P...,2024-12-09 00:00:00+00:00,https://decisions.fct-cf.gc.ca/fc-cf/decisions...,2025-01-03 11:49:19.850000+00:00,Ahmed v. Canada (Public Safety and Emergency P...,,,,NaT,,NaT,,"See upstream license, including non-commercial...",The Honourable Madam Justice Heneghan,HENEGHAN
25,FC,2024 FC 1869,,Huraira v. Canada (Citizenship and Immigration),2024-11-22 00:00:00+00:00,https://decisions.fct-cf.gc.ca/fc-cf/decisions...,2025-01-03 12:18:41.293000+00:00,Huraira v. Canada (Citizenship and Immigration...,,,,NaT,,NaT,,"See upstream license, including non-commercial...",Madam Justice Go,GO
26,FC,2024 FC 1868,,Gurusamy v. Canada (Citizenship and Immigration),2024-11-22 00:00:00+00:00,https://decisions.fct-cf.gc.ca/fc-cf/decisions...,2025-01-03 12:18:48.493000+00:00,Gurusamy v. Canada (Citizenship and Immigratio...,,,,NaT,,NaT,,"See upstream license, including non-commercial...",Mr. Justice Diner,DINER
27,FC,2024 FC 1867,,San Juan Valdelamar v. Canada (Citizenship and...,2024-11-21 00:00:00+00:00,https://decisions.fct-cf.gc.ca/fc-cf/decisions...,2025-05-19 02:01:02.099000+00:00,San Juan Valdelamar v. Canada (Citizenship and...,,,,NaT,,NaT,,"See upstream license, including non-commercial...",Mr. Justice Pentney,PENTNEY
29,FC,2024 FC 1865,,Singh v. Canada (Citizenship and Immigration),2024-11-22 00:00:00+00:00,https://decisions.fct-cf.gc.ca/fc-cf/decisions...,2025-01-03 12:17:11.883000+00:00,Singh v. Canada (Citizenship and Immigration)\...,,,,NaT,,NaT,,"See upstream license, including non-commercial...",The Honourable Mr. Justice Duchesne,DUCHESNE


### Run analysis

In [5]:
# get judge gender

# extract judge gender (if Mr. or Madam in name)
df['judge_gender'] = df['judge_name'].apply(
    lambda x: 'Male' if re.search(r'mr\.?', x, re.IGNORECASE) else 
              ('Female' if re.search(r'madam', x, re.IGNORECASE) else 'Unknown'))

# manually fix some genders
def manual_gender_fix(row):
    if row['simplified_judge_name'] in ["COTTER", "HORNE", "CRAMPTON", "BATTISTA"]:
        return "Male"
    return row['judge_gender']

df['judge_gender'] = df.apply(manual_gender_fix, axis=1)

def fill_unknown_genders(df):
    # Create a mapping of simplified_judge_name to known gender
    # Filter out 'Unknown' values and get the first non-Unknown gender for each judge
    gender_mapping = (df[df['judge_gender'] != 'Unknown']
                      .groupby('simplified_judge_name')['judge_gender']
                      .first()
                      .to_dict())
    
    # Apply the mapping to fill Unknown values
    def fill_gender(row):
        if row['judge_gender'] == 'Unknown':
            # Look up the judge name in our mapping
            return gender_mapping.get(row['simplified_judge_name'], 'Unknown')
        else:
            return row['judge_gender']
    
    df['judge_gender'] = df.apply(fill_gender, axis=1)
    
    return df
    
df = fill_unknown_genders(df)

# print value counts judge_gender
print(df['judge_gender'].value_counts())

judge_gender
Male       2843
Female     2241
Unknown       8
Name: count, dtype: int64


In [6]:
# calculate word count and readability

# get word count for unofficial text
df['word_count'] = df['unofficial_text_en'].str.split().str.len()

# get Flesch Reading Ease score 
df['readability'] = df['unofficial_text_en'].apply(lambda x: textstat.flesch_reading_ease(x) if pd.notna(x) else None)


In [7]:
# filtered_df, where judge has at least 50 cases IMM cases

filtered_df = df[df['simplified_judge_name'].map(df['simplified_judge_name'].value_counts()) >= 50]

# get median word count, number of cases, readability, by simplified_judge_name

median_word_count = filtered_df.groupby('simplified_judge_name')['word_count'].median()
number_of_cases = filtered_df.groupby('simplified_judge_name')['word_count'].count()
median_readability = filtered_df.groupby('simplified_judge_name')['readability'].median().round(1)

# print in a single table, sorted by word count
result = pd.DataFrame({
    'median_word_count': median_word_count,
    'median_readability': median_readability,
    'number_of_cases': number_of_cases,
})
print(result.sort_values(by='median_word_count', ascending=False))

                       median_word_count  median_readability  number_of_cases
simplified_judge_name                                                        
KANE                              4907.0                36.3               50
STRICKLAND                        4748.0                31.7              107
ROY                               4280.5                41.1               82
GASCON                            4219.0                38.5               96
BROWN                             4135.0                35.1              139
LITTLE                            3958.0                35.9              149
MCHAFFIE                          3486.0                36.6              137
FAVEL                             3061.0                34.2              117
PAMEL                             3036.0                42.4              101
PALLOTTA                          2943.0                37.0              116
NORRIS                            2905.0                35.9    

In [None]:
# get median word count by judge_gender
median_word_count_by_gender = df[df['judge_gender'] != 'Unknown'].groupby('judge_gender')['word_count'].median()
print("Median Word Count by Judge Gender:")
print(median_word_count_by_gender)
print()

# get median readability by judge_gender
median_readability_by_gender = df[df['judge_gender'] != 'Unknown'].groupby('judge_gender')['readability'].median()
print("Median Readability by Judge Gender:")
print(median_readability_by_gender)


Median Word Count by Judge Gender:
judge_gender
Female    2274.0
Male      2679.0
Name: word_count, dtype: float64

Median Readability by Judge Gender:
judge_gender
Female    32.842595
Male      34.336968
Name: readability, dtype: float64


### Conclusions

In immigration judicial reviews in Canada's Federal Court from August 2020 to August 2025, there are substantial differences across Federal Court judges who decided 50 or more cases in terms of the median length of their decisions, from approximately 800 words for Justice Heneghan to approximately 4,900 words for Justice Kane. Similarly, there are differences in median Flesch Reading Ease scores across judges, from 25.6 for Justice Nowak to 42.5 for Justice Grammond.

There are also some gendered patterns, with a median word count for female justices of 2,274 compared to a median word count for male justices of 2,679. Readability also differs, although the difference is not large: female judges score 32.8 and male judges 34.3.

Looking at the decisions, some of these differences are likely attributable to patterns in the proportion of decisions delivered orally from the bench.

Nonetheless, some justices do appear to write shorter and less complex decisions on average compared to others. 

To the extent that this is valued -- for example due to efficiency or accessibility -- the Federal Court might find it helpful to examine more closely why some judges have different scores.

Finally, it is worth noting that this analysis uses simple metrics to demonstrate the potential of legal datasets. There is scholarship that suggests using more sophisticated metrics to measure readability in judicial decisions. See, for example:

Madden, Mike, How Understandable are Adjudicative Decisions? Introducing and Applying Law’s Own Readability Formula (January 24, 2025). Forthcoming in (2026) 30 Legal Writing: The Journal of the Legal Writing Institute., Available at SSRN: https://ssrn.com/abstract=5110351 or http://dx.doi.org/10.2139/ssrn.5110351 



