### Notebook to load and analyze Refugee Appeal Division cases

Sean Rehaag

License: Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0). 

Dataset & Code to be cited as:

Sean Rehaag, "Refugee Appeal Division Bulk Decisions Dataset" (2023, updated 2025), online: Refugee Law Laboratory <https://github.com/Refugee-Law-Lab/rad_bulk_data>.

Notes:

(1) Data Source: Immigration and Refugee Board. In the Fall of 2022, the IRB added the Refugee Law Laboratory to their email distribution list for legal publishers of RAD decisions. The RLL therefore receives new RAD cases as they are released for publication by the IRB. Also, in the fall of 2022 the Immigration and Refugee Board provided the RLL with a full backlog of approximately 116k published decisions from all divisions (RAD, RPD, ID, IAD). 

(2) Unofficial Data: The data are unofficial reproductions. For official versions, please contact the Immigration and Refugee Board. 

(3) Non-Affiliation / Endorsement: The data has been collected and reproduced without any affiliation or endorsement from the Immigration and Refugee Board.

(4) Upstream licensing: Users must comply with upstream licenses.

(5) Accuracy: Data was collected and processed programmatically for the purposes of academic research. While we make best efforts to ensure accuracy, data gathering of this kind inevitably involves errors. As such the data should be viewed as preliminary information aimed to prompt further research and discussion, rather than as definitive information.

Acknowledgements: Thanks to Rafael Dolores for coding the parsing scripts.

### Requirements:

    pip install pandas

### If using parquet

    pip install pyarrow

### if loading remotely (other than via Hugging Face)
    
    pip install requests

### If loading remotely via Hugging Face

    pip install datasets
    

(Written on Python 3.9.12)

### Load Data

Two Options (see also API / MCP access here: https://github.com/a2aj-ca/canadian-legal-data)

In [4]:
# OPTION 1: Load Hugging Face dataset

from datasets import load_dataset
import pandas as pd

dataset = load_dataset("a2aj/canadian-case-law", data_dir="RAD", split="train")

# convert to dataframe
df = pd.DataFrame(dataset)

df.head()


Unnamed: 0,dataset,citation_en,citation2_en,name_en,document_date_en,url_en,scraped_timestamp_en,unofficial_text_en,citation_fr,citation2_fr,name_fr,document_date_fr,url_fr,scraped_timestamp_fr,unofficial_text_fr,upstream_license
0,RAD,VB4-01844,,,2014-06-25 00:00:00+00:00,1575413.txt,2023-11-13 02:03:13.439000+00:00,Immigration and\nRefugee Board of Canada\nRefu...,VB4-01844,,,2014-06-25 00:00:00+00:00,1575406.txt,2023-11-13 02:03:13.102000+00:00,\nN° de dossier de la SAR / RAD File No.: VB4-...,The A2AJ has obtained these documents directly...
1,RAD,VB4-01843,,,2014-06-20 00:00:00+00:00,1561010.txt,2023-11-13 02:02:19.623000+00:00,Immigration and\nRefugee Board of Canada\nRefu...,VB4-01843,,,2014-06-20 00:00:00+00:00,1560978.txt,2023-11-13 02:02:18.144000+00:00,\nN° de dossier de la SAR / RAD File No.: VB4-...,The A2AJ has obtained these documents directly...
2,RAD,VB4-01787,,,2014-07-10 00:00:00+00:00,1582557.txt,2023-11-13 02:03:57.013000+00:00,Immigration and\nRefugee Board of Canada\nRefu...,VB4-01787,,,2014-07-10 00:00:00+00:00,1582571.txt,2023-11-13 02:03:57.584000+00:00,\nN° de dossier de la SAR / RAD File No.: VB4-...,The A2AJ has obtained these documents directly...
3,RAD,VB4-01769,,,2014-12-18 00:00:00+00:00,1825444.txt,2023-11-13 02:28:07.167000+00:00,Immigration and\nRefugee Board of Canada\nRefu...,VB4-01769,,,2014-12-18 00:00:00+00:00,1825445.txt,2023-11-13 02:28:07.292000+00:00,Commission de l'immigration\net du statut de r...,The A2AJ has obtained these documents directly...
4,RAD,VB4-01661,,,2015-06-11 00:00:00+00:00,2261500.txt,2023-11-13 09:20:43.152000+00:00,Immigration and\nRefugee Board of Canada\nRefu...,VB4-01661,,,2015-06-11 00:00:00+00:00,2261501.txt,2023-11-13 09:20:43.300000+00:00,Commission de l'immigration\net du statut de r...,The A2AJ has obtained these documents directly...


In [5]:
# OPTION 2: Load parquet data remotely from Huggingface without cloning repo
import pandas as pd
import requests
from io import BytesIO

url = 'https://huggingface.co/datasets/a2aj/canadian-case-law/resolve/main/RAD/train.parquet'

# load data
results = requests.get(url)

# convert to dataframe
df = pd.read_parquet(BytesIO(results.content))

df

# (if code fails, add engine='pyarrow' to read_parquet() function)

Unnamed: 0,dataset,citation_en,citation2_en,name_en,document_date_en,url_en,scraped_timestamp_en,unofficial_text_en,citation_fr,citation2_fr,name_fr,document_date_fr,url_fr,scraped_timestamp_fr,unofficial_text_fr,upstream_license
0,RAD,VB4-01844,,,2014-06-25 00:00:00+00:00,1575413.txt,2023-11-13 02:03:13.439000+00:00,Immigration and\nRefugee Board of Canada\nRefu...,VB4-01844,,,2014-06-25 00:00:00+00:00,1575406.txt,2023-11-13 02:03:13.102000+00:00,\nN° de dossier de la SAR / RAD File No.: VB4-...,The A2AJ has obtained these documents directly...
1,RAD,VB4-01843,,,2014-06-20 00:00:00+00:00,1561010.txt,2023-11-13 02:02:19.623000+00:00,Immigration and\nRefugee Board of Canada\nRefu...,VB4-01843,,,2014-06-20 00:00:00+00:00,1560978.txt,2023-11-13 02:02:18.144000+00:00,\nN° de dossier de la SAR / RAD File No.: VB4-...,The A2AJ has obtained these documents directly...
2,RAD,VB4-01787,,,2014-07-10 00:00:00+00:00,1582557.txt,2023-11-13 02:03:57.013000+00:00,Immigration and\nRefugee Board of Canada\nRefu...,VB4-01787,,,2014-07-10 00:00:00+00:00,1582571.txt,2023-11-13 02:03:57.584000+00:00,\nN° de dossier de la SAR / RAD File No.: VB4-...,The A2AJ has obtained these documents directly...
3,RAD,VB4-01769,,,2014-12-18 00:00:00+00:00,1825444.txt,2023-11-13 02:28:07.167000+00:00,Immigration and\nRefugee Board of Canada\nRefu...,VB4-01769,,,2014-12-18 00:00:00+00:00,1825445.txt,2023-11-13 02:28:07.292000+00:00,Commission de l'immigration\net du statut de r...,The A2AJ has obtained these documents directly...
4,RAD,VB4-01661,,,2015-06-11 00:00:00+00:00,2261500.txt,2023-11-13 09:20:43.152000+00:00,Immigration and\nRefugee Board of Canada\nRefu...,VB4-01661,,,2015-06-11 00:00:00+00:00,2261501.txt,2023-11-13 09:20:43.300000+00:00,Commission de l'immigration\net du statut de r...,The A2AJ has obtained these documents directly...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
14017,RAD,MC2-18146,,,2023-03-31 00:00:00+00:00,MC2-18146ta.txt,2024-09-17 20:46:06.686000+00:00,\nRAD File No. / No de dossier de la SAR : MC2...,MC2-18146,,,2023-03-31 00:00:00+00:00,MC2-18146 f.txt,2024-09-17 20:46:06.443000+00:00,\nDossier de la SAR / RAD File : MC2-18146\nHu...,The A2AJ has obtained these documents directly...
14018,RAD,MC2-13211,,,2023-01-31 00:00:00+00:00,MC2-13211 a.txt,2024-09-17 20:46:03.632000+00:00,\nRAD File / Dossier de la SAR : MC2-13211\nMC...,MC2-13211,,,2023-01-31 00:00:00+00:00,MC2-13211tf.txt,2024-09-17 20:46:03.886000+00:00,\nDossier de la SAR / RAD File: MC2-13211\nMC2...,The A2AJ has obtained these documents directly...
14019,RAD,MC1-06191,,,2021-11-24 00:00:00+00:00,MC1-06191ta.txt,2024-09-11 15:10:40.219000+00:00,\nRAD File No. / No de dossier de la SAR : MC1...,MC1-06191,,,2021-11-24 00:00:00+00:00,MC1-06191 f.txt,2024-09-11 15:10:40.025000+00:00,\nDossier de la SAR / RAD File : MC1-06191\nHu...,The A2AJ has obtained these documents directly...
14020,RAD,VC4-07625,,,2024-07-22 00:00:00+00:00,VC4-07625 a.txt,2025-07-24 21:58:49.457000+00:00,\nRAD File / Dossier de la SAR : VC4-07625\nVC...,VC4-07625,,,2024-07-22 00:00:00+00:00,VC4-07625 tf.txt,2025-07-24 21:58:49.741000+00:00,\nDossier de la SAR / RAD File: VC4-07625\nVC4...,The A2AJ has obtained these documents directly...


### Analyze Data

In [6]:
# View dataframe
df.head()

Unnamed: 0,dataset,citation_en,citation2_en,name_en,document_date_en,url_en,scraped_timestamp_en,unofficial_text_en,citation_fr,citation2_fr,name_fr,document_date_fr,url_fr,scraped_timestamp_fr,unofficial_text_fr,upstream_license
0,RAD,VB4-01844,,,2014-06-25 00:00:00+00:00,1575413.txt,2023-11-13 02:03:13.439000+00:00,Immigration and\nRefugee Board of Canada\nRefu...,VB4-01844,,,2014-06-25 00:00:00+00:00,1575406.txt,2023-11-13 02:03:13.102000+00:00,\nN° de dossier de la SAR / RAD File No.: VB4-...,The A2AJ has obtained these documents directly...
1,RAD,VB4-01843,,,2014-06-20 00:00:00+00:00,1561010.txt,2023-11-13 02:02:19.623000+00:00,Immigration and\nRefugee Board of Canada\nRefu...,VB4-01843,,,2014-06-20 00:00:00+00:00,1560978.txt,2023-11-13 02:02:18.144000+00:00,\nN° de dossier de la SAR / RAD File No.: VB4-...,The A2AJ has obtained these documents directly...
2,RAD,VB4-01787,,,2014-07-10 00:00:00+00:00,1582557.txt,2023-11-13 02:03:57.013000+00:00,Immigration and\nRefugee Board of Canada\nRefu...,VB4-01787,,,2014-07-10 00:00:00+00:00,1582571.txt,2023-11-13 02:03:57.584000+00:00,\nN° de dossier de la SAR / RAD File No.: VB4-...,The A2AJ has obtained these documents directly...
3,RAD,VB4-01769,,,2014-12-18 00:00:00+00:00,1825444.txt,2023-11-13 02:28:07.167000+00:00,Immigration and\nRefugee Board of Canada\nRefu...,VB4-01769,,,2014-12-18 00:00:00+00:00,1825445.txt,2023-11-13 02:28:07.292000+00:00,Commission de l'immigration\net du statut de r...,The A2AJ has obtained these documents directly...
4,RAD,VB4-01661,,,2015-06-11 00:00:00+00:00,2261500.txt,2023-11-13 09:20:43.152000+00:00,Immigration and\nRefugee Board of Canada\nRefu...,VB4-01661,,,2015-06-11 00:00:00+00:00,2261501.txt,2023-11-13 09:20:43.300000+00:00,Commission de l'immigration\net du statut de r...,The A2AJ has obtained these documents directly...


In [7]:
df.tail()

Unnamed: 0,dataset,citation_en,citation2_en,name_en,document_date_en,url_en,scraped_timestamp_en,unofficial_text_en,citation_fr,citation2_fr,name_fr,document_date_fr,url_fr,scraped_timestamp_fr,unofficial_text_fr,upstream_license
14017,RAD,MC2-18146,,,2023-03-31 00:00:00+00:00,MC2-18146ta.txt,2024-09-17 20:46:06.686000+00:00,\nRAD File No. / No de dossier de la SAR : MC2...,MC2-18146,,,2023-03-31 00:00:00+00:00,MC2-18146 f.txt,2024-09-17 20:46:06.443000+00:00,\nDossier de la SAR / RAD File : MC2-18146\nHu...,The A2AJ has obtained these documents directly...
14018,RAD,MC2-13211,,,2023-01-31 00:00:00+00:00,MC2-13211 a.txt,2024-09-17 20:46:03.632000+00:00,\nRAD File / Dossier de la SAR : MC2-13211\nMC...,MC2-13211,,,2023-01-31 00:00:00+00:00,MC2-13211tf.txt,2024-09-17 20:46:03.886000+00:00,\nDossier de la SAR / RAD File: MC2-13211\nMC2...,The A2AJ has obtained these documents directly...
14019,RAD,MC1-06191,,,2021-11-24 00:00:00+00:00,MC1-06191ta.txt,2024-09-11 15:10:40.219000+00:00,\nRAD File No. / No de dossier de la SAR : MC1...,MC1-06191,,,2021-11-24 00:00:00+00:00,MC1-06191 f.txt,2024-09-11 15:10:40.025000+00:00,\nDossier de la SAR / RAD File : MC1-06191\nHu...,The A2AJ has obtained these documents directly...
14020,RAD,VC4-07625,,,2024-07-22 00:00:00+00:00,VC4-07625 a.txt,2025-07-24 21:58:49.457000+00:00,\nRAD File / Dossier de la SAR : VC4-07625\nVC...,VC4-07625,,,2024-07-22 00:00:00+00:00,VC4-07625 tf.txt,2025-07-24 21:58:49.741000+00:00,\nDossier de la SAR / RAD File: VC4-07625\nVC4...,The A2AJ has obtained these documents directly...
14021,RAD,VC3-14129,,,2024-03-04 00:00:00+00:00,VC3-14129 a.txt,2024-07-12 18:44:43.974000+00:00,\nRAD File / Dossier de la SAR : VC3-14129\nPr...,VC3-14129,,,2024-03-04 00:00:00+00:00,VC3-14129tf.txt,2024-07-12 18:44:44.255000+00:00,\nDossier de la SAR / RAD File: VC3-14129\nHui...,The A2AJ has obtained these documents directly...
