### Notebook to load and analyze Canadian Human Rights Tribunal cases

Author: Sean Rehaag

License: Creative Commons Attribution-NonCommercial 4.0 International [(CC BY-NC 4.0)](https://creativecommons.org/licenses/by-nc/4.0/). NOTE: Users must also comply with upstream [licensing](https://www.chrt-tcdp.gc.ca/transparency/terms-and-conditions-en.html) for the CHRT data source, as well as requests on source urls not to allow indexing of the documents by search engines to protect privacy. As a result, users must not make the data available in formats or locations that can be indexed by search engines.

Dataset & Code to be cited as: 

    Sean Rehaag, "Canadian Human Rights Tribunal Bulk Decisions Dataset" (2023), online: Refugee Law Laboratory <https://refugeelab.ca/bulk-data/chrt>.

To load data, see [load_and_analyze_chrt_cases.ipynb](https://github.com/Refugee-Law-Lab/chrt_bulk_data/blob/master/load_and_analyze_chrt_cases.ipynb)

### Notes:

(1) Data Source: [Canadian Human Rights Tribunal](https://www.chrt-tcdp.gc.ca). 

(2) Unofficial Data: The data are unofficial reproductions of materials on the Canadian Human Rights Tribunal website. Links to official versions are included in the dataset.

(3) Non-Affiliation / Endorsement: The data has been collected and reproduced without any affiliation or endorsement from the Canadian Human Rights Tribunal.

(4) Non-Commerical Use: As indicated in the license, data may be used for non-commercial use (with attribution) only. For commercial use, see the Canadian Human Rights Tribunal website's [Terms of Use](https://www.chrt-tcdp.gc.ca/transparency/terms-and-conditions-en.html).

(5) Accuracy: Data was collected and processed programmatically for the purposes of academic research. While we make best efforts to ensure accuracy, data gathering of this kind inevitably involves errors. As such the data should be viewed as preliminary information aimed to prompt further research and discussion, rather than as definitive information. 

### Requirements:

    pip install pandas

### If using parquet

    pip install pyarrow

### if loading remotely (other than via Hugging Face)
    
    pip install requests

### If loading remotely via Hugging Face

    pip install datasets
    

(Written on Python 3.9.12)


### Load Data

Four Options: Local & Remote

In [None]:
# OPTION 1: Load Hugging Face dataset

from datasets import load_dataset
import pandas as pd

dataset = load_dataset("refugee-law-lab/canadian-legal-data", split="train", data_dir="SCC")

# convert to dataframe
df = pd.DataFrame(dataset)
df


In [18]:
# OPTION 2: Load parquet data remotely from Huggingface without cloning repo
import pandas as pd
import requests
from io import BytesIO

url = 'https://huggingface.co/datasets/refugee-law-lab/canadian-legal-data/resolve/main/SCC/train.parquet'

# load data
results = requests.get(url)

# convert to dataframe
df = pd.read_parquet(BytesIO(results.content))

# (if code fails, add engine='pyarrow' to read_parquet() function)

In [4]:
# OPTION 3: Load json data remotely from GitHub without cloning repo

import pandas as pd
import json
import requests

# Set variables
start_year = 2003  # First year of data sought (2003 +)
end_year = 2023  # Last year of data sought (2023 -)


base_ulr = 'https://raw.githubusercontent.com/Refugee-Law-Lab/chrt_bulk_data/master/DATA/YEARLY/'

# load data
results = []
for year in range(start_year, end_year+1):
    url = base_ulr + f'{year}.json'
    results.extend(requests.get(url).json())

# convert to dataframe
df = pd.DataFrame(results)


In [2]:
# OPTION 4: Load json data locally via cloned repo

# First, clone git repo to local machine
# Then run this code to load data

import pandas as pd
import json
import pathlib

# Set variables
start_year = 2003  # First year of data sought (2003 +)
end_year = 2023  # Last year of data sought (2023 -)


# Set path to data
data_path = pathlib.Path('DATA/YEARLY/')

# load data (all years, json files)
results = []
for year in range(start_year, end_year+1):
    with open(data_path / f'{year}.json') as f:
        results.extend(json.load(f))

# convert to dataframe
df = pd.DataFrame(results)

### Analyze Data

In [5]:
# View dataframe
df

Unnamed: 0,citation,citation2,dataset,year,name,language,document_date,source_url,scraped_timestamp,unofficial_text,other
0,2003 CHRT 1,,SCC,2003,"Communications, Energy and paperworkers union ...",en,2003-01-10,https://decisions.chrt-tcdp.gc.ca/chrt-tcdp/de...,2023-12-01,"Communications, Energy and paperworkers union ...",
1,2003 CHRT 10,,SCC,2003,Parisien v. Ottawa-Carleton Regional Transit,en,2003-03-06,https://decisions.chrt-tcdp.gc.ca/chrt-tcdp/de...,2023-12-01,Parisien v. Ottawa-Carleton Regional Transit\n...,
2,2003 CHRT 11,,SCC,2003,Hodgins v. Transport North American Express Inc.,en,2003-03-06,https://decisions.chrt-tcdp.gc.ca/chrt-tcdp/de...,2023-12-01,Hodgins v. Transport North American Express In...,
3,2003 CHRT 12,,SCC,2003,Day v. Department of National Defence and Mich...,en,2003-03-07,https://decisions.chrt-tcdp.gc.ca/chrt-tcdp/de...,2023-12-01,Day v. Department of National Defence and Mich...,
4,2003 CHRT 13,,SCC,2003,Day v. Canada (Department of National Defence),en,2003-03-12,https://decisions.chrt-tcdp.gc.ca/chrt-tcdp/de...,2023-12-01,Day v. Canada (Department of National Defence)...,
...,...,...,...,...,...,...,...,...,...,...,...
1657,2023 TCDP 51,,SCC,2023,Richards c. Service correctionnel Canada,fr,2023-11-06,https://decisions.chrt-tcdp.gc.ca/chrt-tcdp/de...,2023-12-01,Richards c. Service correctionnel Canada\nColl...,
1658,2023 TCDP 6,,SCC,2023,Dorais c. Canadian Armed Forces,fr,2023-02-09,https://decisions.chrt-tcdp.gc.ca/chrt-tcdp/de...,2023-12-01,Dorais c. Canadian Armed Forces\nCollection\nT...,
1659,2023 TCDP 7,,SCC,2023,Nienhuis c. Service correctionnel du Canada,fr,2023-02-07,https://decisions.chrt-tcdp.gc.ca/chrt-tcdp/de...,2023-12-01,Nienhuis c. Service correctionnel du Canada\nC...,
1660,2023 TCDP 8,,SCC,2023,Dicks c. Randall,fr,2023-03-02,https://decisions.chrt-tcdp.gc.ca/chrt-tcdp/de...,2023-12-01,Dicks c. Randall\nCollection\nTribunal canadie...,


In [6]:
# language counts
df['language'].value_counts()

language
en    835
fr    827
Name: count, dtype: int64

In [7]:
# Yearly counts
year_counts = df.year.value_counts()
years_count = sorted(year_counts.index)
for year_count in years_count:
    print(f'{year_count}: {year_counts[year_count]}')


2003: 90
2004: 78
2005: 92
2006: 115
2007: 112
2008: 100
2009: 73
2010: 68
2011: 46
2012: 60
2013: 71
2014: 69
2015: 46
2016: 41
2017: 77
2018: 66
2019: 102
2020: 80
2021: 86
2022: 88
2023: 102


In [8]:
# select 5 random rows from df_unique, iterate through them and print unofficial text
import random
random.seed(999)

random_rows = random.sample(range(0, len(df)), 5)
for row in random_rows:
    print('##################################')
    print(df.iloc[row]['citation'])
    print(df.iloc[row]['source_url'])
    print(df.iloc[row]['document_date'])
    print(df.iloc[row]['year'])
    print(df.iloc[row]['language'])
    print('##################################')
    print()
    print(df.iloc[row]['unofficial_text'])
    print()
    print()
    print('____________________________________________________________________________')
    print()
    print()
    print()


##################################
2023 CHRT 47
https://decisions.chrt-tcdp.gc.ca/chrt-tcdp/decisions/en/item/520993/index.do
2023-10-19
2023
en
##################################

Foley v. HSBC Bank Canada
Collection
Canadian Human Rights Tribunal
Date
2023-10-19
Neutral citation
2023 CHRT 47
File number(s)
T2492/4920
Decision-maker(s)
Khurana, Jennifer
Decision type
Ruling
Grounds
Disability
Sexual Orientation
Notes
This decision has been issued to the parties but not yet published on our website. It is currently available in its original language only: English. If you want a copy of this decision in its original language, please contact the Registry. The decision will be published in both official languages once the translation is finalized, as required by the Official Languages Act.
Summary:
Jonathan Foley wants the Tribunal to reopen his case. The Tribunal closed his case when Mr. Foley didn’t respond to the Tribunal’s messages.
Mr. Foley told the Tribunal he was in Spain and coul