### Notebook to load and analyze Federal Court cases

Sean Rehaag

License: Creative Commons Attribution-NonCommercial 4.0 International [(CC BY-NC 4.0)](https://creativecommons.org/licenses/by-nc/4.0/)

Dataset & Code to be cited as: 

    Sean Rehaag, "Federal Court Bulk Decisions Dataset" (2023), online: Refugee Law Laboratory <https://refugeelab.ca/bulk-data/fc/>.

Notes:

(1) Data Source: [Federal Court](https://www.fct-cf.gc.ca). 

(2) Unofficial Data: The data are unofficial reproductions of materials on the Federal Court website. Links to official versions are included in the dataset.

(3) Non-Affiliation / Endorsement: The data has been collected and reproduced without any affiliation or endorsement from the Federal Court.

(4) Non-Commerical Use: As indicated in the license, data may be used for non-commercial use (with attribution) only. For commercial use, see the Federal Court of Appeal website's [Terms of Use](https://www.fct-cf.gc.ca/en/pages/important-notices).

(5) Accuracy: Data was collected and processed programmatically for the purposes of academic research. While we make best efforts to ensure accuracy, data gathering of this kind inevitably involves errors. As such the data should be viewed as preliminary information aimed to prompt further research and discussion, rather than as definitive information. 

(6) Limitation: Only includes cases with neutral citation, which began to be used in 2001

(7) Delay: Decisions may take many months to be translated (sometimes over a year). As a result, in the most recent years, decisions may only be available in one language.

### Requirements:

    pip install pandas
    pip install requests

(Written on Python 3.9.12)


In [2]:
# import libraries
import pandas as pd
import json
import pathlib
import requests

# Set variables
start_year = 2001  # First year of data sought (2001 +)
end_year = 2023  # Last year of data sought (2023 -)
language = None  # language of cases sought as a list ('en', 'fr', or None for both)

# rework language to lists
languages_sought = ['en', 'fr'] if language is None else [language]


### Load Data

Two Options: Local & Remote

In [3]:
# OPTION 1: Load data locally via cloned repo

# First, clone git repo
# Then run this code to load data

# Set path to data
data_path = pathlib.Path('DATA/YEARLY/')

# load data
results = []
for year in range(start_year, end_year+1):
    for language in languages_sought:
        with open(data_path / f'{year}_{language}.json') as f:
            results.extend(json.load(f))

# convert to dataframe
df = pd.DataFrame(results)


In [3]:
# OPTION 2: Load data remotely from GitHub without cloning repo
# Note: load time varies depending on internet connection (approx 1.5 GB of data for all years/languages)

base_ulr = 'https://raw.githubusercontent.com/Refugee-Law-Lab/fc_bulk_data/master/DATA/YEARLY/'

# load data
results = []
for year in range(start_year, end_year+1):
    for language in languages_sought:
        url = base_ulr + f'{year}_{language}.json'
    results.extend(requests.get(url).json())

# convert to dataframe
df = pd.DataFrame(results)

# filter by language if applicable
if language:
    df = df[df['language'] == language]

### Analyze Data

In [4]:
# View dataframe
df.head()

Unnamed: 0,citation,year,name,language,decision_date,source_url,scraped_timestamp,citation2,unofficial_text
0,2001 FCT 1,2001,Adecon Ship Management Inc. v. Cuba,en,2001-02-01,https://decisions.fct-cf.gc.ca/fc-cf/decisions...,2022-08-18,,Adecon Ship Management Inc. v. Cuba\nCourt (s)...
1,2001 FCT 10,2001,Islam v. Canada (Minister of Citizenship and I...,en,2001-02-02,https://decisions.fct-cf.gc.ca/fc-cf/decisions...,2022-08-18,,Islam v. Canada (Minister of Citizenship and I...
2,2001 FCT 100,2001,Duterville v. Canada,en,2001-02-20,https://decisions.fct-cf.gc.ca/fc-cf/decisions...,2022-08-18,,Duterville v. Canada\nCourt (s) Database\nFede...
3,2001 FCT 1000,2001,LS Entertainment Group Inc. v. KALOS VISION LT...,en,2001-09-07,https://decisions.fct-cf.gc.ca/fc-cf/decisions...,2022-08-18,,LS Entertainment Group Inc. v. KALOS VISION LT...
4,2001 FCT 1001,2001,Ay v. Canada (Minister of Citizenship and Immi...,en,2001-09-07,https://decisions.fct-cf.gc.ca/fc-cf/decisions...,2022-08-18,,Ay v. Canada (Minister of Citizenship and Immi...


In [5]:
df.tail()

Unnamed: 0,citation,year,name,language,decision_date,source_url,scraped_timestamp,citation2,unofficial_text
59945,2023 CF 853,2023,Gosselin c. Canada (Procureur général),fr,2023-06-16,https://decisions.fct-cf.gc.ca/fc-cf/decisions...,2023-07-03,,Gosselin c. Canada (Procureur général)\nBase d...
59946,2023 CF 861,2023,Stoica c. Canada (Sécurité puplique et Protect...,fr,2023-06-20,https://decisions.fct-cf.gc.ca/fc-cf/decisions...,2023-07-03,,Stoica c. Canada (Sécurité puplique et Protect...
59947,2023 CF 893,2023,"Voltage Pictures, LLC c. Salna",fr,2023-06-26,https://decisions.fct-cf.gc.ca/fc-cf/decisions...,2023-07-03,,"Voltage Pictures, LLC c. Salna\nBase de donnée..."
59948,2023 CF 98,2023,Boloh 1(A) c. Canada,fr,2023-01-20,https://decisions.fct-cf.gc.ca/fc-cf/decisions...,2023-06-27,,Boloh 1(A) c. Canada\nBase de données – Cour (...
59949,2023 FC 334,2023,Agnant c. Canada (Sécurité publique et Protect...,fr,2023-03-13,https://decisions.fct-cf.gc.ca/fc-cf/decisions...,2023-06-27,,Agnant c. Canada (Sécurité publique et Protect...


In [6]:
# language counts
df['language'].value_counts()

en    30490
fr    29460
Name: language, dtype: int64

In [7]:
# Yearly counts
year_counts = df.year.value_counts()
years_count = sorted(year_counts.index)
for year_count in years_count:
    print(f'{year_count}: {year_counts[year_count]}')


2001: 2807
2002: 2639
2003: 2962
2004: 3509
2005: 3409
2006: 3071
2007: 2740
2008: 2798
2009: 2501
2010: 2580
2011: 2905
2012: 2748
2013: 2286
2014: 2190
2015: 2610
2016: 2481
2017: 2160
2018: 2263
2019: 2692
2020: 2195
2021: 2712
2022: 2868
2023: 824
