### Notebook to load and analyze Supreme Court of Canada cases

Sean Rehaag

License: Creative Commons Attribution-NonCommercial 4.0 International [(CC BY-NC 4.0)](https://creativecommons.org/licenses/by-nc/4.0/)

Dataset & Code to be cited as: 

    Sean Rehaag, "Supreme Court of Canada Bulk Decisions Dataset" (2023), online: Refugee Law Laboratory <https://refugeelab.ca/bulk-data/scc/>.

Notes:

(1) Data Source: [Supreme Court of Canada](https://www.scc-csc.ca). 

(2) Unofficial Data: The data are unofficial reproductions of materials on the Supreme Court of Canada website. Links to official versions are included in the dataset.

(3) Non-Affiliation / Endorsement: The data has been collected and reproduced without any affiliation or endorsement from the Supreme Court of Canada

(4) Non-Commerical Use: As indicated in the license, data may be used for non-commercial use (with attribution) only. For commercial use, see the Supreme Court of Canada website's [Terms of Use](https://www.scc-csc.ca/terms-avis/notice-enonce-eng.aspx).

(5) Accuracy: Data was collected and processed programmatically for the purposes of academic research. While we make best efforts to ensure accuracy, data gathering of this kind inevitably involves errors. As such the data should be viewed as preliminary information aimed to prompt further research and discussion, rather than as definitive information. 

### Requirements:

    pip install pandas
    pip install requests

(Written on Python 3.9.12)


In [1]:
# import libraries
import pandas as pd
import json
import pathlib
import requests

# Set variables
start_year = 1887  # First year of data sought (1887 +)
end_year = 2022  # Last year of data sought (2022 -)
language = None  # language of cases sought ('en', 'fr', or None for both)


### Load Data

Two Options: Local & Remote

In [2]:
# OPTION 1: Load data locally via repo

# Set path to data
data_path = pathlib.Path('DATA/YEARLY/')

# load data (all years, json files)
results = []
for year in range(start_year, end_year+1):
    with open(data_path / f'{year}.json') as f:
        results.extend(json.load(f))

# convert to dataframe
df = pd.DataFrame(results)

# filter by language if applicable
if language:
    df = df[df['language'] == language]

In [3]:
# OPTION 2: Load data Remotely from GitHub 
# Note: load time varies depending on internet connection (approx 1 GB of data)

base_ulr = 'https://raw.githubusercontent.com/Refugee-Law-Lab/scc_bulk_data/master/DATA/YEARLY/'

# load data
results = []
for year in range(start_year, end_year+1):
    url = base_ulr + f'{year}.json'
    results.extend(requests.get(url).json())

# convert to dataframe
df = pd.DataFrame(results)

# filter by language if applicable
if language:
    df = df[df['language'] == language]

### Analyze Data

In [4]:
# View dataframe
df

Unnamed: 0,citation,citation2,year,name,language,decision_date,source_url,scraped_timestamp,unofficial_text
0,(1887) 13 SCR 441,,1887,City of Winnipeg v. Wright,en,1887-05-11,https://decisions.scc-csc.ca/scc-csc/scc-csc/e...,2022-08-31,City of Winnipeg v. Wright\nCollection\nSuprem...
1,(1887) 13 SCR 469,,1887,Ball v. Crompton Corset Co.,en,1887-03-01,https://decisions.scc-csc.ca/scc-csc/scc-csc/e...,2022-08-31,Ball v. Crompton Corset Co.\nCollection\nSupre...
2,(1887) 13 SCR 577,,1887,St. Catharines Milling and Lumber Co. v. R.,en,1887-06-20,https://decisions.scc-csc.ca/scc-csc/scc-csc/e...,2022-08-31,St. Catharines Milling and Lumber Co. v. R.\nC...
3,(1887) 14 SCR 105,,1887,Canadian Pacific Ry. Co. v. Robinson,en,1887-06-20,https://decisions.scc-csc.ca/scc-csc/scc-csc/e...,2022-08-31,Canadian Pacific Ry. Co. v. Robinson\nCollecti...
4,(1887) 14 SCR 217,,1887,Fairbanks v. Barlow,en,1887-03-14,https://decisions.scc-csc.ca/scc-csc/scc-csc/e...,2022-08-31,Fairbanks v. Barlow\nCollection\nSupreme Court...
...,...,...,...,...,...,...,...,...,...
15229,2022 CSC 54,,2022,R. c. Beaver,fr,2022-12-09,https://decisions.scc-csc.ca/scc-csc/scc-csc/f...,2023-04-13,R. c. Beaver\nCollection\nJugements de la Cour...
15230,2022 CSC 6,,2022,Anderson c. Alberta,fr,2022-03-18,https://decisions.scc-csc.ca/scc-csc/scc-csc/f...,2022-09-01,Anderson c. Alberta\nCollection\nJugements de ...
15231,2022 CSC 7,,2022,R. c. White,fr,2022-03-18,https://decisions.scc-csc.ca/scc-csc/scc-csc/f...,2022-09-01,R. c. White\nCollection\nJugements de la Cour ...
15232,2022 CSC 8,,2022,R. c. Pope,fr,2022-03-21,https://decisions.scc-csc.ca/scc-csc/scc-csc/f...,2022-09-01,R. c. Pope\nCollection\nJugements de la Cour s...


In [5]:
# language counts
df['language'].value_counts()

en    10358
fr     4876
Name: language, dtype: int64

In [6]:
# Yearly counts
year_counts = df.year.value_counts()
years_count = sorted(year_counts.index)
for year_count in years_count:
    print(f'{year_count}: {year_counts[year_count]}')


1887: 44
1888: 43
1889: 55
1890: 48
1891: 71
1892: 90
1893: 54
1894: 70
1895: 99
1896: 67
1897: 80
1898: 70
1899: 70
1900: 65
1901: 64
1902: 79
1903: 70
1904: 72
1905: 82
1906: 72
1907: 82
1908: 77
1909: 64
1910: 65
1911: 63
1912: 51
1913: 56
1914: 59
1915: 62
1916: 71
1917: 79
1918: 96
1919: 81
1920: 74
1921: 63
1922: 65
1923: 47
1924: 67
1925: 78
1926: 68
1927: 97
1928: 92
1929: 83
1930: 78
1931: 82
1932: 70
1933: 71
1934: 72
1935: 48
1936: 47
1937: 57
1938: 44
1939: 44
1940: 58
1941: 47
1942: 47
1943: 53
1944: 55
1945: 47
1946: 43
1947: 34
1948: 48
1949: 54
1950: 44
1951: 49
1952: 50
1953: 62
1954: 62
1955: 76
1956: 80
1957: 72
1958: 72
1959: 88
1960: 79
1961: 86
1962: 80
1963: 94
1964: 77
1965: 83
1966: 74
1967: 97
1968: 114
1969: 127
1970: 204
1971: 196
1972: 190
1973: 208
1974: 228
1975: 230
1976: 218
1977: 250
1978: 232
1979: 274
1980: 224
1981: 224
1982: 236
1983: 180
1984: 126
1985: 168
1986: 152
1987: 188
1988: 212
1989: 266
1990: 286
1991: 218
1992: 216
1993: 278
1994: 222
1