### Notebook to load and analyze Canadian Federal Regulations

Sean Rehaag

License: Creative Commons Attribution-NonCommercial 4.0 International [(CC BY-NC 4.0)](https://creativecommons.org/licenses/by-nc/4.0/). NOTE: Users must also comply with upstream [licensing](https://www.justice.gc.ca/eng/terms-avis/index.html) for the data source.

Dataset & Code to be cited as: 

    Sean Rehaag, "Federal Regulations Bulk Decisions Dataset" (2024), online: Refugee Law Laboratory <https://refugeelab.ca/bulk-data/regulations-fed/>.

Notes:

(1) Data Source: [Department of Justice Github](https://github.com/justicecanada/laws-lois-xml) & [Department of Justice Website](https://laws-lois.justice.gc.ca).

(2) Unofficial Data: The data are unofficial reproductions of materials available on the Department Justice's Consolidated Acts and Regulations of Canada website. Official versions are available [here](https://laws-lois.justice.gc.ca/eng/regulations/).

(3) Non-Affiliation / Endorsement: The data has been collected and reproduced without any affiliation or endorsement from the Government of Canada.

(4) Non-Commerical Use: As indicated in the license, data may be used for non-commercial use (with attribution) only. For commercial use, see the Department of Justice website's [Terms of Use](https://www.justice.gc.ca/eng/terms-avis/index.html).

(5) Accuracy: Data was collected and processed programmatically for the purposes of academic research. While we make best efforts to ensure accuracy, data gathering of this kind inevitably involves errors. As such the data should be viewed as preliminary information aimed to prompt further research and discussion, rather than as definitive information. 

### Requirements:

    pip install pandas

### If using parquet

    pip install pyarrow

### if loading remotely (other than via Hugging Face)
    
    pip install requests

### If loading remotely via Hugging Face

    pip install datasets
    

(Written on Python 3.11.4)


### Load Data

Four Options: Local & Remote

In [None]:
# OPTION 1: Load Hugging Face dataset

from datasets import load_dataset
import pandas as pd

dataset = load_dataset("refugee-law-lab/canadian-legal-data", split="train", data_dir="LEGISLATION-FED")

# convert to dataframe
df = pd.DataFrame(dataset)
df


In [None]:
# OPTION 2: Load parquet data remotely from Huggingface without cloning repo
import pandas as pd
import requests
from io import BytesIO

url = 'https://huggingface.co/datasets/refugee-law-lab/canadian-legal-data/resolve/main/LEGISLATION-FED/train.parquet'

# load data
results = requests.get(url)

# convert to dataframe
df = pd.read_parquet(BytesIO(results.content))

# (if code fails, add engine='pyarrow' to read_parquet() function)

In [8]:
# OPTION 3: Load json data remotely from GitHub without cloning repo

import pandas as pd

# load english data
url = 'https://raw.githubusercontent.com/Refugee-Law-Lab/legislation-fed-bulk-data/main/DATA/df_acts_en.json'
df = pd.read_json(url, orient='records', lines=True)

# load french data
url = 'https://raw.githubusercontent.com/Refugee-Law-Lab/legislation-fed-bulk-data/main/DATA/df_acts_fr.json'
df2 = pd.read_json(url, orient='records', lines=True)

#combine both dataframes
df = pd.concat([df, df2], ignore_index=True)

df



Unnamed: 0,citation,citation2,dataset,year,name,language,document_date,source_url,scraped_timestamp,unofficial_text,other
0,"SC 2019, c 10",A-0.6,LEGISLATION-FED,2019,Accessible Canada Act,en,2019-06-21,https://github.com/justicecanada/laws-lois-xml...,2024-04-19,"# Accessible Canada Act\n\nSC 2019, c 10\n\nAn...",
1,"SC 2018, c 27, s 675",A-1.3,LEGISLATION-FED,2018,Addition of Lands to Reserves and Reserve Crea...,en,2018-12-13,https://github.com/justicecanada/laws-lois-xml...,2024-04-19,# Addition of Lands to Reserves and Reserve Cr...,
2,"SC 2014, c 20, s 376",A-1.5,LEGISLATION-FED,2014,Administrative Tribunals Support Service of Ca...,en,2014-06-19,https://github.com/justicecanada/laws-lois-xml...,2024-04-19,# Administrative Tribunals Support Service of ...,
3,"RSC 1985, c A-1",A-1,LEGISLATION-FED,1988,Access to Information Act,en,1988-12-12,https://github.com/justicecanada/laws-lois-xml...,2024-04-19,"# Access to Information Act\n\nRSC 1985, c A-1...",
4,"RSC 1985, c 35 (4th Supp)",A-10.1,LEGISLATION-FED,1989,Air Canada Public Participation Act,en,1989-11-01,https://github.com/justicecanada/laws-lois-xml...,2024-04-19,# Air Canada Public Participation Act\n\nRSC 1...,
...,...,...,...,...,...,...,...,...,...,...,...
1869,"LRC 1985, c Y-4",Y-4,LEGISLATION-FED,1988,Loi sur l'extraction du quartz dans le Yukon,fr,1988-12-12,https://github.com/justicecanada/laws-lois-xml...,2024-04-19,# Loi sur l'extraction du quartz dans le Yukon...,
1870,"LC 1992, c 1",Z-0.91,LEGISLATION-FED,1992,Loi corrective de 1991,fr,1992-02-28,https://github.com/justicecanada/laws-lois-xml...,2024-04-19,"# Loi corrective de 1991\n\nLC 1992, c 1\n\nLo...",
1871,"SRC 1952, c 89",Z-0.98,LEGISLATION-FED,1952,Loi fédérale sur les droits successoraux,fr,1952-01-01,https://github.com/justicecanada/laws-lois-xml...,2024-04-19,# Loi fédérale sur les droits successoraux\n\n...,
1872,"SC 1950-51, c 2",Z-040,LEGISLATION-FED,1950,Loi de 1950 sur les forces canadiennes,fr,1950-09-09,https://github.com/justicecanada/laws-lois-xml...,2024-04-19,# Loi de 1950 sur les forces canadiennes\n\nSC...,


In [1]:
# OPTION 4: Load json data locally via cloned repo

# First, clone git repo to local machine
# Then run this code to load data

import pandas as pd

# load english data
file_path = 'DATA/df_acts_en.json'
df = pd.read_json(file_path, orient='records', lines=True)

# load french data
file_path = 'DATA/df_acts_fr.json'
df2 = pd.read_json(file_path, orient='records', lines=True)

#combine both dataframes
df = pd.concat([df, df2], ignore_index=True)

df

Unnamed: 0,citation,citation2,dataset,year,name,language,document_date,source_url,scraped_timestamp,unofficial_text,other
0,"SC 2019, c 10",A-0.6,LEGISLATION-FED,2019,Accessible Canada Act,en,2019-06-21,https://github.com/justicecanada/laws-lois-xml...,2024-04-19,"# Accessible Canada Act\n\nSC 2019, c 10\n\nAn...",
1,"SC 2018, c 27, s 675",A-1.3,LEGISLATION-FED,2018,Addition of Lands to Reserves and Reserve Crea...,en,2018-12-13,https://github.com/justicecanada/laws-lois-xml...,2024-04-19,# Addition of Lands to Reserves and Reserve Cr...,
2,"SC 2014, c 20, s 376",A-1.5,LEGISLATION-FED,2014,Administrative Tribunals Support Service of Ca...,en,2014-06-19,https://github.com/justicecanada/laws-lois-xml...,2024-04-19,# Administrative Tribunals Support Service of ...,
3,"RSC 1985, c A-1",A-1,LEGISLATION-FED,1988,Access to Information Act,en,1988-12-12,https://github.com/justicecanada/laws-lois-xml...,2024-04-19,"# Access to Information Act\n\nRSC 1985, c A-1...",
4,"RSC 1985, c 35 (4th Supp)",A-10.1,LEGISLATION-FED,1989,Air Canada Public Participation Act,en,1989-11-01,https://github.com/justicecanada/laws-lois-xml...,2024-04-19,# Air Canada Public Participation Act\n\nRSC 1...,
...,...,...,...,...,...,...,...,...,...,...,...
1869,"LRC 1985, c Y-4",Y-4,LEGISLATION-FED,1988,Loi sur l'extraction du quartz dans le Yukon,fr,1988-12-12,https://github.com/justicecanada/laws-lois-xml...,2024-04-19,# Loi sur l'extraction du quartz dans le Yukon...,
1870,"LC 1992, c 1",Z-0.91,LEGISLATION-FED,1992,Loi corrective de 1991,fr,1992-02-28,https://github.com/justicecanada/laws-lois-xml...,2024-04-19,"# Loi corrective de 1991\n\nLC 1992, c 1\n\nLo...",
1871,"SRC 1952, c 89",Z-0.98,LEGISLATION-FED,1952,Loi fédérale sur les droits successoraux,fr,1952-01-01,https://github.com/justicecanada/laws-lois-xml...,2024-04-19,# Loi fédérale sur les droits successoraux\n\n...,
1872,"SC 1950-51, c 2",Z-040,LEGISLATION-FED,1950,Loi de 1950 sur les forces canadiennes,fr,1950-09-09,https://github.com/justicecanada/laws-lois-xml...,2024-04-19,# Loi de 1950 sur les forces canadiennes\n\nSC...,


### Analyze Data

In [2]:
# View dataframe
df

Unnamed: 0,citation,citation2,dataset,year,name,language,document_date,source_url,scraped_timestamp,unofficial_text,other
0,"SC 2019, c 10",A-0.6,LEGISLATION-FED,2019,Accessible Canada Act,en,2019-06-21,https://github.com/justicecanada/laws-lois-xml...,2024-04-19,"# Accessible Canada Act\n\nSC 2019, c 10\n\nAn...",
1,"SC 2018, c 27, s 675",A-1.3,LEGISLATION-FED,2018,Addition of Lands to Reserves and Reserve Crea...,en,2018-12-13,https://github.com/justicecanada/laws-lois-xml...,2024-04-19,# Addition of Lands to Reserves and Reserve Cr...,
2,"SC 2014, c 20, s 376",A-1.5,LEGISLATION-FED,2014,Administrative Tribunals Support Service of Ca...,en,2014-06-19,https://github.com/justicecanada/laws-lois-xml...,2024-04-19,# Administrative Tribunals Support Service of ...,
3,"RSC 1985, c A-1",A-1,LEGISLATION-FED,1988,Access to Information Act,en,1988-12-12,https://github.com/justicecanada/laws-lois-xml...,2024-04-19,"# Access to Information Act\n\nRSC 1985, c A-1...",
4,"RSC 1985, c 35 (4th Supp)",A-10.1,LEGISLATION-FED,1989,Air Canada Public Participation Act,en,1989-11-01,https://github.com/justicecanada/laws-lois-xml...,2024-04-19,# Air Canada Public Participation Act\n\nRSC 1...,
...,...,...,...,...,...,...,...,...,...,...,...
1869,"LRC 1985, c Y-4",Y-4,LEGISLATION-FED,1988,Loi sur l'extraction du quartz dans le Yukon,fr,1988-12-12,https://github.com/justicecanada/laws-lois-xml...,2024-04-19,# Loi sur l'extraction du quartz dans le Yukon...,
1870,"LC 1992, c 1",Z-0.91,LEGISLATION-FED,1992,Loi corrective de 1991,fr,1992-02-28,https://github.com/justicecanada/laws-lois-xml...,2024-04-19,"# Loi corrective de 1991\n\nLC 1992, c 1\n\nLo...",
1871,"SRC 1952, c 89",Z-0.98,LEGISLATION-FED,1952,Loi fédérale sur les droits successoraux,fr,1952-01-01,https://github.com/justicecanada/laws-lois-xml...,2024-04-19,# Loi fédérale sur les droits successoraux\n\n...,
1872,"SC 1950-51, c 2",Z-040,LEGISLATION-FED,1950,Loi de 1950 sur les forces canadiennes,fr,1950-09-09,https://github.com/justicecanada/laws-lois-xml...,2024-04-19,# Loi de 1950 sur les forces canadiennes\n\nSC...,


In [3]:
# language counts
df['language'].value_counts()

language
en    937
fr    937
Name: count, dtype: int64

In [4]:
# Yearly counts
year_counts = df.year.value_counts()
years_count = sorted(year_counts.index)
for year_count in years_count:
    print(f'{year_count}: {year_counts[year_count]}')


1870: 2
1871: 2
1882: 2
1908: 2
1909: 2
1911: 2
1912: 2
1916: 2
1919: 2
1920: 4
1921: 2
1924: 2
1927: 2
1928: 8
1929: 2
1930: 8
1931: 2
1934: 4
1936: 2
1943: 2
1947: 2
1948: 2
1950: 2
1952: 12
1958: 4
1959: 4
1960: 2
1961: 2
1963: 2
1964: 4
1970: 2
1971: 38
1972: 2
1973: 2
1974: 2
1975: 2
1976: 4
1977: 4
1978: 2
1979: 4
1980: 4
1981: 2
1982: 2
1983: 4
1984: 10
1985: 18
1986: 16
1987: 14
1988: 522
1989: 62
1990: 18
1991: 48
1992: 46
1993: 28
1994: 34
1995: 34
1996: 32
1997: 40
1998: 28
1999: 22
2000: 32
2001: 24
2002: 52
2003: 30
2004: 20
2005: 66
2006: 22
2007: 24
2008: 32
2009: 24
2010: 18
2011: 18
2012: 34
2013: 38
2014: 48
2015: 28
2016: 4
2017: 42
2018: 36
2019: 46
2020: 14
2021: 24
2022: 30
2023: 26
2024: 4
