### Notebook to load and analyze Canadian Federal Regulations

Sean Rehaag

License: Creative Commons Attribution-NonCommercial 4.0 International [(CC BY-NC 4.0)](https://creativecommons.org/licenses/by-nc/4.0/). NOTE: Users must also comply with upstream [licensing](https://www.justice.gc.ca/eng/terms-avis/index.html) for the data source.

Dataset & Code to be cited as: 

    Sean Rehaag, "Federal Regulations Bulk Decisions Dataset" (2024), online: Refugee Law Laboratory <https://refugeelab.ca/bulk-data/regulations-fed/>.

Notes:

(1) Data Source: [Department of Justice Github](https://github.com/justicecanada/laws-lois-xml) & [Department of Justice Website](https://laws-lois.justice.gc.ca).

(2) Unofficial Data: The data are unofficial reproductions of materials available on the Department Justice's Consolidated Acts and Regulations of Canada website. Official versions are available [here](https://laws-lois.justice.gc.ca/eng/regulations/).

(3) Non-Affiliation / Endorsement: The data has been collected and reproduced without any affiliation or endorsement from the Government of Canada.

(4) Non-Commerical Use: As indicated in the license, data may be used for non-commercial use (with attribution) only. For commercial use, see the Department of Justice website's [Terms of Use](https://www.justice.gc.ca/eng/terms-avis/index.html).

(5) Accuracy: Data was collected and processed programmatically for the purposes of academic research. While we make best efforts to ensure accuracy, data gathering of this kind inevitably involves errors. As such the data should be viewed as preliminary information aimed to prompt further research and discussion, rather than as definitive information. 

### Requirements:

    pip install pandas

### If using parquet

    pip install pyarrow

### if loading remotely (other than via Hugging Face)
    
    pip install requests

### If loading remotely via Hugging Face

    pip install datasets
    

(Written on Python 3.11.4)


### Load Data

Four Options: Local & Remote

In [1]:
# OPTION 1: Load Hugging Face dataset

from datasets import load_dataset
import pandas as pd

dataset = load_dataset("refugee-law-lab/canadian-legal-data", "REGULATIONS-FED", split="train")

# convert to dataframe
df = pd.DataFrame(dataset)
df


Downloading data:   0%|          | 0.00/38.7M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/9354 [00:00<?, ? examples/s]

Unnamed: 0,citation,citation2,dataset,year,name,language,document_date,source_url,scraped_timestamp,unofficial_text,other
0,"CRC, c 10",,REGULATIONS-FED,1979,Flying Accidents Compensation Regulations,en,1979-08-15,https://github.com/justicecanada/laws-lois-xml...,2024-05-23,# Flying Accidents Compensation Regulations\n\...,
1,"CRC, c 100",,REGULATIONS-FED,1979,Ottawa International Airport Zoning Regulations,en,1979-08-15,https://github.com/justicecanada/laws-lois-xml...,2024-05-23,# Ottawa International Airport Zoning Regulati...,
2,"CRC, c 101",,REGULATIONS-FED,1979,Penticton Airport Zoning Regulations,en,1979-08-15,https://github.com/justicecanada/laws-lois-xml...,2024-05-23,"# Penticton Airport Zoning Regulations\n\nCRC,...",
3,"CRC, c 1013",,REGULATIONS-FED,1979,Canada Industrial Relations Remuneration Regul...,en,1979-08-15,https://github.com/justicecanada/laws-lois-xml...,2024-05-23,# Canada Industrial Relations Remuneration Reg...,
4,"CRC, c 1015",,REGULATIONS-FED,1979,Fair Wages and Hours of Labour Regulations,en,1979-08-15,https://github.com/justicecanada/laws-lois-xml...,2024-05-23,# Fair Wages and Hours of Labour Regulations\n...,
...,...,...,...,...,...,...,...,...,...,...,...
9349,TR/99-80,,REGULATIONS-FED,1999,Décret de remise visant le directeur exécutif ...,fr,1999-8-18,https://github.com/justicecanada/laws-lois-xml...,2024-05-23,# Décret de remise visant le directeur exécuti...,
9350,TR/99-81,,REGULATIONS-FED,1999,Décret de remise visant le directeur de la Com...,fr,1999-8-18,https://github.com/justicecanada/laws-lois-xml...,2024-05-23,# Décret de remise visant le directeur de la C...,
9351,TR/99-82,,REGULATIONS-FED,1999,Décret de remise visant Télésat Canada,fr,1999-8-18,https://github.com/justicecanada/laws-lois-xml...,2024-05-23,# Décret de remise visant Télésat Canada\n\nTR...,
9352,TR/99-9,,REGULATIONS-FED,1999,Décret sur la renonciation aux terres réservée...,fr,1999-2-3,https://github.com/justicecanada/laws-lois-xml...,2024-05-23,# Décret sur la renonciation aux terres réserv...,


In [2]:
# OPTION 2: Load parquet data remotely from Huggingface without cloning repo
import pandas as pd
import requests
from io import BytesIO

url = 'https://huggingface.co/datasets/refugee-law-lab/canadian-legal-data/resolve/main/REGULATIONS-FED/train.parquet'

# load data
results = requests.get(url)

# convert to dataframe
df = pd.read_parquet(BytesIO(results.content))

df
# (if code fails, add engine='pyarrow' to read_parquet() function)

Unnamed: 0,citation,citation2,dataset,year,name,language,document_date,source_url,scraped_timestamp,unofficial_text,other
0,"CRC, c 10",,REGULATIONS-FED,1979,Flying Accidents Compensation Regulations,en,1979-08-15,https://github.com/justicecanada/laws-lois-xml...,2024-05-23,# Flying Accidents Compensation Regulations\n\...,
1,"CRC, c 100",,REGULATIONS-FED,1979,Ottawa International Airport Zoning Regulations,en,1979-08-15,https://github.com/justicecanada/laws-lois-xml...,2024-05-23,# Ottawa International Airport Zoning Regulati...,
2,"CRC, c 101",,REGULATIONS-FED,1979,Penticton Airport Zoning Regulations,en,1979-08-15,https://github.com/justicecanada/laws-lois-xml...,2024-05-23,"# Penticton Airport Zoning Regulations\n\nCRC,...",
3,"CRC, c 1013",,REGULATIONS-FED,1979,Canada Industrial Relations Remuneration Regul...,en,1979-08-15,https://github.com/justicecanada/laws-lois-xml...,2024-05-23,# Canada Industrial Relations Remuneration Reg...,
4,"CRC, c 1015",,REGULATIONS-FED,1979,Fair Wages and Hours of Labour Regulations,en,1979-08-15,https://github.com/justicecanada/laws-lois-xml...,2024-05-23,# Fair Wages and Hours of Labour Regulations\n...,
...,...,...,...,...,...,...,...,...,...,...,...
9349,TR/99-80,,REGULATIONS-FED,1999,Décret de remise visant le directeur exécutif ...,fr,1999-8-18,https://github.com/justicecanada/laws-lois-xml...,2024-05-23,# Décret de remise visant le directeur exécuti...,
9350,TR/99-81,,REGULATIONS-FED,1999,Décret de remise visant le directeur de la Com...,fr,1999-8-18,https://github.com/justicecanada/laws-lois-xml...,2024-05-23,# Décret de remise visant le directeur de la C...,
9351,TR/99-82,,REGULATIONS-FED,1999,Décret de remise visant Télésat Canada,fr,1999-8-18,https://github.com/justicecanada/laws-lois-xml...,2024-05-23,# Décret de remise visant Télésat Canada\n\nTR...,
9352,TR/99-9,,REGULATIONS-FED,1999,Décret sur la renonciation aux terres réservée...,fr,1999-2-3,https://github.com/justicecanada/laws-lois-xml...,2024-05-23,# Décret sur la renonciation aux terres réserv...,


In [3]:
# OPTION 3: Load json data remotely from GitHub without cloning repo

import pandas as pd

# load english data
url = 'https://raw.githubusercontent.com/Refugee-Law-Lab/regulations-fed-bulk-data/main/DATA/df_regs_en.json'
df = pd.read_json(url, orient='records', lines=True)

# load french data
url = 'https://raw.githubusercontent.com/Refugee-Law-Lab/regulations-fed-bulk-data/main/DATA/df_regs_fr.json'
df2 = pd.read_json(url, orient='records', lines=True)

#combine both dataframes
df = pd.concat([df, df2], ignore_index=True)

df



Unnamed: 0,citation,citation2,dataset,year,name,language,document_date,source_url,scraped_timestamp,unofficial_text,other
0,"CRC, c 10",,REGULATIONS-FED,1979,Flying Accidents Compensation Regulations,en,1979-08-15,https://github.com/justicecanada/laws-lois-xml...,2024-05-23,# Flying Accidents Compensation Regulations\n\...,
1,"CRC, c 100",,REGULATIONS-FED,1979,Ottawa International Airport Zoning Regulations,en,1979-08-15,https://github.com/justicecanada/laws-lois-xml...,2024-05-23,# Ottawa International Airport Zoning Regulati...,
2,"CRC, c 101",,REGULATIONS-FED,1979,Penticton Airport Zoning Regulations,en,1979-08-15,https://github.com/justicecanada/laws-lois-xml...,2024-05-23,"# Penticton Airport Zoning Regulations\n\nCRC,...",
3,"CRC, c 1013",,REGULATIONS-FED,1979,Canada Industrial Relations Remuneration Regul...,en,1979-08-15,https://github.com/justicecanada/laws-lois-xml...,2024-05-23,# Canada Industrial Relations Remuneration Reg...,
4,"CRC, c 1015",,REGULATIONS-FED,1979,Fair Wages and Hours of Labour Regulations,en,1979-08-15,https://github.com/justicecanada/laws-lois-xml...,2024-05-23,# Fair Wages and Hours of Labour Regulations\n...,
...,...,...,...,...,...,...,...,...,...,...,...
9349,TR/99-80,,REGULATIONS-FED,1999,Décret de remise visant le directeur exécutif ...,fr,1999-8-18,https://github.com/justicecanada/laws-lois-xml...,2024-05-23,# Décret de remise visant le directeur exécuti...,
9350,TR/99-81,,REGULATIONS-FED,1999,Décret de remise visant le directeur de la Com...,fr,1999-8-18,https://github.com/justicecanada/laws-lois-xml...,2024-05-23,# Décret de remise visant le directeur de la C...,
9351,TR/99-82,,REGULATIONS-FED,1999,Décret de remise visant Télésat Canada,fr,1999-8-18,https://github.com/justicecanada/laws-lois-xml...,2024-05-23,# Décret de remise visant Télésat Canada\n\nTR...,
9352,TR/99-9,,REGULATIONS-FED,1999,Décret sur la renonciation aux terres réservée...,fr,1999-2-3,https://github.com/justicecanada/laws-lois-xml...,2024-05-23,# Décret sur la renonciation aux terres réserv...,


In [4]:
# OPTION 4: Load json data locally via cloned repo

# First, clone git repo to local machine
# Then run this code to load data

import pandas as pd

# load english data
file_path = 'DATA/df_regs_en.json'
df = pd.read_json(file_path, orient='records', lines=True)

# load french data
file_path = 'DATA/df_regs_fr.json'
df2 = pd.read_json(file_path, orient='records', lines=True)

#combine both dataframes
df = pd.concat([df, df2], ignore_index=True)

df

Unnamed: 0,citation,citation2,dataset,year,name,language,document_date,source_url,scraped_timestamp,unofficial_text,other
0,"CRC, c 10",,REGULATIONS-FED,1979,Flying Accidents Compensation Regulations,en,1979-08-15,https://github.com/justicecanada/laws-lois-xml...,2024-05-23,# Flying Accidents Compensation Regulations\n\...,
1,"CRC, c 100",,REGULATIONS-FED,1979,Ottawa International Airport Zoning Regulations,en,1979-08-15,https://github.com/justicecanada/laws-lois-xml...,2024-05-23,# Ottawa International Airport Zoning Regulati...,
2,"CRC, c 101",,REGULATIONS-FED,1979,Penticton Airport Zoning Regulations,en,1979-08-15,https://github.com/justicecanada/laws-lois-xml...,2024-05-23,"# Penticton Airport Zoning Regulations\n\nCRC,...",
3,"CRC, c 1013",,REGULATIONS-FED,1979,Canada Industrial Relations Remuneration Regul...,en,1979-08-15,https://github.com/justicecanada/laws-lois-xml...,2024-05-23,# Canada Industrial Relations Remuneration Reg...,
4,"CRC, c 1015",,REGULATIONS-FED,1979,Fair Wages and Hours of Labour Regulations,en,1979-08-15,https://github.com/justicecanada/laws-lois-xml...,2024-05-23,# Fair Wages and Hours of Labour Regulations\n...,
...,...,...,...,...,...,...,...,...,...,...,...
9349,TR/99-80,,REGULATIONS-FED,1999,Décret de remise visant le directeur exécutif ...,fr,1999-8-18,https://github.com/justicecanada/laws-lois-xml...,2024-05-23,# Décret de remise visant le directeur exécuti...,
9350,TR/99-81,,REGULATIONS-FED,1999,Décret de remise visant le directeur de la Com...,fr,1999-8-18,https://github.com/justicecanada/laws-lois-xml...,2024-05-23,# Décret de remise visant le directeur de la C...,
9351,TR/99-82,,REGULATIONS-FED,1999,Décret de remise visant Télésat Canada,fr,1999-8-18,https://github.com/justicecanada/laws-lois-xml...,2024-05-23,# Décret de remise visant Télésat Canada\n\nTR...,
9352,TR/99-9,,REGULATIONS-FED,1999,Décret sur la renonciation aux terres réservée...,fr,1999-2-3,https://github.com/justicecanada/laws-lois-xml...,2024-05-23,# Décret sur la renonciation aux terres réserv...,


### Analyze Data

In [5]:
# View dataframe
df

Unnamed: 0,citation,citation2,dataset,year,name,language,document_date,source_url,scraped_timestamp,unofficial_text,other
0,"CRC, c 10",,REGULATIONS-FED,1979,Flying Accidents Compensation Regulations,en,1979-08-15,https://github.com/justicecanada/laws-lois-xml...,2024-05-23,# Flying Accidents Compensation Regulations\n\...,
1,"CRC, c 100",,REGULATIONS-FED,1979,Ottawa International Airport Zoning Regulations,en,1979-08-15,https://github.com/justicecanada/laws-lois-xml...,2024-05-23,# Ottawa International Airport Zoning Regulati...,
2,"CRC, c 101",,REGULATIONS-FED,1979,Penticton Airport Zoning Regulations,en,1979-08-15,https://github.com/justicecanada/laws-lois-xml...,2024-05-23,"# Penticton Airport Zoning Regulations\n\nCRC,...",
3,"CRC, c 1013",,REGULATIONS-FED,1979,Canada Industrial Relations Remuneration Regul...,en,1979-08-15,https://github.com/justicecanada/laws-lois-xml...,2024-05-23,# Canada Industrial Relations Remuneration Reg...,
4,"CRC, c 1015",,REGULATIONS-FED,1979,Fair Wages and Hours of Labour Regulations,en,1979-08-15,https://github.com/justicecanada/laws-lois-xml...,2024-05-23,# Fair Wages and Hours of Labour Regulations\n...,
...,...,...,...,...,...,...,...,...,...,...,...
9349,TR/99-80,,REGULATIONS-FED,1999,Décret de remise visant le directeur exécutif ...,fr,1999-8-18,https://github.com/justicecanada/laws-lois-xml...,2024-05-23,# Décret de remise visant le directeur exécuti...,
9350,TR/99-81,,REGULATIONS-FED,1999,Décret de remise visant le directeur de la Com...,fr,1999-8-18,https://github.com/justicecanada/laws-lois-xml...,2024-05-23,# Décret de remise visant le directeur de la C...,
9351,TR/99-82,,REGULATIONS-FED,1999,Décret de remise visant Télésat Canada,fr,1999-8-18,https://github.com/justicecanada/laws-lois-xml...,2024-05-23,# Décret de remise visant Télésat Canada\n\nTR...,
9352,TR/99-9,,REGULATIONS-FED,1999,Décret sur la renonciation aux terres réservée...,fr,1999-2-3,https://github.com/justicecanada/laws-lois-xml...,2024-05-23,# Décret sur la renonciation aux terres réserv...,


In [6]:
# language counts
df['language'].value_counts()

language
en    4677
fr    4677
Name: count, dtype: int64

In [7]:
# Yearly counts
year_counts = df.year.value_counts()
years_count = sorted(year_counts.index)
for year_count in years_count:
    print(f'{year_count}: {year_counts[year_count]}')


1945: 2
1951: 2
1954: 6
1955: 4
1956: 2
1957: 4
1958: 2
1960: 2
1961: 8
1962: 2
1964: 2
1965: 4
1966: 4
1967: 6
1970: 2
1972: 6
1973: 6
1974: 4
1975: 2
1976: 4
1977: 10
1978: 78
1979: 1214
1980: 96
1981: 126
1982: 102
1983: 132
1984: 86
1985: 114
1986: 182
1987: 140
1988: 206
1989: 136
1990: 220
1991: 136
1992: 258
1993: 268
1994: 204
1995: 218
1996: 188
1997: 202
1998: 222
1999: 182
2000: 160
2001: 318
2002: 206
2003: 260
2004: 172
2005: 190
2006: 242
2007: 160
2008: 158
2009: 112
2010: 164
2011: 182
2012: 200
2013: 166
2014: 178
2015: 148
2016: 200
2017: 160
2018: 228
2019: 270
2020: 136
2021: 198
2022: 122
2023: 172
2024: 58
