### A2AJ Canadian Legal Data Downloaded Via Parquet Files

Documentation:

- Canadian Case Law Dataset: https://huggingface.co/datasets/a2aj/canadian-case-law
- Canadian Laws Dataset: https://huggingface.co/datasets/a2aj/canadian-laws
- Pandas Documentation: https://pandas.pydata.org/docs/

Parquet is a columnar storage format that provides efficient data compression and encoding schemes, making it ideal for storing and transferring large datasets. By accessing the A2AJ Canadian Legal Data directly through Parquet files, you can quickly download and load specific datasets into pandas DataFrames with just a few lines of code. This approach gives you direct control over which datasets to download and when, without requiring additional libraries beyond pandas.

The A2AJ (Access to Access to Justice) Canadian Legal Data provides bulk access to Canadian legal documents, including court decisions from various courts and tribunals, as well as legislation and regulations. This free resource contains the full text of legal documents along with metadata such as dates, citations, and case names. The Parquet files are hosted on Hugging Face and optimized for efficient storage and fast loading.

This notebook demonstrates how to download and work with the A2AJ Canadian Legal Data using direct Parquet file access. You'll learn how to load data from specific courts or tribunals (like the Supreme Court of Canada), combine multiple datasets into a single DataFrame, and access different types of legal documents including court decisions, federal legislation, and regulations. The examples show the URL patterns for accessing different datasets and how to efficiently load them into pandas for analysis.

Important Note: This direct Parquet download approach may be appropriate when you want simple, straightforward access to specific datasets without additional dependencies. It's particularly useful for one-time downloads or when working in environments where you want minimal package requirements. However, if you need streaming capabilities for very large datasets, automatic caching, or more advanced data processing features, consider using the Hugging Face Datasets library as described in the accompanying Datasets notebook. For searching and retrieving specific documents rather than bulk analysis, the A2AJ API (described in the API notebook) may be more appropriate.

### Setup

In [1]:
# # install required packages if not already installed
# !pip install pandas
import pandas as pd

### Load all decisions for a specific court / tribunal

Current options are:

| Dataset | Court / Tribunal / Reporter |
|------|----------------------------|
| SCC | Supreme Court of Canada |
| FCA | Federal Court of Appeal |
| FC | Federal Court |
| TCC | Tax Court of Canada |
| CMAC | Court Martial Appeal Court of Canada |
| CHRT | Canadian Human Rights Tribunal |
| SST | Social Security Tribunal of Canada |
| RPD | Refugee Protection Division (IRB) |
| RAD | Refugee Appeal Division (IRB) |
| RLLR | Refugee Law Lab Reporter (RPD, IRB) |
| ONCA | Ontario Court of Appeal |

In [2]:
# load all decisiosn for a specific court / tribunal (e.g. Supreme Court of Canada, which is "SCC")

# setup the URL to the parquet file
url = "https://huggingface.co/datasets/a2aj/canadian-case-law/resolve/main/SCC/train.parquet"

# download the parquet file into a pandas df
df = pd.read_parquet(url)
df.head(5)

Unnamed: 0,dataset,citation_en,citation2_en,name_en,document_date_en,url_en,scraped_timestamp_en,unofficial_text_en,citation_fr,citation2_fr,name_fr,document_date_fr,url_fr,scraped_timestamp_fr,unofficial_text_fr,upstream_license
0,SCC,[1958] SCR 425,[1958] SCR 425,The Queen v. Laboratoires Marois Limitée,1958-06-03 00:00:00+00:00,https://decisions.scc-csc.ca/scc-csc/scc-csc/e...,2022-08-31 17:46:18.027000+00:00,The Queen v. Laboratoires Marois Limitée\nColl...,,,,NaT,,NaT,,"See upstream license, including non-commercial..."
1,SCC,[1958] SCR 603,[1958] SCR 603,Lattoni and Corbo v. The Queen,1958-06-26 00:00:00+00:00,https://decisions.scc-csc.ca/scc-csc/scc-csc/e...,2022-08-31 17:44:52.934000+00:00,Lattoni and Corbo v. The Queen\nCollection\nSu...,,,,NaT,,NaT,,"See upstream license, including non-commercial..."
2,SCC,[1958] SCR 608,[1958] SCR 608,Validity of Section 92 (4) of The Vehicles Act...,1958-10-07 00:00:00+00:00,https://decisions.scc-csc.ca/scc-csc/scc-csc/e...,2022-08-31 17:44:08.293000+00:00,Validity of Section 92 (4) of The Vehicles Act...,,,,NaT,,NaT,,"See upstream license, including non-commercial..."
3,SCC,[1958] SCR 61,[1958] SCR 61,"Composers, Authors and Publishers Association ...",1957-12-19 00:00:00+00:00,https://decisions.scc-csc.ca/scc-csc/scc-csc/e...,2022-08-31 17:32:10.430000+00:00,"Composers, Authors and Publishers Association ...",,,,NaT,,NaT,,"See upstream license, including non-commercial..."
4,SCC,[1958] SCR 65,[1958] SCR 65,The City of Westmount v. Montreal Transporatio...,1957-12-19 00:00:00+00:00,https://decisions.scc-csc.ca/scc-csc/scc-csc/e...,2022-08-31 17:32:46.015000+00:00,The City of Westmount v. Montreal Transporatio...,,,,NaT,,NaT,,"See upstream license, including non-commercial..."


#### Load all cases for all courts / tribunals

NOTE: If RAM limited consider streamining data via Hugging Face Datasets rather than loading via parquet all at once: https://huggingface.co/docs/datasets/en/stream


In [3]:
# setup the URLs to the parquet file

datasets = ["SCC",
            "FCA",
            "FC",
            "TCC",
            "CMAC",
            "CHRT",
            "SST",
            "RPD",
            "RAD",
            "RLLR",
            "ONCA"
]

url_prefix = "https://huggingface.co/datasets/a2aj/canadian-case-law/resolve/main/"

# download the parquet files into a pandas df
df = None
for dataset in datasets:
    url = f"{url_prefix}{dataset}/train.parquet"
    print(f"Downloading {dataset} data from {url}")
    df_temp = pd.read_parquet(url)
    
    if df is None:
        df = df_temp
    else:
        df = pd.concat([df, df_temp], ignore_index=True)
df.head(5)

Downloading SCC data from https://huggingface.co/datasets/a2aj/canadian-case-law/resolve/main/SCC/train.parquet
Downloading FCA data from https://huggingface.co/datasets/a2aj/canadian-case-law/resolve/main/FCA/train.parquet
Downloading FC data from https://huggingface.co/datasets/a2aj/canadian-case-law/resolve/main/FC/train.parquet
Downloading TCC data from https://huggingface.co/datasets/a2aj/canadian-case-law/resolve/main/TCC/train.parquet
Downloading CMAC data from https://huggingface.co/datasets/a2aj/canadian-case-law/resolve/main/CMAC/train.parquet
Downloading CHRT data from https://huggingface.co/datasets/a2aj/canadian-case-law/resolve/main/CHRT/train.parquet
Downloading SST data from https://huggingface.co/datasets/a2aj/canadian-case-law/resolve/main/SST/train.parquet
Downloading RPD data from https://huggingface.co/datasets/a2aj/canadian-case-law/resolve/main/RPD/train.parquet
Downloading RAD data from https://huggingface.co/datasets/a2aj/canadian-case-law/resolve/main/RAD/trai

Unnamed: 0,dataset,citation_en,citation2_en,name_en,document_date_en,url_en,scraped_timestamp_en,unofficial_text_en,citation_fr,citation2_fr,name_fr,document_date_fr,url_fr,scraped_timestamp_fr,unofficial_text_fr,upstream_license
0,SCC,[1958] SCR 425,[1958] SCR 425,The Queen v. Laboratoires Marois Limitée,1958-06-03 00:00:00+00:00,https://decisions.scc-csc.ca/scc-csc/scc-csc/e...,2022-08-31 17:46:18.027000+00:00,The Queen v. Laboratoires Marois Limitée\nColl...,,,,NaT,,NaT,,"See upstream license, including non-commercial..."
1,SCC,[1958] SCR 603,[1958] SCR 603,Lattoni and Corbo v. The Queen,1958-06-26 00:00:00+00:00,https://decisions.scc-csc.ca/scc-csc/scc-csc/e...,2022-08-31 17:44:52.934000+00:00,Lattoni and Corbo v. The Queen\nCollection\nSu...,,,,NaT,,NaT,,"See upstream license, including non-commercial..."
2,SCC,[1958] SCR 608,[1958] SCR 608,Validity of Section 92 (4) of The Vehicles Act...,1958-10-07 00:00:00+00:00,https://decisions.scc-csc.ca/scc-csc/scc-csc/e...,2022-08-31 17:44:08.293000+00:00,Validity of Section 92 (4) of The Vehicles Act...,,,,NaT,,NaT,,"See upstream license, including non-commercial..."
3,SCC,[1958] SCR 61,[1958] SCR 61,"Composers, Authors and Publishers Association ...",1957-12-19 00:00:00+00:00,https://decisions.scc-csc.ca/scc-csc/scc-csc/e...,2022-08-31 17:32:10.430000+00:00,"Composers, Authors and Publishers Association ...",,,,NaT,,NaT,,"See upstream license, including non-commercial..."
4,SCC,[1958] SCR 65,[1958] SCR 65,The City of Westmount v. Montreal Transporatio...,1957-12-19 00:00:00+00:00,https://decisions.scc-csc.ca/scc-csc/scc-csc/e...,2022-08-31 17:32:46.015000+00:00,The City of Westmount v. Montreal Transporatio...,,,,NaT,,NaT,,"See upstream license, including non-commercial..."


#### Load specific set of legislation / regulations

Current options are:

| Dataset | Type |
|---------|------|
| LEGISLATION-FED | Federal Legislation (Acts) |
| REGULATIONS-FED | Federal Regulations |


In [4]:
# load specific type of laws (e.g. Federal Legislation)

# setup the URL to the parquet file
url = "https://huggingface.co/datasets/a2aj/canadian-laws/resolve/main/LEGISLATION-FED/train.parquet"

# download the parquet file into a pandas df
df = pd.read_parquet(url)
df.head(5)

Unnamed: 0,citation_en,citation2_en,dataset,name_en,document_date_en,source_url_en,scraped_timestamp_en,unofficial_text_en,unofficial_sections_en,num_sections_en,citation_fr,citation2_fr,name_fr,document_date_fr,source_url_fr,scraped_timestamp_fr,unofficial_text_fr,unofficial_sections_fr,num_sections_fr,upstream_license
0,"SC 2019, c 10",A-0.6,LEGISLATION-FED,Accessible Canada Act,2019-06-21 00:00:00+00:00,https://github.com/justicecanada/laws-lois-xml...,2025-07-29 00:00:00+00:00,"# Accessible Canada Act\n\nSC 2019, c 10\n\nAn...","{""1"": ""Short title This Act may be cited as th...",209,"LC 2019, c 10",A-0.6,Loi canadienne sur l’accessibilité,2019-06-21 00:00:00+00:00,https://github.com/justicecanada/laws-lois-xml...,2025-07-29 00:00:00+00:00,# Loi canadienne sur l’accessibilité\n\nLC 201...,"{""1"": ""Titre abrégé Loi canadienne sur l’acces...",209,"See upstream license, including requirements r..."
1,"RSC 1985, c A-1",A-1,LEGISLATION-FED,Access to Information Act,1988-12-12 00:00:00+00:00,https://github.com/justicecanada/laws-lois-xml...,2025-07-29 00:00:00+00:00,"# Access to Information Act\n\nRSC 1985, c A-1...","{""1"": ""Short title This Act may be cited as th...",172,"LRC 1985, c A-1",A-1,Loi sur l’accès à l’information,1988-12-12 00:00:00+00:00,https://github.com/justicecanada/laws-lois-xml...,2025-07-29 00:00:00+00:00,"# Loi sur l’accès à l’information\n\nLRC 1985,...","{""1"": ""Titre abrégé Loi sur l’accès à l’inform...",172,"See upstream license, including requirements r..."
2,"SC 2018, c 27, s 675",A-1.3,LEGISLATION-FED,Addition of Lands to Reserves and Reserve Crea...,2018-12-13 00:00:00+00:00,https://github.com/justicecanada/laws-lois-xml...,2025-07-29 00:00:00+00:00,# Addition of Lands to Reserves and Reserve Cr...,"{""1"": ""Short title This Act may be cited as th...",8,"LC 2018, c 27, art 675",A-1.3,Loi sur l’ajout de terres aux réserves et la c...,2018-12-13 00:00:00+00:00,https://github.com/justicecanada/laws-lois-xml...,2025-07-29 00:00:00+00:00,# Loi sur l’ajout de terres aux réserves et la...,"{""1"": ""Titre abrégé Loi sur l’ajout de terres ...",8,"See upstream license, including requirements r..."
3,"SC 2014, c 20, s 376",A-1.5,LEGISLATION-FED,Administrative Tribunals Support Service of Ca...,2014-06-19 00:00:00+00:00,https://github.com/justicecanada/laws-lois-xml...,2025-07-29 00:00:00+00:00,# Administrative Tribunals Support Service of ...,"{""1"": ""Short title This Act may be cited as th...",18,"LC 2014, c 20, art 376",A-1.5,Loi sur le Service canadien d’appui aux tribun...,2014-06-19 00:00:00+00:00,https://github.com/justicecanada/laws-lois-xml...,2025-07-29 00:00:00+00:00,# Loi sur le Service canadien d’appui aux trib...,"{""1"": ""Titre abrégé Loi sur le Service canadie...",18,"See upstream license, including requirements r..."
4,"RSC 1985, c 35 (4th Supp)",A-10.1,LEGISLATION-FED,Air Canada Public Participation Act,1989-11-01 00:00:00+00:00,https://github.com/justicecanada/laws-lois-xml...,2025-07-29 00:00:00+00:00,# Air Canada Public Participation Act\n\nRSC 1...,"{""1"": ""Short title This Act may be cited as th...",16,"LRC 1985, c 35 (4e suppl)",A-10.1,Loi sur la participation publique au capital d...,1989-11-01 00:00:00+00:00,https://github.com/justicecanada/laws-lois-xml...,2025-07-29 00:00:00+00:00,# Loi sur la participation publique au capital...,"{""1"": ""Titre abrégé Loi sur la participation p...",16,"See upstream license, including requirements r..."


#### Load all legislation / regulations

In [5]:
# setup the URLs to the parquet file

datasets = ["LEGISLATION-FED",
            "REGULATIONS-FED"
]

url_prefix = "https://huggingface.co/datasets/a2aj/canadian-laws/resolve/main/"

# download the parquet files into a pandas df
df = None
for dataset in datasets:
    url = f"{url_prefix}{dataset}/train.parquet"
    print(f"Downloading {dataset} data from {url}")
    df_temp = pd.read_parquet(url)
    
    if df is None:
        df = df_temp
    else:
        df = pd.concat([df, df_temp], ignore_index=True)
df.head(5)

Downloading LEGISLATION-FED data from https://huggingface.co/datasets/a2aj/canadian-laws/resolve/main/LEGISLATION-FED/train.parquet
Downloading REGULATIONS-FED data from https://huggingface.co/datasets/a2aj/canadian-laws/resolve/main/REGULATIONS-FED/train.parquet


Unnamed: 0,citation_en,citation2_en,dataset,name_en,document_date_en,source_url_en,scraped_timestamp_en,unofficial_text_en,unofficial_sections_en,num_sections_en,citation_fr,citation2_fr,name_fr,document_date_fr,source_url_fr,scraped_timestamp_fr,unofficial_text_fr,unofficial_sections_fr,num_sections_fr,upstream_license
0,"SC 2019, c 10",A-0.6,LEGISLATION-FED,Accessible Canada Act,2019-06-21 00:00:00+00:00,https://github.com/justicecanada/laws-lois-xml...,2025-07-29 00:00:00+00:00,"# Accessible Canada Act\n\nSC 2019, c 10\n\nAn...","{""1"": ""Short title This Act may be cited as th...",209,"LC 2019, c 10",A-0.6,Loi canadienne sur l’accessibilité,2019-06-21 00:00:00+00:00,https://github.com/justicecanada/laws-lois-xml...,2025-07-29 00:00:00+00:00,# Loi canadienne sur l’accessibilité\n\nLC 201...,"{""1"": ""Titre abrégé Loi canadienne sur l’acces...",209,"See upstream license, including requirements r..."
1,"RSC 1985, c A-1",A-1,LEGISLATION-FED,Access to Information Act,1988-12-12 00:00:00+00:00,https://github.com/justicecanada/laws-lois-xml...,2025-07-29 00:00:00+00:00,"# Access to Information Act\n\nRSC 1985, c A-1...","{""1"": ""Short title This Act may be cited as th...",172,"LRC 1985, c A-1",A-1,Loi sur l’accès à l’information,1988-12-12 00:00:00+00:00,https://github.com/justicecanada/laws-lois-xml...,2025-07-29 00:00:00+00:00,"# Loi sur l’accès à l’information\n\nLRC 1985,...","{""1"": ""Titre abrégé Loi sur l’accès à l’inform...",172,"See upstream license, including requirements r..."
2,"SC 2018, c 27, s 675",A-1.3,LEGISLATION-FED,Addition of Lands to Reserves and Reserve Crea...,2018-12-13 00:00:00+00:00,https://github.com/justicecanada/laws-lois-xml...,2025-07-29 00:00:00+00:00,# Addition of Lands to Reserves and Reserve Cr...,"{""1"": ""Short title This Act may be cited as th...",8,"LC 2018, c 27, art 675",A-1.3,Loi sur l’ajout de terres aux réserves et la c...,2018-12-13 00:00:00+00:00,https://github.com/justicecanada/laws-lois-xml...,2025-07-29 00:00:00+00:00,# Loi sur l’ajout de terres aux réserves et la...,"{""1"": ""Titre abrégé Loi sur l’ajout de terres ...",8,"See upstream license, including requirements r..."
3,"SC 2014, c 20, s 376",A-1.5,LEGISLATION-FED,Administrative Tribunals Support Service of Ca...,2014-06-19 00:00:00+00:00,https://github.com/justicecanada/laws-lois-xml...,2025-07-29 00:00:00+00:00,# Administrative Tribunals Support Service of ...,"{""1"": ""Short title This Act may be cited as th...",18,"LC 2014, c 20, art 376",A-1.5,Loi sur le Service canadien d’appui aux tribun...,2014-06-19 00:00:00+00:00,https://github.com/justicecanada/laws-lois-xml...,2025-07-29 00:00:00+00:00,# Loi sur le Service canadien d’appui aux trib...,"{""1"": ""Titre abrégé Loi sur le Service canadie...",18,"See upstream license, including requirements r..."
4,"RSC 1985, c 35 (4th Supp)",A-10.1,LEGISLATION-FED,Air Canada Public Participation Act,1989-11-01 00:00:00+00:00,https://github.com/justicecanada/laws-lois-xml...,2025-07-29 00:00:00+00:00,# Air Canada Public Participation Act\n\nRSC 1...,"{""1"": ""Short title This Act may be cited as th...",16,"LRC 1985, c 35 (4e suppl)",A-10.1,Loi sur la participation publique au capital d...,1989-11-01 00:00:00+00:00,https://github.com/justicecanada/laws-lois-xml...,2025-07-29 00:00:00+00:00,# Loi sur la participation publique au capital...,"{""1"": ""Titre abrégé Loi sur la participation p...",16,"See upstream license, including requirements r..."
