LexPredict Legal Dictionaries
Many academic and commercial applications of natural language processing and machine learning to text can benefit from a controlled lexicon of expert-selected terms (i.e., a dictionary). This is especially true of highly technical language, such as legal text. However, no open source and freely-available dictionaries of this nature have been available.
Our goal is to change this fact. Our open-source document analytics platform, ContraxSuite, is just one of the projects that will benefit. We expect many other researchers and companies to both benefit and contribute.
The contents and organization of this legal dictionary repository will evolve as the community participates and our products evolve. However, our vision is to work towards multi-domain, multi-lingual, and cross-lingual resources for legal and regulatory text.
- US GAAP
- UK GAAP
- US GASB
- US FASB
- IFRS FASB
- Common terms
- Geopolitical Actors and Bodies
- US Federal Regulators
- US State Regulators
- UK Regulators
- Australian Regulators
- Canadian Regulators
- Common Law (Black's 1910 Law Dictionary)
- Common Law Subset (1000 most frequent terms)
- United States Code
- US Code of Federal Regulations
- US Courts
- US Federal Acts
- US State Codes and Compiled Laws
- US State Courts
- Australian Courts
- Canadian Courts
- ICD (automated)
- CPT (automated)
- US Federal Hazardous Waste
- Multi-lingual (English, French, German, Spanish)
- Divisions (e.g., countries, states, provinces)
- German Courts
- Chemical Elements
- Chemical Compounds
LexPredict ContraxSuite - Software
This repository is also used by LexPredict ContraxSuite, our open-source contract analytics and document analytics platform.
The data in this repository is available under a Creative Commons Attribution Share Alike 4.0. LexPredict is proud to support open source in legal technology and innovation.
Sources of Data
Geographical coordinates in multi/geopolitical/geopolitical_divisions.csv are from geonames.org (Creative Commons Attribution 4.0 License).
- 1.0.2: October 1, 2017 - October release, including over 10 new knowledge sets and word embedding models
- 1.0.1: September 1, 2017 - September release, including over 20 new knowledge sets
- 1.0.0: August 1, 2017 - Initial open source release