Legal Data Clustering

This repository contains code to cluster legal network data. It is, inter alia, used to produce the results reported in the following publications:

Daniel Martin Katz, Corinna Coupette, Janis Beckedorf, and Dirk Hartung, Complex Societies and the Growth of the Law, Sci. Rep. 10 (2020), https://doi.org/10.1038/s41598-020-73623-x
Corinna Coupette, Janis Beckedorf, Dirk Hartung, Michael Bommarito, and Daniel Martin Katz, Measuring Law Over Time, to appear (2021)

Related Repositories:

Related Data:

Setup

It is assumed that you have Python 3.7 installed. (Other versions are not tested.)
Set up a virtual environment and activate it. (This is not required but recommended.)
Install the required packages pip install -r requirements.txt.

Usage

Download or Generate the Data

One option is to generate the required data yourself using https://github.com/QuantLaw/legal-data-preprocessing (also available at https://doi.org/10.5281/zenodo.4070772).

Another option is to use the generated data from the related datasets (see above). This repository also contains the clustering results. To execute the clustering, you only need the following directories, other directories should be removed as otherwise clustering steps might be skipped.

Required files for Germany relative to this repository

../legal-networks-data/de/2_xml
../legal-networks-data/de/4_crossreference_graph
../legal-networks-data/de/5_snapshot_mapping_edgelist

Required files for USA relative to this repository

../legal-networks-data/us/2_xml
../legal-networks-data/us/4_crossreference_graph
../legal-networks-data/us/5_snapshot_mapping_edgelist

The combined data of statutes and regulations is located in the de_reg and us_reg folders next to the de and us folders.

Run

Run ./run_example_configs.sh to preprocess the graphs in multiple configurations, cluster them, and map the clusterings over all available years.

The following steps will be executed:

Preprocessing Simplify the graphs so that they can serve as input for clustering algorithms.
Cluster Perform the clustering with infomap or louvain.
Cluster Texts Collect the text for each cluster. (This step can only be performed if the text data is available ../legal-networks-data/{us,de,us_reg,de_reg}/2_xml.)
Cluster Evolution Mappings Map the clusters over time.
Cluster Evolution Graph Create a graph with clusters as nodes and edges indicating the dynamics of nodes between snapshots.
Cluster Inspection Inspect the content of individual clusters.
Cluster Evolution Inspection Inspect the content of cluster families.

Name		Name	Last commit message	Last commit date
Latest commit History 179 Commits
.github/workflows		.github/workflows
legal_data_clustering		legal_data_clustering
tests		tests
.codeclimate.yml		.codeclimate.yml
.coveragerc		.coveragerc
.gitignore		.gitignore
.isort.cfg		.isort.cfg
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
__main__.py		__main__.py
codecov.yml		codecov.yml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
run_example_configs.sh		run_example_configs.sh
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Legal Data Clustering

Setup

Usage

Download or Generate the Data

Run

About

Releases 3

Packages

Contributors 2

Languages

License

QuantLaw/legal-data-clustering

Folders and files

Latest commit

History

Repository files navigation

Legal Data Clustering

Setup

Usage

Download or Generate the Data

Run

About

Resources

License

Stars

Watchers

Forks

Releases 3

Packages 0

Contributors 2

Languages

Packages