Skip to content

IacobusKopiirefuto/slovak-law-word-count

Repository files navigation

slovak-law-word-count

This project aims to offer linguistic analysis of legal acts of the Slovak Republic and other documents available on Slov-Lex.

It was created as part of a report Analysis of law volume - revealing dormant potential (Analýza objemu zákonov - odhalenie spiaceho potenciálu, HTML, PDF) for The Institute of Economic and Social Studies (INESS). It was also shortly mentioned in the short news and economic newsletter & podcast of DennikN.

So far it offers these metrics and outputs:

  • word_count
  • char_count
  • word_count_stop
  • sent_count
  • avg_sent_length
  • type_token_ratio
  • FKGL
  • GFI
  • syllable_count
  • complex_word_count
  • tag_frequencies
  • lemma_counts
  • s_readability_word_count
  • tag_word_count

Running project

Simply download the script and run either quick analysis (almost instant results, low hardware requirements) or analysis using Stanza natural language analysis package (outputs more metrics but is also more resource heavy)

Example for analysing the Constitution of Slovak Republic:

  • quick analysis:
# from download_fun import download_links_from_table
from quick_analysis import q_analysis
from stop_words_default import default_stop_words

# download_links_from_table('https://www.slov-lex.sk/pravne-predpisy/SK/ZZ/1992/460/20230701', '../Constitution')
q_analysis("./Constitution", default_stop_words, 'o Ústave SR')
  • stanza analysis:
# from download_fun import download_links_from_table
from stanza_analysis import s_analysis
from stop_words_default import default_stop_words

# download_links_from_table('https://www.slov-lex.sk/pravne-predpisy/SK/ZZ/1992/460/20230701', './Constitution')
s_analysis("./Constitution", default_stop_words, 'o Ústave SR')

To Do

  • fix download_fun.py: function seems to work, but slov-lex.sk started to block it
    • currently gives error: HTTPSConnectionPool(host='www.slov-lex.sk', port=443): Max retries exceeded with url: /pravne-predpisy/SK/ZZ/1992/460/20230701 (Caused by SSLError(SSLError(1, '[SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] sslv3 alert handshake failure (_ssl.c:1006)')))
    • but curl https://www.slov-lex.sk/pravne-predpisy/SK/ZZ/1992/460/20230701 as well as download_links_from_table('https://example.org/', '../Constitution') works
    • you can also download all legal texts (11 GB, zip archive) from slov-lex.sk
  • add additional metrics:
    • longest (shortest) sentences from s_sentences()
    • longest (shortest) words from s_readability()
    • page count (conversion from character count)
    • number of internal and external hyperlinks in text
    • number of paragraphs
    • number of footnotes
  • generalize input
  • graph of interconnected legal texts (already available on Slov-Lex 'Zobraziť graf k predpisu' (top right corner, icon of a pie chart without text))

About

Measuring Slovak law's growing word count (and other metrics)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages