Skip to content

FabioCollado/meuqueridodiario

Repository files navigation

Segmentation of Legal Documents: unsupervised approach with BERT embeddings

Author: Fabio Collado

Advisors: Andre Assumpção, Roberto Lotufo e Rodrigo Nogueira

Final Project IA376

December 2021

Uma parceria com Meu Querido Diario

How to run the code:

1 - Extract the PDFs of the gazettes from https://github.com/rennerocha/querido-diario-api-wrapper and place them in the "pdfs" folder.

2 - Run the notebook "processing_pdfs.ipynb" to generate the file queridodiario2.pkl

3 - Run the notebook "Projeto_Final_Querido_Diario.ipynb" on Google Collab.

Contents:

  • Relatório Final.pdf - Final Report.
  • Relatório.zip - Final Report in Latex.
  • Projeto_Final_Querido_Diario.ipynb - Notebook to run the project.
  • processing_pdfs.ipynb - Notebook to extract the pdfs, preprocess the text and generate the file "queridodiario2.pkl".
  • preprocess.py - Code for pre-processing the text.
  • validation_dataset.pkl - Dataset for validation.
  • test_dataset.pkl - Dataset for test.

About

Projeto final de IA376

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published