Author: Fabio Collado
Advisors: Andre Assumpção, Roberto Lotufo e Rodrigo Nogueira
Final Project IA376
December 2021
Uma parceria com Meu Querido Diario
1 - Extract the PDFs of the gazettes from https://github.com/rennerocha/querido-diario-api-wrapper and place them in the "pdfs" folder.
2 - Run the notebook "processing_pdfs.ipynb" to generate the file queridodiario2.pkl
3 - Run the notebook "Projeto_Final_Querido_Diario.ipynb" on Google Collab.
- Relatório Final.pdf - Final Report.
- Relatório.zip - Final Report in Latex.
- Projeto_Final_Querido_Diario.ipynb - Notebook to run the project.
- processing_pdfs.ipynb - Notebook to extract the pdfs, preprocess the text and generate the file "queridodiario2.pkl".
- preprocess.py - Code for pre-processing the text.
- validation_dataset.pkl - Dataset for validation.
- test_dataset.pkl - Dataset for test.