# Topic Analysis

This notebook provides some samples of the requests that can be made using the *main* methods from the module *scripts/topic_analysis.py*.

*main* calls the following classes:
- *Database* ('*scripts/database.py*'): For database operations
    - *fetch_single*: Fetches a single document for topic analysis
    - *fetch_all*: Fetches all documents for topic analysis
- *Process* ('*scripts/topic_analysis/text_processing.py*'): For text processing
    - *single_doc*: Processes text from a single document
    - *docs_parallel*: Processes text from several documents using parallel processing
- *Analysis* ('*scripts/topic_analysis/analysis.py*'): For analysis of processed documents
    - *analyze_docs*: Analyzes documents by topic

To execute this notebook, please start by running the initialization script below. Then, you can run and modify the other code cells according to your needs. 

In [1]:
from datetime import datetime
from pathlib import Path
import logging

# Set up logging constants
LOGS_DIR = Path("logs")
timestamp = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
LOGS_FILE = LOGS_DIR / f"topic_analysis_{timestamp}.log"

logging.basicConfig(
    filename=LOGS_FILE,
    level=logging.DEBUG,
    format='%(asctime)s:%(levelname)s:%(message)s'
)

## Analysis by single documents

In [2]:
from scripts.topic_analysis_main import main

main(lang="fr", mode='single', document_id=1)

[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\nicol\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
Processing Document 1: 100%|██████████| 1/1 [00:37<00:00, 37.08s/it]

Document 1 Results:
Topic 1: personnes, rapliq, handicap, handicapées, personnes handicapées, discrimination, racisme, discriminations, systémiques, mémoire





## Topics by language

In [3]:
from scripts.topic_analysis_main import main

main(lang="fr", mode='all')

Processing Documents: 146745it [3:04:57, 13.07it/s]          

In [None]:
from scripts.topic_analysis_main import main

main(lang="en", mode='all')

In [None]:
from scripts.topic_analysis_main import main

main(lang="bilingual", mode='all')