# Sentiment Analysis

This notebook provides some samples of the requests that can be made using the *SentimentAnalysis* class from the module *scripts/sentiment_analysis.py*.

*SentimentAnalysis* contains the following methods:
- *analyze_sentiment*: Analyzes sentiment for a single document. Run inside the methods.
- *analyze_docs_by_language*: Returns sentiment analysis of documents grouped by language.
- *analyze_all_docs*: Returns sentiment analysis of documents for all language groups.
- *get_avg_sentiment_by_category*: Returns sentiment average by a user-defined data column.

To execute this notebook, please start by running the initialization script below. Then, you can run and modify the other code cells according to your needs. 

In [1]:
# Import required libraries
from pathlib import Path
from datetime import datetime
from loguru import logger

from scripts.sentiment_analysis import SentimentAnalysis

# Define constants
LOGS_DIR = Path('logs')
timestamp = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
LOGS_FILE = LOGS_DIR / f"sentiment_analysis_{timestamp}.log"
PDF_LIST = Path('data/pdf_list.csv')
DB_PATH = Path('data/database.db')

logger.add(
    LOGS_FILE,
    rotation="1 day",
    retention="7 days",
    level="DEBUG",
    format="{time:YYYY-MM-DD at HH:mm:ss} | {level} | {message}",
)

sa = SentimentAnalysis(DB_PATH)

  from .autonotebook import tqdm as notebook_tqdm
[32m2024-10-05 16:32:16.105[0m | [1mINFO    [0m | [36mscripts.database[0m:[36m__init__[0m:[36m25[0m - [1mDatabase connection successful.[0m
[32m2024-10-05 16:32:17.372[0m | [1mINFO    [0m | [36mscripts.sentiment_analysis[0m:[36m__init__[0m:[36m35[0m - [1mSentimentAnalysis initialized successfully[0m


In [2]:
# Analyze all documents
results = sa.analyze_all_docs()
results.head()

[32m2024-10-05 16:32:19.404[0m | [1mINFO    [0m | [36mscripts.sentiment_analysis[0m:[36manalyze_docs_by_language[0m:[36m77[0m - [1mAnalyzing sentiment for 90 fr documents[0m
100%|██████████| 90/90 [04:46<00:00,  3.18s/it]
[32m2024-10-05 16:37:06.035[0m | [1mINFO    [0m | [36mscripts.sentiment_analysis[0m:[36manalyze_docs_by_language[0m:[36m101[0m - [1mSentiment analysis for fr documents completed[0m
[32m2024-10-05 16:37:06.038[0m | [1mINFO    [0m | [36mscripts.sentiment_analysis[0m:[36manalyze_docs_by_language[0m:[36m77[0m - [1mAnalyzing sentiment for 14 en documents[0m
100%|██████████| 14/14 [00:54<00:00,  3.88s/it]
[32m2024-10-05 16:38:00.378[0m | [1mINFO    [0m | [36mscripts.sentiment_analysis[0m:[36manalyze_docs_by_language[0m:[36m101[0m - [1mSentiment analysis for en documents completed[0m
[32m2024-10-05 16:38:00.381[0m | [1mINFO    [0m | [36mscripts.sentiment_analysis[0m:[36manalyze_docs_by_language[0m:[36m77[0m - [1mAnal

Unnamed: 0,id,organization,document_type,category,clientele,knowledge_type,overall_sentiment,aspect_discrimination,aspect_inclusion,aspect_race,aspect_prejudice,aspect_equality,aspect_diversity,aspect_ethnicity,aspect_bias
0,1,Carrefour de ressources en interculturel (CRIC...,Mémoire,Organismes communautaires et à but non-lucratif,Personnes issues de l'immigration,Communautaire,0.7459,0.3046,0.489467,,,,,,
1,2,Culture Montréal,Mémoire,Organismes municipaux et paramunicipaux,Artistes,"Communautaire,Municipal",-0.00185,-0.35765,-0.07125,-0.07295,,,,,
2,3,Maurice Bakinde,Témoignage,Citoyens et particuliers,Personnes racisées,Citoyen,-0.0037,-0.399525,0.4794,,,,,,
3,5,Nafija Rahman,Témoignage,Citoyens et particuliers,S.O.,Citoyen,0.23985,,,,,,,,
4,6,François Picard,Mémoire,Citoyens et particuliers,S.O.,Citoyen,-0.0004,0.122772,,,,,,,


In [3]:
# Get average sentiment by category
avg_sentiment = sa.get_avg_sentiment_by_category('category')
avg_sentiment

[32m2024-10-05 16:38:38.620[0m | [1mINFO    [0m | [36mscripts.sentiment_analysis[0m:[36manalyze_docs_by_language[0m:[36m77[0m - [1mAnalyzing sentiment for 90 fr documents[0m
100%|██████████| 90/90 [04:42<00:00,  3.14s/it]
[32m2024-10-05 16:43:21.115[0m | [1mINFO    [0m | [36mscripts.sentiment_analysis[0m:[36manalyze_docs_by_language[0m:[36m101[0m - [1mSentiment analysis for fr documents completed[0m
[32m2024-10-05 16:43:21.118[0m | [1mINFO    [0m | [36mscripts.sentiment_analysis[0m:[36manalyze_docs_by_language[0m:[36m77[0m - [1mAnalyzing sentiment for 14 en documents[0m
100%|██████████| 14/14 [00:53<00:00,  3.85s/it]
[32m2024-10-05 16:44:14.995[0m | [1mINFO    [0m | [36mscripts.sentiment_analysis[0m:[36manalyze_docs_by_language[0m:[36m101[0m - [1mSentiment analysis for en documents completed[0m
[32m2024-10-05 16:44:14.997[0m | [1mINFO    [0m | [36mscripts.sentiment_analysis[0m:[36manalyze_docs_by_language[0m:[36m77[0m - [1mAnal

category
Personalités et organisations politiques           0.916533
Organismes communautaires et à but non-lucratif    0.569631
Organismes municipaux et paramunicipaux            0.557596
Chercheurs et experts                              0.306473
Citoyens et particuliers                           0.303685
Regroupements et réseaux                          -0.125112
Name: overall_sentiment, dtype: float64


: 