Dans ce notebook, une analyse de sentiment a été réalisée à l'aide d'un modèle BERT préentraîné. Cependant, en raison des contraintes de temps de calcul des inférences, seuls les 10 000 premiers tweets ont été conservés pour l'analyse. Cette approche permet de réduire la charge de traitement tout en fournissant une base de données significative pour évaluer les sentiments exprimés dans les tweets sélectionnés.

In [None]:
from google.colab import drive
drive.mount('/content/drive')


Mounted at /content/drive


In [None]:
!pip install torch torchvision torchaudio




In [None]:
!pip install transformers


Collecting transformers
  Downloading transformers-4.30.2-py3-none-any.whl (7.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.2/7.2 MB[0m [31m68.1 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub<1.0,>=0.14.1 (from transformers)
  Downloading huggingface_hub-0.15.1-py3-none-any.whl (236 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m236.8/236.8 kB[0m [31m26.6 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1 (from transformers)
  Downloading tokenizers-0.13.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m116.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting safetensors>=0.3.1 (from transformers)
  Downloading safetensors-0.3.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m86.2 MB/s[0m eta [36m0:00:

In [None]:
!nvidia-smi


Thu Jun 29 11:44:27 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12    Driver Version: 525.85.12    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   45C    P8    12W /  70W |      0MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [None]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import requests

In [None]:
import numpy as np
import pandas as pd
df = pd.read_csv('/content/drive/MyDrive/preprocesses_tweets.csv')
df.head()

Unnamed: 0,id,text,preprocessed_tweet
0,1323650510029791233,RT @WashingtonNFL: RT our digital sticker if y...,RT WashingtonNFL RT digital sticker exercised ...
1,1323650510013059073,RT @MisikoMichael: @WhiteHouse A vote for Trum...,RT MisikoMichael WhiteHouse vote Trump vote ch...
2,1323650509836849153,RT @justfivefoottwo: If you vote Trump tomorro...,RT justfivefoottwo vote Trump tomorrow make su...
3,1323650509765574657,@sammyliddell929 Trump,sammyliddell929 Trump
4,1323650509555802114,RT @matthewjdowd: Both Trump and Biden went to...,RT matthewjdowd Trump Biden went church choice...


In [None]:
df.drop('text', axis=1, inplace=True)
df.head()

Unnamed: 0,id,preprocessed_tweet
0,1323650510029791233,RT WashingtonNFL RT digital sticker exercised ...
1,1323650510013059073,RT MisikoMichael WhiteHouse vote Trump vote ch...
2,1323650509836849153,RT justfivefoottwo vote Trump tomorrow make su...
3,1323650509765574657,sammyliddell929 Trump
4,1323650509555802114,RT matthewjdowd Trump Biden went church choice...


In [None]:
df10k = df.head(10000)

In [None]:
tokenizer = AutoTokenizer.from_pretrained('nlptown/bert-base-multilingual-uncased-sentiment')
model = AutoModelForSequenceClassification.from_pretrained('nlptown/bert-base-multilingual-uncased-sentiment')

# Fonction qui attribue un score de sentiment à un tweet
def sentiment_score(tweet):
    tokens = tokenizer.encode(tweet, return_tensors='pt')
    #tokens = tokenizer.encode(tweet, return_tensors='pt').to('cuda')

    result = model(tokens)
    return int(torch.argmax(result.logits)) + 1

# Ajoute une nouvelle colonne "sentiment" au DataFrame df10k avec les scores de sentiment pour chaque tweet
df10k['sentiment'] = df10k['preprocessed_tweet'].apply(lambda x: sentiment_score(x[:512]))



In [None]:
df100.head(20)

Unnamed: 0,id,preprocessed_tweet,sentiment
0,1323650510029791233,RT WashingtonNFL RT digital sticker exercised ...,5
1,1323650510013059073,RT MisikoMichael WhiteHouse vote Trump vote ch...,1
2,1323650509836849153,RT justfivefoottwo vote Trump tomorrow make su...,1
3,1323650509765574657,sammyliddell929 Trump,5
4,1323650509555802114,RT matthewjdowd Trump Biden went church choice...,1
5,1323650509396307968,RT MarciaJacobs13 Dont let trump fool tax Reme...,1
6,1323650509253840902,RT kaitlancollins campaign belief tonight land...,1
7,1323650509241278467,RT aricnesbitt Porter Township Hall morning Va...,1
8,1323650509144793088,RT COOLCHICBLONDE Election Day u care even lit...,5
9,1323650508976988165,RT gsjh59 NY State Local story Trump Train Ral...,1


In [None]:
df10k.to_csv('tweets_with_sentiments.csv', index=False)