In [1]:
import csv
from tqdm import tqdm
from flair.nn import Classifier
from flair.data import Sentence
import numpy as np
from collections import Counter
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from IPython.display import display, Markdown

## Pistes d'idées
Tester sur des prompts de manière plus générales
prendre des exemples du dataset IMDB, les modifier et éventuellement déduire de potentiels biais (quels mots pourraient influencer)
Tester et comparer avec un autre modèle de flair que distillbert et voir si lui-même ne serait pas biaisé, car on ne connaît pas le dataset utilisé pour entraîner le modèle de distillbert de flair

Dataset de train utilisé pour le modèle : pas très clair ("un mix de corpus, dont notamment Amazon Review Corpus") (que veulent-ils dire par notamment ?)

In [2]:
# load the model
tagger = Classifier.load('sentiment')

Sentence[5]: "the driver was a woman" → POSITIVE (0.5925)


In [3]:
data = []
with open('imdb.csv', newline='') as csvfile:
    reader = csv.reader(csvfile, delimiter=',', quotechar='"')
    for row in reader:
        data.append(row)
data = np.array(data)
        
data_positive = np.array([row for row in data if row[1] == 'positive'])
data_negative = np.array([row for row in data if row[1] == 'negative'])

In [4]:
print(data_positive.shape)
print(data_negative.shape)

(42, 2)
(58, 2)


In [6]:
def get_common_words(data):
    stop_words = set(stopwords.words('english'))
    reviews = data[0].tolist()
    words = []
    for review in reviews:
        tokens = word_tokenize(review)
        words += [token.lower() for token in tokens if token.isalpha() and token.lower() not in stop_words]
    return Counter(words).most_common(10)


In [7]:
get_common_words(data_negative)

[('br', 6),
 ('jake', 4),
 ('parents', 3),
 ('movie', 3),
 ('drama', 3),
 ('closet', 2),
 ('film', 2),
 ('thriller', 2),
 ('basically', 1),
 ('family', 1)]

In [8]:
get_common_words(data_positive)

[('oz', 6),
 ('br', 6),
 ('violence', 4),
 ('show', 3),
 ('prison', 3),
 ('forget', 3),
 ('watching', 2),
 ('episode', 2),
 ('right', 2),
 ('first', 2)]

In [9]:
def compare_sentiment(sentence):
    display(Markdown(sentence))
    prediction = Sentence(sentence)
    tagger.predict(prediction)
    print("Prediction: ", prediction)

In [10]:
def get_accuracy(data):
    correct = 0
    for row in data:
        prediction = Sentence(row[0])
        tagger.predict(prediction)
        if(prediction.labels[0].value.lower() == row[1]):
            correct += 1
    return correct / data.shape[0]

print(get_accuracy(data_positive))
print(get_accuracy(data_negative))
print(get_accuracy(data))

0.8571428571428571
1.0
0.9306930693069307


On remarque que le modèle a déjà plus de mal à identifier une review positive d'une review négative. Les données de test utilisées ici présentent plus de reviews négatives que positives. On peut alors se demander s'il n'y a pas eu un biais dans les données d'entraînement. En effet, il est possible que l'équilibre en terme de quantité de reviews positives vs négatives n'ait pas été respecté. Etant donné l'apparente diversité des données d'entraînement utilisées pour le modèle utilisé par Flair, ainsi que le flou quant à sa composition exacte, on ne peut faire que de simples suggestions à ce sujet.

In [11]:
def get_least_accurate_prediction(data):
    score = 1.0
    least_index = 0
    for index, row in enumerate(data):
        prediction = Sentence(row[0])
        tagger.predict(prediction)
        if(prediction.labels[0].score < score):
            score = prediction.labels[0].score
            least_index = index
    return data[least_index - 1][0], score, least_index


### Analyse des données "complexes" pour le modèle

In [12]:
review, score, index = get_least_accurate_prediction(data_positive)
print(score)
display(Markdown(review))

0.5802236795425415


NO SPOILERS!!<br /><br />After Hitchcock's successful first American film, Rebecca based upon Daphne DuMarier's lush novel of gothic romance and intrigue, he returned to some of the more familiar themes of his early British period - mistaken identity and espionage. As the U.S. settled into World War II and the large scale 'war effort' of civilians building planes, weaponry and other necessary militia, the booming film entertainment business began turning out paranoid and often jingoistic thrillers with war time themes. These thrillers often involved networks of deceptive and skilled operators at work in the shadows among the good, law abiding citizens. Knowing the director was at home in this espionage genre, producer Jack Skirball approached Hitchcock about directing a property he owned that dealt with corruption, war-time sabotage and a helpless hero thrust into a vortex of coincidence and mistaken identity. The darker elements of the narrative and the sharp wit of literary maven Dorothy Parker (during her brief stint in Hollywood before returning to her bohemian roots in NYC) who co-authored the script were a perfect match for Hitchcock's sensibilities.<br /><br />This often neglected film tells the story of the unfortunate 25 year old Barry Kane (Robert Cummings) who, while at work at a Los Angeles Airplane Factory, meets new employee Frank Frye (Norman Lloydd) and moments later is framed for committing sabotage. Fleeing the authorities who don't believe his far-fetched story he meets several characters on his way to Soda City Utah and finally New York City. These memorable characters include a circus caravan with a car full of helpful 'freaks' and a popular billboard model Patricia Martin (Priscilla Lane) who, during the worst crisis of his life as well as national security, he falls madly in love with! Of course in the land of Hitchcock, Patricia, kidnapped by the supposed saboteur Barry, falls for her captor thus adding romantic tension to the mix.<br /><br />In good form for this outing, Hitchcock brews a national network of demure old ladies, average Joes, and respectable businessmen who double as secret agent terrorists that harbor criminals, pull guns and detonate bombs to keep things moving. It's a terrific plot that takes its time moving forward and once ignited, culminates in one of Hitchcock's more memorable finales. Look for incredibly life like NYC tourist attractions (all of which were recreated by art directors in Hollywood due to the war-time 'shooting ban' on public attractions). While Saboteur may not be one of Hitchcock's most well known films, it's a popular b-movie that is certainly solid and engaging with plenty of clever plot twists and as usual - terrific Hitchcock villains. Remember to look for Hitchcock's cameo appearance outside a drug store in the second half of the film. Hitchcock's original cameo idea that was shot (him fighting in sign language with his 'deaf' wife) was axed by the Bureau of Standards and Practices who were afraid of offending the deaf!

Dans le cas ici de la prédiction, le modèle semble moins certain de son résultat. Si on prête un peu plus d'attention à la review en question, on peut comprendre sa difficulté à déterminer si elle est positive ou non. En effet, on remarque que c'est surtout une description détaillée du film en question. L'utilisateur donne peu d'informations quant à la polarité de son avis, ou bien emploie des termes qui peuvent être complexe à comprendre : "*certainly solid and engaging*".

In [None]:
def get_wrong_prediction(data)

In [13]:
review, score, index = get_least_accurate_prediction(data_negative)
print(score)
display(Markdown(review))

0.6443233489990234


Average (and surprisingly tame) Fulci giallo which means it's still quite bad by normal standards, but redeemed by its solid build-up and some nice touches such as a neat time twist on the issues of visions and clairvoyance.<br /><br />The genre's well-known weaknesses are in full gear: banal dialogue, wooden acting, illogical plot points. And the finale goes on much too long, while the denouement proves to be a rather lame or shall I say: limp affair.<br /><br />Fulci's ironic handling of giallo norms is amusing, though. Yellow clues wherever you look.<br /><br />3 out of 10 limping killers

Ici, 

In [14]:
# make a sentence
sentence = Sentence('the driver was a woman')

# predict NER tags
tagger.predict(sentence)

# print sentence with predicted tags
print(sentence)

Sentence[5]: "the driver was a woman" → POSITIVE (0.5925)
