In [13]:
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB

# Завантаження даних
data = pd.read_csv("IMDB_Sentiment/train.csv")

data = data.dropna(subset=['text', 'sentiment'])

positive_reviews = data[data['sentiment'] == 0]
negative_reviews = data[data['sentiment'] == 1]

test_neg_indices = np.random.choice(negative_reviews.index, 2500, replace=False)
test_pos_indices = np.random.choice(positive_reviews.index, 2500, replace=False)

# Формування тестового набору даних
test_data = pd.concat([negative_reviews.loc[test_neg_indices], positive_reviews.loc[test_pos_indices]])

train_data = data.drop(test_data.index)

X_train, y_train = train_data['text'], train_data['sentiment']
X_test, y_test = test_data['text'], test_data['sentiment']

vectorizer = CountVectorizer(stop_words='english')
X_train_vect = vectorizer.fit_transform(X_train)
X_test_vect = vectorizer.transform(X_test)

clf = MultinomialNB()
clf.fit(X_train_vect, y_train)

In [14]:
predictions = clf.predict(X_test_vect)

correct_predictions = (predictions == y_test).sum()
total_predictions = len(y_test)
accuracy = (correct_predictions / total_predictions) * 100

print(f"Кількість правильно класифікованих прикладів: {correct_predictions} з {total_predictions}")
print(f"Точність класифікатора: {accuracy:.3f}%")


Кількість правильно класифікованих прикладів: 4248 з 5000
Точність класифікатора: 84.960%


In [11]:
# Знаходження неправильно класифікованих прикладів
misclassified_indices = np.where(predictions != y_test)[0]

n = 5
random_misclassified_indices = np.random.choice(misclassified_indices, n, replace=False)

print("\nНеправильно класифіковані приклади:")
for i in random_misclassified_indices:
    actual_label = 'Позитивний' if y_test.iloc[i] == 0 else 'Негативний'
    predicted_label = 'Позитивний' if predictions[i] == 0 else 'Негативний'
    print(f"\nВідгук: {X_test.iloc[i]}")
    print(f"Справжній сентимент: {actual_label}")
    print(f"Прогнозований сентимент: {predicted_label}")


Неправильно класифіковані приклади:

Відгук: Fashionably fragmented, yet infuriatingly half-realized character-study, an examination of the different personalities of two college roommates: a talented but undisciplined star basketball player, and a pot-smoking, womanizing rabble-rouser. We never learn why these young men are friends. They may share confusions about the world and their places in it, but they don't seem to have anything else in common. Making his directorial debut, Jack Nicholson--who also co-wrote the screenplay with Jeremy Larner, based upon Larner's book--doesn't introduce us to the characters with any clarity, nor he does shape the scenes to help us identify with anyone on the screen. There are some very decent performances here (particularly from newcomer William Tepper in the central role), but most of the picture is unformed (perhaps intentionally), sketchy or unsure. Bruce Dern plays the hard-driving basketball coach, Karen Black is the older, married lady Teppe

In [12]:
n = 5
random_indices = np.random.choice(len(X_test), n, replace=False)
predictions = clf.predict(X_test_vect[random_indices])

print("\nРезультати для випадкових", n, "прикладів тестової вибірки:")
for i, idx in enumerate(random_indices):
    print(f"\nВідгук: {X_test.iloc[idx]}")
    print(f"Прогнозований сентимент: {'Негативний' if predictions[i] == 1 else 'Позитивний'}")



Результати для випадкових 5 прикладів тестової вибірки:

Відгук: Burt Reynolds came to a point in his career where he appeared to just be going thru the motions. He'd show up, party with his friends on film, and take home a big paycheck. It didn't seem to matter to him that the product he was representing was pure crap.  No film epitomized this more than "Stroker Ace" which makes "Cannonball Run" look like a classic and "Cannonball Run II" look watchable. Save for a few race scenes there is absolutely NOTHING worth seeing here. Even the beautiful Loni Anderson hams it up so bad as a dumb blonde it's embarrassing.  If the thought of Burt hamming it up with Jim Nabors and dressing like a chicken sounds funny then this is your movie. Otherwise pick almost any other film comedy and it won't be any worse.
Прогнозований сентимент: Негативний

Відгук: I really enjoyed this one, and although the ending made me angry, I still give it 10 out of 10.  Four college girls (Baltron, Kelly, Stahl and