# Kapitel 11: Durchführen von Sentiment-Analysen für Textdaten

## Setup

Es werden die Verzeichnisse festgelegt. Wenn Sie mit Google Colab arbeiten: Die erforderlichen Dateien werden kopiert und die erforderlichen Bibliotheken installiert.

# Hinweise

Mit ### ergänzte Code-Zeilen geben Werte an, die angepasst werden können. 

Transformers Version 3.5.1 muss installiert werden: pip install transformers==3.5.1 

Pytorch (oder Tensorflow > 2) muss instaliert werden: pip install torch==1.7.0+cpu torchvision==0.8.1+cpu torchaudio==0.7.0 -f https://download.pytorch.org/whl/torch_stable.html


In [1]:
import sys, os
ON_COLAB = 'google.colab' in sys.modules

if ON_COLAB:
    GIT_ROOT = 'https://github.com/blueprints-for-text-analytics-python/blueprints-text/raw/master'
    os.system(f'wget {GIT_ROOT}/ch11/setup.py')

%run -i setup.py

You are working on a local system.
Files will be searched relative to "..".


## Python-Einstellungen laden

Allgemeine Importe, Standardwerte für die Formatierung in Matplotlib, Pandas usw.

In [2]:
# Pfad zum Importieren der Blueprint-Packages
sys.path.append(BASE_DIR + '/packages')

import pandas as pd
from sklearn import preprocessing
import nltk
nltk.download('opinion_lexicon')

[nltk_data] Downloading package opinion_lexicon to
[nltk_data]     C:\Users\kleme\AppData\Roaming\nltk_data...
[nltk_data]   Package opinion_lexicon is already up-to-date!


True

# Sentiment-Analyse

## Einführung in den Amazon-Kundenrezensionsdatensatz

In [3]:
file = "reviews_5_balanced.json.gz"
file = f"{BASE_DIR}/data/amazon-product-reviews/reviews_5_balanced.json.gz" ### real location
df = pd.read_json(file, lines=True)
df = df.drop(columns=['reviewTime','unixReviewTime']) ###
df = df.rename(columns={'reviewText': 'text'}) ###
df.sample(5, random_state=12)

Unnamed: 0,overall,verified,reviewerID,asin,text,summary
163807,5,False,A2A8GHFXUG1B28,B0045Z4JAI,Good Decaf... it has a good flavour for a deca...,Nice!
195640,5,True,A1VU337W6PKAR3,B00K0TIC56,I could not ask for a better system for my sma...,I could not ask for a better system for my sma...
167820,4,True,A1Z5TT1BBSDLRM,B0012ORBT6,good product at a good price and saves a trip ...,Four Stars
104268,1,False,A4PRXX2G8900X,B005SPI45U,I like the principle of a raw chip - something...,No better alternatives but still tastes bad.
51961,1,True,AYETYLNYDIS2S,B00D1HLUP8,"Fake China knockoff, you get what you pay for.",Definitely not OEM


# Blueprint 1: Sentiment-Analyse mit lexikonbasierten Ansätzen

## Bing Liu Lexikon

In [4]:
from nltk.corpus import opinion_lexicon
from nltk.tokenize import word_tokenize

print('Total number of words in opinion lexicon', len(opinion_lexicon.words()))
print('Examples of positive words in opinion lexicon',
      opinion_lexicon.positive()[:5])
print('Examples of negative words in opinion lexicon',
      opinion_lexicon.negative()[:5])

Total number of words in opinion lexicon 6789
Examples of positive words in opinion lexicon ['a+', 'abound', 'abounds', 'abundance', 'abundant']
Examples of negative words in opinion lexicon ['2-faced', '2-faces', 'abnormal', 'abolish', 'abominable']


In [5]:
# Erstellung eines Wörterbuchs für die Bewertung unseres Rezensionstextes
# Dieser erste BEfehl braucht nur beim ersten Aufruf durchgeführt werden
nltk.download('punkt') ###
df.rename(columns={"reviewText": "text"}, inplace=True)
pos_score = 1
neg_score = -1
word_dict = {}

# Hinzufügen der positiven Wörter zum Wörterbuch
for word in opinion_lexicon.positive():
        word_dict[word] = pos_score
        
# Hinzufügen der negativen Wörter zum Wörterbuch
for word in opinion_lexicon.negative():
        word_dict[word] = neg_score
        
def bing_liu_score(text):
    sentiment_score = 0
    bag_of_words = word_tokenize(text.lower())
    for word in bag_of_words:
        if word in word_dict:
            sentiment_score += word_dict[word]
    return sentiment_score / len(bag_of_words)

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\kleme\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


In [6]:
df['Bing_Liu_Score'] = df['text'].apply(bing_liu_score)
df[['asin','text','Bing_Liu_Score']].sample(2, random_state=0)

Unnamed: 0,asin,text,Bing_Liu_Score
188097,B00099QWOU,As expected,0.0
184654,B000RW1XO8,Works as designed...,0.25


In [7]:
df['Bing_Liu_Score'] = preprocessing.scale(df['Bing_Liu_Score'])
df.groupby('overall').agg({'Bing_Liu_Score':'mean'})

Unnamed: 0_level_0,Bing_Liu_Score
overall,Unnamed: 1_level_1
1,-0.587784
2,-0.427183
4,0.345291
5,0.529736


# Ansätze des überwachten Lernens

## Aufbereitung von Daten für einen überwachten Lernansatz

In [8]:
pd.set_option('display.max_rows', None)  ###
pd.set_option('display.max_columns', None)  ###
pd.set_option('display.width', None)  ###
pd.set_option('display.max_colwidth', None)  ###

file = "reviews_5_balanced.json.gz"
file = f"{BASE_DIR}/data/amazon-product-reviews/reviews_5_balanced.json.gz" 
df = pd.read_json(file, lines=True)
df = df.rename(columns={'reviewText': 'text'})  ###

# Zuweisung eines neuen Zielklassen Lables [1,0] auf der Grundlage der Produktbewertung
df['sentiment'] = 0
df.loc[df['overall'] > 3, 'sentiment'] = 1
df.loc[df['overall'] < 3, 'sentiment'] = 0

# Unnötige Spalten entfernen, um einen einfachen Data Frame zu erhalten 
df.drop(columns=[
    'reviewTime', 'unixReviewTime', 'overall', 'reviewerID', 'summary'],
        inplace=True)
df.sample(3)

Unnamed: 0,verified,asin,text,sentiment
50820,True,B00BUIG6OK,Nasty.,0
55604,True,B00HC2EY9W,The stickiness does not hold as well on the recent purchase as the first ones I bought.,0
252495,True,B000NW4PJC,Easy to install and great quality.,1


# Blueprint 2: Vektorisierung von Textdaten und Anwendung von überwachtem Lernen

## Schritt 1: Datenvorbereitung

In [9]:
from blueprints.preparation import clean
df['text_orig'] = df['text'].copy()
df['text'] = df['text'].apply(clean)

In [10]:
# Durchführung der Tokenisierung und Lemmatisierung durch Wiederverwendung des Blueprints aus Kapitel 4 
# Dies kann aufgrund des Umfangs des Datensatzes länger dauern
import textacy
import spacy
from spacy.lang.en import STOP_WORDS as stop_words
nlp = spacy.load('en_core_web_sm')

def extract_lemmas(doc, **kwargs):
    return [t.lemma_ for t in textacy.extract.words(doc,
                                                    filter_stops = False,
                                                    filter_punct = True,
                                                    filter_nums = True,
                                                    include_pos = ['ADJ', 'NOUN', 'VERB', 'ADV'],
                                                    exclude_pos = None,
                                                    min_freq = 1)]

def clean_text(text):
    doc = nlp(text)
    lemmas = extract_lemmas(doc)
    return ' '.join(lemmas)

In [11]:
# Alternative Methode, die Wordnet POS-Tags anstelle von SpaCy verwendet - kann bei ähnlicher Genauigkeit schneller ablaufen
# Tokenisierung und Lemmatisierung unter Verwendung von wordnet. Wiederverwendung von Teilen des Blueprints aus Kapitel 4
# Verwendet wordnet POS-Tags anstelle von spaCy
# den Wert, der dem POS-Tag entspricht, des wordnet-Objekts zurückgeben
from nltk.corpus import wordnet

def get_wordnet_pos(pos_tag):
    if pos_tag.startswith('J'):
        return wordnet.ADJ
    elif pos_tag.startswith('V'):
        return wordnet.VERB
    elif pos_tag.startswith('N'):
        return wordnet.NOUN
    elif pos_tag.startswith('R'):
        return wordnet.ADV
    else:
        return wordnet.NOUN
    
import string
from nltk import pos_tag
from nltk.corpus import stopwords
from nltk.tokenize import WhitespaceTokenizer
from nltk.stem import WordNetLemmatizer
nltk.download('averaged_perceptron_tagger')
nltk.download('wordnet')
nltk.download('omw-1.4')

def clean_text(text):
    # Text in Kleinbuchstaben umwandlenlower text
    text = text.lower()
    # Text tokenisieren und Satzzeichen entfernen
    text = [word.strip(string.punctuation) for word in text.split(" ")]
    # Wörter, die Zahlen enthalten, entferne
    text = [word for word in text if not any(c.isdigit() for c in word)]
    # Stoppwörter entfernen
    stop = stopwords.words('english')
    text = [x for x in text if x not in stop]
    # leere Token entfernen
    text = [t for t in text if len(t) > 0]
    # POS-Tags rezeugen
    pos_tags = pos_tag(text)
    # Text lemmatisieren
    text = [WordNetLemmatizer().lemmatize(t[0], get_wordnet_pos(t[1])) for t in pos_tags]
    # Wörter mit nur einem Buchstaben entfernen
    text = [t for t in text if len(t) > 1]
    # alle verbinden
    text = " ".join(text)
    return(text)

[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     C:\Users\kleme\AppData\Roaming\nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\kleme\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package omw-1.4 to
[nltk_data]     C:\Users\kleme\AppData\Roaming\nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!


In [12]:
### Achtung! Dauert lange!

df["text"] = df["text"].apply(clean_text)

## Entfernung aller Annotationen, die nach dem Reinigungsschritt leer sind
df = df[df['text'].str.len() != 0]

## Schritt 2: Aufteilung in Trainings- und Testdaten

In [13]:
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(df['text'],
                                                    df['sentiment'],
                                                    test_size=0.2,
                                                    random_state=42,
                                                    stratify=df['sentiment'])

print ('Size of Training Data ', X_train.shape[0])
print ('Size of Test Data ', X_test.shape[0])

print ('Distribution of classes in Training Data :')
print ('Positive Sentiment ', str(sum(Y_train == 1)/ len(Y_train) * 100.0))
print ('Negative Sentiment ', str(sum(Y_train == 0)/ len(Y_train) * 100.0))

print ('Distribution of classes in Testing Data :')
print ('Positive Sentiment ', str(sum(Y_test == 1)/ len(Y_test) * 100.0))
print ('Negative Sentiment ', str(sum(Y_test == 0)/ len(Y_test) * 100.0))

Size of Training Data  234108
Size of Test Data  58527
Distribution of classes in Training Data :
Positive Sentiment  50.90770071932612
Negative Sentiment  49.09229928067388
Distribution of classes in Testing Data :
Positive Sentiment  50.9081278726058
Negative Sentiment  49.09187212739419


## Schritt 3: Vektorisierung von Text

In [14]:
from sklearn.feature_extraction.text import TfidfVectorizer

tfidf = TfidfVectorizer(min_df = 10, ngram_range=(1,1))
X_train_tf = tfidf.fit_transform(X_train)
X_test_tf = tfidf.transform(X_test)

## Schritt 4: Training des Machine Learning-Modells

In [15]:
from sklearn.svm import LinearSVC

model1 = LinearSVC(random_state=42, tol=1e-5)
model1.fit(X_train_tf, Y_train)

LinearSVC(random_state=42, tol=1e-05)

In [16]:
from sklearn.metrics import accuracy_score
from sklearn.metrics import roc_auc_score

Y_pred = model1.predict(X_test_tf)
print ('Accuracy Score - ', accuracy_score(Y_test, Y_pred))
print ('ROC-AUC Score - ', roc_auc_score(Y_test, Y_pred))

Accuracy Score -  0.8658396979172006
ROC-AUC Score -  0.8660667427476778


In [17]:
sample_reviews = df.sample(5, random_state=22)
sample_reviews_tf = tfidf.transform(sample_reviews['text'])
sentiment_predictions = model1.predict(sample_reviews_tf)
sentiment_predictions = pd.DataFrame(data = sentiment_predictions,
                                     index=sample_reviews.index,
                                     columns=['sentiment_prediction'])
sample_reviews = pd.concat([sample_reviews, sentiment_predictions], axis=1)
print ('Some sample reviews with their sentiment - ')
sample_reviews[['text_orig','sentiment_prediction']]

Some sample reviews with their sentiment - 


Unnamed: 0,text_orig,sentiment_prediction
29500,"Its a nice night light, but not much else apparently!",1
98387,"Way to small, do not know what to do with them or how to use them",0
113648,"Didn't make the room ""blue"" enough - returned with no questions asked",0
281527,Excellent,1
233713,fit like oem and looks good,1


In [18]:
def baseline_scorer(text):
    score = bing_liu_score(text)
    if score > 0:
        return 1
    else:
        return 0
    
Y_pred_baseline = X_test.apply(baseline_scorer)
acc_score = accuracy_score(Y_pred_baseline, Y_test)
print (acc_score)

0.7525073897517385


## Speichern des trainierten Modells und des Vektorisierers zur späteren Verwendung mit der API

In [19]:
import pickle

pickle.dump(model1, open('models/sentiment_classification.pickle','wb'))
pickle.dump(tfidf, open('models/sentiment_vectorizer.pickle','wb'))

# Vorgefertigte Sprachmodelle mit Deep Learning (wieder-)verwenden

# Blueprint 3: Transfer-Learning-Techniken und vorab trainiertes Sprachmodell verwenden

In [20]:
# Dies ist ein optionaler Schritt, um den Umfang der Daten zu reduzieren, indem nur 40 % der Beobachtungen in die 
# Stichprobe aufgenommen werden. ACHTUNG!Eine größere Anzahl von Beobachtungen kann zu einer längeren Laufzeit und 
# zum automatischen Herunterfahren der Colab Free-Instanz führen.
df = df.sample(frac=0.4, random_state=42)

## Schritt 1: Laden von Modellen und Tokenisierung

In [21]:
from transformers import BertConfig, BertTokenizer
from transformers import BertForSequenceClassification

config = BertConfig.from_pretrained('bert-base-uncased', finetuning_task='binary')
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForSequenceClassification.from_pretrained('bert-base-uncased')

  return torch._C._cuda_getDeviceCount() > 0
Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were

In [22]:
# Es gibt eine Änderung im Verhalten der Abtrennung beim Aufruf der Encode-Funktion.
# Dies führt zu einer Warnung und das Verhalten wird sich wahrscheinlich in Zukunft ändern
# Derzeit wird die Warnung wie beschrieben unterdrückt - https://github.com/huggingface/transformers/issues/5397
import warnings; ###
warnings.filterwarnings('ignore'); ###

def get_tokens(text, tokenizer, max_seq_length, add_special_tokens=True):
  input_ids = tokenizer.encode(text, 
                               add_special_tokens=add_special_tokens, 
                               max_length=max_seq_length,
                               pad_to_max_length=True)
  attention_mask = [int(id > 0) for id in input_ids]
  assert len(input_ids) == max_seq_length
  assert len(attention_mask) == max_seq_length
  return (input_ids, attention_mask)

text = "Here is the sentence I want embeddings for."
input_ids, attention_mask = get_tokens(text, 
                                       tokenizer, 
                                       max_seq_length=30, 
                                       add_special_tokens = True)
input_tokens = tokenizer.convert_ids_to_tokens(input_ids)
print (text)
print (input_tokens)
print (input_ids)
print (attention_mask)

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


Here is the sentence I want embeddings for.
['[CLS]', 'here', 'is', 'the', 'sentence', 'i', 'want', 'em', '##bed', '##ding', '##s', 'for', '.', '[SEP]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]']
[101, 2182, 2003, 1996, 6251, 1045, 2215, 7861, 8270, 4667, 2015, 2005, 1012, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]


In [23]:
X_train, X_test, Y_train, Y_test = train_test_split(df['text_orig'],
                                                    df['sentiment'],
                                                    test_size=0.2,
                                                    random_state=42,
                                                    stratify=df['sentiment'])
X_train_tokens = X_train.apply(get_tokens, args=(tokenizer, 50))
X_test_tokens = X_test.apply(get_tokens, args=(tokenizer, 50))

In [24]:
import torch
from torch.utils.data import TensorDataset

input_ids_train = torch.tensor(
    [features[0] for features in X_train_tokens.values], dtype=torch.long)
input_mask_train = torch.tensor(
    [features[1] for features in X_train_tokens.values], dtype=torch.long)
label_ids_train = torch.tensor(Y_train.values, dtype=torch.long)

print (input_ids_train.shape)
print (input_mask_train.shape)
print (label_ids_train.shape)

torch.Size([93643, 50])
torch.Size([93643, 50])
torch.Size([93643])


In [25]:
input_ids_train[2]

tensor([  101, 10140,  2021,  2074,  2205,  2235,  2130,  2005, 10514,  9468,
        27581,  2015,  1012,   102,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0])

In [26]:
train_dataset = TensorDataset(input_ids_train,input_mask_train,label_ids_train)

In [27]:
input_ids_test = torch.tensor([features[0] for features in X_test_tokens.values], 
                              dtype=torch.long)
input_mask_test = torch.tensor([features[1] for features in X_test_tokens.values], 
                               dtype=torch.long)
label_ids_test = torch.tensor(Y_test.values, 
                              dtype=torch.long)
test_dataset = TensorDataset(input_ids_test, input_mask_test, label_ids_test)

## Schritt 2: Trainierne des Modells

In [28]:
from torch.utils.data import DataLoader, RandomSampler

train_batch_size = 64
num_train_epochs = 2

train_sampler = RandomSampler(train_dataset)
train_dataloader = DataLoader(train_dataset, 
                              sampler=train_sampler, 
                              batch_size=train_batch_size)
t_total = len(train_dataloader) // num_train_epochs

print ("Num examples = ", len(train_dataset))
print ("Num Epochs = ", num_train_epochs)
print ("Total train batch size  = ", train_batch_size)
print ("Total optimization steps = ", t_total)

Num examples =  93643
Num Epochs =  2
Total train batch size  =  64
Total optimization steps =  732


In [29]:
from transformers import AdamW, get_linear_schedule_with_warmup

learning_rate = 1e-4
adam_epsilon = 1e-8
warmup_steps = 0

optimizer = AdamW(model.parameters(), lr=learning_rate, eps=adam_epsilon)
scheduler = get_linear_schedule_with_warmup(optimizer, 
                                            num_warmup_steps=warmup_steps, 
                                            num_training_steps=t_total)

In [30]:
### Achtung!!! Dauert sehr, sehr lange!!!


from tqdm import trange, notebook

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
train_iterator = trange(num_train_epochs, desc="Epoch")

## Modell in den Modus 'trainieren' versetzen
model.train()
    
for epoch in train_iterator:
    epoch_iterator = notebook.tqdm(train_dataloader, desc="Iteration")
    for step, batch in enumerate(epoch_iterator):

        ## Alle Gradienten zu Beginn jeder Iteration zurücksetzen
        model.zero_grad()
        
        ## HINWEIS zur Beschleunigung: Setzen Sie das Modell und die eingegebenen Beobachtungen auf die GPU
        model.to(device)
        batch = tuple(t.to(device) for t in batch)
        
        ## Identifizieren Sie die Inputs für das Modell
        inputs = {'input_ids':      batch[0],
                  'attention_mask': batch[1],
                  'labels':         batch[2]}

        ## Vorwärtsdurchlauf durch das Modell. Eingabe -> Modell -> Ausgabe
        outputs = model(**inputs)

        ## Bestimmen Sie die Abweichung (Verlust)
        loss = outputs[0]
        print("\r%f" % loss, end='')

        ## Rückproportionierung des Verlustes (automatische Berechnung von Gradienten)
        loss.backward()

        ## Verhinderung explodierender Gradienten durch Begrenzung der Gradienten auf 1,0 
        torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)

        ## Aktualisierung der Parameter und der Lernrate
        optimizer.step()
        scheduler.step()

Epoch:   0%|          | 0/2 [00:00<?, ?it/s]

Iteration:   0%|          | 0/1464 [00:00<?, ?it/s]

0.684133

Epoch:  50%|█████     | 1/2 [3:27:27<3:27:27, 12447.11s/it]

Iteration:   0%|          | 0/1464 [00:00<?, ?it/s]

0.026598

Epoch: 100%|██████████| 2/2 [6:47:43<00:00, 12231.68s/it]  


In [31]:
model.save_pretrained('outputs')

## Schritt 3: Modellbewertung


In [32]:
import numpy as np
from torch.utils.data import SequentialSampler

test_batch_size = 64
test_sampler = SequentialSampler(test_dataset)
test_dataloader = DataLoader(test_dataset, 
                             sampler=test_sampler, 
                             batch_size=test_batch_size)

# Laden Sie das zuvor gespeicherte, trainierte Modell 
model = model.from_pretrained('outputs') ###

# Initialisierung der Vorhersage und der tatsächlichen Kennzeichnungen
preds = None
out_label_ids = None

## Modell in den "eval"-Modus versetzen
model.eval()

for batch in notebook.tqdm(test_dataloader, desc="Evaluating"):
    
    ## Setzen Sie das Modell und die eingegebenen Beobachtungen auf die GPU
    model.to(device)
    batch = tuple(t.to(device) for t in batch)
    
    ## Keine Gradienten verfolgen, da im 'eval'-Modus
    with torch.no_grad():
        inputs = {'input_ids':      batch[0],
                  'attention_mask': batch[1],
                  'labels':         batch[2]}

        ## Vorwärtsdurchlauf durch das Modell
        outputs = model(**inputs)

        ## Wir erhalten Verlust, da wir die Labels bereitgestellt haben
        tmp_eval_loss, logits = outputs[:2]

        ## Der Testdatensatz enthält möglicherweise mehr als eine Batch von Artikeln.
        if preds is None:
            preds = logits.detach().cpu().numpy()
            out_label_ids = inputs['labels'].detach().cpu().numpy()
        else:
            preds = np.append(preds, logits.detach().cpu().numpy(), axis=0)
            out_label_ids = np.append(out_label_ids, 
                                      inputs['labels'].detach().cpu().numpy(), 
                                      axis=0)
    
## Endgültiger Verlust, Vorhersagen und Genauigkeit
preds = np.argmax(preds, axis=1)
acc_score = accuracy_score(preds, out_label_ids)
print ('Accuracy Score on Test data ', acc_score)

Evaluating:   0%|          | 0/366 [00:00<?, ?it/s]

Accuracy Score on Test data  0.9483148947076161
