# Notebook: Predict Complete Dataset

This notebook is used to predict the sentiment for the entire data set. The results
<br>**Contributors:** [Nils Hellwig](https://github.com/NilsHellwig/) | [Markus Bink](https://github.com/MarkusBink/)

## Packages

In [None]:
from simpletransformers.classification import ClassificationModel
import pandas as pd
import os

## Parameters

In [None]:
MODEL_TYPE = "bert"
MODEL_PATH = "../Trainings/Models/GermEval_and_Annotaded_it_0"
DATASET_PATH = "../Datasets/dataset/"
PREDICTION_DIRECTORY_PATH = "../Datasets/complete_dataset_predictions/"
USE_CUDA = False
PARTIES = ["CDU_CSU", "SPD", "AFD", "FDP", "GRUENE", "LINKE"]

## Code

### 1. Create Directories for Predictions

In [None]:
# Iterate over the parties
for party in PARTIES:
    # Try to create the directory for the party
    try:
        os.makedirs(PREDICTION_DIRECTORY_PATH + party)
    except FileExistsError:
        # The directory already exists, so do nothing
        pass

### 1. Load Model

In [None]:
model = ClassificationModel(model_type=MODEL_TYPE, model_name=MODEL_PATH, use_cuda=USE_CUDA)

Test Model

In [None]:
predictions, raw_outputs = model.predict(["Das war ein super Tag in Köln! Vielen Dank euch alle, das war super!", "Diese scheiß Politik! Merkel die Hexe muss weg!"])

In [None]:
predictions

### 2. Predict Sentiment of Dataset

In [None]:
for party in PARTIES:
    for subdir, _, files in os.walk(DATASET_PATH + party):
        for file in files:
            if file.endswith('.csv') and subdir[len(DATASET_PATH):] in PARTIES:
                # Get username of CSV file
                username = file[:-4]
                
                print(f'Current account: {username}')
                
                # Read CSV file as pandas dataframe
                df = pd.read_csv(DATASET_PATH + party + "/" + file)
                
                # Use model to predict the sentiment of each tweet
                sentiment_predictions, raw_outputs = model.predict(df["tweet"].tolist())
                
                # Combine sentiment predictions and original dataframe and remove every column except the id and pred column
                df_pred = pd.concat([df, pd.DataFrame({'pred': sentiment_predictions})], axis=1).reset_index(drop=True).loc[:, ['id', 'pred']]
                
                # Save modified daraframe to a CSV file in the prediction directory
                df_pred.to_csv(PREDICTION_DIRECTORY_PATH + party + "/"+username+".csv")