# Conduct Translations and Back-Translations of the comment texts in the training data to create augmented data

### Idea: 
The training data so far only has English texts. The test data has only non-English texts. It might be useful to include texts in the training that are in the same language as the texts from the test data.

We will create augmented training data by translating the original, English texts to all of the six languages in the test data, and also by back-translating those translations back to English.

The created texts will be stored in seperate columns for each language and for translation and back-translation.

For further training, we can pick individual combinations of augmented data for the training to find out, which works best.


### Import libraries

In [None]:

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)


### Get datasets

In [None]:
# Connect to Google Drive to access the data files

from google.colab import drive
drive.mount('/content/drive')


Mounted at /content/drive


Since the translation process is very long and prone to errors, we divided the training data into 22 chunks of equal length and performed the translation and back-translation on one chunk at a time. For the further training, the chunks were joined.

In [None]:
# Load test dataset
test = pd.read_csv('/content/drive/MyDrive/JigsawProject/jigsaw-multilingual-toxic-comment-classification/test.csv')

# Load a chunk of the trianing dataset
df = pd.read_csv('/content/drive/MyDrive/JigsawProject/jigsaw-multilingual-toxic-comment-classification/df_tox_chunk_12.csv')


## Prepare Google Translator

In [None]:
!pip install deep-translator

Collecting deep-translator
  Downloading deep_translator-1.11.4-py3-none-any.whl.metadata (30 kB)
Downloading deep_translator-1.11.4-py3-none-any.whl (42 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.3/42.3 kB[0m [31m1.7 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: deep-translator
Successfully installed deep-translator-1.11.4


In [None]:
# Make a copy of the training data to avoid working on a slice
toxic_comments = df.copy()

# Print new dataset
print(toxic_comments.head())

                 id                                       comment_text  toxic  \
0  531e7faee73602a8  Usman Naveed \n\nhello usman you are a idiot a...      1   
1  5331dc6691fb5d6a  also i want sex with you S meh call this numbe...      1   
2  533a8b15b43256b0                Get a life you useless child rapist      1   
3  5348e52a99e31de7  hey will how bout u suck my dick D! try to fck...      1   
4  534bd27881491505  YOU FUCKING SUCK I HATE YOUR FUCKING SITE IT S...      1   

   severe_toxic  obscene  threat  insult  identity_hate  trans_comment-tr  \
0             0        0       0       1              0               NaN   
1             0        0       0       0              0               NaN   
2             0        0       0       0              0               NaN   
3             0        1       0       1              0               NaN   
4             1        1       0       1              0               NaN   

   back-trans_comment-tr  trans_comment-ru  back-t

In [None]:
# Print shape of the new dataset
print(toxic_comments.shape)

(972, 20)


In [None]:
# Import libraries
from deep_translator import GoogleTranslator
from concurrent.futures import ThreadPoolExecutor
import time

# Define functions for translating and back-translating in batches

def translate_comment_batch(comments, lang):
    # Using the translate_batch function for bulk translation
    return GoogleTranslator(source='en', target=lang).translate_batch(comments)

def back_translate_comment_batch(translated_comments, lang):
    # Using the translate_batch function for bulk back-translation
    return GoogleTranslator(source=lang, target='en').translate_batch(translated_comments)

def translate_and_back_translate_batch(comments, langs):
    results = {}
    for lang in langs:
        print("x\nx\nLANGUAGE", lang, "x\nx\nx")
        # Translate batch for the given language
        translated_comments = translate_comment_batch(comments, lang)
        # Back-translate batch for the given language
        back_translated_comments = back_translate_comment_batch(translated_comments, lang)
        results[lang] = (translated_comments, back_translated_comments)
    return results

def batch_process_comments(df, langs, batch_size=50):
    # Split the DataFrame into batches
    batches = [df.iloc[i:i + batch_size] for i in range(0, len(df), batch_size)]


    with ThreadPoolExecutor() as executor:
        i=0
        for batch in batches:
            i += 1
            print("x\nx\nbatch nr", i, "x\nx\nx")
            # Gather the comments for the current batch
            comments = batch['comment_text'].tolist()

            # Process the batch in parallel
            future = executor.submit(translate_and_back_translate_batch, comments, langs)
            results = future.result()

            # Update the DataFrame with the translated and back-translated comments
            for lang in langs:
                translated_comments, back_translated_comments = results[lang]
                batch[f'trans_comment-{lang}'] = translated_comments
                batch[f'back-trans_comment-{lang}'] = back_translated_comments

            # Write back the results to the DataFrame
            df.update(batch)

            # To avoid rate limit issues, introduce a small delay between batches
            time.sleep(0)  # Adjust the sleep time if necessary

    return df

# Perform Translation and Back-Translation

toxic_comments = batch_process_comments(toxic_comments, unique_langs_list)

# Print result
print(toxic_comments)

x
x
batch nr 1 x
x
x
x
x
LANGUAGE tr x
x
x
x
x
LANGUAGE ru x
x
x
x
x
LANGUAGE it x
x
x
x
x
LANGUAGE it x
x
x
x
x
LANGUAGEx
x
LANGUAGE fr x
x
x
 fr x
x
x
x
x
LANGUAGE pt x
x
x
x
x
LANGUAGE pt x
x
x
x
x
LANGUAGE es x
x
x
x
x
LANGUAGE es x
x
x


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  batch[f'trans_comment-{lang}'] = translated_comments
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  batch[f'back-trans_comment-{lang}'] = back_translated_comments
 "ayrıca seninle seks yapmak istiyorum meh bu numarayı 2654431'den ara eğlence için meh."
 'Hayatını yaşa işe yaramaz çocuk tecavüzcüsü'
 'Hey, sen benim pipimi emmeye ne dersin D! Bana amına koyim sikmeye çalış, şişko götlü orospu çocuğu!!!! Sayfamı tekrar yap! Sana meydan okuyorum!... orospu çocuğu göt orospu orospu kedi yalayan peynir topu.'
 'SİKTİ

x
x
batch nr 2 x
x
x
x
x
LANGUAGE tr x
x
x
x
x
LANGUAGE ru x
x
x
x
x
LANGUAGE it x
x
x
x
x
LANGUAGE fr x
x
x
x
x
LANGUAGE pt x
x
x
x
x
LANGUAGE es x
x
x


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  batch[f'trans_comment-{lang}'] = translated_comments
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  batch[f'back-trans_comment-{lang}'] = back_translated_comments


x
x
batch nr 3 x
x
x
x
x
LANGUAGE tr x
x
x
x
x
LANGUAGE ru x
x
x
x
x
LANGUAGE it x
x
x
x
x
LANGUAGE fr x
x
x
x
x
LANGUAGE pt x
x
x
x
x
LANGUAGE es x
x
x


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  batch[f'trans_comment-{lang}'] = translated_comments
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  batch[f'back-trans_comment-{lang}'] = back_translated_comments


x
x
batch nr 4 x
x
x
x
x
LANGUAGE tr x
x
x
x
x
LANGUAGE ru x
x
x
x
x
LANGUAGE it x
x
x
x
x
LANGUAGE fr x
x
x
x
x
LANGUAGE pt x
x
x
x
x
LANGUAGE es x
x
x


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  batch[f'trans_comment-{lang}'] = translated_comments
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  batch[f'back-trans_comment-{lang}'] = back_translated_comments


x
x
batch nr 5 x
x
x
x
x
LANGUAGE tr x
x
x
x
x
LANGUAGE ru x
x
x
x
x
LANGUAGE it x
x
x
x
x
LANGUAGE fr x
x
x
x
x
LANGUAGE pt x
x
x
x
x
LANGUAGE es x
x
x


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  batch[f'trans_comment-{lang}'] = translated_comments
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  batch[f'back-trans_comment-{lang}'] = back_translated_comments


x
x
batch nr 6 x
x
x
x
x
LANGUAGE tr x
x
x
x
x
LANGUAGE ru x
x
x
x
x
LANGUAGE it x
x
x
x
x
LANGUAGE fr x
x
x
x
x
LANGUAGE pt x
x
x
x
x
LANGUAGE es x
x
x


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  batch[f'trans_comment-{lang}'] = translated_comments
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  batch[f'back-trans_comment-{lang}'] = back_translated_comments


x
x
batch nr 7 x
x
x
x
x
LANGUAGE tr x
x
x
x
x
LANGUAGE ru x
x
x
x
x
LANGUAGE it x
x
x
x
x
LANGUAGE fr x
x
x
x
x
LANGUAGE pt x
x
x
x
x
LANGUAGE es x
x
x


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  batch[f'trans_comment-{lang}'] = translated_comments
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  batch[f'back-trans_comment-{lang}'] = back_translated_comments


x
x
batch nr 8 x
x
x
x
x
LANGUAGE tr x
x
x
x
x
LANGUAGE ru x
x
x
x
x
LANGUAGE it x
x
x
x
x
LANGUAGE fr x
x
x
x
x
LANGUAGE pt x
x
x
x
x
LANGUAGE es x
x
x


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  batch[f'trans_comment-{lang}'] = translated_comments
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  batch[f'back-trans_comment-{lang}'] = back_translated_comments


x
x
batch nr 9 x
x
x
x
x
LANGUAGE tr x
x
x
x
x
LANGUAGE ru x
x
x
x
x
LANGUAGE it x
x
x
x
x
LANGUAGE fr x
x
x
x
x
LANGUAGE pt x
x
x
x
x
LANGUAGE es x
x
x


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  batch[f'trans_comment-{lang}'] = translated_comments
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  batch[f'back-trans_comment-{lang}'] = back_translated_comments


x
x
batch nr 10 x
x
x
x
x
LANGUAGE tr x
x
x
x
x
LANGUAGE ru x
x
x
x
x
LANGUAGE it x
x
x
x
x
LANGUAGE fr x
x
x
x
x
LANGUAGE pt x
x
x
x
x
LANGUAGE es x
x
x


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  batch[f'trans_comment-{lang}'] = translated_comments
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  batch[f'back-trans_comment-{lang}'] = back_translated_comments


x
x
batch nr 11 x
x
x
x
x
LANGUAGE tr x
x
x
x
x
LANGUAGE ru x
x
x
x
x
LANGUAGE it x
x
x
x
x
LANGUAGE fr x
x
x
x
x
LANGUAGE pt x
x
x
x
x
LANGUAGE es x
x
x


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  batch[f'trans_comment-{lang}'] = translated_comments
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  batch[f'back-trans_comment-{lang}'] = back_translated_comments


x
x
batch nr 12 x
x
x
x
x
LANGUAGE tr x
x
x
x
x
LANGUAGE ru x
x
x
x
x
LANGUAGE it x
x
x
x
x
LANGUAGE fr x
x
x
x
x
LANGUAGE pt x
x
x
x
x
LANGUAGE es x
x
x


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  batch[f'trans_comment-{lang}'] = translated_comments
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  batch[f'back-trans_comment-{lang}'] = back_translated_comments


x
x
batch nr 13 x
x
x
x
x
LANGUAGE tr x
x
x
x
x
LANGUAGE ru x
x
x
x
x
LANGUAGE it x
x
x
x
x
LANGUAGE fr x
x
x
x
x
LANGUAGE pt x
x
x
x
x
LANGUAGE es x
x
x


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  batch[f'trans_comment-{lang}'] = translated_comments
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  batch[f'back-trans_comment-{lang}'] = back_translated_comments


x
x
batch nr 14 x
x
x
x
x
LANGUAGE tr x
x
x
x
x
LANGUAGE ru x
x
x
x
x
LANGUAGE it x
x
x
x
x
LANGUAGE fr x
x
x
x
x
LANGUAGE pt x
x
x
x
x
LANGUAGE es x
x
x


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  batch[f'trans_comment-{lang}'] = translated_comments
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  batch[f'back-trans_comment-{lang}'] = back_translated_comments


x
x
batch nr 15 x
x
x
x
x
LANGUAGE tr x
x
x
x
x
LANGUAGE ru x
x
x
x
x
LANGUAGE it x
x
x
x
x
LANGUAGE fr x
x
x
x
x
LANGUAGE pt x
x
x
x
x
LANGUAGE es x
x
x


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  batch[f'trans_comment-{lang}'] = translated_comments
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  batch[f'back-trans_comment-{lang}'] = back_translated_comments


x
x
batch nr 16 x
x
x
x
x
LANGUAGE tr x
x
x
x
x
LANGUAGE ru x
x
x
x
x
LANGUAGE it x
x
x
x
x
LANGUAGE fr x
x
x
x
x
LANGUAGE pt x
x
x
x
x
LANGUAGE es x
x
x


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  batch[f'trans_comment-{lang}'] = translated_comments
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  batch[f'back-trans_comment-{lang}'] = back_translated_comments


x
x
batch nr 17 x
x
x
x
x
LANGUAGE tr x
x
x
x
x
LANGUAGE ru x
x
x
x
x
LANGUAGE it x
x
x
x
x
LANGUAGE fr x
x
x
x
x
LANGUAGE pt x
x
x
x
x
LANGUAGE es x
x
x


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  batch[f'trans_comment-{lang}'] = translated_comments
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  batch[f'back-trans_comment-{lang}'] = back_translated_comments


x
x
batch nr 18 x
x
x
x
x
LANGUAGE tr x
x
x
x
x
LANGUAGE ru x
x
x
x
x
LANGUAGE it x
x
x
x
x
LANGUAGE fr x
x
x
x
x
LANGUAGE pt x
x
x
x
x
LANGUAGE es x
x
x


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  batch[f'trans_comment-{lang}'] = translated_comments
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  batch[f'back-trans_comment-{lang}'] = back_translated_comments


x
x
batch nr 19 x
x
x
x
x
LANGUAGE tr x
x
x
x
x
LANGUAGE ru x
x
x
x
x
LANGUAGE it x
x
x
x
x
LANGUAGE fr x
x
x
x
x
LANGUAGE pt x
x
x
x
x
LANGUAGE es x
x
x


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  batch[f'trans_comment-{lang}'] = translated_comments
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  batch[f'back-trans_comment-{lang}'] = back_translated_comments


x
x
batch nr 20 x
x
x
x
x
LANGUAGE tr x
x
x
x
x
LANGUAGE ru x
x
x
x
x
LANGUAGE it x
x
x
x
x
LANGUAGE fr x
x
x
x
x
LANGUAGE pt x
x
x
x
x
LANGUAGE es x
x
x
                   id                                       comment_text  \
0    531e7faee73602a8  Usman Naveed \n\nhello usman you are a idiot a...   
1    5331dc6691fb5d6a  also i want sex with you S meh call this numbe...   
2    533a8b15b43256b0                Get a life you useless child rapist   
3    5348e52a99e31de7  hey will how bout u suck my dick D! try to fck...   
4    534bd27881491505  YOU FUCKING SUCK I HATE YOUR FUCKING SITE IT S...   
..                ...                                                ...   
967  89f1f2cf4dfd3dc4  you fat asshole, why don't you leave my page a...   
968  8a02b677955ce409  You pathetic small town fucktard. You have no ...   
969  8a12e921ed17b6ca  Please keep in mind that he is an idiot and ma...   
970  8a15bc67790faeb9  "\nIt's not that you ""didn't manage the prope...   
971  8a1d7

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  batch[f'trans_comment-{lang}'] = translated_comments
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  batch[f'back-trans_comment-{lang}'] = back_translated_comments


## Save the augmented data as .csv file

In [None]:
import os
from datetime import datetime

# Base path for saving the file
path = '/content/drive/MyDrive/JigsawProject/jigsaw-multilingual-toxic-comment-classification/'

# Define the chunk number for the name of the .csv file
chunk_id = "chunk-12"

# Get the current timestamp in YYYYMMDD_HHMMSS format
# Adding a timestamp to the file name shall avoid accidental overwriting of .csv files of previous chunks
#in case of forgetting to change the string of the chunk_id for a new run of the code for a different chunk.
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")

# Build the full file path string
file_path = os.path.join(path, f"{chunk_id}_{timestamp}.csv")

# Save the data frame
toxic_comments.to_csv(file_path, index=False)

# Print the file path for confirmation
print(f"File saved to: {file_path}")


File saved to: /content/drive/MyDrive/JigsawProject/jigsaw-multilingual-toxic-comment-classification/chunk-12_20250127_143614.csv
