## Translate data Sarcasm on Reddit

In [1]:
import os
import warnings

import pandas as pd

from tqdm.auto import tqdm

In [2]:
warnings.filterwarnings("ignore")

## Google translate

In [3]:
from deep_translator import GoogleTranslator

In [4]:
translator = GoogleTranslator(source='en', target='ru')

## Sarcasm on Reddit

https://www.kaggle.com/danofer/sarcasm

In [5]:
path_to_data = '../data/Sarcasm_on_Reddit'

In [6]:
df = pd.read_csv(os.path.join(path_to_data, 'train-balanced-sarcasm.csv'))

# Result:
# df = pd.read_csv(os.path.join(path_to_data, 'rus-train-balanced-sarcasm.csv'))

In [7]:
df.head()

Unnamed: 0,label,comment,author,subreddit,score,ups,downs,date,created_utc,parent_comment
0,0,NC and NH.,Trumpbart,politics,2,-1,-1,2016-10,2016-10-16 23:55:23,"Yeah, I get that argument. At this point, I'd ..."
1,0,You do know west teams play against west teams...,Shbshb906,nba,-4,-1,-1,2016-11,2016-11-01 00:24:10,The blazers and Mavericks (The wests 5 and 6 s...
2,0,"They were underdogs earlier today, but since G...",Creepeth,nfl,3,3,0,2016-09,2016-09-22 21:45:37,They're favored to win.
3,0,"This meme isn't funny none of the ""new york ni...",icebrotha,BlackPeopleTwitter,-8,-1,-1,2016-10,2016-10-18 21:03:47,deadass don't kill my buzz
4,0,I could use one of those tools.,cush2push,MaddenUltimateTeam,6,-1,-1,2016-12,2016-12-30 17:00:13,Yep can confirm I saw the tool they use for th...


In [8]:
df['comment'] = pd.Series(df['comment'], dtype='str')
df['parent_comment'] = pd.Series(df['parent_comment'], dtype='str')

### Drop comments with unsuitable length

In [9]:
df = df[df['comment'].apply(lambda x: len(x) > 10 and len(x) < 3000)]
df = df[df['parent_comment'].apply(lambda x: len(x) > 10 and len(x) < 3000)]

In [10]:
df.reset_index(inplace=True, drop=True)

### Add a column with translation

In [11]:
df['rus_comment'] = df['comment']
df['rus_parent_comment'] = df['parent_comment']

In [12]:
for ind, row in enumerate(tqdm(df['comment'])):
# for ind, row in tqdm(enumerate(df['comment'].iloc[i:], start=i), total=df.index.size-i):
    try:
        rus_comment = translator.translate(df.comment.iloc[ind])
        df['rus_comment'].iloc[ind] = rus_comment
    except:
        print(f"Problem with ind={ind}, comment:{df.comment.iloc[ind]}")
    try:
        rus_pc = translator.translate(df.parent_comment.iloc[ind])
        df['rus_parent_comment'].iloc[ind] = rus_pc
    except:
        print(f"Problem with ind={ind}, parent comment:{df.parent_comment.iloc[ind]}")

### Results:

In [13]:
ind = 33
print(df.comment.iloc[ind])
print(df.rus_comment.iloc[ind])

Lube up and take it?
Смазать и взять?


In [14]:
ind = 76
print(df.comment.iloc[ind])
print(df.rus_comment.iloc[ind])

I haven't tried henna.... Mainly because it's so permanent, I don't think I'm up for that kind of commitment haha.
Я не пробовала хну ... В основном потому, что она такая стойкая, не думаю, что я готов к таким обязательствам, ха-ха.


In [15]:
ind = 66
print(df.comment.iloc[ind])
print(df.rus_comment.iloc[ind])

Oh, I never realized it was so easy, why had I, and every other lonely person on earth never thought of that before?
О, я никогда не понимал, что это так просто, почему я и все одинокие люди на земле никогда не думали об этом раньше?


In [16]:
ind = 160
print(df.comment.iloc[ind])
print(df.rus_comment.iloc[ind])

Does the diamond cap come w/ Tags?
Бриллиантовая крышка поставляется с бирками?


In [17]:
df.head()

Unnamed: 0,label,comment,author,subreddit,score,ups,downs,date,created_utc,parent_comment,rus_comment,rus_parent_comment
0,0,You do know west teams play against west teams...,Shbshb906,nba,-4,-1,-1,2016-11,2016-11-01 00:24:10,The blazers and Mavericks (The wests 5 and 6 s...,"Вы ведь знаете, что западные команды играют пр...",Блейзеры и Mavericks (5-е и 6-е место на Запад...
1,0,"They were underdogs earlier today, but since G...",Creepeth,nfl,3,3,0,2016-09,2016-09-22 21:45:37,They're favored to win.,"Раньше они были аутсайдерами, но после того, к...",Они хотят побеждать.
2,0,"This meme isn't funny none of the ""new york ni...",icebrotha,BlackPeopleTwitter,-8,-1,-1,2016-10,2016-10-18 21:03:47,deadass don't kill my buzz,"Этот мем не смешной, как и «ниггер из Нью-Йорка».","ублюдок, не убивай мой кайф"
3,0,I could use one of those tools.,cush2push,MaddenUltimateTeam,6,-1,-1,2016-12,2016-12-30 17:00:13,Yep can confirm I saw the tool they use for th...,Я мог бы использовать один из этих инструментов.,"Да, могу подтвердить, что видел инструмент, ко..."
4,0,"I don't pay attention to her, but as long as s...",only7inches,AskReddit,0,0,0,2016-09,2016-09-02 10:35:08,do you find ariana grande sexy ?,"Я не обращаю на нее внимания, но пока она зако...",ты находишь Ариану Гранде сексуальной?


### Save pickle

In [18]:
df.to_csv(
    '../data/Sarcasm_on_Reddit/rus-train-balanced-sarcasm.csv',
    index=False
)