
## Translate body text (training) <br>

In the following, we translate the extracted body text of the articles from the training dataset. As previously mentioned, the limitations of Google Translate API require extra care in ordere to split the text (in order not to reach the character limit) and the set of documents (in order not to reach the document limit).

In [1]:
import pandas as pd
from googletrans import Translator
import time

In [2]:
train_df = pd.read_csv('train/_TRAIN_details_in_df.csv')

The limit over characters (max 15k characters according to the official documentation, but 5k in our tests) can be overcome by subdividing the text in a way that preserves basic text units. <br><br>

Our idea was subdividing the text into paragraphs (thus using `\n\n` as delimiter) until their length is shorter than 5000 characters. However, some articles presented some critical peculiarities (absence of paragraphs, absence of formatting, absence of punctuation) and that is a reason to observe that there is no obvious way to cut the text that will not raise some problems. <br><br>

Other attempts which we made are: cutting the text at the last punctuation symbol before the 5000 character limit, brutally cutting the text at 5000 characters and gluing the parts together after translation.

In [5]:
translator = Translator()

def splitTranslator(file,text,lang,delimiter='\n\n'):
    
    # break down long text into smaller chuck by splitting every time it sees the '\n\n' 
    # and keep it in body list 
    body_list =  text.split(delimiter)
    
    # iterate over each text chunks to translate
    translated_list = []
    
    flag_long_paragraph = False
    
    for s in body_list:
        
        # check if any chunk is still over 5000 char
        if len(s) > 5000:
            flag_long_paragraph = True
            print (file,'is still over limit:')
            continue
            
        # if the chunk is under 5000 char, then translate, and keep in the translated list
        elif len(s) == 0:
            continue
        else:
            try:
                translated = translator.translate(s)
                lang = translated.src
                translated = translated.text
                translated_list.append(translated)
            except Exception:
                print (file, 'cannot translate')
            time.sleep(0.5)
    
    # put all the translated text together and connect them using space
    translated_body = ' '.join(translated_list)
    return translated_body,flag_long_paragraph

## main translation task
count = 0

# we used start and end to translate in batches because of googletran's timeout
start = -1 
end = len(train_df)

long_paragraph_list = []
for index, row in train_df.iterrows():
    
    if index < start or index > end:
        continue
    
    print('--------------',index,'-----------------')
    id1 = str(row['pair_id']).split('_')[0]
    id2 = str(row['pair_id']).split('_')[1]
    body1 = str(row['text1'])
    lang1 = row['url1_lang']
    body2 = str(row['text2'])
    lang2 = row['url1_lang']
    
    if lang1 != 'en':
        print('translating text1 of language ', lang1,' of length ', len(body1),'...')
        if len(body1) > 5000:
            split_text1 = splitTranslator(id1,body1,lang1)
            translated1 = split_text1[0]
            if split_text1[1]:
                long_paragraph_list.append(id1)
        elif pd.isnull(body1):
            continue
        else:
            translated1 = translator.translate(body1)
            lang1 = translated1.src
            translated1 = translated1.text
    else:
        translated1 = body1.replace("\n", " ")
        
    if lang2 != 'en':
        print('translating text2 of language ', lang2,' of length ', len(body2),'...')
        if len(body2) > 5000:
            split_text2 = splitTranslator(id2,body2,lang2)
            translated2 = split_text2[0]
            if split_text2[1]:
                long_paragraph_list.append(id2)
        elif pd.isnull(body2):
            continue
        else:
            translated2 = translator.translate(body2)
            lang2 = translated2.src
            translated2 = translated2.text
    else:
        translated2 = body2.replace("\n", " ")
    
    train_df.loc[index, "translated_body1"] = translated1
    train_df.loc[index, "translated_body2"] = translated2
    
    time.sleep(0.5)
    count += 1
    
    if (count+1)%50 == 0:
        print (count+1, 'rows done.')
        

print("The number of articles that have been cut due to absence of paragraphs is ", len(long_paragraph_list))

-------------- 0 -----------------
-------------- 1 -----------------
-------------- 2 -----------------
-------------- 3 -----------------
-------------- 4 -----------------
-------------- 5 -----------------
-------------- 6 -----------------
-------------- 7 -----------------
-------------- 8 -----------------
-------------- 9 -----------------
-------------- 10 -----------------
-------------- 11 -----------------
-------------- 12 -----------------
-------------- 13 -----------------
-------------- 14 -----------------
-------------- 15 -----------------
-------------- 16 -----------------
-------------- 17 -----------------
-------------- 18 -----------------
-------------- 19 -----------------
-------------- 20 -----------------
-------------- 21 -----------------
-------------- 22 -----------------
-------------- 23 -----------------
-------------- 24 -----------------
-------------- 25 -----------------
-------------- 26 -----------------
-------------- 27 -----------------
--

-------------- 233 -----------------
-------------- 234 -----------------
-------------- 235 -----------------
-------------- 236 -----------------
-------------- 237 -----------------
-------------- 238 -----------------
-------------- 239 -----------------
-------------- 240 -----------------
-------------- 241 -----------------
-------------- 242 -----------------
-------------- 243 -----------------
-------------- 244 -----------------
-------------- 245 -----------------
-------------- 246 -----------------
-------------- 247 -----------------
-------------- 248 -----------------
250 rows done.
-------------- 249 -----------------
-------------- 250 -----------------
-------------- 251 -----------------
-------------- 252 -----------------
-------------- 253 -----------------
-------------- 254 -----------------
-------------- 255 -----------------
-------------- 256 -----------------
-------------- 257 -----------------
-------------- 258 -----------------
-------------- 259 ----

-------------- 573 -----------------
-------------- 574 -----------------
-------------- 575 -----------------
-------------- 576 -----------------
-------------- 577 -----------------
-------------- 578 -----------------
-------------- 579 -----------------
-------------- 580 -----------------
-------------- 581 -----------------
-------------- 582 -----------------
-------------- 583 -----------------
-------------- 584 -----------------
-------------- 585 -----------------
-------------- 586 -----------------
-------------- 587 -----------------
-------------- 588 -----------------
-------------- 589 -----------------
-------------- 590 -----------------
-------------- 591 -----------------
-------------- 592 -----------------
-------------- 593 -----------------
-------------- 594 -----------------
-------------- 595 -----------------
-------------- 596 -----------------
-------------- 597 -----------------
-------------- 598 -----------------
600 rows done.
-------------- 599 ----

-------------- 921 -----------------
-------------- 922 -----------------
-------------- 923 -----------------
-------------- 924 -----------------
-------------- 925 -----------------
-------------- 926 -----------------
-------------- 927 -----------------
-------------- 928 -----------------
-------------- 929 -----------------
-------------- 930 -----------------
-------------- 931 -----------------
-------------- 932 -----------------
-------------- 933 -----------------
-------------- 934 -----------------
-------------- 935 -----------------
-------------- 936 -----------------
-------------- 937 -----------------
-------------- 938 -----------------
-------------- 939 -----------------
-------------- 940 -----------------
-------------- 941 -----------------
-------------- 942 -----------------
-------------- 943 -----------------
-------------- 944 -----------------
-------------- 945 -----------------
-------------- 946 -----------------
-------------- 947 -----------------
-

-------------- 1243 -----------------
-------------- 1244 -----------------
-------------- 1245 -----------------
-------------- 1246 -----------------
-------------- 1247 -----------------
-------------- 1248 -----------------
1250 rows done.
-------------- 1249 -----------------
-------------- 1250 -----------------
-------------- 1251 -----------------
-------------- 1252 -----------------
-------------- 1253 -----------------
-------------- 1254 -----------------
-------------- 1255 -----------------
-------------- 1256 -----------------
-------------- 1257 -----------------
-------------- 1258 -----------------
-------------- 1259 -----------------
-------------- 1260 -----------------
-------------- 1261 -----------------
-------------- 1262 -----------------
-------------- 1263 -----------------
-------------- 1264 -----------------
-------------- 1265 -----------------
-------------- 1266 -----------------
-------------- 1267 -----------------
-------------- 1268 --------------

-------------- 1475 -----------------
-------------- 1476 -----------------
-------------- 1477 -----------------
-------------- 1478 -----------------
-------------- 1479 -----------------
-------------- 1480 -----------------
-------------- 1481 -----------------
-------------- 1482 -----------------
-------------- 1483 -----------------
-------------- 1484 -----------------
-------------- 1485 -----------------
-------------- 1486 -----------------
-------------- 1487 -----------------
-------------- 1488 -----------------
-------------- 1489 -----------------
-------------- 1490 -----------------
-------------- 1491 -----------------
-------------- 1492 -----------------
-------------- 1493 -----------------
-------------- 1494 -----------------
-------------- 1495 -----------------
-------------- 1496 -----------------
-------------- 1497 -----------------
-------------- 1498 -----------------
1500 rows done.
-------------- 1499 -----------------
-------------- 1500 --------------

-------------- 1664 -----------------
translating text1 of language  de  of length  3 ...
translating text2 of language  de  of length  1944 ...
-------------- 1665 -----------------
translating text1 of language  de  of length  3915 ...
translating text2 of language  de  of length  1948 ...
-------------- 1666 -----------------
translating text1 of language  de  of length  9184 ...
translating text2 of language  de  of length  14363 ...
-------------- 1667 -----------------
translating text1 of language  de  of length  3559 ...
translating text2 of language  de  of length  2594 ...
-------------- 1668 -----------------
translating text1 of language  de  of length  1295 ...
translating text2 of language  de  of length  2061 ...
-------------- 1669 -----------------
translating text1 of language  de  of length  3767 ...
translating text2 of language  de  of length  6148 ...
-------------- 1670 -----------------
translating text1 of language  de  of length  1288 ...
translating text2 of 

-------------- 1738 -----------------
translating text1 of language  de  of length  1788 ...
translating text2 of language  de  of length  1361 ...
-------------- 1739 -----------------
translating text1 of language  de  of length  113 ...
translating text2 of language  de  of length  1088 ...
-------------- 1740 -----------------
translating text1 of language  de  of length  4251 ...
translating text2 of language  de  of length  1530 ...
-------------- 1741 -----------------
translating text1 of language  de  of length  4150 ...
translating text2 of language  de  of length  1441 ...
-------------- 1742 -----------------
translating text1 of language  de  of length  1651 ...
translating text2 of language  de  of length  4445 ...
-------------- 1743 -----------------
translating text1 of language  de  of length  1721 ...
translating text2 of language  de  of length  342 ...
-------------- 1744 -----------------
translating text1 of language  de  of length  2440 ...
translating text2 of 

-------------- 1812 -----------------
translating text1 of language  de  of length  950 ...
translating text2 of language  de  of length  593 ...
-------------- 1813 -----------------
translating text1 of language  de  of length  5552 ...
translating text2 of language  de  of length  4911 ...
-------------- 1814 -----------------
translating text1 of language  de  of length  761 ...
translating text2 of language  de  of length  1831 ...
-------------- 1815 -----------------
translating text1 of language  de  of length  1079 ...
translating text2 of language  de  of length  1831 ...
-------------- 1816 -----------------
translating text1 of language  de  of length  3 ...
translating text2 of language  de  of length  1678 ...
-------------- 1817 -----------------
translating text1 of language  de  of length  1651 ...
translating text2 of language  de  of length  4445 ...
-------------- 1818 -----------------
translating text1 of language  es  of length  1890 ...
translating text2 of lang

-------------- 1885 -----------------
translating text1 of language  es  of length  1276 ...
translating text2 of language  es  of length  1055 ...
-------------- 1886 -----------------
translating text1 of language  es  of length  1276 ...
translating text2 of language  es  of length  1055 ...
-------------- 1887 -----------------
translating text1 of language  es  of length  2486 ...
translating text2 of language  es  of length  2034 ...
-------------- 1888 -----------------
translating text1 of language  es  of length  2074 ...
translating text2 of language  es  of length  1283 ...
-------------- 1889 -----------------
translating text1 of language  es  of length  1446 ...
translating text2 of language  es  of length  1034 ...
-------------- 1890 -----------------
translating text1 of language  es  of length  1206 ...
translating text2 of language  es  of length  698 ...
-------------- 1891 -----------------
translating text1 of language  es  of length  5888 ...
translating text2 of

translating text1 of language  es  of length  2038 ...
translating text2 of language  es  of length  2150 ...
-------------- 2028 -----------------
translating text1 of language  es  of length  4224 ...
translating text2 of language  es  of length  1592 ...
-------------- 2029 -----------------
translating text1 of language  es  of length  1510 ...
translating text2 of language  es  of length  2299 ...
-------------- 2030 -----------------
translating text1 of language  es  of length  2742 ...
translating text2 of language  es  of length  1897 ...
-------------- 2031 -----------------
translating text1 of language  es  of length  3555 ...
translating text2 of language  es  of length  1846 ...
-------------- 2032 -----------------
translating text1 of language  es  of length  5501 ...
translating text2 of language  es  of length  3591 ...
-------------- 2033 -----------------
translating text1 of language  es  of length  797 ...
translating text2 of language  es  of length  1258 ...
---

-------------- 2096 -----------------
translating text1 of language  pl  of length  3 ...
translating text2 of language  pl  of length  3 ...
-------------- 2097 -----------------
translating text1 of language  pl  of length  1939 ...
translating text2 of language  pl  of length  2463 ...
-------------- 2098 -----------------
translating text1 of language  pl  of length  1100 ...
translating text2 of language  pl  of length  2475 ...
2100 rows done.
-------------- 2099 -----------------
translating text1 of language  pl  of length  2490 ...
translating text2 of language  pl  of length  2477 ...
-------------- 2100 -----------------
translating text1 of language  pl  of length  1004 ...
translating text2 of language  pl  of length  473 ...
-------------- 2101 -----------------
translating text1 of language  pl  of length  599 ...
translating text2 of language  pl  of length  459 ...
-------------- 2102 -----------------
translating text1 of language  pl  of length  1537 ...
translating 

-------------- 2165 -----------------
translating text1 of language  pl  of length  852 ...
translating text2 of language  pl  of length  1141 ...
-------------- 2166 -----------------
translating text1 of language  pl  of length  87 ...
translating text2 of language  pl  of length  3956 ...
-------------- 2167 -----------------
translating text1 of language  pl  of length  3 ...
translating text2 of language  pl  of length  809 ...
-------------- 2168 -----------------
translating text1 of language  pl  of length  4032 ...
translating text2 of language  pl  of length  2295 ...
-------------- 2169 -----------------
translating text1 of language  pl  of length  1000 ...
translating text2 of language  pl  of length  954 ...
-------------- 2170 -----------------
translating text1 of language  pl  of length  17947 ...
translating text2 of language  pl  of length  3174 ...
-------------- 2171 -----------------
translating text1 of language  pl  of length  3 ...
translating text2 of language

translating text1 of language  tr  of length  920 ...
translating text2 of language  tr  of length  648 ...
-------------- 2237 -----------------
translating text1 of language  tr  of length  1743 ...
translating text2 of language  tr  of length  838 ...
-------------- 2238 -----------------
translating text1 of language  tr  of length  1674 ...
translating text2 of language  tr  of length  673 ...
-------------- 2239 -----------------
translating text1 of language  tr  of length  600 ...
translating text2 of language  tr  of length  1464 ...
-------------- 2240 -----------------
translating text1 of language  tr  of length  3 ...
translating text2 of language  tr  of length  806 ...
-------------- 2241 -----------------
translating text1 of language  tr  of length  576 ...
translating text2 of language  tr  of length  733 ...
-------------- 2242 -----------------
translating text1 of language  tr  of length  1361 ...
translating text2 of language  tr  of length  1076 ...
-------------

-------------- 2312 -----------------
translating text1 of language  tr  of length  626 ...
translating text2 of language  tr  of length  1399 ...
-------------- 2313 -----------------
translating text1 of language  tr  of length  2449 ...
translating text2 of language  tr  of length  1488 ...
-------------- 2314 -----------------
translating text1 of language  tr  of length  3 ...
translating text2 of language  tr  of length  3 ...
-------------- 2315 -----------------
translating text1 of language  tr  of length  601 ...
translating text2 of language  tr  of length  6886 ...
-------------- 2316 -----------------
translating text1 of language  tr  of length  692 ...
translating text2 of language  tr  of length  806 ...
-------------- 2317 -----------------
translating text1 of language  fr  of length  2310 ...
translating text2 of language  fr  of length  1287 ...
-------------- 2318 -----------------
translating text1 of language  fr  of length  3096 ...
translating text2 of language

2400 rows done.
-------------- 2399 -----------------
-------------- 2400 -----------------
-------------- 2401 -----------------
-------------- 2402 -----------------
-------------- 2403 -----------------
-------------- 2404 -----------------
-------------- 2405 -----------------
-------------- 2406 -----------------
-------------- 2407 -----------------
-------------- 2408 -----------------
-------------- 2409 -----------------
-------------- 2410 -----------------
-------------- 2411 -----------------
-------------- 2412 -----------------
-------------- 2413 -----------------
-------------- 2414 -----------------
-------------- 2415 -----------------
-------------- 2416 -----------------
-------------- 2417 -----------------
-------------- 2418 -----------------
-------------- 2419 -----------------
-------------- 2420 -----------------
-------------- 2421 -----------------
-------------- 2422 -----------------
-------------- 2423 -----------------
-------------- 2424 --------------

translating text1 of language  de  of length  869 ...
translating text2 of language  de  of length  4246 ...
-------------- 2615 -----------------
translating text1 of language  de  of length  802 ...
translating text2 of language  de  of length  693 ...
-------------- 2616 -----------------
translating text1 of language  de  of length  2701 ...
translating text2 of language  de  of length  704 ...
-------------- 2617 -----------------
translating text1 of language  de  of length  1473 ...
translating text2 of language  de  of length  559 ...
-------------- 2618 -----------------
translating text1 of language  de  of length  1395 ...
translating text2 of language  de  of length  1086 ...
-------------- 2619 -----------------
translating text1 of language  de  of length  1124 ...
translating text2 of language  de  of length  710 ...
-------------- 2620 -----------------
translating text1 of language  de  of length  593 ...
translating text2 of language  de  of length  625 ...
----------

-------------- 2696 -----------------
translating text1 of language  de  of length  3218 ...
translating text2 of language  de  of length  5396 ...
-------------- 2697 -----------------
translating text1 of language  de  of length  2360 ...
translating text2 of language  de  of length  7806 ...
-------------- 2698 -----------------
translating text1 of language  de  of length  4429 ...
translating text2 of language  de  of length  1724 ...
2700 rows done.
-------------- 2699 -----------------
translating text1 of language  de  of length  519 ...
translating text2 of language  de  of length  506 ...
-------------- 2700 -----------------
translating text1 of language  de  of length  2565 ...
translating text2 of language  de  of length  2083 ...
-------------- 2701 -----------------
translating text1 of language  de  of length  781 ...
translating text2 of language  de  of length  1566 ...
-------------- 2702 -----------------
translating text1 of language  de  of length  3624 ...
transl

-------------- 2752 -----------------
translating text1 of language  es  of length  1703 ...
translating text2 of language  es  of length  1569 ...
-------------- 2753 -----------------
translating text1 of language  es  of length  1077 ...
translating text2 of language  es  of length  1272 ...
-------------- 2754 -----------------
translating text1 of language  es  of length  2111 ...
translating text2 of language  es  of length  1808 ...
-------------- 2755 -----------------
translating text1 of language  es  of length  4837 ...
translating text2 of language  es  of length  6335 ...
-------------- 2756 -----------------
translating text1 of language  es  of length  2353 ...
translating text2 of language  es  of length  2648 ...
-------------- 2757 -----------------
translating text1 of language  es  of length  2081 ...
translating text2 of language  es  of length  1570 ...
-------------- 2758 -----------------
translating text1 of language  es  of length  3841 ...
translating text2 o

-------------- 2839 -----------------
translating text1 of language  es  of length  1518 ...
translating text2 of language  es  of length  1775 ...
-------------- 2840 -----------------
translating text1 of language  es  of length  3777 ...
translating text2 of language  es  of length  1561 ...
-------------- 2841 -----------------
translating text1 of language  es  of length  1605 ...
translating text2 of language  es  of length  1754 ...
-------------- 2842 -----------------
translating text1 of language  es  of length  2398 ...
translating text2 of language  es  of length  1058 ...
-------------- 2843 -----------------
translating text1 of language  es  of length  1643 ...
translating text2 of language  es  of length  2070 ...
-------------- 2844 -----------------
translating text1 of language  es  of length  3081 ...
translating text2 of language  es  of length  1814 ...
-------------- 2845 -----------------
translating text1 of language  es  of length  1190 ...
translating text2 o

-------------- 2909 -----------------
translating text1 of language  es  of length  2175 ...
translating text2 of language  es  of length  1100 ...
-------------- 2910 -----------------
translating text1 of language  es  of length  2637 ...
translating text2 of language  es  of length  4516 ...
-------------- 2911 -----------------
translating text1 of language  es  of length  1842 ...
translating text2 of language  es  of length  1555 ...
-------------- 2912 -----------------
translating text1 of language  es  of length  934 ...
translating text2 of language  es  of length  1639 ...
-------------- 2913 -----------------
translating text1 of language  es  of length  1707 ...
translating text2 of language  es  of length  1262 ...
-------------- 2914 -----------------
translating text1 of language  es  of length  934 ...
translating text2 of language  es  of length  565 ...
-------------- 2915 -----------------
translating text1 of language  es  of length  3348 ...
translating text2 of l

-------------- 2974 -----------------
translating text1 of language  pl  of length  3015 ...
translating text2 of language  pl  of length  4913 ...
-------------- 2975 -----------------
translating text1 of language  pl  of length  2549 ...
translating text2 of language  pl  of length  2411 ...
-------------- 2976 -----------------
translating text1 of language  pl  of length  737 ...
translating text2 of language  pl  of length  905 ...
-------------- 2977 -----------------
translating text1 of language  pl  of length  1140 ...
translating text2 of language  pl  of length  1541 ...
-------------- 2978 -----------------
translating text1 of language  pl  of length  2109 ...
translating text2 of language  pl  of length  798 ...
-------------- 2979 -----------------
translating text1 of language  pl  of length  977 ...
translating text2 of language  pl  of length  716 ...
-------------- 2980 -----------------
translating text1 of language  pl  of length  1810 ...
translating text2 of lan

-------------- 3046 -----------------
translating text1 of language  pl  of length  1926 ...
translating text2 of language  pl  of length  1911 ...
-------------- 3047 -----------------
translating text1 of language  pl  of length  1143 ...
translating text2 of language  pl  of length  4998 ...
-------------- 3048 -----------------
translating text1 of language  pl  of length  1469 ...
translating text2 of language  pl  of length  1115 ...
3050 rows done.
-------------- 3049 -----------------
translating text1 of language  pl  of length  3 ...
translating text2 of language  pl  of length  2485 ...
-------------- 3050 -----------------
translating text1 of language  pl  of length  5369 ...
translating text2 of language  pl  of length  2101 ...
-------------- 3051 -----------------
translating text1 of language  pl  of length  2512 ...
translating text2 of language  pl  of length  2535 ...
-------------- 3052 -----------------
translating text1 of language  pl  of length  2182 ...
transl

-------------- 3111 -----------------
translating text1 of language  tr  of length  1102 ...
translating text2 of language  tr  of length  909 ...
-------------- 3112 -----------------
translating text1 of language  tr  of length  1934 ...
translating text2 of language  tr  of length  3 ...
-------------- 3113 -----------------
translating text1 of language  tr  of length  1934 ...
translating text2 of language  tr  of length  3 ...
-------------- 3114 -----------------
translating text1 of language  tr  of length  13134 ...
translating text2 of language  tr  of length  1828 ...
-------------- 3115 -----------------
translating text1 of language  tr  of length  1027 ...
translating text2 of language  tr  of length  1226 ...
-------------- 3116 -----------------
translating text1 of language  tr  of length  1426 ...
translating text2 of language  tr  of length  1036 ...
-------------- 3117 -----------------
translating text1 of language  tr  of length  3 ...
translating text2 of languag

-------------- 3187 -----------------
translating text1 of language  tr  of length  911 ...
translating text2 of language  tr  of length  1171 ...
-------------- 3188 -----------------
translating text1 of language  tr  of length  1071 ...
translating text2 of language  tr  of length  1048 ...
-------------- 3189 -----------------
translating text1 of language  tr  of length  1529 ...
translating text2 of language  tr  of length  1615 ...
-------------- 3190 -----------------
translating text1 of language  tr  of length  5523 ...
translating text2 of language  tr  of length  1686 ...
-------------- 3191 -----------------
translating text1 of language  tr  of length  1787 ...
translating text2 of language  tr  of length  931 ...
-------------- 3192 -----------------
translating text1 of language  tr  of length  1464 ...
translating text2 of language  tr  of length  2168 ...
-------------- 3193 -----------------
translating text1 of language  tr  of length  1730 ...
translating text2 of 

-------------- 3272 -----------------
translating text1 of language  ar  of length  1192 ...
translating text2 of language  ar  of length  657 ...
-------------- 3273 -----------------
translating text1 of language  ar  of length  520 ...
translating text2 of language  ar  of length  2869 ...
-------------- 3274 -----------------
translating text1 of language  ar  of length  1079 ...
translating text2 of language  ar  of length  553 ...
-------------- 3275 -----------------
translating text1 of language  ar  of length  1248 ...
translating text2 of language  ar  of length  1570 ...
-------------- 3276 -----------------
translating text1 of language  ar  of length  1388 ...
translating text2 of language  ar  of length  945 ...
-------------- 3277 -----------------
translating text1 of language  ar  of length  1538 ...
translating text2 of language  ar  of length  600 ...
-------------- 3278 -----------------
translating text1 of language  ar  of length  639 ...
translating text2 of lang

-------------- 3340 -----------------
translating text1 of language  ar  of length  2504 ...
translating text2 of language  ar  of length  1100 ...
-------------- 3341 -----------------
translating text1 of language  ar  of length  1040 ...
translating text2 of language  ar  of length  1143 ...
-------------- 3342 -----------------
translating text1 of language  ar  of length  984 ...
translating text2 of language  ar  of length  580 ...
-------------- 3343 -----------------
translating text1 of language  ar  of length  1229 ...
translating text2 of language  ar  of length  742 ...
-------------- 3344 -----------------
translating text1 of language  ar  of length  723 ...
translating text2 of language  ar  of length  1467 ...
-------------- 3345 -----------------
translating text1 of language  ar  of length  849 ...
translating text2 of language  ar  of length  920 ...
-------------- 3346 -----------------
translating text1 of language  ar  of length  225 ...
translating text2 of langu

-------------- 3415 -----------------
translating text1 of language  ar  of length  1030 ...
translating text2 of language  ar  of length  673 ...
-------------- 3416 -----------------
translating text1 of language  ar  of length  1630 ...
translating text2 of language  ar  of length  507 ...
-------------- 3417 -----------------
translating text1 of language  ar  of length  571 ...
translating text2 of language  ar  of length  520 ...
-------------- 3418 -----------------
translating text1 of language  ar  of length  790 ...
translating text2 of language  ar  of length  1155 ...
-------------- 3419 -----------------
translating text1 of language  ar  of length  1688 ...
translating text2 of language  ar  of length  1212 ...
-------------- 3420 -----------------
translating text1 of language  ar  of length  1785 ...
translating text2 of language  ar  of length  610 ...
-------------- 3421 -----------------
translating text1 of language  ar  of length  716 ...
translating text2 of langu

-------------- 3485 -----------------
translating text1 of language  pl  of length  613 ...
translating text2 of language  pl  of length  3428 ...
-------------- 3486 -----------------
translating text1 of language  pl  of length  1017 ...
translating text2 of language  pl  of length  1188 ...
-------------- 3487 -----------------
translating text1 of language  pl  of length  2868 ...
translating text2 of language  pl  of length  1547 ...
-------------- 3488 -----------------
translating text1 of language  pl  of length  994 ...
translating text2 of language  pl  of length  716 ...
-------------- 3489 -----------------
translating text1 of language  pl  of length  3846 ...
translating text2 of language  pl  of length  3265 ...
-------------- 3490 -----------------
translating text1 of language  pl  of length  983 ...
translating text2 of language  pl  of length  880 ...
-------------- 3491 -----------------
translating text1 of language  pl  of length  1651 ...
translating text2 of lan

translating text1 of language  tr  of length  7390 ...
translating text2 of language  tr  of length  1043 ...
-------------- 3568 -----------------
translating text1 of language  tr  of length  990 ...
translating text2 of language  tr  of length  796 ...
-------------- 3569 -----------------
translating text1 of language  tr  of length  2166 ...
translating text2 of language  tr  of length  1307 ...
-------------- 3570 -----------------
translating text1 of language  tr  of length  2164 ...
translating text2 of language  tr  of length  1444 ...
-------------- 3571 -----------------
translating text1 of language  tr  of length  2305 ...
translating text2 of language  tr  of length  2290 ...
-------------- 3572 -----------------
translating text1 of language  tr  of length  2558 ...
translating text2 of language  tr  of length  1838 ...
-------------- 3573 -----------------
translating text1 of language  tr  of length  286 ...
translating text2 of language  tr  of length  2684 ...
-----

-------------- 3634 -----------------
translating text1 of language  tr  of length  1907 ...
translating text2 of language  tr  of length  1127 ...
-------------- 3635 -----------------
translating text1 of language  tr  of length  1576 ...
translating text2 of language  tr  of length  1697 ...
-------------- 3636 -----------------
translating text1 of language  tr  of length  1422 ...
translating text2 of language  tr  of length  2005 ...
-------------- 3637 -----------------
translating text1 of language  tr  of length  1435 ...
translating text2 of language  tr  of length  1370 ...
-------------- 3638 -----------------
translating text1 of language  tr  of length  3527 ...
translating text2 of language  tr  of length  1612 ...
-------------- 3639 -----------------
translating text1 of language  tr  of length  2948 ...
translating text2 of language  tr  of length  2368 ...
-------------- 3640 -----------------
translating text1 of language  tr  of length  814 ...
translating text2 of

-------------- 3721 -----------------
translating text1 of language  de  of length  4487 ...
translating text2 of language  de  of length  1279 ...
-------------- 3722 -----------------
translating text1 of language  de  of length  3265 ...
translating text2 of language  de  of length  1266 ...
-------------- 3723 -----------------
translating text1 of language  de  of length  1141 ...
translating text2 of language  de  of length  675 ...
-------------- 3724 -----------------
translating text1 of language  de  of length  8718 ...
translating text2 of language  de  of length  2816 ...
-------------- 3725 -----------------
translating text1 of language  de  of length  1695 ...
translating text2 of language  de  of length  2603 ...
-------------- 3726 -----------------
translating text1 of language  de  of length  1114 ...
translating text2 of language  de  of length  1120 ...
-------------- 3727 -----------------
translating text1 of language  de  of length  5376 ...
translating text2 of

-------------- 3797 -----------------
translating text1 of language  de  of length  2227 ...
translating text2 of language  de  of length  1435 ...
-------------- 3798 -----------------
translating text1 of language  de  of length  947 ...
translating text2 of language  de  of length  1229 ...
3800 rows done.
-------------- 3799 -----------------
translating text1 of language  de  of length  975 ...
translating text2 of language  de  of length  1700 ...
-------------- 3800 -----------------
translating text1 of language  de  of length  17112 ...
translating text2 of language  de  of length  9083 ...
-------------- 3801 -----------------
translating text1 of language  de  of length  1930 ...
translating text2 of language  de  of length  3241 ...
-------------- 3802 -----------------
translating text1 of language  de  of length  5971 ...
translating text2 of language  de  of length  3944 ...
-------------- 3803 -----------------
translating text1 of language  de  of length  1287 ...
tran

-------------- 3876 -----------------
translating text1 of language  de  of length  3092 ...
translating text2 of language  de  of length  2431 ...
-------------- 3877 -----------------
translating text1 of language  de  of length  2303 ...
translating text2 of language  de  of length  1962 ...
-------------- 3878 -----------------
translating text1 of language  de  of length  4595 ...
translating text2 of language  de  of length  5701 ...
-------------- 3879 -----------------
translating text1 of language  de  of length  4742 ...
translating text2 of language  de  of length  2643 ...
-------------- 3880 -----------------
translating text1 of language  de  of length  999 ...
translating text2 of language  de  of length  1281 ...
-------------- 3881 -----------------
translating text1 of language  de  of length  2335 ...
translating text2 of language  de  of length  761 ...
-------------- 3882 -----------------
translating text1 of language  de  of length  1830 ...
translating text2 of 

-------------- 3942 -----------------
translating text1 of language  de  of length  3146 ...
translating text2 of language  de  of length  2190 ...
-------------- 3943 -----------------
translating text1 of language  de  of length  889 ...
translating text2 of language  de  of length  926 ...
-------------- 3944 -----------------
translating text1 of language  de  of length  6169 ...
translating text2 of language  de  of length  5036 ...
-------------- 3945 -----------------
translating text1 of language  de  of length  1179 ...
translating text2 of language  de  of length  1105 ...
-------------- 3946 -----------------
translating text1 of language  de  of length  2666 ...
translating text2 of language  de  of length  1703 ...
-------------- 3947 -----------------
translating text1 of language  de  of length  1027 ...
translating text2 of language  de  of length  736 ...
-------------- 3948 -----------------
translating text1 of language  de  of length  1862 ...
translating text2 of l

-------------- 4013 -----------------
translating text1 of language  de  of length  2506 ...
translating text2 of language  de  of length  659 ...
-------------- 4014 -----------------
translating text1 of language  de  of length  5583 ...
translating text2 of language  de  of length  5135 ...
-------------- 4015 -----------------
translating text1 of language  de  of length  4108 ...
translating text2 of language  de  of length  813 ...
-------------- 4016 -----------------
translating text1 of language  de  of length  2654 ...
translating text2 of language  de  of length  1286 ...
-------------- 4017 -----------------
translating text1 of language  de  of length  2172 ...
translating text2 of language  de  of length  1728 ...
-------------- 4018 -----------------
translating text1 of language  de  of length  752 ...
translating text2 of language  de  of length  803 ...
-------------- 4019 -----------------
translating text1 of language  de  of length  1134 ...
translating text2 of la

-------------- 4085 -----------------
translating text1 of language  de  of length  3914 ...
translating text2 of language  de  of length  1481 ...
-------------- 4086 -----------------
translating text1 of language  de  of length  905 ...
translating text2 of language  de  of length  1110 ...
-------------- 4087 -----------------
translating text1 of language  de  of length  1592 ...
translating text2 of language  de  of length  821 ...
-------------- 4088 -----------------
translating text1 of language  de  of length  780 ...
translating text2 of language  de  of length  903 ...
-------------- 4089 -----------------
translating text1 of language  de  of length  3731 ...
translating text2 of language  de  of length  2241 ...
-------------- 4090 -----------------
translating text1 of language  de  of length  4620 ...
translating text2 of language  de  of length  15885 ...
-------------- 4091 -----------------
translating text1 of language  de  of length  1800 ...
translating text2 of l

-------------- 4151 -----------------
translating text1 of language  es  of length  6105 ...
translating text2 of language  es  of length  1146 ...
-------------- 4152 -----------------
translating text1 of language  es  of length  814 ...
translating text2 of language  es  of length  749 ...
-------------- 4153 -----------------
translating text1 of language  es  of length  1276 ...
translating text2 of language  es  of length  1916 ...
-------------- 4154 -----------------
translating text1 of language  es  of length  33 ...
translating text2 of language  es  of length  51 ...
-------------- 4155 -----------------
translating text1 of language  es  of length  1904 ...
translating text2 of language  es  of length  969 ...
-------------- 4156 -----------------
translating text1 of language  es  of length  1107 ...
translating text2 of language  es  of length  823 ...
-------------- 4157 -----------------
translating text1 of language  es  of length  1551 ...
translating text2 of langua

translating text1 of language  es  of length  873 ...
translating text2 of language  es  of length  557 ...
-------------- 4242 -----------------
translating text1 of language  es  of length  3002 ...
translating text2 of language  es  of length  4217 ...
-------------- 4243 -----------------
translating text1 of language  es  of length  1338 ...
translating text2 of language  es  of length  1354 ...
-------------- 4244 -----------------
translating text1 of language  es  of length  2482 ...
translating text2 of language  es  of length  1645 ...
-------------- 4245 -----------------
translating text1 of language  es  of length  2726 ...
translating text2 of language  es  of length  1023 ...
-------------- 4246 -----------------
translating text1 of language  es  of length  954 ...
translating text2 of language  es  of length  996 ...
-------------- 4247 -----------------
translating text1 of language  es  of length  1291 ...
translating text2 of language  es  of length  1390 ...
------

translating text2 of language  de  of length  3762 ...
-------------- 4318 -----------------
translating text1 of language  de  of length  2403 ...
translating text2 of language  de  of length  545 ...
-------------- 4319 -----------------
translating text1 of language  de  of length  958 ...
translating text2 of language  de  of length  711 ...
-------------- 4320 -----------------
translating text1 of language  de  of length  3815 ...
translating text2 of language  de  of length  3424 ...
-------------- 4321 -----------------
translating text1 of language  de  of length  1254 ...
translating text2 of language  de  of length  3992 ...
-------------- 4322 -----------------
translating text1 of language  de  of length  1021 ...
translating text2 of language  de  of length  689 ...
-------------- 4323 -----------------
translating text1 of language  de  of length  2689 ...
translating text2 of language  de  of length  381 ...
-------------- 4324 -----------------
translating text1 of lan

-------------- 4389 -----------------
translating text1 of language  de  of length  1635 ...
translating text2 of language  de  of length  535 ...
-------------- 4390 -----------------
translating text1 of language  de  of length  3701 ...
translating text2 of language  de  of length  178 ...
-------------- 4391 -----------------
translating text1 of language  de  of length  1882 ...
translating text2 of language  de  of length  947 ...
-------------- 4392 -----------------
translating text1 of language  de  of length  519 ...
translating text2 of language  de  of length  1657 ...
-------------- 4393 -----------------
translating text1 of language  de  of length  4035 ...
translating text2 of language  de  of length  8836 ...
-------------- 4394 -----------------
translating text1 of language  de  of length  3558 ...
translating text2 of language  de  of length  2721 ...
-------------- 4395 -----------------
translating text1 of language  de  of length  3320 ...
translating text2 of la

-------------- 4461 -----------------
translating text1 of language  de  of length  70 ...
translating text2 of language  de  of length  919 ...
-------------- 4462 -----------------
translating text1 of language  de  of length  1019 ...
translating text2 of language  de  of length  660 ...
-------------- 4463 -----------------
translating text1 of language  de  of length  2599 ...
translating text2 of language  de  of length  1505 ...
-------------- 4464 -----------------
translating text1 of language  de  of length  2558 ...
translating text2 of language  de  of length  1695 ...
-------------- 4465 -----------------
translating text1 of language  de  of length  1132 ...
translating text2 of language  de  of length  1622 ...
-------------- 4466 -----------------
translating text1 of language  de  of length  3604 ...
translating text2 of language  de  of length  426 ...
-------------- 4467 -----------------
translating text1 of language  de  of length  1997 ...
translating text2 of lan

-------------- 4521 -----------------
translating text1 of language  de  of length  2856 ...
translating text2 of language  de  of length  276 ...
-------------- 4522 -----------------
translating text1 of language  de  of length  2628 ...
translating text2 of language  de  of length  2680 ...
-------------- 4523 -----------------
translating text1 of language  de  of length  4453 ...
translating text2 of language  de  of length  1048 ...
-------------- 4524 -----------------
translating text1 of language  de  of length  2667 ...
translating text2 of language  de  of length  3935 ...
-------------- 4525 -----------------
translating text1 of language  de  of length  2688 ...
translating text2 of language  de  of length  782 ...
-------------- 4526 -----------------
translating text1 of language  de  of length  2605 ...
translating text2 of language  de  of length  303 ...
-------------- 4527 -----------------
translating text1 of language  de  of length  1130 ...
translating text2 of l

-------------- 4587 -----------------
translating text1 of language  de  of length  4014 ...
translating text2 of language  de  of length  2045 ...
-------------- 4588 -----------------
translating text1 of language  de  of length  1243 ...
translating text2 of language  de  of length  545 ...
-------------- 4589 -----------------
translating text1 of language  de  of length  2356 ...
translating text2 of language  de  of length  1235 ...
-------------- 4590 -----------------
translating text1 of language  de  of length  1123 ...
translating text2 of language  de  of length  1720 ...
-------------- 4591 -----------------
translating text1 of language  de  of length  2816 ...
translating text2 of language  de  of length  1658 ...
-------------- 4592 -----------------
translating text1 of language  de  of length  2174 ...
translating text2 of language  de  of length  1961 ...
-------------- 4593 -----------------
translating text1 of language  de  of length  369 ...
translating text2 of 

translating text1 of language  de  of length  3585 ...
translating text2 of language  de  of length  3602 ...
-------------- 4650 -----------------
translating text1 of language  de  of length  2726 ...
translating text2 of language  de  of length  3 ...
-------------- 4651 -----------------
translating text1 of language  de  of length  1975 ...
translating text2 of language  de  of length  2827 ...
-------------- 4652 -----------------
translating text1 of language  de  of length  1823 ...
translating text2 of language  de  of length  2497 ...
-------------- 4653 -----------------
translating text1 of language  de  of length  4089 ...
translating text2 of language  de  of length  3666 ...
-------------- 4654 -----------------
translating text1 of language  de  of length  931 ...
translating text2 of language  de  of length  680 ...
-------------- 4655 -----------------
translating text1 of language  de  of length  2654 ...
translating text2 of language  de  of length  2236 ...
-------

-------------- 4708 -----------------
translating text1 of language  de  of length  4997 ...
translating text2 of language  de  of length  3050 ...
-------------- 4709 -----------------
translating text1 of language  de  of length  4331 ...
translating text2 of language  de  of length  833 ...
-------------- 4710 -----------------
translating text1 of language  de  of length  2167 ...
translating text2 of language  de  of length  1784 ...
-------------- 4711 -----------------
translating text1 of language  de  of length  1583 ...
translating text2 of language  de  of length  789 ...
-------------- 4712 -----------------
translating text1 of language  de  of length  726 ...
translating text2 of language  de  of length  879 ...
-------------- 4713 -----------------
translating text1 of language  de  of length  4487 ...
translating text2 of language  de  of length  1301 ...
-------------- 4714 -----------------
translating text1 of language  de  of length  6055 ...
translating text2 of la

translating text1 of language  de  of length  7591 ...
translating text2 of language  de  of length  1779 ...
-------------- 4778 -----------------
translating text1 of language  de  of length  1593 ...
translating text2 of language  de  of length  3615 ...
-------------- 4779 -----------------
translating text1 of language  de  of length  13946 ...
translating text2 of language  de  of length  3562 ...
-------------- 4780 -----------------
translating text1 of language  de  of length  4263 ...
translating text2 of language  de  of length  2635 ...
-------------- 4781 -----------------
translating text1 of language  de  of length  6449 ...
translating text2 of language  de  of length  2171 ...
-------------- 4782 -----------------
translating text1 of language  de  of length  11609 ...
translating text2 of language  de  of length  2592 ...
-------------- 4783 -----------------
translating text1 of language  de  of length  1640 ...
translating text2 of language  de  of length  1457 ...


-------------- 4846 -----------------
translating text1 of language  ar  of length  1817 ...
translating text2 of language  ar  of length  616 ...
-------------- 4847 -----------------
translating text1 of language  ar  of length  1220 ...
translating text2 of language  ar  of length  749 ...
-------------- 4848 -----------------
translating text1 of language  ar  of length  4420 ...
translating text2 of language  ar  of length  2342 ...
4850 rows done.
-------------- 4849 -----------------
translating text1 of language  ar  of length  1279 ...
translating text2 of language  ar  of length  1591 ...
-------------- 4850 -----------------
translating text1 of language  ar  of length  2038 ...
translating text2 of language  ar  of length  900 ...
-------------- 4851 -----------------
translating text1 of language  ar  of length  700 ...
translating text2 of language  ar  of length  1564 ...
-------------- 4852 -----------------
translating text1 of language  ar  of length  376 ...
translat

-------------- 4911 -----------------
translating text1 of language  pl  of length  2116 ...
translating text2 of language  pl  of length  9999 ...
-------------- 4912 -----------------
translating text1 of language  pl  of length  1182 ...
translating text2 of language  pl  of length  2696 ...
-------------- 4913 -----------------
translating text1 of language  pl  of length  2142 ...
translating text2 of language  pl  of length  1805 ...
-------------- 4914 -----------------
translating text1 of language  pl  of length  2130 ...
translating text2 of language  pl  of length  1429 ...
-------------- 4915 -----------------
translating text1 of language  pl  of length  1454 ...
translating text2 of language  pl  of length  2521 ...
-------------- 4916 -----------------
translating text1 of language  pl  of length  603 ...
translating text2 of language  pl  of length  2638 ...
-------------- 4917 -----------------
translating text1 of language  pl  of length  1549 ...
translating text2 of

We observe that only 11 documents have been truncated due to absence of formatting or paragraphs.

In [25]:
# checking translated result
train_df.loc[2482, "translated_body1"]

"A woman who lost her husband to her own mother has said she will never fully forgive her for taking him away and then having his child.  Lauren Wall, who's now 34, and from Twickenham, south-west London married airport worker Paul White when she was just 19.  Her mum Julie, who is now 53, paid for a £15,000 wedding and grateful Lauren took her along on her two-week honeymoon to Devon.  Eight weeks later, husband Paul moved out and nine months later, her mother Julie gave birth to his child announcing they were together.  Lauren said: 'Paul always got on really well with mum. I never thought it strange though, as she was his mum-in-law and he was just being friendly.  Lauren Wall, from Twickenham, south-west London, who's now 34 got married to husband Paul when she was just 19. Just weeks later her husband moved out and was soon with Lauren's mum, Julie  'They'd laugh a lot together. I didn't think to be worried at all. Who would?  'I couldn't wait to settle in to marriage but the ink 

In [26]:
path = 'train/_TRAIN_text_translated.csv'
train_df.to_csv(path,index=False)