This notebook is used for collecting google trends data.
- It takes the countries from the UNHCR refugees dataset
- Gets the languages associated  to each country
- Gets the two letter language codes associated with each language
- Translates words from english to those languages

#### Read in relevant dictionaries, dataframes, modules

In [71]:
import pytrends
from country_abbrev import *
from country_language import *
from pytrends.request import TrendReq
import pandas as pd
import itertools
#import googletrans
import swifter

# get the list of all unique countries:
countries = pd.Series(pd.read_csv('../../data/data.csv', engine="pyarrow").Country_o.unique()).to_frame(name='country')

# list of all unique languages:
unique_languages = pd.Series(list(set(list(itertools.chain(*country_language_dict.values())))), name='language').str.lower()

# list of language codes from googletrans
# langcodes = pd.DataFrame.from_dict(googletrans.LANGCODES, orient='index')

Merge list of languages 

In [72]:
refugee_lang = unique_languages.to_frame().merge(langcodes, left_on='language', right_index=True, how='left')

Out of the approximately 190 languages, there are about 110 left that don't have codes associated with the specific names we provide. This could be due to not data cleaning, because appear to be less commonly used languages we will skip this for now.

In [73]:
refugee_lang[refugee_lang[0].isna()].sample(10)

Unnamed: 0,language,0
75,slovene,
122,sami,
162,kirundi,
45,bassa,
167,tok pisin,
182,forro,
26,tongan,
146,taiwanese hokkien,
121,berber,
61,ndebele,


In [74]:
refugee_lang.dropna(inplace=True)


Set up translator(s)

In [122]:
from deep_translator import GoogleTranslator
translator = GoogleTranslator(source='en', target='en') # output -> Weiter so, du bist großartig

def translate_keywords_slow(translator, series, lang):
    translator.target = lang
    series = series.str.split('+').explode()
    series_translated = translator.translate_batch(series.values.tolist())
    series_translated = pd.Series(index=series.index.tolist(), data=series_translated, name = series.name).to_frame().groupby(series.index)[series.name].agg(list).apply(lambda x: '+'.join(x))
    return series_translated

In [183]:
import requests

def translate_keywords(series, lang):

    series = series.str.split('+').explode()
    url = "https://translate.googleapis.com/translate_a/single"
    params = {
        "client": "gtx",
        "sl": "auto",
        "tl": lang,
        "dt": "t",
        "q": "\n".join(series.tolist())
    }
    response = requests.get(url, params=params)
    series_translated = [r[0].strip('\n').lower() for r in response.json()[0]]
    series_translated = pd.Series(index=series.index.tolist(), data=series_translated, name = series.name).to_frame().groupby(series.index)[series.name].agg(list).apply(lambda x: '+'.join(x))
    return series_translated

Read in list of words from the paper:

In [152]:
boss_words = pd.read_csv('boss_words.csv')['list']

In [196]:
# removing en from list
refugee_lang_not_en = refugee_lang[refugee_lang[0] != 'en']

# for each language in set, translate list of words to that language
translated_keyword = refugee_lang_not_en[0].swifter.apply(lambda x: translate_keywords(series = boss_words, lang= x))

In [202]:
# display results of df.
pd.concat([boss_words.rename('en',), translated_keyword.T.rename(refugee_lang[0], axis='columns')], axis=1)

Unnamed: 0,en,bg,ur,si,te,ig,sq,bn,nl,it,...,xh,ta,cy,yo,pa,ps,sw,be,ny,km
0,advisers+advisors,съветници+съветници,مشیر+مشیر,උපදේශකයන්+උපදේශකයන්,సలహాదారులు+సలహాదారులు,ndị ndụmọdụ+ndị ndụmọdụ,këshilltarët+këshilltarët,উপদেষ্টা+উপদেষ্টা,adviseurs+adviseurs,consiglieri+consiglieri,...,abacebisi+abacebisi,ஆலோசகர்கள்+ஆலோசகர்கள்,cynghorwyr+cynghorwyr,awọn onimọran+olugbamoran,ਸਲਾਹਕਾਰ+ਸਲਾਹਕਾਰ,مشاورین+مشاورین,washauri+washauri,дарадцы+дарадцы,alangizi+alangizi,ទីប្រឹក្សា+ទីប្រឹក្សា
1,agent,агент,ایجنٹ,නියෝජිතයා,ఏజెంట్,onye nnọchi anya,agjent,প্রতিনিধি,tussenpersoon,agente,...,iarhente,முகவர்,asiant,oluranlowo,ਏਜੰਟ,اجنټ,wakala,агент,wothandizira,ភ្នាក់ងារ
2,aliens,извънземни,غیر ملکی,පිටසක්වල ජීවීන්,విదేశీయులు,ndị ọbịa,alienet,এলিয়েন,buitenaardse wezens,alieni,...,abaphambukeli,வேற்றுகிரகவாசிகள்,estroniaid,awọn ajeji,ਪਰਦੇਸੀ,بهرنیان,wageni,іншапланецяне,alendo,ជនបរទេស
3,applicant+applicants+application+apply,кандидат+кандидати+приложение+приложи,درخواست گزار+درخواست دہندگان+درخواست+درخواست دیں,ඉල්ලුම්කරු+අයදුම්කරුවන්+අයදුම්පත+අයදුම් කරන්න,దరఖాస్తుదారు+దరఖాస్తుదారులు+అప్లికేషన్+దరఖాస్తు,onye ochoputa+ndị na-arịọ arịrịọ+ngwa+tinye,aplikanti+aplikantët+aplikacion+aplikoni,প্রার্থী+আবেদনকারীদের+আবেদন+আবেদন,aanvrager+aanvragers+sollicitatie+toepassen,richiedente+candidati+applicazione+fare domanda a,...,umenzi-sicelo+abafaki zicelo+isicelo+faka isicelo,விண்ணப்பதாரர்+விண்ணப்பதாரர்கள்+விண்ணப்பம்+விண்...,ymgeisydd+ymgeiswyr+cais+gwneud cais,olubẹwẹ+awọn olubẹwẹ+ohun elo+waye,ਬਿਨੈਕਾਰ+ਬਿਨੈਕਾਰ+ਐਪਲੀਕੇਸ਼ਨ+ਲਾਗੂ ਕਰੋ,غوښتونکی+غوښتونکي+غوښتنلیک+درخواست کول,mwombaji+waombaji+maombi+kuomba,абітурыент+абітурыентаў+прымяненне+ўжываць,wofunsira+ofunsira+ntchito+gwiritsani ntchito,អ្នកដាក់ពាក្យ+អ្នកដាក់ពាក្យ+កម្មវិធី+អនុវត្ត
4,appointment,назначаване,تقرری,පත්වීම,నియామకం,nhọpụta,takim,অ্যাপয়েন্টমেন্ট,afspraak,appuntamento,...,ukuqeshwa,நியமனம்,apwyntiad,ipinnu lati pade,ਮੁਲਾਕਾਤ,ملاقات,uteuzi,прызначэнне,kusankhidwa,ការណាត់ជួប
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
187,wellbeing,благополучие,خیریت,යහපැවැත්ම,క్షేమం,ịdị mma,mirëqenien,সুস্থতা,welzijn,benessere,...,intlalontle,நல்வாழ்வு,lles,alafia,ਤੰਦਰੁਸਤੀ,هوساینه,ustawi,дабрабыту,ubwino,សុខុមាលភាព
188,woes,неволи,پریشانیاں,දුක්ඛිත තත්වයන්,బాధలు,ahụhụ,hallet,দুর্ভোগ,ellende,guai,...,yeha,துயரங்கள்,gwaeau,ègbé,ਦੁੱਖ,کړاوونه,matatizo,беды,tsoka,វេទនា
189,work visa,работна виза,کام کا ویزا,රැකියා වීසා,పని వీసా,visa ọrụ,vizë pune,কাজ ভিসা,werk visum,visto di lavoro,...,i-visa yomsebenzi,வேலை விசா,fisa gwaith,fisa iṣẹ,ਕੰਮ ਦਾ ਵੀਜ਼ਾ,کاري ویزه,visa ya kazi,рабочая віза,visa ya ntchito,ទិដ្ឋាការការងារ
190,worker,работник,کارکن,සේවකයා,కార్మికుడు,onye ọrụ,punëtor,কর্মী,arbeider,lavoratore,...,umsebenzi,தொழிலாளி,gweithiwr,osise,ਕਾਮਾ,کارګر,mfanyakazi,рабочы,wantchito,កម្មករ
