<div>
<img src="images/icon_important.jpg" width="50" align="left"/>
</div>
<br>
<br>

### __Important Legal Notice__
By running and editing this Jupyter notebook with the corresponding dataset, you agree that you will not use or store the data for other purposes than participating in the Champagne Coding with DNB & Women in Data Science, Oslo. You will delete the data and notebook after the event and will not attempt to identify any of the commentors.

## Translating into Norwegian and cleaning up the data

Most libraries for sentiment analysis only support English, except for polyglot, which is rather problematic to install. For that reason, we made a short script to attempt to translate and make a consistent set of review comments. 

__Note__ that if you get ```HTTP Error``` due to ```Too many requests```, you need to have a VPN client to modify your IP if you wish to continue running the functions for translating.

In [1]:
import pandas as pd
from pathlib import Path
current_directory = Path.cwd()
reviews_directory = Path(current_directory, 'reviews')

In [2]:
# Read data frame
df = pd.read_csv(Path(reviews_directory, 'dnb_reviews.csv'))

In [3]:
# Clean data frame
df.drop_duplicates(inplace=True)
df.dropna(inplace=True)
df['Review_Text'] = df['Review_Text'].map(lambda text: text.replace("...Full Review", ""))

In [4]:
# Import textblob and nltk.tokenize
from textblob import TextBlob
from nltk.tokenize import sent_tokenize # for tokenizing into sentences
import statistics

In [5]:
from langdetect import detect
print(detect("Har ikke root tilgang, kommer fortsatt ikke"))
print(detect("The best way to accses dnb"))

no
en


In [6]:
def detect_lang(text):
    if (detect(text) == 'en'):
        return 'en'
    if (detect(text) == 'no'):
        return 'no'
    else:
        return 'no'

In [7]:
df['Language'] = df['Review_Text'].apply(detect_lang)

In [8]:
df['Language'].unique()

array(['en', 'no'], dtype=object)

In [9]:
df[df['Language']=='en']['Review_Text'][:10]

0     app is making my phone hang sevaral time ..als...
1     "SIM tool launches before the page is done loa...
2     Complete trash. Used to be ok. Its hardly an a...
3     I have been using the old and new version of t...
4     Almost never works. Often try to log in but ge...
5     Jesus Christ why is there no option to change ...
6     Worst app ever? Slower and affords less privac...
7     Trying to login but the application is not res...
9     very unreliable, but when it works its pretty ...
10    After the recent update, I can now log into th...
Name: Review_Text, dtype: object

In [10]:
df[df['Language']=='en'].shape

(308, 6)

In [11]:
df[df['Language']=='no']['Review_Text'][:10]

8      Har ikke root tilgang, kommer fortsatt ikke in...
156                             New version is very good
183                                         Can't log in
199                           You're missing night mode.
202                                          didn't work
208                             horrible after update...
209                                         doesn't work
213                                  It does not work 🙄🙄
216                                   awful after update
218    Edit: fungerer etter å reinstallere. Fungerer ...
Name: Review_Text, dtype: object

In [12]:
df[df['Language']=='no'].shape

(589, 6)

In [13]:
from googletrans import Translator
translator = Translator()
print(translator.translate('Jeg har ikke penger', src='no').text)

I do not have money


In [14]:
def translate_to_eng(text):
    try:
        return translator.translate(text, src='no').text
    except:
        emoji_stripped_text = text.encode('ascii', 'ignore').decode('ascii')
        try:
            return translator.translate(emoji_stripped_text, src='no').text
        except:
            print('Error,returning same string:\n', text)
            return text
    return text

In [15]:
df['Review_Eng'] = df[df['Language']=='no']['Review_Text'].apply(translate_to_eng)

Error,returning same string:
 Iphone x10 &gt; Android
Error,returning same string:
 Skjønner ikke hva de før meg klager på. Den fungerer jo helt supert. Redder dagen :)
Error,returning same string:
 Ubrukelig app.
PER DESIGN.
Error,returning same string:
 Ser ikke helt forskjell på denne appen og vanlig mobilbank, bortsett at skjermen ser rar ut på sensation.
Error,returning same string:
 Mangler mye funksjonalitet ift andre bank-app'er. Fortsetter dessuten å bruke gps i bakgrunnen etter jeg har gått ut. Desire HD
Error,returning same string:
 Fungerer fint på Desire HD. Perfekt å ha hvis du er på ferie uten tilgang til PC og må innom nettbanken for å betale en regning eller noe.
Error,returning same string:
 nye oppdateringen er noe av det mest plundrete og unødvendige jeg noensinne har brukt.... må konstant logge med inn på nytt, og det bruker tid og flere forsøk, og i tillegg får jeg da beskjed om å avinstallere + installere på nytt? Herregud, for et rot.
Error,returning same string

In [16]:
df.loc[df['Language'] == 'en', 'Review_Eng'] = df['Review_Text']

In [17]:
df[df['Language'] == 'no'][["Review_Eng", "Review_Text", "Language"]].sample(10)

Unnamed: 0,Review_Eng,Review_Text,Language
743,- Unable to log in with BankID for Mobile. Nor...,- Får ikke til å logge inn med Bank ID på Mobi...,no
750,It appears that this app is now not much more ...,Det ser ut at denne app er nå ikke mye mer enn...,no
266,s Poorly that d considered switching banks! th...,så dårlig at d vurderes å bytte bank! den forr...,no
412,Huge potential for this version. Hassle login ...,Stort potensiale for denne versjonen. Problemf...,no
665,It has ceased to function on Android Oreo on S...,Den har sluttet å fungere på Android Oreo på S...,no
890,ubrukelig app,ubrukelig app,no
527,Bad patch that is only in English too? Seems h...,Dårlig oppdatering som kun er på engelsk også?...,no
353,"it was good once, then came an update, also wa...","den var bra en gang, så kom det en oppdatering...",no
373,Oh. My. Lord. Getting hurt app utvecklarna but...,Oh. My. Lord. Får vondt av app-utvecklarna men...,no
424,It was a good app that worked and was easy to ...,det VAR en bra app som funket og var lett å br...,no


In [18]:
#Still some left without translating..
df.query('(Review_Text == Review_Eng) and Language == "no"')[['Review_Text', 'Review_Eng']].shape

(66, 2)

In [19]:
#Let's try another library, this one works on small amounts of requests, otherwise it complains on "too many requests"
def translate_to_eng_textblob(text):
    text_blob = TextBlob(text)
    if text_blob.detect_language() != 'en':
        try:
            text_blob = text_blob.translate(to='en')
        except:
            #do nothing
            return str(text_blob)
    return str(text_blob)

In [20]:
df['textblob_Translate'] = df.query('(Review_Text == Review_Eng) and Language == "no"')['Review_Text'].apply(translate_to_eng_textblob)

In [21]:
df.loc[df['textblob_Translate'].notnull(), 'Review_Eng'] = df['textblob_Translate']

In [22]:
# If you see the values, mostly are badly categorised english sentences..
df.query('(Review_Text == Review_Eng) and Language == "no"')[['Review_Text', 'Review_Eng', 'Language']]

Unnamed: 0,Review_Text,Review_Eng,Language
156,New version is very good,New version is very good,no
199,You're missing night mode.,You're missing night mode.,no
216,awful after update,awful after update,no
251,All good!,All good!,no
252,Excellent,Excellent,no
287,ok app,ok app,no
323,wow,wow,no
324,bah,bah,no
338,gooooood,gooooood,no
504,Eeeeuuuurrrgggg,Eeeeuuuurrrgggg,no


In [23]:
# Let's classify those as english..
df.loc[df.query('(Review_Text == Review_Eng) and Language == "no"').index, 'Language'] = 'en'

In [24]:
df[df['Language'] == 'en'][['Review_Text', 'Review_Eng']].sample(10)

Unnamed: 0,Review_Text,Review_Eng
9,"very unreliable, but when it works its pretty ...","very unreliable, but when it works its pretty ..."
132,I couldn't login in after update. It says it c...,I couldn't login in after update. It says it c...
683,Can't log in with lg g3,Can't log in with lg g3
184,"Very bad, the old one was better .","Very bad, the old one was better ."
192,the new update is very bad,the new update is very bad
42,Got error code 1027 during activation of the a...,Got error code 1027 during activation of the a...
15,Biometric login hardly ever works. The app is ...,Biometric login hardly ever works. The app is ...
145,"app doesn't work, can't log in","app doesn't work, can't log in"
230,I am using the app since. Last 5 month and i a...,I am using the app since. Last 5 month and i a...
34,fingerprint not working on the OnePlus 7 pro i...,fingerprint not working on the OnePlus 7 pro i...


In [25]:
df[df['Language'] == 'no'][['Review_Text', 'Review_Eng']].sample(10)

Unnamed: 0,Review_Text,Review_Eng
760,Jeg savner mer flyt i app bruk,I miss more fluidity in app use
8,"Har ikke root tilgang, kommer fortsatt ikke in...","Do not root access, still can not sign. Fr err..."
293,Etter den siste oppdatering fungerer den super...,After the last update it works super-fast! Fin...
431,Jeg får ikke aktivert appen. får bare error code.,I do not get activated app. only get error code.
244,Trenger lett tilgjengelig språk-innstilling. B...,Need readily accessible language setting. Auto...
506,Det nye designet er helt klart i riktig retnin...,The new design is clearly in the right directi...
496,dårlig oppgradering,bad upgrade
511,jeg må si jeg savner den gamle appen. Denne ny...,I must say I miss the old app. This new versio...
646,Etter siste oppdatering klarer ikke logge meg ...,After the last update can not log into online ...
332,Har reinnstallert og fortsatt problemer med fo...,Have reinnstallert and continued problems with...


In [26]:
df = df.drop(['textblob_Translate'], axis=1)

In [27]:
from faker import Faker
fake = Faker('no_NO')

def anonymous_name (text):
    return fake.name()

df['Name'] = df['Name'].apply(anonymous_name)

In [28]:
df = df.drop(['Unnamed: 0'], axis=1)

In [29]:
df[df['Language'] == 'no'].sample(10)

Unnamed: 0,Name,Date,Review_Score,Review_Text,Language,Review_Eng
506,Dr. Stein Andresen,"January 18, 2019",4,Det nye designet er helt klart i riktig retnin...,no,The new design is clearly in the right directi...
783,Torill Edvardsen-Pettersen,"September 24, 2013",1,Virker ikke lenger etter oppdatering,no,No longer works after update
487,Nora Sandvik,"February 14, 2019",1,Mobil bank funker ikke,no,Mobile banking does not work
304,Roy Eliassen-Olsen,"April 4, 2019",1,Støtter fortsatt ikke rootade Androids. Kommer...,no,Supports still not rooted Androids. Going well...
656,Kim-Vegard Sørensen,"October 8, 2016",5,"Man får gjort det man trenger i banken, og kje...",no,"One gets done what you need in the bank, and f..."
285,Kåre Larsen,"April 17, 2019",3,Kommer seg!,no,Coming up!
400,Wenche-Anita Moe,"January 28, 2019",1,"Ingen mulighet for å sjekke saldo, eller overf...",no,"No option to check your balance, or transfer w..."
333,Siv Ødegård,"March 11, 2019",1,"Den Norske Bank, nå med ny og forbedret app. E...",no,"The Norwegian Bank, now with new and improved ..."
220,Morten Nilsen-Iversen,"August 21, 2019",3,kan man gi mindre enn 0 stjerner. 3 typer inlo...,no,one can give less than 0 stars. 3 types inlogg...
462,Arne Næss,"February 28, 2019",5,"Må logge inn med bankID på mobil hver gang, da...",no,Must log in with BankID for mobile phones ever...


In [30]:
df[df['Language'] == 'en'].sample(10)

Unnamed: 0,Name,Date,Review_Score,Review_Text,Language,Review_Eng
193,Astrid-Kristine Aasen,"March 14, 2019",5,easy for everything,en,easy for everything
73,Ida Christensen-Eide,"January 30, 2019",5,fingerprint login doesn't work. in the previou...,en,fingerprint login doesn't work. in the previou...
651,Ragnhild Sivertsen,"November 2, 2017",1,Doesnt even work with samsung note 8. Is this ...,en,Doesnt even work with samsung note 8. Is this ...
432,Ingeborg Dahl,"February 12, 2019",1,Slow and stopped logging in with biometric det...,en,Slow and stopped logging in with biometric det...
713,Sander Dahl,"October 8, 2015",5,Very useful and practical,en,Very useful and practical
38,Arild-Steinar Madsen,"May 16, 2019",4,"works so far, but how do i change it to show i...",en,"works so far, but how do i change it to show i..."
81,Mathias-Arild Martinsen,"February 3, 2019",1,Where is the old app? I want it back. This new...,en,Where is the old app? I want it back. This new...
91,May Haugen,"February 3, 2019",1,The app worked fine when it first updated to t...,en,The app worked fine when it first updated to t...
733,Bjørn-Stig Næss,"March 12, 2014",1,"Having been an avid user of the mobilbank, I w...",en,"Having been an avid user of the mobilbank, I w..."
107,Magne-Roy Lien,"January 14, 2019",2,I have to log in with bank id every time I loa...,en,I have to log in with bank id every time I loa...


In [31]:
df.to_csv(str(Path(reviews_directory))+"/dnb_reviews_final.csv")