<a href="https://colab.research.google.com/github/atlas-github/fi_analytics/blob/master/Chapter_7_Sentiment_analysis_using_Malaya.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

I’ve converted part of the PDF document into a spreadsheet resembling the snapshot below. 

In [2]:
import pandas as pd
df = pd.read_csv("sample_parl.csv")
df

Unnamed: 0,Speaker,Text
0,Dato' Seri Haji Mukhriz Tun Dr. Mahathir [Jerlun],"minta Menteri Pelancongan, Seni dan Budaya men..."
1,"Menteri Pelancongan, Seni dan Budaya [Datuk Mo...","Bismillahi rahmani rahim, assalamualaikum wara..."
2,Dato' Seri Haji Mukhriz Tun Dr. Mahathir [Jerlun],Saya mengucapkan jutaan terima kasih kepada Ya...
3,"Menteri Pelancongan, Seni dan Budaya [Datuk Mo...",Tuan Yang di-Pertua. Untuk menjawab soalan tam...
4,Dato’ Seri Haji Mukhriz Tun Dr. Mahathir [Jerlun],Maksud saya mengenai pembangunan Yang Berhorma...
5,"Menteri Pelancongan, Seni dan Budaya [Datuk Mo...","Yang Berhormat, kita memang ada rancangan untu..."
6,Tuan Yang di-Pertua,Soalan tambahan.
7,Tuan Yusuf bin Abd Wahab [Tanjong Manis],"Tanjong Manis, Tanjong Manis."
8,Dato’ Jalaluddin bin Alias [Jelebu],Jelebu.
9,Tuan Yusuf bin Abd Wahab [Tanjong Manis],"Tanjong Manis. Tuan Yang diPertua, Tanjong Man..."


After uploading the dataset, the next step would be to install the malaya library by running the following line.

In [3]:
!pip install malaya

Collecting malaya
[?25l  Downloading https://files.pythonhosted.org/packages/db/c1/2a9cc05f26f8622f323d8a8df8126e1d9937a08b45362d974d6304f86543/malaya-3.6.3-py3-none-any.whl (4.0MB)
[K     |████████████████████████████████| 4.0MB 2.8MB/s 
[?25hCollecting sentencepiece
[?25l  Downloading https://files.pythonhosted.org/packages/d4/a4/d0a884c4300004a78cca907a6ff9a5e9fe4f090f5d95ab341c53d28cbc58/sentencepiece-0.1.91-cp36-cp36m-manylinux1_x86_64.whl (1.1MB)
[K     |████████████████████████████████| 1.1MB 39.7MB/s 
[?25hCollecting herpetologist
  Downloading https://files.pythonhosted.org/packages/83/58/8ad7d0ccd94e6810c08baf040f622e44807afafc5fdc20c58f4a0a774e01/herpetologist-0.0.9-py3-none-any.whl
Collecting youtokentome
[?25l  Downloading https://files.pythonhosted.org/packages/a3/65/4a86cf99da3f680497ae132329025b291e2fda22327e8da6a9476e51acb1/youtokentome-1.0.6-cp36-cp36m-manylinux2010_x86_64.whl (1.7MB)
[K     |████████████████████████████████| 1.7MB 37.8MB/s 
Collecting unideco

You’ll get a rather long list of comments from Google Colab, and it is safe to ignore these. Next, we’ll view the types of emotions available in this library.

In [4]:
import malaya

#view the types of emotions available
malaya.emotion.label

['anger', 'fear', 'happy', 'love', 'sadness', 'surprise']

The choice of model would depend on which gives you the best result. For this demonstration, I’ll use the multinomial model. Users are free to explore the other models at their own time. 

In [5]:
model = malaya.emotion.multinomial()

downloading frozen /root/Malaya/emotion/multinomial model


  charset=Bar.ASCII if ascii is True else ascii or Bar.UTF)
101%|██████████| 6.00/5.95 [00:00<00:00, 8.62MB/s]


downloading frozen /root/Malaya/emotion/multinomial vector


116%|██████████| 3.00/2.59 [00:00<00:00, 5.66MB/s]


downloading frozen /root/Malaya/emotion/multinomial bpe


119%|██████████| 1.00/0.84 [00:00<00:00, 2.27MB/s]


You are likely to get a few warning messages here, which are safe to ignore. 

Let’s pick a random statement, say row 2, in the column Text. The statement is reproduced here:

Saya mengucapkan jutaan terima kasih kepada Yang Berhormat Menteri di atas jawapan tersebut. Soalan tambahan saya ialah walaupun hutan merupakan sumber hasil pendapatan bagi kerajaan negeri dan pengusaha, contohnya melalui aktiviti pembalakan dan sebagainya, Kedah sangat prihatin soal kelestarian alam sekitar. Apakah bentuk bantuan yang boleh disalurkan kepada pihak swasta untuk membangunkan produk-produk pelancongan supaya kalau pun tidak dapat mengganti hasil yang boleh diperoleh daripada pembalakan contohnya, kurang-kurang dapat memberi hasil yang setimpal kepada pengusaha dan kerajaan negeri.

~Dato' Seri Haji Mukhriz Tun Dr. Mahathir [Jerlun], 14 Oct 2019

Going by the statement, it is likely to be between a neutral to a slightly negative statement, judging from his request for reasonable compensation to operators and the state government. 

To test how the multinomial model scores this statement, run the following lines of code.

In [6]:
model.predict([df.iloc[2, 1]])

['anger']

In [7]:
model.predict_proba([df.iloc[2, 1]])

[{'anger': 0.3168547733448728,
  'fear': 0.14767448881320275,
  'happy': 0.13088171980427257,
  'love': 0.10666615839911558,
  'sadness': 0.19795303520701094,
  'surprise': 0.09996982443152384}]

The multinomial model scores the statement above as an angry one. I would like to indicate the scores for the various emotions shouldn’t be looked at alone: you should also look at the scores of the other statements to determine which statements are the angry ones, the ones depicting fear etc. 

Now, I’ll use the multinomial model and run it on all statements using a for loop.


In [9]:
emotions = pd.DataFrame(columns = ['anger', 'fear', 'happy', 'love', 'sadness', 'surprise'])

for i in range(len(df)):
  result = pd.DataFrame(model.predict_proba([df.iloc[i, 1]]))
  emotions = emotions.append(result, ignore_index = True)

emotions

Unnamed: 0,anger,fear,happy,love,sadness,surprise
0,0.231862,0.131138,0.134627,0.124979,0.244913,0.132481
1,0.238093,0.116695,0.21778,0.123474,0.171392,0.132567
2,0.316855,0.147674,0.130882,0.106666,0.197953,0.09997
3,0.252482,0.157898,0.148688,0.127589,0.187247,0.126097
4,0.306371,0.146608,0.142913,0.120837,0.148703,0.134568
5,0.299077,0.128777,0.144333,0.14017,0.14979,0.137852
6,0.170148,0.152893,0.132502,0.136038,0.271495,0.136924
7,0.143107,0.154008,0.201308,0.173618,0.134847,0.193112
8,0.145088,0.153908,0.189621,0.207773,0.141876,0.161734
9,0.141875,0.160858,0.211531,0.16627,0.125977,0.193489


The last step of the analysis would be to join the table above with the statements table to generate some insights. 

In [11]:
result = pd.concat([df, emotions], axis = 1)
result

Unnamed: 0,Speaker,Text,anger,fear,happy,love,sadness,surprise
0,Dato' Seri Haji Mukhriz Tun Dr. Mahathir [Jerlun],"minta Menteri Pelancongan, Seni dan Budaya men...",0.231862,0.131138,0.134627,0.124979,0.244913,0.132481
1,"Menteri Pelancongan, Seni dan Budaya [Datuk Mo...","Bismillahi rahmani rahim, assalamualaikum wara...",0.238093,0.116695,0.21778,0.123474,0.171392,0.132567
2,Dato' Seri Haji Mukhriz Tun Dr. Mahathir [Jerlun],Saya mengucapkan jutaan terima kasih kepada Ya...,0.316855,0.147674,0.130882,0.106666,0.197953,0.09997
3,"Menteri Pelancongan, Seni dan Budaya [Datuk Mo...",Tuan Yang di-Pertua. Untuk menjawab soalan tam...,0.252482,0.157898,0.148688,0.127589,0.187247,0.126097
4,Dato’ Seri Haji Mukhriz Tun Dr. Mahathir [Jerlun],Maksud saya mengenai pembangunan Yang Berhorma...,0.306371,0.146608,0.142913,0.120837,0.148703,0.134568
5,"Menteri Pelancongan, Seni dan Budaya [Datuk Mo...","Yang Berhormat, kita memang ada rancangan untu...",0.299077,0.128777,0.144333,0.14017,0.14979,0.137852
6,Tuan Yang di-Pertua,Soalan tambahan.,0.170148,0.152893,0.132502,0.136038,0.271495,0.136924
7,Tuan Yusuf bin Abd Wahab [Tanjong Manis],"Tanjong Manis, Tanjong Manis.",0.143107,0.154008,0.201308,0.173618,0.134847,0.193112
8,Dato’ Jalaluddin bin Alias [Jelebu],Jelebu.,0.145088,0.153908,0.189621,0.207773,0.141876,0.161734
9,Tuan Yusuf bin Abd Wahab [Tanjong Manis],"Tanjong Manis. Tuan Yang diPertua, Tanjong Man...",0.141875,0.160858,0.211531,0.16627,0.125977,0.193489


Using these scores, let’s identify the 5 statements with the 5 highest scores for anger. 

In [12]:
result.nlargest(5, ['anger']) 

Unnamed: 0,Speaker,Text,anger,fear,happy,love,sadness,surprise
15,Tuan Yang di-Pertua,Silakan Yang Berhormat Jelebu.,0.323224,0.126676,0.140412,0.164801,0.109307,0.135579
18,Tuan Yang di-Pertua,Cadangan yang baik. Tuan Yang di-Pertua pun me...,0.321994,0.175013,0.121242,0.119978,0.126815,0.134959
2,Dato' Seri Haji Mukhriz Tun Dr. Mahathir [Jerlun],Saya mengucapkan jutaan terima kasih kepada Ya...,0.316855,0.147674,0.130882,0.106666,0.197953,0.09997
4,Dato’ Seri Haji Mukhriz Tun Dr. Mahathir [Jerlun],Maksud saya mengenai pembangunan Yang Berhorma...,0.306371,0.146608,0.142913,0.120837,0.148703,0.134568
5,"Menteri Pelancongan, Seni dan Budaya [Datuk Mo...","Yang Berhormat, kita memang ada rancangan untu...",0.299077,0.128777,0.144333,0.14017,0.14979,0.137852


In [13]:
result.nlargest(5, ['happy']) 

Unnamed: 0,Speaker,Text,anger,fear,happy,love,sadness,surprise
22,Dato’ Seri Dr. Shahidan bin Kassim [Arau],"Tahniah Yang Berhormat Menteri, tahniah.",0.210811,0.125986,0.25416,0.135436,0.121685,0.151923
20,Tuan Abdul Latiff bin Abdul Rahman [Kuala Krai],"Tahniah Yang Berhormat Menteri, tahniah. Saya ...",0.217794,0.131846,0.235689,0.139696,0.118446,0.15653
1,"Menteri Pelancongan, Seni dan Budaya [Datuk Mo...","Bismillahi rahmani rahim, assalamualaikum wara...",0.238093,0.116695,0.21778,0.123474,0.171392,0.132567
9,Tuan Yusuf bin Abd Wahab [Tanjong Manis],"Tanjong Manis. Tuan Yang diPertua, Tanjong Man...",0.141875,0.160858,0.211531,0.16627,0.125977,0.193489
7,Tuan Yusuf bin Abd Wahab [Tanjong Manis],"Tanjong Manis, Tanjong Manis.",0.143107,0.154008,0.201308,0.173618,0.134847,0.193112
