Permalink
Switch branches/tags
Nothing to show
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
187 lines (124 sloc) 6.9 KB

Sentiment

Polyglot has polarity lexicons for 136 languages. The scale of the words' polarity consisted of three degrees: +1 for positive words, and -1 for negatives words. Neutral words will have a score of 0.

Languages Coverage

from polyglot.downloader import downloader
print(downloader.supported_languages_table("sentiment2", 3))
  1. Turkmen                    2. Thai                       3. Latvian
  4. Zazaki                     5. Tagalog                    6. Tamil
  7. Tajik                      8. Telugu                     9. Luxembourgish, Letzeb...
 10. Alemannic                 11. Latin                     12. Turkish
 13. Limburgish, Limburgan...  14. Egyptian Arabic           15. Tatar
 16. Lithuanian                17. Spanish; Castilian        18. Basque
 19. Estonian                  20. Asturian                  21. Greek, Modern
 22. Esperanto                 23. English                   24. Ukrainian
 25. Marathi (Marāṭhī)         26. Maltese                   27. Burmese
 28. Kapampangan               29. Uighur, Uyghur            30. Uzbek
 31. Malagasy                  32. Yiddish                   33. Macedonian
 34. Urdu                      35. Malayalam                 36. Mongolian
 37. Breton                    38. Bosnian                   39. Bengali
 40. Tibetan Standard, Tib...  41. Belarusian                42. Bulgarian
 43. Bashkir                   44. Vietnamese                45. Volapük
 46. Gan Chinese               47. Manx                      48. Gujarati
 49. Yoruba                    50. Occitan                   51. Scottish Gaelic; Gaelic
 52. Irish                     53. Galician                  54. Ossetian, Ossetic
 55. Oriya                     56. Walloon                   57. Swedish
 58. Silesian                  59. Lombard language          60. Divehi; Dhivehi; Mald...
 61. Danish                    62. German                    63. Armenian
 64. Haitian; Haitian Creole   65. Hungarian                 66. Croatian
 67. Bishnupriya Manipuri      68. Hindi                     69. Hebrew (modern)
 70. Portuguese                71. Afrikaans                 72. Pashto, Pushto
 73. Amharic                   74. Aragonese                 75. Bavarian
 76. Assamese                  77. Panjabi, Punjabi          78. Polish
 79. Azerbaijani               80. Italian                   81. Arabic
 82. Icelandic                 83. Ido                       84. Scots
 85. Sicilian                  86. Indonesian                87. Chinese Word
 88. Interlingua               89. Waray-Waray               90. Piedmontese language
 91. Quechua                   92. French                    93. Dutch
 94. Norwegian Nynorsk         95. Norwegian                 96. Western Frisian
 97. Upper Sorbian             98. Nepali                    99. Persian
100. Ilokano                  101. Finnish                  102. Faroese
103. Romansh                  104. Javanese                 105. Romanian, Moldavian, ...
106. Malay                    107. Japanese                 108. Russian
109. Catalan; Valencian       110. Fiji Hindi               111. Chinese
112. Cebuano                  113. Czech                    114. Chuvash
115. Welsh                    116. West Flemish             117. Kirghiz, Kyrgyz
118. Kurdish                  119. Kazakh                   120. Korean
121. Kannada                  122. Khmer                    123. Georgian
124. Sakha                    125. Serbian                  126. Albanian
127. Swahili                  128. Chechen                  129. Sundanese
130. Sanskrit (Saṁskṛta)      131. Venetian                 132. Northern Sami
133. Slovak                   134. Sinhala, Sinhalese       135. Bosnian-Croatian-Serbian
136. Slovene
from polyglot.text import Text

Polarity

To inquiry the polarity of a word, we can just call its own attribute polarity

text = Text("The movie was really good.")
print("{:<16}{}".format("Word", "Polarity")+"\n"+"-"*30)
for w in text.words:
    print("{:<16}{:>2}".format(w, w.polarity))
Word            Polarity
------------------------------
The              0
movie            0
was              0
really           0
good             1
.                0

Entity Sentiment

We can calculate a more sphosticated sentiment score for an entity that is mentioned in text as the following:

blob = ("Barack Obama gave a fantastic speech last night. "
        "Reports indicate he will move next to New Hampshire.")
text = Text(blob)

First, we need split the text into sentneces, this will limit the words tha affect the sentiment of an entity to the words mentioned in the sentnece.

first_sentence = text.sentences[0]
print(first_sentence)
The movie was really good.

Second, we extract the entities

first_entity = first_sentence.entities[0]
print(first_entity)
[u'Obama']

Finally, for each entity we identified, we can calculate the strength of the positive or negative sentiment it has on a scale from 0-1

first_entity.positive_sentiment
0.9375
first_entity.negative_sentiment
0

Citation

This work is a direct implementation of the research being described in the Building sentiment lexicons for all major languages paper. The author of this library strongly encourage you to cite the following paper if you are using this software.

@inproceedings{chen2014building,
title={Building sentiment lexicons for all major languages},
author={Chen, Yanqing and Skiena, Steven},
booktitle={Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Short Papers)},
pages={383--389},
year={2014}}