## Using TextRazor for Named Entities (and topics)<a name="_using textrazor for named entities (and topics)"></a>

Here is the site: https://www.textrazor.com/

You need to 
`pip install textrazor`
at the command line.

We are allowed to use my key for a limited number of queries per day. Don't abuse it.

Documentation: https://www.textrazor.com/tutorials

In [1]:
KEY = "see code in Module on BS"

In [2]:
import textrazor

import nlp_utilities as mytools

In [3]:
textrazor.api_key = KEY

In [4]:
client = textrazor.TextRazor(extractors=["entities", "topics"])
client.set_classifiers(["textrazor_newscodes"])

In [7]:
files = mytools.get_filenames("data/movie_reviews/positive/")

In [8]:
files

['data/movie_reviews/positive/cv670_tok-24009.txt',
 'data/movie_reviews/positive/cv671_tok-10077.txt',
 'data/movie_reviews/positive/cv672_tok-12350.txt',
 'data/movie_reviews/positive/cv673_tok-6552.txt',
 'data/movie_reviews/positive/cv674_tok-11591.txt',
 'data/movie_reviews/positive/cv675_tok-11864.txt',
 'data/movie_reviews/positive/cv676_tok-19999.txt',
 'data/movie_reviews/positive/cv677_tok-11867.txt',
 'data/movie_reviews/positive/cv678_tok-24352.txt',
 'data/movie_reviews/positive/cv679_tok-13972.txt',
 'data/movie_reviews/positive/cv680_tok-18142.txt',
 'data/movie_reviews/positive/cv681_tok-28559.txt',
 'data/movie_reviews/positive/cv682_tok-21593.txt',
 'data/movie_reviews/positive/cv683_tok-12295.txt',
 'data/movie_reviews/positive/cv684_tok-10367.txt',
 'data/movie_reviews/positive/cv685_tok-11187.txt',
 'data/movie_reviews/positive/cv686_tok-22284.txt',
 'data/movie_reviews/positive/cv687_tok-20347.txt',
 'data/movie_reviews/positive/cv688_tok-10047.txt',
 'data/movie_

In [9]:
texts = mytools.load_texts_as_string(files)

In [10]:
texts[files[0]]

'rated on a 4-star scale screening venue : odoen ( liverpool city centre ) released in the uk by uip on april 7 , 2000 ; certificate 15 ; 126 minutes ; country of origin usa ; aspect ratio 1 . 85 : 1 directed by stephen soderbergh ; produced by danny devito , michael shamberg , stacey sher . written by susannah grant . photographed by ed lachmann ; edited by anne v . coates . it\'s astounding , how many of us turn into lawyers when we get wrapped up in cases . i remember the louise woodward affair , for example , when i knew all the evidence from the courtroom pictures on tv , and was able to reel off detailed arguments crushing people who had just looked at her and assumed she was guilty . passion about justice is what makes people consider law careers ; it chills me when i see sell-outs trying to condemn obviously innocent people , or using dishonest tactics to help the guilty . stephen soderbergh\'s " erin brockovich " is a brilliant story of anger turning into courtroom skills ; a 

In [11]:
response = client.analyze(texts[files[0]])

In [12]:
list(response.entities())

[TextRazor Entity b'louise woodward' at positions [99, 100],
 TextRazor Entity b'ed lachmann' at positions [67, 68],
 TextRazor Entity b'coates' at positions [75],
 TextRazor Entity b'anne v' at positions [72, 73],
 TextRazor Entity b'Michael Shamberg' at positions [54, 55],
 TextRazor Entity b'roberts' at positions [693],
 TextRazor Entity b'United International Pictures' at positions [19],
 TextRazor Entity b'Aspect ratio' at positions [37, 38],
 TextRazor Entity b'Liverpool' at positions [10],
 TextRazor Entity b'4-star' at positions [3],
 TextRazor Entity b'odoen -LRB- liverpool' at positions [8, 9, 10],
 TextRazor Entity b'Solicitor' at positions [324],
 TextRazor Entity b'2000-04-07T00:00:00.000-04:00' at positions [21, 22, 23, 24],
 TextRazor Entity b'Edward L. Masry' at positions [329, 330],
 TextRazor Entity b'Albert Finney' at positions [332, 333],
 TextRazor Entity b'erin' at positions [382],
 TextRazor Entity b'david' at positions [194],
 TextRazor Entity b'Erin Brockovich 

In [13]:
entities = list(response.entities())
entities.sort(key=lambda x: x.relevance_score, reverse=True)
seen = set()
for entity in entities:
    if entity.id not in seen:
        print(entity.id, entity.relevance_score, entity.confidence_score, entity.freebase_types)
        seen.add(entity.id)

Erin Brockovich (film) 0.7564 13.43 ['/award/award_winning_work', '/award/award_nominated_work', '/award/ranked_item', '/film/film', '/media_common/netflix_title']
Pacific Gas and Electric Company 0.4017 2.287 ['/business/employer', '/business/business_operation', '/exhibitions/exhibition_sponsor', '/business/issuer', '/organization/organization']
Lawsuit 0.3914 7.133 ['/book/book_subject', '/media_common/quotation_subject', '/internet/website_category']
Steven Soderbergh 0.3777 6.674 ['/award/award_winner', '/film/cinematographer', '/people/person', '/book/author', '/tv/tv_program_creator', '/film/writer', '/film/editor', '/tv/tv_director', '/organization/organization_founder', '/celebrities/celebrity', '/film/person_or_entity_appearing_in_film', '/award/award_nominee', '/film/actor', '/tv/tv_actor', '/film/producer', '/film/director', '/theater/theater_director', '/tv/tv_producer']
Lawyer 0.3547 5.662 ['/book/book_subject', '/media_common/quotation_subject', '/business/industry', '/o

In [14]:
for topic in response.topics():
    if topic.score > 0.5:
        print(topic.label, topic.score)

Erin Brockovich (film) 1
Erin Brockovich 0.6715
Steven Soderbergh 0.6689
Pacific Gas and Electric Company 0.6487
Lawyer 0.6281
Law 0.6175
Lawsuit 0.6075
Justice 0.5833
Social institutions 0.5271
Government 0.5047


In [15]:
for category in response.categories():
    print(category.category_id, category.label, category.score)

02002001 crime, law and justice>judiciary (system of justice)>lawyer 0.5589
02008001 crime, law and justice>trials>litigation 0.5073
02002000 crime, law and justice>judiciary (system of justice) 0.5013
01005000 arts, culture and entertainment>cinema 0.4047
02006000 crime, law and justice>laws 0.3924
02008000 crime, law and justice>trials 0.3915
02000000 crime, law and justice 0.3867
02009000 crime, law and justice>prosecution 0.3845
01013000 arts, culture and entertainment>photography 0.3839
04010003 economy, business and finance>media>cinema industry 0.3502


# Twitter API<a name="_twitter api"></a><a name="_twitter api"></a>

If you want to use twitter's API to collect your own tweets about a subject, here is a good document:

http://socialmedia-class.org/twittertutorial.html

It requires you to create your own API keys with your own twitter account.

Note: You can 'pip install twitter' instead of the setup.py instructions shown there.

In [20]:
import json
import tweepy


# Variables that contains the user credentials to access Twitter API
# register for these at apps.twitter.com.  You need to register to get your own.
ACCESS_TOKEN = 'yours here'
ACCESS_SECRET = 'yours here'
CONSUMER_KEY = 'yours here'
CONSUMER_SECRET = 'yours here'

auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_SECRET)

api = tweepy.API(auth)


In [23]:
query = '#python'
max_tweets = 10
searched_tweets = [status for status in tweepy.Cursor(api.search, q=query).items(max_tweets)]

In [24]:
searched_tweets

[Status(metadata={'result_type': 'recent', 'iso_language_code': 'en'}, user=User(statuses_count=328, profile_background_image_url='http://abs.twimg.com/images/themes/theme1/bg.png', profile_text_color='333333', name='Rubén Valseca', profile_use_background_image=True, friends_count=185, default_profile=True, contributors_enabled=False, profile_image_url_https='https://pbs.twimg.com/profile_images/670737658444926976/f19pwCqw_normal.jpg', notifications=False, followers_count=216, id=885512094, translator_type='none', profile_background_color='C0DEED', time_zone=None, geo_enabled=False, protected=False, location='Madrid', _json={'statuses_count': 328, 'entities': {'description': {'urls': []}}, 'default_profile': True, 'profile_text_color': '333333', 'name': 'Rubén Valseca', 'profile_use_background_image': True, 'notifications': False, 'profile_link_color': '1DA1F2', 'friends_count': 185, 'profile_image_url': 'http://pbs.twimg.com/profile_images/670737658444926976/f19pwCqw_normal.jpg', 'con

In [30]:
# You will then have to parse the results to get the text you want.

for Status in searched_tweets:
    print("User: " + Status.user.screen_name, "Tweet:" + Status.text)

User: rubnvp Tweet:RT @_pi0_: This is not an April fools joke! You can build @vuejs and @nuxt_js apps using #Python / HTML and CSS using this module 😲 Made by…
User: pythonbot_ Tweet:Mastering Social Media Mining With Python  https://t.co/yONK09frgl #learning #python #pythonbot_
User: LearntoPython Tweet:Python for Financial Analysis and Algorithmic Trading
☞ https://t.co/ob5qrSUz1a
#python
S1rhTw2IG https://t.co/4ImFjKJ9Lg
User: byLilyV Tweet:40 #bestseller #udemy #courses

13. #Machine Learning A-Z™: Hands-On #Python &amp; #R In #Data #Science… https://t.co/zNOX8lhuy7
User: Python_Udemy Tweet:Computer Programming For Beginners Learn Python Programming
☞ https://t.co/lkgQUAQnx9
#Python
BJbmiHFqLM https://t.co/XFwdm5yXYt
User: 4383hberaud Tweet:RT @juldanjou: Wondering how you can make your application 6x time faster with asyncio? Here's an example:
https://t.co/D9QEQrMg8Y
#python…
User: Kholdo Tweet:Segundo post!, continuando con la agrupación de datos con Python. https://t.co/rSOhKN2

What's in twitter data: https://dev.twitter.com/overview/api/tweets

Incredibly cool: https://dev.twitter.com/overview/api/entities-in-twitter-objects