## Getting Pocket Recommendations with Twilio, Pocket, & Spacy

I currently have 92 articles in my Pocket. I often have trouble deciding what to read next, so I end up spending more time than I wish picking an article out. Instead of wasting all that time searching, I thought, _maybe I can build a tool that does that for me?_ 

And thus, I decided to build this idea out with Twilio. In this tutorial, we'll not only use the Twilio and Pocket APIs, but we'll also use Python's SpaCy module to match a user's requested topic.  


### Environment Setup 

But before we even get started, we have to set our environment up. This guide was written in Python 3.6. If you haven't already, download [Python](https://www.python.org/downloads/) and [Pip](https://pip.pypa.io/en/stable/installing/). Next, you’ll need to install several packages that we’ll use throughout this tutorial on the command line in our project directory:

``` 
pip3 install pocket==0.3.6
pip3 install requests==2.5.0
pip3 install spacy==1.9.0
```

We'll be using the Pocket API, which requires you first have an account. If you don’t already have one, well, first, you’ve been missing out! Secondly, sign up on their homepage [here](). This tutorial does become more exciting as you add more and more content. If you’re a new user, that’s okay! It’s up to you whether you want to add anything. 

Using [this]() link, we’ll make an application and generate API keys. You can name it whatever you want, but the platform we’ll be using is Web. For the purposes of this tutorial, we’ll also only need retrieval permission, but if you want to add more functionality for future reference, feel free!

This should redirect you to a My Applications page. Click on the application you just created, which will redirect you to a new page that contains your consumer key right at the top, as shown below.

Next, we need a request key to ultimately retrieve our access token. We can do this with [this link].

```
python -m spacy download en
```

### Getting Started

Assuming you've successfully generated your pocket API keys, we can call the pocket client to begin. Your keys are associated with your account, so these are what will provide you with the data needed for this exercise. 

In [None]:
from pocket import Pocket

p = Pocket(
consumer_key='73820-b7626621174f19626b4f04fe',
access_token='9bbd62c8-8fa2-f1ce-df16-c1d7c9'
)

If you look at the pocket documentation, you'll see that the `get()` method has a few parameters you can utilize. For the purposes of what we're trying to do, we'll set the parameters `since` and `state`. *Since* allows us to select a date from which to pull data from. And since we're trying to figure out how much we've read this year so far, we'll set `state` to `archive`. 

In [73]:
def get_articles():

	api_call = p.get(contentType='article')
	articles = api_call[0]['list']
	article_info = {}

	for i in articles:
		article_info[articles[i]['resolved_url']] = [ articles[i]['given_title'], articles[i]['excerpt'] ]
	return article_info


## Information Extraction

Information Extraction is the process of extracting the meaning from text, computationally. To accomplish this, we have to take the unstructured text and find a way to convert it into _structured_ data. With that said, information extraction is the means by which you acquire structured data from a given unstructured dataset. There are a number of ways in which this can be done, but generally, information extraction consists of searching for specific types of entities and relationships between those entities. 

An example is being given the following text, 

```
Martin received a 98% on his math exam, whereas Jacob received a 84%. Eli, who also took the same test, received an 89%. Lastly, Ojas received a 72%.
```

This is clearly unstructured. It requires reading for any logical relationships to be understood. Through the use of information extraction techniques, however, we could output structured data such as the following: 

```
Name     Grade
Martin   98
Jacob    84
Eli      89
Ojas     72
```

## Named Entity Extraction

Named entities are nouns that refer to specific types of individuals, such as organizations, people, dates, etc. Therefore, the purpose of a named entity recognition (NER) system is to identify all textual mentions of the named entities. More specifically, we'll build our own named entity recognition system with the Python module `spaCy`, a Python module commonly used for Natural Language Processing in industry. 

In [62]:
import spacy

Using spaCy, we'll load the built-in English tokenizer, tagger, parser, NER and word vectors. We indicate this with the parameter `'en'`:

In [74]:
def get_entities(article_info):

	nlp = spacy.load('en')
	article_entities = {'Default': []}

	for i in article_info:
	    doc = " ".join(article_info[i])
	    entities = nlp(doc)
	    if len(list(entities.ents)) == 0:
	        article_entities['Default'].append(i)
	        continue
	    for j in list(entities.ents):
	        try:
	            article_entities[str(j).lower()].append(i)
	        except:
	            article_entities[str(j).lower()] = [i]
	return article_entities

We need an example to actually process, so below is some text from Columbia's website. With this example in mind, we feed it into the tokenizer.

In [75]:
def get_entities(article_info):

	nlp = spacy.load('en')
	article_entities = {'Default': []}

	for i in article_info:
	    doc = " ".join(article_info[i])
	    entities = nlp(doc)
	    if len(list(entities.ents)) == 0:
	        article_entities['Default'].append(i)
	        continue
	    for j in list(entities.ents):
	        try:
	            article_entities[str(j).lower()].append(i)
	        except:
	            article_entities[str(j).lower()] = [i]
	return article_entities

In [None]:
from flask import Flask, request, redirect
from twilio.twiml.messaging_response import MessagingResponse
import random 

app = Flask(__name__)


@app.route("/sms", methods=['GET', 'POST'])
def sms_reply():
	current_article_set = get_articles()
	article_entities = get_entities(current_article_set)

	body = request.values.get('Body', None)


	if len(body) == 0:
 		article = article_entities['Default'][random.randint(0,len(article_entities['Default']))]
	else:
		try:
			article = article_entities[body.lower()][random.randint(0, body.lower())]
		except: 
			article = article_entities['Default'][random.randint(0, len(article_entities['Default']))]

	resp = MessagingResponse()

	resp.message(article)

	return str(resp)

if __name__ == "__main__":
    app.run(debug=True)