## All about (or a lot about... or some about) bots!

Today we wanted to give you a little overview of the kinds of bots that are circulating in the world, providing a taxonomy, if you will, of all the innovative work being done. But first, what is a bot? 

We'll draw mostly from [botnerds.com](http://botnerds.com) and the [article by Mark Sample](https://medium.com/@samplereality/a-protest-bot-is-a-bot-so-specific-you-cant-mistake-it-for-bullshit-90fe10b7fbaa#.ikymcdf6q). The former reference offers a practical definition...
>Bots are software programs that perform automated, repetitive, pre-defined tasks.  These tasks can include almost any interaction with software that has an API.

...while Mark Sample is more aspirational.
>A computer program that reveals the injustice and inequality of the world and imagines alternatives. A computer program that says who’s to praise and who’s to blame. A computer program that questions how, when, who and why. A computer program whose indictments are so specific you can’t mistake them for bullshit. A computer program that does all this automatically.

![Bot](http://21clradio.com/wp-content/uploads/2016/02/Kiddle-Logo.png)

First, the practical. Our first reference divides bots into "Good" and "Bad" categories (sorry Kasiana), which are interpreted largely in terms of their effect on our information ecosystem -- functionally, adding or detracting.

1. Good Bots
    * **Chatbots** "are designed to carry on conversations with humans, usually just for fun, and to test the limits of the technology."
    * **Crawlers** "run continuously in the background, primarily fetch data from other APIs or websites, and are *well-behaved* in that they respect directives you give them"
    * **Transactional bots** act as agents on behalf of humans, and interact with external systems to accomplish a specific transaction, moving data from one platform to another."
    * **Informational bots** "surface helpful information, often as push notifications, and are also called *news bots*."
    * **Art bots** "are designed to be appreciated aesthetically."
    * **Game bots** "game bots function as characters, often for humans to play against or to practice and develop skills..."
2. Bad Bots
    * **Hackers** "distribute malware of all kinds"
    * **Spammers** "steal content (email addresses, images, text, etc)from other website", often to republish it
    * **Scrapers** post "promotional content around the web, and ultimately drive traffic to the spammer’s website"
    * **Impersonators** "mimic natural user characteristics, making them hard to identify" (they cite [political propadanda bots](http://www.businessinsider.com/political-bots-by-governments-around-the-world-2015-12/#mexico-1))
    
In terms of Bot Agency or Bot Intelligence, this framing presents examples along a spectrum -- Script Bots, Smart Bots and Intelligent Agents. An interaction with a Script Bots, they write, is

>based off of a pre-determined model (the “script”) that determines what the bot can and cannot do.  The “script” is a decision tree where responding to one question takes you down a specific path, which opens up a new, pre-determined set of possibilities. 

Upping the autonomy a little, Smart Bots have access to other APIs that expand the universe of responses.

>Many bots have a heavy server-side processing component, which allows them access to massive computing power in understanding and responding to queries.  Couple that with the open-sourcing of AI software libraries like Theano and TensorFlow, and you have the ingredients for some amazing human-bot interactions.

This category also allows for human-assisted interactions. The bot need not act alone, but can invoke human intelligence or even direct consultation, redirecting the interaction to a responsible human. Finally, the Intelligent Agent is meant to act autonomously.

> If operating correctly, they should require no human intervention in order to perform their tasks correctly.  Google’s self-driving cars are designed without steering wheels for humans, because they shouldn’t be necessary.  x.ai has a bot that schedules meetings for you, Amy Ingram, and she manages all the back-and-forth with zero oversight.

Mark Sample provides a different, less practical characterization of bots, one drawn more from literary studies (where bots have been an object of fascination for some time). He focuses on one particular kind of bot (primarily active on Twitter) that he terms "Protest Bots" or "Bots of Conviction". Sample says they share at least five characteristics: 

* **Topical** - "They are about the morning news — and the daily horrors that fail to make it into the news."
* **Data-based** - "[They don’t make this [stuff] up. They draw from research, statistics, spreadsheets, databases." 
* **Cumulative** - [It is the nature of bots to do the same thing over and over again, with only slight variation...  The repetition builds on itself, the bot relentlessly riffing on its theme, unyielding and overwhelming, a pile-up of wreckage on our screens."
* **Oppositional** - "[B]ots of conviction challenge us to consider our own complicity in the wrongs of the world."
* **Uncanny.** - "Protests bots often reveal something that was hidden; or conversely, they might purposefully obscure something that had been in plain sight."

The examples of Protest Bots that Sample introduces are often journalistic, but often more in the realm of advocacy. Still, the examples open up the potential to the kinds of projects or actions that can be taken outside the more functional description of our first reference. 

This weekend, you will be imaginging and making your own bots and we hope this outline has helped prime the pump. We will be on this topic for another week or so and will, with Suman, venture into the area of converational bots.




## Creating Our Bot Account & App

**1) Create a New Twitter User for Your Bot**

Before we get started, you'll want to create a new Twitter account for your bot! It's best to not create a bot out of your personal Twitter account, but I will leave that up to you!

Go to [https://twitter.com/signup](https://twitter.com/signup) (you'll have to log out of your normal account or go incognito) and create a new account. **You will need to use your phone number when siging up** or you wont be able to create a new Twitter app in next step.


**2) Create a New Twitter App for Your Bot**

You are a pro at this! Once you have created your new Twitter account, create a new Twitter app for your bot.

1. Go to [https://apps.twitter.com](https://apps.twitter.com/) and log in with your *personal* Twitter user account. 2. Click “Create New App”
3. Fill out the form, agree to the terms, and click “Create your Twitter application”
4. Click on “Keys and Access Tokens” tab, and copy your “API key” and “API secret”. Scroll down and click “Create my access token”, and copy your “Access token” and “Access token secret”.

Once you have your tokens, copy them below.

In [None]:
# insert your own keys and secrets here...or just use Mark's! he won't mind
CONSUMER_KEY = ""
CONSUMER_SECRET = ""
ACCESS_TOKEN = ""
ACCESS_TOKEN_SECRET = ""

In [None]:
# before we can make Twitter API calls, we need to initialize a few things...
from tweepy import OAuthHandler, API

# setup the authentication
auth = OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)

# create an object we will use to communicate with the Twitter API
api = API(auth)

# call the "me" api to make sure you using the Twitter api as your bot
print 'Ok, we are ready to tweet as ' + api.me().screen_name

### Send a Tweet From Our Bot!

We will be using the `statuses/update` api to send the tweet: https://dev.twitter.com/rest/reference/post/statuses/update

In [None]:
# your bot's first tweet
api.update_status(status='hello world!')

## Great! Now let's take a quick detour....

In a few minutes, we are going to create a simple "news" bot: a bot that tweets out the latest stories from The New York Times. But, how are we going to get the NYTimes latest stories? Let's turn to [RSS](https://en.wikipedia.org/wiki/RSS).

Most news sites on the internet publish an RSS feed. Here are a few:

[New York Times "HomePage"](http://rss.nytimes.com/services/xml/rss/nyt/HomePage.xml)

[Wired](https://www.wired.com/feed/)

[WNYC Radio Lab Podcast](http://feeds.wnyc.org/radiolab)

To use RSS feeds in our code, we're going to use the python module called [Feedparser](https://pypi.python.org/pypi/feedparser). `Feedparser` does the hard work of fetching and parsing the feeds for us. RSS feeds can be very messy and this module does an amazing job of dealing with the mess and handing us a nice python object (`dictionary`) to work with! 

Let's install the module and start working with some RSS feeds.

Use the following to install the `feedparser` module:

In [None]:
%%bash
pip install feedparser

Now let's look at a short example of how we can fetch the RSS feed for The New York Times "HomePage" stories.

The "HomePage" RSS feed can be found here: [http://rss.nytimes.com/services/xml/rss/nyt/HomePage.xml](http://rss.nytimes.com/services/xml/rss/nyt/HomePage.xml)

The code below uses the `feedparser` module to fetch the RSS feed (remember HTTP requests?), parse it and return it as a python dictionary. This module does the hard work for us.

In [None]:
# let's fetch the New York Times Homepage RSS Feed
import feedparser

# the URL of the homepage stories RSS feed
nytimes_rss_url = 'http://rss.nytimes.com/services/xml/rss/nyt/HomePage.xml'

# fetch the RSS feed and parse it
feed = feedparser.parse(nytimes_rss_url)

# what type of object are we dealing with?
print type(feed)
print feed.keys()

Let's look at the `feed` information...

In [None]:
print feed['feed']

Now, lets look at the `entities`, which is a list of the stories found in the feed:

In [None]:
# now, let's print out the stories (titles and urls) in the RSS feed
for entry in feed['entries']:
    print entry['title']
    print entry['link']
    print '--'

Let's take a break from "news" and look at the RSS feed for [Atlas Obscura](http://www.atlasobscura.com/) (a lovely site about travel)

In [None]:
import feedparser

rss_url = 'http://www.atlasobscura.com/feeds/latest'
feed = feedparser.parse(rss_url)

for entry in feed['entries']:
    print entry['title']
    print entry['link']
    print '---'

## OK, Let's Get Back to Our Bot!

If we wanted to tweet out the latest story from Atlas Obscura, we combine our Twitter and our RSS/feedparser examples:

In [None]:
from tweepy import OAuthHandler, API
import feedparser

# setup the authentication
auth = OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)

# create an object we will use to communicate with the Twitter API
api = API(auth)

# now, get the Atlas Obscura feed
rss_url = 'http://www.atlasobscura.com/feeds/latest'
feed = feedparser.parse(rss_url)

# let's take only the 1st story in our list
first_story = feed['entries'][0]

# now, create the text of the tweet using the story title and link/url
tweet_text = 'This is really interesting! ' + first_story['title'] + ' ' + first_story['link']
print tweet_text

In [None]:
# and now, tweet it!
api.update_status(status=tweet_text)

## Next Up: Let's Clone the @twoheadlines Bot

[Darius Kazemi](https://twitter.com/tinysubversions) created a clever bot called [@twoheadlines](https://twitter.com/twoheadlines) where he combines two different headlines in to a single tweet:

> Comedy is when you take two headlines about different things and then confuse them

Let's do a simple clone of the `@twoheadlines` bot by combining the first half of a New York Times headline with the second half of a Breitbart headline :-)

In [None]:
import feedparser

# fetch the nytimes and breitbart RSS feeds
nytimes_rss_url = 'http://rss.nytimes.com/services/xml/rss/nyt/HomePage.xml'
breitbart_rss_url = 'http://feeds.feedburner.com/breitbart'

nytimes_feed = feedparser.parse(nytimes_rss_url)
breitbart_feed = feedparser.parse(breitbart_rss_url)

# get the first story from each of the two feeds
nytimes_first_story = nytimes_feed['entries'][0]
breitbart_first_story = breitbart_feed['entries'][0]

print 'nyt: '+ nytimes_first_story['title']
print 'b: ' + breitbart_first_story['title']

In [None]:
# combine the two headlines into a single headline
nytimes_words = nytimes_first_story['title'].split(' ')
breitbart_words = breitbart_first_story['title'].split(' ')

# take the 1st half of the nytimes "words" plus the second half of the breitbart "words
new_words = nytimes_words[:len(nytimes_words)/2] + breitbart_words[len(breitbart_words)/2:]

# this is python weirdness to take a list of words
# and join them together with a space between each word
new_headline = ' '.join(new_words)
print new_headline

** Now, your bot can tweet the combined headline! **

In [None]:
api.update_status(status=new_headline)

### OK, that's cute but how can we create a long-running bot?

Everything we've done up to now just runs once and then exits/stops. Let's look at how we can have something run forever - our bot doesn't need to sleep much!

Python has a great [`time`](https://docs.python.org/2/library/time.html), which handles various time-related functions (duh!). The `time` module also has a very helpful method called `sleep()`, which tells our program to sleep, or "pause", for a number of seconds. Let's take a look at it:


In [None]:
# the time module allows us to "sleep" or pause for a given number of seconds
import time

# loop 10 times, pausing for 1 second during each iteration
for number in range(0, 10):
    print number
    
    # sleep for one second
    time.sleep(1)
    
print 'done!'

We can add a simple "forever" loop to get our script to run until we stop it. The code below will loop forever, pausing for 1 second, until you hit the stop button in your notebook.

In [None]:
# the time module allows us to "sleep" or pause for a given number of seconds
import time

# loop forever!
while True:
    print 'hello'
    
    # sleep for one second
    time.sleep(1)
    
# to get this to stop, hit the Stop button in your notebook

### Let's put it all together and build our news bot

This is a very simple "news" bot, which will tweet out new top stories from The New York Times. The bot will check the NYTimes HomePage RSS feed every 10 seconds - if it sees a new story, it will tweet it.

I'm also adding some super complicated AI, to add some color-commentary to each story that our bot tweets.

This code uses a new module called [`random`](https://docs.python.org/2/library/random.html), which makes it easy to randomly select an item from a `list`.

*So you don't put extra stress on The New York Times servers, you should sleep every 60 seconds (at least). We are only sleeping for 10 seconds here for demo purposes.*

In [None]:
# this "bot" will tweet out any new stories published in the nytimes homepage
import time    
import feedparser
import random

insightful_things_to_say = [
    'this is really interesting',
    'great read -->',
    'hmmm....',
    'amazing',
    'how does this happen?',
]

nytimes_rss_url = 'http://rss.nytimes.com/services/xml/rss/nyt/HomePage.xml'

# keep track of the previous nytimes link/url that we tweeted
prev_tweeted_link = ''

# loop forever!
while True:
    
    # fetch and parse the NYTimes RSS feed
    nytimes_feed = feedparser.parse(nytimes_rss_url)

    # get the first story
    first_story = nytimes_feed['entries'][0]

    # take the link of the first story and see if we've tweeted it before
    link = first_story['link']
    if link != prev_tweeted_link:
        # it's new, lets tweet it out!
        print 'new story - lets tweet it: ' + link
  
        # build the text of our tweet
        tweet_text = random.choice(insightful_things_to_say) + ' ' + first_story['title'] + ' ' + first_story['link']
        
        # fire it off to twitter
        api.update_status(status=tweet_text)
        
        # keep track of the this link that we just tweeted
        prev_tweeted_link = link
    else:
        # we've already tweeted this...no new stories
        # nothing to do
        print 'no new story...lets wait a little while'

    # sleep for a little while
    time.sleep(10)

    
# if you want to stop this script, hit the Stop button in your notebook

## Some computing tools for programming with "language"

The Twitter bot Mike prepared relies on mashing up two headlines. Some of that might get better if we knew a little about what the headline described. What is the subject? What action is described? Some of these questions are addressed by a field of computer science (well, computational linguistics) called Natural Language Processing. There are plenty of tools in Python for making use of the fruits of this research. 

We will be using a package called [TextBlob](https://textblob.readthedocs.io/en/dev/) that is a simplified version of the Natural Language Toolkit in Python. (Sometimes tools become really powerful for practitioners and leave non-experts behind. That's what has happened, to some extent, with the NLTK. It's a little hard to just "jump in". And so TextBlob is like computational training wheels.) [Allison Parrish's Natural Language Basics with TextBlob](http://rwet.decontextualize.com/book/textblob/) is a great place to read about what TextBlob is good for. 

First, we need to install the package. Off to PIP!

In [None]:
%%bash
pip install TextBlob

In [None]:
from nltk import download
download('brown')
download('punkt')
download('maxent_ne_chunker')
download('words')
download('conll2000')
download('maxent_treebank_pos_tagger')
download('averaged_perceptron_tagger')

Then, load the package for this session and bring in a headline from todays New York Times. We read it in as a string but preface the quotes with a "u". That tells Python the string is in Unicode -- publishers use fancy quotation marks, for example, that are not the simple " or '. 

The TextBlob() function takes text and turns it into a "TextBlob" object.

In [None]:
from textblob import TextBlob

headline = u"After Election, Trump’s Professed Love for Leaks Quickly Faded"
tb = TextBlob(headline)

type(tb)

In [None]:
tb

The TextBlob object has a number of attribures that have processed the text. The simplest are lists of words and sentences. Here we pull just the words.

In [None]:
tb.words

This is obviously a better approach than the one we took when we just split a string on spaces -- a technique that didn't handle punctuation like commas and periods well. OK that's a good trick but there are better ones! For example, TextBlob's language processing let's it estimate which words are part of noun phrases. 

There are various techniques for doing this and none of them are perfect. To be fair, using a headline means using a text fragment and not a sentence. The language processing tools are usually trained on full sentences of text. Still, it's not bad.

In [None]:
tb.noun_phrases

Noun phrases are obtained by extracting information from a "tagged" version of the text. Here the tags represent parts of speech. You can see [a complete list of the tags here.](https://cs.nyu.edu/grishman/jet/guide/PennPOS.html) The parts of speech are stored as a list of word-tag pairs.

In [None]:
tb.tags

In [None]:
type(tb.tags[0])

The .tags attribute is a list. (See the square brackets?) The list elements are a new data type called a "tuple" which is like a list, for our purposes. So you can take, say the first element of the tags list and look at the first and second elements of the tuple (the word and its estimates part of speech).

In [None]:
tb.tags[0]

In [None]:
tb.tags[0][0]

In [None]:
tb.tags[0][1]

While I'm not wild about it, TextBlob also provides an estimate of the sentiment of the statement. That is, is the text expressing a positive or negative sentiment. I'll leave you to consult the Parrish blog post or the TextBlob documentation of this lovely feature.

In [None]:
tb.sentiment

Here we do the same thing to a different headline. Mashing them up might mean replacing one noun phrase with another. How might you do that?

In [None]:
headline2 = u"Trump Vows to Catch ‘Low Life Leakers’ in Washington D.C."
tb2 = TextBlob(headline2)
tb2.tags

In [None]:
tb2.noun_phrases

One last thing. There are various methods to "parse" text -- different algorithms for tagging words in a sentence, for extracting noun phrases and for estimating sentiment. You can replace the default when you call TextBlob. The documentation describes other noun phrase extractors. Here's how you would use the ConllExtractor, based on a data set compiled for the Conference on Computational Natural Language Learning (CoNLL-2000).

In [None]:
from textblob.np_extractors import ConllExtractor
extractor = ConllExtractor()

tb = TextBlob(headline,np_extractor=extractor)
tb.noun_phrases