A little drill
-------------

As promised, your drill will have a little rudimentary work and then a much more open-ended assignment. The "basics" this time have to do with blocks of code and navigating dictionaries and lists. Skim this if you are feeling Python-powerful. Read more closely if you have been hanging on. 

**Blocks of code: Loops and conditional evaluation**

A block of code is nothing more than a group of Python commands. Typically, this group hangs together because, when executed in sequence, they perform be a single high-level task. Most modern programming languages have some kind of block structure. Python identifies blocks through common indenttion. This language requirement, forcing common indentation, also makes the code is more readable.

![stack](http://www.python-course.eu/images/blocks.png)

"So, how does it work? All statements with the same distance to the right belong to the same block of code, i.e. the statements within a block line up vertically. The block ends at a line less indented or the end of \[your notebooks' code cell\]. If a block has to be more deeply nested, it is simply indented further to the right... There is another aspect of structuring in Python, which we haven't mentioned so far, which you can see in the example below. Loops and Conditional statements end with a colon ":" - the same is true for functions and other structures introducing blocks. So, we should have said Python structures by colons and indentation." (Cribbed from the [Python Tutorial](http://www.python-course.eu/python3_blocks.php))

**Conditional expressions**

We have seen examples in which blocks of code are executed only if certain conditions apply. These are called "conditional blocks" and we'd like to document them formally here. In the cell below we have a simple example -- the code is indented to the same level (and the notebook helps you here) is all to be executed if the expression between the "if" and the colon ":" is true. You can put any Boolean expression in here, including the operators like "and" and "or" and "not". 

In [None]:
x = "Pecan Pie Vending Machine in Cedar Creek, Texas"

if "Pie" in x:
    x = x.replace("Pie","Sandies")
    x = x.upper()
    print x

In [None]:
x = "Pecan Pie Vending Machine in Cedar Creek, Texas"

"Pie" in x

Change x above to different strings and make sure it does what you think it should. Next, here is another example where we take one of two actions depending on whether the initial condition is true or false. There's a lot of indentation in this example, corresponding to **nested blocks** -- you only get to the "if" statement asking whether x is larger than 8, if x is larger than 5. Make sure you understand what is getting executed when.

In [None]:
x = 10

if x > 5:
    print "big number"
    if x>8:
        print "it's really big"
else:
    print "small number"

Again, change x to a few different values and make sure you understand what's happening. 

Finally, we can specify as many subconditions to an "if" statement. That is, it's not just if-elese, it's if-elif-elif-...elif-else. In the cell below, if x is larger than 25 we print out "really big". On the other hand, if it's just bigger than 10 (between 10 and 25), we print out "big" and otherwise (x is less than or equal to 10) we call it "small".  Change the values of x to make sure you know whata this is doing. Try adding other conditions.

Note that in this case, the subconditions are putting tighter constraints on the value of x. This is common.

In [None]:
x = 7

if x > 25:
    print "really big"
elif x > 10:
    print "big"
else:
    print "small"

**Loops**

Another common reason for using blocks is that we'd like to repeat the group of operations several times, with inputs that we specify. The "for loop" basically iterates over a list-like data set and executes the following code block with each data point. For example, range() returns a set of integers in, well, a given range. You can specify a start, and end and an increment. **The result in each case is a list of integers.** 

The list runs from the start value, up to but not including the end value, in steps of the increment.

In [None]:
# a list of 0 through 8
print range(9)

# a list from 5 to 19
print range(5,20)

# a list from 30 to 48 in steps of 2
print range(30,50,2)

The simplest "for loop" just iterates over a list. In the cell below, have our source data as range(10) or the integers from 0 through 9. The loop proceeds by successively assigning the variable "i" to each element of the list of integers. First, "i" stands for the number  0, then "i" is 1 then i is 2 and so on up until 9. **With each new value, we execute the code in the block**, here just the single line that prints the value of "i".

Because it is a variable name, "i" could have been any name. "pineapple" or "sheep" or "diet_coke" would all work if you substituted every occurence of the name "i" with your new choice.

In [None]:
for i in range(10):
    print i

Here we run through the integers 1 though 10 and test if the number is odd or not, printing one thing if it is and another if it's even. Notice that again we have **nested blocks**. The print conditional block is nested in the looping block. We also have a new operator here "%" -- a%b returns the remainder of the division of a by b. (And so an even number has 0 remainder after division by 2).

In [None]:
for i in range(1,11):
    
    if i % 2 == 0:
        print str(i) +" an even number"
        
    else:
        print str(i) + " an odd number"

Our print statement exhibits a string that is the sum of two parts. One is  " an odd number" or " an even number". The other is str(i). The command str() takes any object and turns it into a string. So the nuber 1 becomes the string "1" that you can then concatenate with the other string in the printout statement. 

If we didn't use str(), you'd get an error. Python doesn't know what the command below means -- the error that's printed makes this clear.

In [None]:
1+"even"

So all of this is a little boring but it's a good start. You can iterate over lots of different kinds of things, chiefly lists. Remember, they store data in order so the loop below assigns each name from the list "students" successively to the variable name s. It then carries that value into the loop block and creates a slightly lame sentence with the indicated name. So we start with s being "Juan" and end with s being "Yingying."

In [None]:
students = ["Ajibola","Elise","Megan","Jake","Inti","Kasiana","Siqi"]

for student in students:
    
    drill = student + " is learning about code blocks."
    print drill
    print "---"
    

Again, "student" is the name of a variable and is arbitrary -- we could have used anything. Replace "student" with the letter s or the word light_bulb. 

Before we finish iteration, there is one other kind of construction that loops. The "while" loop will continue executing until some condition is satisfied. You might want to run through a list of sentences and print out the first that is less than 140 charaters, or one that contains the word "Pecan". 

Below we will use the command sample() from the "random" package. The package contains a number of tools for generating random variables. For example, sample() -- as its name might suggest -- takes a list as an argument and, in computer style, puts the contents of the list into a hat and pulls out some number of the elements at random, a number you specify. Here we take 3 from the list of integers from 0 to 9, or we take a single student name from the list of students. 

Execute this code several times to make sure you see what it's doing.

In [None]:
from random import sample

# 3 draws from the collection 0,...,9
print sample(range(10),3)

# 2 draws from our list of students
print sample(students,2)

Notice that sample() returns a list. So if we ask for 1 randomly selected element, we will get a list with one element. Often we don't want a list with one element, but we want the student name, say, that we selected. You can do this with the following command (where the square braces ask for the entry with index 0, the first and only element in the list).

In [None]:
# 1 randomly selected student name
student = sample(students,1)
print student

# 1 randomly selected student name, but a string
print student[0]

We can do this in one line too. Here we select a single item from the list ["H","T"]. It's like a 50-50 coin toss everytime you execute the code below. Each time you run it, Python puts the "T" and "H" in a hat, mixes it up and selects one. Try it a few times!

In [None]:
sample(["H","T"],1)[0]

Below we will use the command sample() from the "random" package to pick either "H" or "T" with 50% chance for each and print out how many "coin tosses" it took to get the first "H". We will use the counter "count" (again an arbitrary name, but  which we start as 0) and increment it each time we toss something other than a "H". 

The code starts with a flip. If it was "H", then we never execute the "while" loop. If it was tails, "T", we go into the loop and keep flipping until we get a "H". Got it? Execute this a few times and make sure you understand what it's doing.

In [None]:
flip = sample(["H","T"],1)[0]
print flip
count = 1

while flip == "T":
    flip = sample(["H","T"],1)[0]
    print flip
    count = count + 1
    
print "--->", count, "flip(s)"

This while-looping is pretty common action. We want to do things until a condition is satisfied. We might search through words or sentences. We might want to add words to a string until we are at 140 characters and so on.

**1. Write a loop (while or for) that runs through the names of students in our class (some subset) and takes action based on their Twitter ID number.**

We have also seen code blocks when we defined functions. Look back to your notebook from Tuesday. The body of the function is a code block that has other indented blocks for "if" statements and loops. It's all on dispay!

** Navigating complex nestings of dictionaries and lists **

As we have seen, APIs and other web services often return data in the form of JSON strings. Or, as in the case of "feedparser", data are transfromed into an object that made up of things like lists and dictionaries ("like" in the sense that you use the same mechanisms to extract data -- numerical indices in one case and "keys" in the other).  

**2. Make a list of contact information. Each entry in the list represents is a dictionary representing a member of the class. These dictionaries should include your classmate's name, their Twitter handle, and a string that describes something you know about them. Call your list `contacts` and provide it with at least 3 entries, summaries of 3 of your classmates.**

**3. Create a dictionary from the third article in the current NYT RSS feed. The dictionary should contain 4 keys -- "author", "title", "summary" and "tags". In the first three cases, we want a single string as the value for each key. So the title is a string like "Something important happened today". The "tags" key is to hold a list of strings, one for each tag suggested by the NYT. So it might be ["Trump","Bannon","Pecan Pie"].**

OK that's the rudimentary stuff for this drill. Now, onto the more open-ended assignment. Bots!

The bot bazaar
--------------

**Yes, a Pecan Pie is on the line.**

We finished the last class session looking at a simple implementation of the Two Headlines bot. There are countless examples out there of bots that do practical work, bots that are informational, bots that protest, and bots as art pieces. We are looking for a journalistic Twitter bot. One that might tell as story in pieces (a serial bot) or one that might react to the passage of bills in Congress. Or one that might channel events in a given neighborhood, telling untold stories. 

We'd like these first bots of yours to be simple and, at their most complicated, draw from some feed of data. They are not meant to respond to users or the audience. We will create so-called "conversational bots" next week. For now, they should just report in some way. See someting, say something. Beyond that constraint, the level of complexity is up to you.

Let's pull together some of the code from Mike on Thursday.

In [None]:
import feedparser

# fetch the nytimes and breitbart RSS feeds
nytimes_rss_url = 'http://rss.nytimes.com/services/xml/rss/nyt/HomePage.xml'
breitbart_rss_url = 'http://feeds.feedburner.com/breitbart'

nytimes_feed = feedparser.parse(nytimes_rss_url)
breitbart_feed = feedparser.parse(breitbart_rss_url)

# get the first story from each of the two feeds
nytimes_first_story = nytimes_feed['entries'][0]
breitbart_first_story = breitbart_feed['entries'][0]

print 'nyt: '+ nytimes_first_story['title']
print 'b: ' + breitbart_first_story['title']

# combine the two headlines into a single headline
nytimes_words = nytimes_first_story['title'].split(' ')
breitbart_words = breitbart_first_story['title'].split(' ')

# take the 1st half of the nytimes "words" plus the second half of the breitbart "words
new_words = nytimes_words[:len(nytimes_words)/2] + breitbart_words[len(breitbart_words)/2:]

# this is python weirdness to take a list of words
# and join them together with a space between each word
new_headline = ' '.join(new_words)
print "two headlines: "+new_headline

**An aside about "import" statements**

First, consider the statement

>`import feedparser`

This imports the "module" feedparser, creating a reference that lets us get to all the goodies it contains. That is, to use the function parse() we write the following.

>`import feedparser`
<br><br>
... and later ...
<br><br>
`feedparser.parse('http://rss.nytimes.com/services/xml/rss/nyt/HomePage.xml')`

The second form of this statement we've seen uses a "from". Here's what it looks like for our feedparser example.

>`from feedparser import parse`
<br><br>
... and later ...
<br><br>
`parse('http://rss.nytimes.com/services/xml/rss/nyt/HomePage.xml')`

We've been using this last form to make it clear that we are relying on specific functions or data from a module. This was true of Pandas and TextBlob and all the others we've been using. The first form of "import" has  advantages in that we don't have to know in advance what we'll use from module. We can access all the goodies without having to make another import statement. It's great for interactive work.

For example, in Pandas, we might import read_csv() and then later we find we want DataFrame(). With the second import statement, we either have to request both at one time (anticipating our need)...

>`from pandas import read_csv, DataFrame`
 
... or make two import calls, one for each function in different cells, as the need arises.

>`from pandas import read_csv`
<br><br>
... and later ...
<br><br>
`from pandas import DataFrame`

Whereas with the first style of "import" we are covered.

>`import pandas`
<br><br>
... then later ...
<br><br>
`pandas.read_csv("http://compute-cuj.org/unemployment.csv")`
<br><br>
... then later ...
<br><br>
`pandas.DataFrame(list_of_lists)`

Got it?

**Back to the bots!**

OK so we saw the Two Headlines bot that Mike recoded in a simple way. Everything we've done up to now just runs once and then exits/stops. Let's look at how we can have something run forever - our bot doesn't need to sleep much!

Python has a great [`time`](https://docs.python.org/2/library/time.html) module, which handles various time-related functions (duh!). The `time` module also has a very helpful method called `sleep()`, which tells our program to sleep, or "pause", for a number of seconds. Let's take a look at it:

In [None]:
# the time module allows us to "sleep" or pause for a given number of seconds
import time

# loop 5 times, pausing for 2 seconds during each iteration
for number in range(0, 5):
    print number
    
    # sleep for one second
    time.sleep(2)
    
print 'done!'

We can add a simple "forever" loop to get our script to run until we stop it. The code below will loop forever, pausing for 1 second, until you hit the stop button in your notebook.

In [None]:
# the time module allows us to "sleep" or pause for a given number of seconds
import time

# loop forever!
while True:
    print 'hello'
    
    # sleep for one second
    time.sleep(1)
    
# to get this to stop, hit the Stop button in your notebook

** Let's put it all together and build our news bot**

This is a very simple "news" bot, which will tweet out new top stories from The New York Times. The bot will check the NYTimes HomePage RSS feed every 10 seconds - if it sees a new story, it will tweet it.

I'm also adding some super complicated AI, to add some color-commentary to each story that our bot tweets.

This code uses the module called [`random`](https://docs.python.org/2/library/random.html), which we saw at the top of this notebook. Recall that it makes it easy to randomly select an item from a `list`.

*So you don't put extra stress on The New York Times servers, you should sleep every 60 seconds (at least). We are only sleeping for 10 seconds here for demo purposes.*

In [None]:
# this "bot" will tweet out any new stories published in the nytimes homepage
import time    
import feedparser
import random

# our list of pithy comments about the nyt articles
insightful_things_to_say = [
    'this is really interesting',
    'great read -->',
    'hmmm....',
    'amazing',
    'how does this happen?',
]

# the location of the nyt rss feeed
nytimes_rss_url = 'http://rss.nytimes.com/services/xml/rss/nyt/HomePage.xml'

# keep track of the nytimes links/urls that we tweeted.
# we will put each link we tweet into this list
prev_tweeted_links = []

# loop forever!
while True:
    
    # fetch and parse the NYTimes RSS feed
    nytimes_feed = feedparser.parse(nytimes_rss_url)

    # get the first story
    first_story = nytimes_feed['entries'][0]

    # take the link of the first story and see if we've tweeted it before
    # (is the new link in our list of previously tweeted links? if not, tweet it!)
    link = first_story['link']
    
    if not link in prev_tweeted_links:
        
        # it's new, lets tweet it out!
        print 'new story - lets tweet it: ' + link 
  
        # build the text of our tweet -- a pithy comment, the title and a link to the story
        tweet_text = random.choice(insightful_things_to_say) + ' ' + first_story['title'] + ' ' + first_story['link']
        
        # print our tweet text as a test
        print "our tweet: " + tweet_text
        print "--"*10
        # fire it off to twitter by uncommenting the line below
        #api.update_status(status=tweet_text)
        
        # keep track of the this link that we just tweeted
        prev_tweeted_links.append(link)
        
    else:
        
        # we've already tweeted this...no new stories
        # nothing to do
        print "no new story... let's wait a little while"

    # sleep for a little while
    time.sleep(10)
    
# if you want to stop this script, hit the Stop button in your notebook

**Some computing tools for programming with "languag"**

The Twitter bot Mike prepared relies on mashing up two headlines. Some of that might get better if we knew a little about what the headline described. What is the subject? What action is described? Some of these questions are addressed by a field of computer science (well, computational linguistics) called Natural Language Processing. There are plenty of tools in Python for making use of the fruits of this research. 

We will be using a package called [TextBlob](https://textblob.readthedocs.io/en/dev/) that is a simplified version of the Natural Language Toolkit (NLTK) in Python. (Sometimes tools become really powerful for practitioners and leave non-experts behind. That's what has happened, to some extent, with the NLTK. It's a little hard to just "jump in". And so TextBlob is like computational training wheels.) [Allison Parrish's Natural Language Basics with TextBlob](http://rwet.decontextualize.com/book/textblob/) is a great place to read about what TextBlob is good for. 

First, we need to install the package. Off to PIP! Then we'll install some data the NLTK will need. They are collections of texts and special functions that we'll explain shortly.

In [None]:
%%bash
pip install TextBlob

In [None]:
from nltk import download
download('brown')
download('punkt')
download('maxent_ne_chunker')
download('words')
download('conll2000')
download('maxent_treebank_pos_tagger')
download('averaged_perceptron_tagger')

Then, load the package for this session and bring in a headline from todays New York Times. We read it in as a string but preface the quotes with a "u". That tells Python the string is in Unicode -- publishers use fancy quotation marks, for example, that are not the simple " or '. 

The TextBlob() function takes text and turns it into a "TextBlob" object.

In [None]:
from textblob import TextBlob

headline = u"After Election, Trump’s Professed Love for Leaks Quickly Faded"
tb = TextBlob(headline)

type(tb)

In [None]:
tb

The TextBlob object has a number of attribures that have processed the text. The simplest are lists of words and sentences. Here we pull just the words.

In [None]:
tb.words

This is obviously a better approach than the one we took when we just split a string on spaces -- a technique that didn't handle punctuation like commas and periods well. OK that's a good trick but there are better ones! For example, TextBlob's language processing let's it estimate which words are part of noun phrases. 

There are various techniques for doing this and none of them are perfect. To be fair, using a headline means using a text fragment and not a sentence. The language processing tools are usually trained on full sentences of text. Still, it's not bad.

In [None]:
tb.noun_phrases

Noun phrases are obtained by extracting information from a "tagged" version of the text. Here the tags represent parts of speech. You can see [a complete list of the tags here.](https://cs.nyu.edu/grishman/jet/guide/PennPOS.html) The parts of speech are stored as a list of word-tag pairs.

In [None]:
tb.tags

In [None]:
type(tb.tags[0])

The .tags attribute is a list. (See the square brackets?) The list elements are a new data type called a "tuple" which is like a list, for our purposes. So you can take, say the first element of the tags list and look at the first and second elements of the tuple (the word and its estimates part of speech).

In [None]:
tb.tags[0]

In [None]:
tb.tags[0][0]

In [None]:
tb.tags[0][1]

While I'm not wild about it, TextBlob also provides an estimate of the sentiment of the statement. That is, is the text expressing a positive or negative sentiment. I'll leave you to consult the Parrish blog post or the TextBlob documentation of this lovely feature.

In [None]:
tb.sentiment

Here we do the same thing to a different headline. Mashing them up might mean replacing one noun phrase with another. How might you do that?

In [None]:
headline2 = u"Trump Vows to Catch ‘Low Life Leakers’ in Washington D.C."
tb2 = TextBlob(headline2)
tb2.tags

In [None]:
tb2.noun_phrases

One last thing. There are various methods to "parse" text -- different algorithms for tagging words in a sentence, for extracting noun phrases and for estimating sentiment. You can replace the default when you call TextBlob. The documentation describes other noun phrase extractors. Here's how you would use the ConllExtractor, based on a data set compiled for the Conference on Computational Natural Language Learning (CoNLL-2000).

In [None]:
from textblob.np_extractors import ConllExtractor
extractor = ConllExtractor()

tb = TextBlob(headline,np_extractor=extractor)
tb.noun_phrases

**Your turn**

You now have some simple facility with text (this will only get beefier and more powerful as we go through the class) and you can pull data from APIs with ease. So, next, a Twitter bot! We'd like these first bots of yours **to be simple and, at their most complicated, draw from some feed of data.** (Like an API or an RSS feed or...) They are not meant to respond to users or the audience. We will create so-called "conversational bots" next week with Suman. Your bot might look to other bots or other Twitter accounts for input though. We just don't want you crafting something that responds to people "talking" to it. 

**4. Create a Twitter bot that, at its most complex, could report in some way.  See someting, say something. It might be simpler, of course. But beyond the constraint that you only respond to data or an API, and not users, you are free to create. A prize to the most journalistically interesting, or downright cool, bot!**

If you would like your bot to run 24/7 I can install it on one of our servers. Just let me know!