<img src=https://github.com/computationaljournalism/columbia2019/raw/master/images/chatbots-talking.jpg width=500>


### Conversations on Twitter ###

Today we are going to talk about the texture of conversation happening on sites like Twitter. With talk of efforts to promote civility and organic discussion, many of your projects are attempting to understand the flow of information between accounts on Twitter, say. We are going to have a look at the various techniques we might use to assess that or make it visible. This is just and introduction and we can have much more to say as the semester progresses.

First, we are going to have a look at conversations that mention `@AOC`. Our first look will be through a platform called [Hoaxy](https://hoaxy.iuni.iu.edu). It's a platform not a programming approach which means you might outgrow it quickly, but we will show you how to create similar graphics from scratch. Let's have a look at a ["network diagram"](https://hoaxy.iuni.iu.edu/#query=%40AOC&sort=recent&type=Twitter) that maps out who is mentioning, retweeting, replying to whom.

<img src=https://github.com/computationaljournalism/columbia2019/raw/master/images/aoc.jpg width=500 style="border:1px solid black">

Hoaxy produces a network diagram, where each point represents an account on Twitter. The connections represent two accounts "in communication" by retweeting, replying or mentioning one another (with the relationship being described as *directional* -- I might mention you, but you might not ever mention me). The colors of the points depict a scale that is meant to describe the "bot like"ness of an account. 

<img src=https://github.com/computationaljournalism/columbia2019/raw/master/images/b1.jpg width=500 style="border:1px solid black">

The good people at Indiana Univeristy have "learned" characteristics of bot behavior, producing a score for any account. The learning (which we will return to formally later in the term) involves contrasting data from known bot accounts and "real" accounts. Each Twitter account is reduced to a number of characteristics to make this comparison. The list below was taken from one of their academic papers. In general, this is a hard learning problem.

<img src=https://github.com/computationaljournalism/columbia2019/raw/master/images/b8.jpg width=400 style="border:1px solid black">

In general, this is a hard learning problem. But the color scale might give you a rough indication of when the conversation taking place involves authentic or somehow immitation accounts. And we might want to know the difference because it will help us judge whether we are witnessing (or partaking in) a genuine exchange of opinions or are instead part of an amplification campaign of some kind. Propaganda.

The group at Indiana has made their bot-or-not scaling available via an API complete with a [Python interface](https://github.com/IUNetSci/botometer-python)! You can install the `botometer` and then use your Twitter credentials and a key from RapidAPI.

In [None]:
%%sh
pip install botometer

You will need a key to use the API -- you are limited to 2,000 calls per day. You can apply [here](https://rapidapi.com/OSoMe/api/botometer?utm_source=mashape&utm_medium=301) (RapidAPI is an API hosting service of sorts.) Then, we can use our Twitter information and this new key to ask for information about the botness of the accounts who follow us...

<img src=https://github.com/computationaljournalism/columbia2019/raw/master/images/b2.jpg width=500 style="border:1px solid black">

...  or those we follow.

<img src=https://github.com/computationaljournalism/columbia2019/raw/master/images/b3.jpg width=500 style="border:1px solid black">

In [None]:
# grab your keys from a previous notebook or https://apps.twitter.com

CONSUMER_KEY = ""
CONSUMER_SECRET = ""
ACCESS_TOKEN = ""
ACCESS_TOKEN_SECRET = ""

MashKey = "NAeSF7TTbymshxelRHhCXqlQVfc0p1zgYQojsnVombDJddPvas"

OK, let's kick the tires on this! We can look up one person or a group. Let's try a few...

In [None]:
from botometer import Botometer

meter = Botometer(wait_on_ratelimit=True,
                  mashape_key=MashKey,
                  consumer_key = CONSUMER_KEY,
                  consumer_secret = CONSUMER_SECRET,
                  access_token = ACCESS_TOKEN,
                  access_token_secret = ACCESS_TOKEN_SECRET)

In [None]:
# Check a single account by screen name
result = meter.check_account('@emilybell')
result

In [None]:
result = meter.check_account('@MatthewAlbasi')
result

Let's look at one more -- James-ugh... [https://twitter.com/JamesKe54983151](https://twitter.com/JamesKe54983151)

<img src=https://github.com/computationaljournalism/columbia2019/raw/master/images/b9.jpg width=500 style="border:1px solid black">

In [None]:
result = meter.check_account('@JamesKe54983151')
result

This person seems so earnest! And might be genuine but his assets have certainly made their way around the web....

<img src=https://github.com/computationaljournalism/columbia2019/raw/master/images/b7.jpg width=500 style="border:1px solid black">

Sometimes you win, sometimes you lose. Let's now have a look at a conversation and make a network diagram on our own. This is a useful exercise in that we now have complete control over what features of an account we choose to highlight and what kinds of connections we want to establish. Let's setup the Twitter API using the `Cursor` to pull a number of tweets on a topic -- I picked `#nationalemergency` but today `#PresidentsDay` is awesome too. 

So, let's set up the `API` object and then make a call to the `Cursor` using a list comprehension from last time.

In [None]:
# before we can make Twitter API calls, we need to initialize a few things...
from tweepy import OAuthHandler, API, Cursor

# setup the authentication
auth = OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)

# create an object we will use to communicate with the Twitter API
api = API(auth,wait_on_rate_limit=True,wait_on_rate_limit_notify=True)

In [None]:
searches = [status for status in Cursor(api.search,q='#nationalemergency', result_type='recent',timeout=999999).items(1000)]

In [None]:
from pprint import pprint
pprint(searches[0]._json)

OK so now we are going to compile the information we need using the basic Python object that is returned in the `.json` object from Tweepy. Let's create a data frame that has one row per "connection" -- someone mentioning, retweeting, replying or quoting someone else. 

This is a moment where other forms of iteration break down and we have to use a loop. So, well, here we go! We are going to create a "dictionary of lists" form of a data frame, where we have one list for the "froms" (the account doing the tweeting), one for the "to's" (the account being retweeted or replied or mentioned), with the "edges" indicating the kind of relationship we have. 

Have a look...

In [None]:
froms = []
tos = []
edges = []

# iterate through the 1000 tweets we collected
for tweetobj in searches:

    # use the built-in python form of the data (dictionaries and lists)
    tweet = tweetobj._json
    
    # pull the screen name of the "from"
    screen_name = tweet["user"]["screen_name"]
    link = []

    # the "link" list will hold the names of people we have included in our 
    # graph already (so if someone retweets and mentions someone, we only get one of these)
    
    # pull the "to's" that are retweets
    if "retweeted_status" in tweet:        
        retweet_name = tweet["retweeted_status"]["user"]["screen_name"]
        link.append(retweet_name)
        
        edges.append("retweet")
        tos.append(retweet_name)
        froms.append(screen_name)
        
    # pull the "to's" that are mentions
    if "user_mentions" in tweet["entities"]:
        
        for mention in tweet["entities"]["user_mentions"]:
            mention_name = mention["screen_name"]
            
            if not mention_name in link:
                link.append(mention_name)
                
                edges.append("mention")
                tos.append(mention_name)
                froms.append(screen_name)
    
    # pull the "to's" that are replies
    if tweet["in_reply_to_screen_name"]:
        reply_name = tweet["in_reply_to_screen_name"]
        
        if not reply_name in link:
             
             edges.append("reply")
             tos.append(reply_name)
             froms.append(screen_name)

Now turn the whole thing into a data frame. Again, we are using a dictionary of lists for this...

In [None]:
from pandas import DataFrame

df=DataFrame({"FromType":"Twitter","FromName":froms,"Edge":edges,"ToType":"Twitter","ToName":tos})
df.head(50)

Oh the things Pandas helps us do... Here are the most retweeted accounts among the 1000 tweets we pulled.

In [None]:
df[df["Edge"]=="retweet"]["ToName"].value_counts()

In [None]:
result = meter.check_account("TrumperSeaney")
print(result["display_scores"]["english"])

result = meter.check_account("pollsofpolitics")
print(result["display_scores"]["english"])

result = meter.check_account("deanbc1")
print(result["display_scores"]["english"])

Oh the [last one](http://twitter.com/deanbc1) looks bad.

Let's output this to a file called `test.csv` because I am running out of time before class and somehow I couldn't come up with a better name. Future me, forgive me. 

We will import this file to [Graph Commons, a handy, interactive and shareable tool for making network graphs.](https://graphcommons.com) Make an account and upload your file and have a look... 

In [None]:
df.to_csv("test.csv",index=False)