Our minions - Powerful? Engaging? Or an isolated horde?
------
Today we are going to probe the network of our bots a little. First, we will talk about your observtions about the bots and how hard it was to tell them from actual users. You've read a little, but you've also had some experience looking at their profiles. To make discussion easier, here is a simple table. Feel free to add anything you want to help anchor our discussion.

![Bot](https://regmedia.co.uk/2015/06/04/terminator.jpg?x=648&y=348&crop=1)

The code below simply slurps up all the data from all the followers Rosie acquired. We are going to use a new set of data that you can [download here.](http://compute-cuj.org/rosie2.tar.gz) Uncompress it and put it in the same folder as this notebook. 

The code simply creates a data frame one row at a time -- each .json file representing a follower becominga new row.

In [None]:
from json import loads
from pandas import DataFrame

# load in the list of followers and then reverse the order so now
# oldest followers are first and newest are last (we should have been
# doing that all along -- although it's good to know about negative indices)

followers = loads(open("rosie2/newfollowers.json").read())
followers.reverse()

# write the header for the csv
column_names = ["id","screen_name","name","location","lang","time_zone","url",
                "statuses_count","followers_count","friends_count","favourites_count","listed_count","created_at"]
rows = []

# loop over our follower ids
for id in followers:
    
    filename = "rosie2/users/"+str(id)+".json"
    rawuser = loads(open(filename).read())

    rows.append([rawuser["id"],rawuser["screen_name"].encode('utf-8'),rawuser["name"].encode('utf-8'),
                       rawuser["location"].encode('utf-8'),rawuser["lang"],rawuser["time_zone"],rawuser["url"],
                       rawuser["statuses_count"],rawuser["followers_count"],
                       rawuser["friends_count"],rawuser["favourites_count"],rawuser["listed_count"],
                       rawuser["created_at"]]) 

# turn the list of lists into a dataframe
df = DataFrame(rows,columns=column_names)

# and write it out!
df.to_csv("savefollowers.csv",index=False)

Have a look of various kinds. They are now sorted in acquisition order (oldest bots at the top, newest at the bottom). But there are other ways to sort this table.

In [None]:
df[:2500]

In [None]:
df.sort_values(by="followers_count",ascending=False)[0:40]

Next, we are going to build a network of our bots' followers. Recall the following API calls that we can use to first get the id's of the followers and then their user information. This relies on Tweepy and so we have to fire up the old OAuth keys...

In [None]:
CONSUMER_KEY = ""
CONSUMER_SECRET = ""
ACCESS_TOKEN = ""
ACCESS_TOKEN_SECRET = ""

In [None]:
# before we can make Twitter API calls, we need to initialize a few things...
from tweepy import OAuthHandler, API

# setup the authentication
auth = OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)

# create an object we will use to communicate with the Twitter API
api = API(auth)
type(api)

This might be an interesting user...

In [None]:
ids = api.followers_ids(screen_name="JoveinElyclaL",count=5000)
type(ids)

In [None]:
len(ids)

In [None]:
users = api.lookup_users(user_ids=ids)
type(users[0])

In [None]:
len(users)

Now, we are going to wrap this in a function. The function takes a list of user ids, our column of bots' id's and then an API object from Tweepy that represents "twitter". Here's the function, we can go over it later. In the next cell we use it.

In [None]:
from pandas import DataFrame, concat

def grab_users(idlist,botlist,twitter):

    # These are the data frames we will output, one for nodes and one for edges
    big_edges = DataFrame()
    big_nodes = DataFrame()
    
    # These are the columns each will have
    node_names = ["Type","Name","Description","id","location","lang","time_zone","url",
                  "statuses_count","followers_count","friends_count","favourites_count",
                  "listed_count","created_at"]
    
    edge_names = ["From Type","From Name","Edge","To Type","To Name","Weight"]

    # iterate over each id in the idlist...
    for id in idlist:
        
        # looking up the id's followers on twitter 
        ids = twitter.followers_ids(user_id=id,count=5000)
        ids.append(id)
    
        # and then getting information about each one (a little funny business for users
        # with more than 100 followers -- not many)
        number_of_followers = len(ids)
        
        users = []
        for i in range(0, number_of_followers, 100):
    
            subset_ids = ids[i:i + 100]
            users = users+twitter.lookup_users(user_ids=subset_ids)
        
        # isolate the screen name, because we didn't pass that (we just gave id's)
        screen_name = [u.screen_name for u in users if u.id==id][0]
    
        # initialize a list of nodes and edges that will hold the rows for the data frames
        nodes = []
        edges = []
    
        # now iterate over the users and build up the nodes data frame for this id
        for u in users:
                
            if u.id in botlist:
                
                node_type = "bot"
            
            else:
                
                node_type = "follower"
            
            nodes.append([node_type,u.screen_name.encode('utf-8'),u.name.encode('utf-8'),u.id,
                          u.location.encode('utf-8'),u.lang,u.time_zone,u.url,
                          u.statuses_count,u.followers_count,
                          u.friends_count,u.favourites_count,u.listed_count,
                          u.created_at]) 

            if not u.id== id:
            
                edges.append([node_type,u.screen_name,"follows","bot",screen_name,1])
        
        # combine the new edges and nodes into the big output
        if big_nodes.empty:
            big_nodes = DataFrame(nodes,columns=node_names)
        else:
            big_nodes = concat([big_nodes,DataFrame(nodes,columns=node_names)]).drop_duplicates()

        if big_edges.empty:
            big_edges = DataFrame(edges,columns=edge_names)
        else:
            big_edges = concat([big_edges,DataFrame(edges,columns=edge_names)])
    
    # and return both, wrapping them in a dictionary
    return({"nodes":big_nodes,"edges":big_edges})

In [None]:
b = grab_users([270315089,406926562],df["id"],api)

The output stores an edges and a nodes data frame in a dictionary. We can output both and load them into Graph Commons.

In [None]:
b["edges"].head(10)

In [None]:
b["edges"].head(10)

In [None]:
b["nodes"].head(5)

In [None]:
b["edges"].to_csv("edges.csv",index=False)
b["nodes"].to_csv("nodes.csv",index=False)

Now, we want to look at the activity of the bots. One idea would be to see what each bot is retweeting to get a sense of what kind of overlap you have. Here is one bot user ("i" becomes "l") -- [jennlferrothman](https://twitter.com/jennlferrothman). One of her retweets is to Chris Voss. Here is the tweet.

In [None]:
%%HTML
<blockquote class="twitter-tweet" data-lang="en"><p lang="en" dir="ltr">Its all right letting yourself go, as long as you can get yourself back.  Mick Jagger</p>&mdash; CHRIS VOSS (@CHRISVOSS) <a href="https://twitter.com/CHRISVOSS/status/829470715317465088">February 8, 2017</a></blockquote>
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

Have a look at the other retweeters. Many have the patterns we look for, but none of them are in our collection of followers. This might be a way to expand the network we know about.

In the next cells, we use the API to give us information about "CHRISVOSS" and to pull his timeline. This was code Mike gave you last Tuesday.

In [None]:
sum(df["screen_name"].str.contains("IloveLadyGaga92|Shawnie__ovoxo|Miss__Candass|TheHlllHaveEyes|AlexisApparell",case=False))

In [None]:
# get a user's profile ('CHRISVOSS' in this case)
user = api.get_user(screen_name='CHRISVOSS')

# print out some of the user's information
print user.screen_name
print user.id
print user.statuses_count
print user.friends_count
print user.followers_count
print user.description

Here we pull all of CHRISVOSS' tweets, then if the retweet count is betwee 5 and 20, we pull the retweets of the tweet and for each we record the id of the retweeter, their screen name, their real name and the id of the tweet they retweeted. This will be written row by row and stored in a data frame.

In [None]:
vosslist = []

for tweet in api.user_timeline(screen_name="CHRISVOSS",count=50):
    
    if tweet.retweet_count > 5 and tweet.retweet_count <= 20:
        retweets = api.retweets(tweet.id)
    
        for retweet in retweets:
            vosslist.append([retweet.user.id,retweet.user.screen_name,retweet.user.name,tweet.id])

voss = DataFrame(vosslist,columns=["id","screen_name","name","retweet_id"])

In [None]:
voss.head(10)

And you can see certain bots retweet this account more than others. Actually, are they all our bots?

In [None]:
voss['screen_name'].value_counts().head(25)

Here we pull our followers that retweeted CHRISVOSS.

In [None]:
df[df["screen_name"].isin(voss["screen_name"])]

The code below makes a large table of connections between Rosie's followers and their friends. It gives you a sense of how overlapping things are.

**DON'T EXECUTE THIS CODE** It takes a while. I gave you what you need in the file you downloaded. You can read_csv it in the next cell.

In [None]:
# ONLY DO THIS ONCE!! After this you can just read_csv("savefollowers.csv")!!!

# from os.path import isfile
# from json import loads
# from pandas import DataFrame

# load in the list of followers and then reverse the order so now
# oldest followers are first and newest are last (we should have been
# doing that all along -- although it's good to know about negative indices)

followers = loads(open("rosie2/newfollowers.json").read())
followers.reverse()

# initialize our two empty columns
rosie_followers = []
their_friends = []

# loop over our follower ids for the first 2500 followers
for id in followers[:2500]:
    
    filename = "rosie2/friends/"+str(id)+".json"
    
    if not isfile(filename):
        continue
    
    friends_for_one_follower = loads(open(filename).read())

    their_friends = their_friends + friends_for_one_follower
    rosie_followers = rosie_followers + [id]*len(friends_for_one_follower)
    
# turn the two lists into a dataframe, each list is a column
network = DataFrame({"rosie follower":rosie_followers,"their friend":their_friends})

# and write it out!
network.to_csv("rosie2/network_2500.csv",index=False)

In [None]:
from pandas import read_csv

network = read_csv("rosie2/network_2500.csv")
network.head()

In [None]:
network["rosie follower"].value_counts().head(10)

In [None]:
network["their friend"].value_counts().head(10)