# Content Analytics & Impact

## Scenario  

Imagine you want to advise your managing editor on strategies for your news organization to help them maximize their audience size and reach. (Pick a twitter account as "your" news organization, which you'll analyze). 

**Question 1**  
What time of day should the social media editor share links to get maximal retweets and / or favorites on Twitter? Is there a difference in what the optimal time is on weekends versus weekdays?

**Question 2**  
Should the social media editor embed an image in tweets in order to get more retweets?

**Question 3**  
Should the social media editor use more or less hashtags in tweets in order to get more retweets?

Answer the previous questions to provide a report. 

In [63]:
# Import libraries
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt

# Makes it so that you can scroll horizontally to see all columns of an output DataFrame
pd.set_option('display.max_columns', None)
# Make it so urls and tweets won't get truncated when we print them out
pd.set_option('display.max_colwidth', -1)

# This magic function allows you to see the charts directly within the notebook. 
%matplotlib inline

# This command will make the plots more attractive by adopting the commone style of ggplot
matplotlib.style.use("ggplot")

In [None]:
import tweepy

#Setup and authenticate Tweepy - INSERT your keys here. 
CONSUMER_KEY = ""
CONSUMER_SECRET = ""
ACCESS_TOKEN = ""
ACCESS_TOKEN_SECRET = ""

auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)
api = tweepy.API(auth)

me = api.me()
print me.screen_name

### Collect Data

The next cell contains code to get historical tweets from an account. Set the `account` parameter to that of your news organization.

In [66]:
import sys

# According to Twitter docs: https://dev.twitter.com/rest/reference/get/statuses/user_timeline
# Can only get 3,200 tweets into the history of an account
# And you can get 200 at a time, so that's 16 sets of 200 that we can get. 

# Set the account you want to collect data for using screen name
account = "baltimoresun"
page_size = 200
all_tweets = [] # A list that will contain upto 3,200 tweets (we exclude RTs)

# Get the first page of tweets
max_id = sys.maxint
tweets = api.user_timeline(account, count=page_size, include_rts=False)
for tweet in tweets:
    if tweet.id < max_id:
        max_id = tweet.id
    all_tweets.append(tweet)

# Get subsequent pages of tweets
num_pages_to_collect = 16
for page in range(1, num_pages_to_collect):
    print "Collecting page ", (page+1)
    tweets = api.user_timeline(account, count=page_size, include_rts=False, max_id=max_id)
    for tweet in tweets:
        if tweet.id < max_id:
            max_id = tweet.id
        all_tweets.append(tweet)
        
print "Total number of tweets collected: ", len(all_tweets)

Collecting page  2
Collecting page  3
Collecting page  4
Collecting page  5
Collecting page  6
Collecting page  7
Collecting page  8
Collecting page  9
Collecting page  10
Collecting page  11
Collecting page  12
Collecting page  13
Collecting page  14
Collecting page  15
Collecting page  16
Total number of tweets collected:  2048


In [56]:
# What does a single tweet look like?
# Documentation for what gets delivered in a tweet object: https://dev.twitter.com/overview/api/tweets
import pprint 

for i, tweet in enumerate(all_tweets):
    if i < 1:
        pprint.pprint(tweet._json)

{u'contributors': None,
 u'coordinates': None,
 u'created_at': u'Fri Mar 25 18:10:17 +0000 2016',
 u'entities': {u'hashtags': [],
               u'media': [{u'display_url': u'pic.twitter.com/liDRDt4u3F',
                           u'expanded_url': u'http://twitter.com/BaltSunTV/status/713378676851671042/photo/1',
                           u'id': 713378676729974785,
                           u'id_str': u'713378676729974785',
                           u'indices': [103, 126],
                           u'media_url': u'http://pbs.twimg.com/media/CeZuIHjUEAE1hnM.jpg',
                           u'media_url_https': u'https://pbs.twimg.com/media/CeZuIHjUEAE1hnM.jpg',
                           u'sizes': {u'large': {u'h': 684, u'resize': u'fit', u'w': 1024},
                                      u'medium': {u'h': 401, u'resize': u'fit', u'w': 600},
                                      u'small': {u'h': 227, u'resize': u'fit', u'w': 340},
                                      u'thumb': {u'h'

In [None]:
# For example, to get at a few relevant fields
for i, tweet in enumerate(all_tweets):
    # Comment out the next line if you want to loop through all the tweets in the all_tweets list
    if i < 1:
        print tweet.favorite_count
        print tweet.retweet_count
        print tweet.text
        print tweet.created_at # The time will be reported in UTC (Universal Coordinated Time)
        print tweet.entities.has_key("media") # Does the tweet have "media" e.g. an image associated?
        print tweet.entities["hashtags"] # List of hashtags that have been parsed out of the tweet
        print ""

**Question 1**  
What time of day should the social media editor share links to get maximal retweets and / or favorites on Twitter? Is there a difference in what the optimal time is on weekends versus weekdays?

**Question 2**  
Should the social media editor embed an image in tweets in order to get more retweets?

**Question 3**  
Should the social media editor use more or less hashtags in tweets in order to get more retweets?

## Shareback + Discuss

What were your team's findings? What was the best time of day to tweet to get RTs and favorites? What are you really measuring (and not measuring) with these metrics?

What else could you possibly measure here? Are there other metrics that might be more telling?

What are some possible limitations of this approach to content optimization? 
