# Popular Twitter Accounts

## Instructions

* In this activity, you are given an incomplete CSV file of Twitter's most popular accounts. You will use this CSV file in conjunction with Tweepy's API to create a pandas DataFrame.

* Consult the Jupyter Notebook file for instructions, but here are your tasks:

* The "PopularAccounts.csv" file has columns whose info needs to be filled in.

* Create a pandas DataFrame and import the CSV file into it.

* Call Tweepy's API to retrieve the info for the missing columns in the starter CSV.

* Export the results to a new CSV file called "PopularAccounts_New.csv"

* Calculate the averages of a user's tweets, # of followers, etc., then create a DataFrame of the averages.

In [1]:
import tweepy
import json
import pandas as pd
import os
# Your Twitter API Keys
api_dir = os.path.dirname(os.path.dirname(os.path.dirname(os.path.dirname(os.path.dirname(os.path.realpath('__file__'))))))
file_name = os.path.join(api_dir, "api_keys.json")
data = json.load(open(file_name))

gkey = data['google_places_api_key']
consumer_key = data['twitter_consumer_key']
consumer_secret = data['twitter_consumer_secret']
access_token = data['twitter_access_token']
access_token_secret = data['twitter_access_token_secret']


auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth, parser=tweepy.parsers.JSONParser())

### Iterrows?

Iterrows will allow you to parse through a Panda's dataframe.
It returns two values for each iteration.

* index = index of the Data Frame
* data = everything in your object. In the case below 'names' and 'tweets' 

In [31]:
my_tweets = [{'name':"Henry", 'tweets':100}, {'name':"Bob",'tweets':110}, {'name':'Sam','tweets':120}]
df = pd.DataFrame(my_tweets)
for index, row in df.iterrows():
    print("Index: " + str(index))
    print(row['name'])
    print(row['tweets'])
    print()

Index: 0
Henry
100

Index: 1
Bob
110

Index: 2
Sam
120



## Iterrows is quirky

If you need to change your DataFrame, you can't change the data variable in the code below.
You have to df.set_value(index, 'tweets', value) as shown below:

In [32]:
my_tweets = [{'name':"Henry", 'tweets':100}, {'name':"Bob",'tweets':110}, {'name':'Sam','tweets':120}]
df = pd.DataFrame(my_tweets)
for index, data in df.iterrows():
    # can't do data['tweets'] = data['tweets'] + 123 or similar here. Must use set_value
    new_tweets = row['tweets'] + 123
    df.set_value(index, 'tweets', new_tweets)

print(df)


    name  tweets
0  Henry     243
1    Bob     243
2    Sam     243


In [33]:
file_name = os.path.join("Resources", "PopularAccounts.csv")
popular_tweeters = pd.read_csv(file_name, dtype=str)

In [34]:
popular_tweeters.head()

Unnamed: 0,Screen Name,Real Name,Tweets,Followers,Following,Favorites Count
0,katyperry,,,,,
1,justinbieber,,,,,
2,BarackObama,,,,,
3,Taylorswift13,,,,,
4,rihanna,,,,,


In [50]:
# Iterate through DataFrame
for index, row in popular_tweeters.iterrows():
    
    # Error handling
    try:
        # Grab the username
        target_user = row["Screen Name"]

        # Use the tweepy api to get_user located in the target_user variable
        # Use the username with the Twitter API get_user
        user_account = api.get_user(target_user)
        user_real_name = user_account["name"]
        print(user_real_name)
        # Get the specific column data
        user_tweets = user_account["statuses_count"]
        followers_count = user_account["followers_count"]
        favourites_count = user_account["favourites_count"]
        user_following = user_account["friends_count"]
        # Do the same for 
        # "followers_count", 
        # "friends_count", 
        # and "favourites_count"

        # Replace the row information for each
        popular_tweeters.set_value(index, "Real Name", user_real_name)
        popular_tweeters.set_value(index, "Tweets", user_tweets)
        popular_tweeters.set_value(index, "Followers", followers_count)
        popular_tweeters.set_value(index, "Favorites Count", favourites_count)
        popular_tweeters.set_value(index, "Following", user_following)

        # Do the same for "Tweets", "Followers", "Following, and "Favorites Count"
    
    # If an error is encountered, move on with the next iteration of the loop
    except:
        print ("target user skipped: " + target_user)
        continue
        
# Export the new CSV as "PopularAccounts_New.csv"
# Hint: use the to_csv() method
popular_tweeters.to_csv("PopularAcounts_New.csv", index=False)


KATY PERRY
Justin Bieber
Barack Obama
Taylor Swift
Rihanna
YouTube
Ellen DeGeneres
Lady Gaga
Twitter
Justin Timberlake
Britney Spears
Cristiano Ronaldo
Kim Kardashian West
Selena Gomez
CNN Breaking News
jimmy fallon
Ariana Grande
Shakira
Demi Lovato
Jennifer Lopez
Instagram
Oprah Winfrey
Drizzy
LeBron James
The New York Times
Bill Gates
CNN
Kevin Hart
Miley Ray Cyrus
One Direction
SportsCenter
ESPN
BBC Breaking News
Harry Styles.
P!nk
Lil Wayne WEEZY F
Wiz Khalifa
Niall Horan
Bruno Mars
Adele
Narendra Modi
Neymar Jr
target user skipped: kanyewest
Kaka
Neil Patrick Harris
Donald J. Trump
daniel tosh
Amitabh Bachchan
Alicia Keys
NBA


In [51]:
popular_tweeters

Unnamed: 0,Screen Name,Real Name,Tweets,Followers,Following,Favorites Count
0,katyperry,KATY PERRY,8809.0,108311217.0,205.0,5922.0
1,justinbieber,Justin Bieber,30639.0,105212482.0,317205.0,3442.0
2,BarackObama,Barack Obama,15492.0,98983315.0,625491.0,10.0
3,Taylorswift13,Taylor Swift,83.0,85818984.0,0.0,115.0
4,rihanna,Rihanna,10062.0,85911363.0,1126.0,1024.0
5,youtube,YouTube,21570.0,70966233.0,1030.0,1732.0
6,theellenshow,Ellen DeGeneres,15773.0,76909443.0,35958.0,695.0
7,ladygaga,Lady Gaga,8655.0,76347598.0,128255.0,1825.0
8,twitter,Twitter,6489.0,62474506.0,145.0,5284.0
9,jtimberlake,Justin Timberlake,3771.0,64939098.0,260.0,50.0


In [54]:
# Calculate Averages of the following: "Tweets"; "Followers"; "Follwing"; "Favorites Count"
# See "Create DataFrame" below for the variable names you should use
average_tweet_count = popular_tweeters["Tweets"].mean()
average_followers = popular_tweeters["Followers"].mean()
average_following_count = popular_tweeters["Following"].mean()
average_favorites_count = popular_tweeters["Favorites Count"].mean()

# Create DataFrame
averages = {"Average Tweet Count": average_tweet_count, 
            "Average Follower Count": average_followers, 
            "Average Following Count": average_following_count,
            "Average Favorites Count": average_favorites_count}

# Create a Dataframe of the averages
data_frame = pd.DataFrame(averages, index=["Top Accounts"])


In [55]:
data_frame

Unnamed: 0,Average Favorites Count,Average Follower Count,Average Following Count,Average Tweet Count
Top Accounts,1794.306122,49275530.0,32685.530612,31360.632653
