# WeRateDogs Twitter Analysis

This project was done as a Udacity project to put together the concepts we learned. The Twitter handle [WeRateDogs](twitter.com/dog_rates). The key thing this channel does that this project was focused on is extracting the ratings, most of which are 12/10 or some other rating over 10.

As a disclaimer, Udacity is absolute garbage and I would not recomend it to anyone as there is no support. 

## Step 1 - Gathering the Data
There are three data sources and their details are listed below:
* `img_predictions`: This is a file that was gather from Udacity using requests package. The contents include tweet_data with AI predictions as to what the dogs are in the images. 
* `post_archive`: This is a file that was provided from Udacity for download to local machine and included 5,000 posts that has been modified to pull out the dog ratings from the text strings of the posts. Udacity intentionally left key metrics out of this document.
* `tweet_data`: This data source pulls from Twitter's API directly using the tweet_id variable from post_archive. 

The code block below imports our python modules as well as our secrete API credentials so that this notebook can utilize a public git repo.

In [123]:
import requests
import pandas as pd
import tweepy
from tweepy import OAuthHandler
import json
from timeit import default_timer as timer
import os.path
import API_Key_Credentials as creds
import matplotlib.pyplot as plt

consumer_key = creds.consumer_key
consumer_secret = creds.consumer_secret
access_token = creds.access_token
access_secret = creds.access_secret

**Gathering** `img_predictions`

In [124]:
url = 'https://d17h27t6h515a5.cloudfront.net/topher/2017/August/599fd2ad_image-predictions/image-predictions.tsv'
response = requests.get(url)

In [125]:
response.status_code

200

In [126]:
if response.status_code == 200:
    print("We're in boys")
else:
    print("Houston, we have a problem, Error: ",response.status_code)

We're in boys


In [127]:
with open('image_predictions.tsv', 'wb') as file:
    file.write(response.content)

In [128]:
img_predictions = pd.read_csv('image_predictions.tsv', delimiter='\t')
img_predictions.sample(5)

Unnamed: 0,tweet_id,jpg_url,img_num,p1,p1_conf,p1_dog,p2,p2_conf,p2_dog,p3,p3_conf,p3_dog
1321,756303284449767430,https://pbs.twimg.com/media/Cn7tyyZWYAAPlAY.jpg,1,golden_retriever,0.981652,True,cocker_spaniel,0.00679,True,Labrador_retriever,0.004325,True
401,673689733134946305,https://pbs.twimg.com/media/CVltNgxWEAA5sCJ.jpg,1,Chesapeake_Bay_retriever,0.38222,True,American_Staffordshire_terrier,0.35014,True,seat_belt,0.098874,False
1316,755110668769038337,https://pbs.twimg.com/ext_tw_video_thumb/75511...,1,Labrador_retriever,0.708974,True,golden_retriever,0.114314,True,Great_Pyrenees,0.065813,True
1959,865718153858494464,https://pbs.twimg.com/media/DAOmEZiXYAAcv2S.jpg,1,golden_retriever,0.673664,True,kuvasz,0.157523,True,Labrador_retriever,0.126073,True
686,684097758874210310,https://pbs.twimg.com/media/CX5nR5oWsAAiclh.jpg,1,Labrador_retriever,0.627856,True,German_short-haired_pointer,0.173675,True,Chesapeake_Bay_retriever,0.041342,True


**Gathering** `post_archive`

In [129]:
post_archive = pd.read_csv('twitter-archive-enhanced.csv')
post_archive.sample(5)

Unnamed: 0,tweet_id,in_reply_to_status_id,in_reply_to_user_id,timestamp,source,text,retweeted_status_id,retweeted_status_user_id,retweeted_status_timestamp,expanded_urls,rating_numerator,rating_denominator,name,doggo,floofer,pupper,puppo
679,789137962068021249,,,2016-10-20 16:15:26 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Bo. He's a West Congolese Bugaboop Snu...,,,,https://twitter.com/dog_rates/status/789137962...,12,10,Bo,,,,
126,868552278524837888,,,2017-05-27 19:39:34 +0000,"<a href=""http://twitter.com/download/iphone"" r...",Say hello to Cooper. His expression is the sam...,,,,"https://www.gofundme.com/3ti3nps,https://twitt...",12,10,Cooper,,,,
1741,679475951516934144,,,2015-12-23 01:37:45 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Evy. She doesn't want to be a Koala. 9...,,,,https://twitter.com/dog_rates/status/679475951...,9,10,Evy,,,,
102,872620804844003328,,,2017-06-08 01:06:27 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Monkey. She's supporting owners everyw...,,,,https://twitter.com/dog_rates/status/872620804...,13,10,Monkey,,,,
1256,710588934686908417,,,2016-03-17 22:09:38 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Beemo. He's a Chubberflop mix. 12/10 w...,,,,https://twitter.com/dog_rates/status/710588934...,12,10,Beemo,,,,


**Gathering** `tweet_data`

In [130]:
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)

api = tweepy.API(auth, wait_on_rate_limit=True)

tweet_ids = post_archive.tweet_id.values
len(tweet_ids)

# Query Twitter's API for JSON data for each tweet ID in the Twitter archive
count = 0
fails_dict = {}
start = timer()


# Save each tweet's returned JSON as a new line in a .txt file
if os.path.exists('tweet_json.txt'):
    print('Twitter Info Already Captured')
else:
    with open('tweet_json.txt', 'w') as outfile:
        # This loop will likely take 20-30 minutes to run because of Twitter's rate limit
        for tweet_id in tweet_ids:
            count += 1
            print(str(count) + ": " + str(tweet_id))
            try:
                tweet = api.get_status(tweet_id, tweet_mode='extended')
                print("Success")
                json.dump(tweet._json, outfile)
                outfile.write('\n')
            except tweepy.TweepError as e:
                print("Fail")
                fails_dict[tweet_id] = e
                pass
end = timer()


Twitter Info Already Captured


This codeblock below took a while to figure out but was my solution to storing the raw JSON from a txt file to python object. It stores each JSON as a dictionary inside the `tweet_data` list. Note that `for line in f:` is a special sytax that parses each line of the txt file.

In [131]:
with open('tweet_json.txt') as f:
    tweet_data = []
    for line in f:
        tweet_data.append(json.loads(line))
    

In [132]:
type(tweet_data)

list

In [133]:
type(tweet_data[1])

dict

Example of the raw API with all it's attributes before selectively choosing which ones to add to our  master dataFrame

In [134]:
tweet_data[4]

{'created_at': 'Sat Jul 29 16:00:24 +0000 2017',
 'id': 891327558926688256,
 'id_str': '891327558926688256',
 'full_text': 'This is Franklin. He would like you to stop calling him "cute." He is a very fierce shark and should be respected as such. 12/10 #BarkWeek https://t.co/AtUZn91f7f',
 'truncated': False,
 'display_text_range': [0, 138],
 'entities': {'hashtags': [{'text': 'BarkWeek', 'indices': [129, 138]}],
  'symbols': [],
  'user_mentions': [],
  'urls': [],
  'media': [{'id': 891327551943041024,
    'id_str': '891327551943041024',
    'indices': [139, 162],
    'media_url': 'http://pbs.twimg.com/media/DF6hr6AVYAAZ8G8.jpg',
    'media_url_https': 'https://pbs.twimg.com/media/DF6hr6AVYAAZ8G8.jpg',
    'url': 'https://t.co/AtUZn91f7f',
    'display_url': 'pic.twitter.com/AtUZn91f7f',
    'expanded_url': 'https://twitter.com/dog_rates/status/891327558926688256/photo/1',
    'type': 'photo',
    'sizes': {'medium': {'w': 720, 'h': 540, 'resize': 'fit'},
     'large': {'w': 720, 'h':

In [135]:
tweet_data[4]['retweet_count']

8837

## Step 2 - Creating Master dataFrame
This process is all about merging our three datasets then going through the process of accessing and cleaning.

This codeblock transforms our list of dictionaries to a dataFrame. Some did not contain media which would throw an error. Because in this instance we are only interested in posts with photos using a simple pass funtion solved our issue appropiately.

In [136]:
tweets = pd.DataFrame()

for i in range(1,len(tweet_data)):
    try:
        tweets = tweets.append({'tweet_id': tweet_data[i]['id'],
                                'retweets': tweet_data[i]['retweet_count'],
                                'favorite_counts': tweet_data[i]['favorite_count'],
                                'created_at': tweet_data[i]['created_at'],
                                'full_text':tweet_data[i]['full_text'].split('https://')[0],
                                'post_link':tweet_data[i]['full_text'].split('https://')[1],
                                'picture':tweet_data[i]['entities']['media'][0]['media_url'],
                                'retweeted':tweet_data[i]['retweeted']
                               }, ignore_index=True)
    except:
        pass

In [137]:
tweets.columns.tolist()

['created_at',
 'favorite_counts',
 'full_text',
 'picture',
 'post_link',
 'retweeted',
 'retweets',
 'tweet_id']

### Question:
Is it possible to merge three dataFrames at the same time in a pythonic fashion?

In [138]:
master_df = pd.merge(tweets, img_predictions, on='tweet_id')
master_df

Unnamed: 0,created_at,favorite_counts,full_text,picture,post_link,retweeted,retweets,tweet_id,jpg_url,img_num,p1,p1_conf,p1_dog,p2,p2_conf,p2_dog,p3,p3_conf,p3_dog
0,Mon Jul 31 00:18:03 +0000 2017,24183.0,This is Archie. He is a rare Norwegian Pouncin...,http://pbs.twimg.com/media/DGBdLU1WsAANxJ9.jpg,t.co/wUnZnhtVJB,0.0,3925.0,8.918152e+17,https://pbs.twimg.com/media/DGBdLU1WsAANxJ9.jpg,1,Chihuahua,0.716012,True,malamute,0.078253,True,kelpie,0.031379,True
1,Sun Jul 30 15:58:51 +0000 2017,40629.0,This is Darla. She commenced a snooze mid meal...,http://pbs.twimg.com/media/DF_q7IAWsAEuuN8.jpg,t.co/tD36da7qLQ,0.0,8161.0,8.916896e+17,https://pbs.twimg.com/media/DF_q7IAWsAEuuN8.jpg,1,paper_towel,0.170278,False,Labrador_retriever,0.168086,True,spatula,0.040836,False
2,Sat Jul 29 16:00:24 +0000 2017,38862.0,This is Franklin. He would like you to stop ca...,http://pbs.twimg.com/media/DF6hr6AVYAAZ8G8.jpg,t.co/AtUZn91f7f,0.0,8837.0,8.913276e+17,https://pbs.twimg.com/media/DF6hr6BUMAAzZgT.jpg,2,basset,0.555712,True,English_springer,0.225770,True,German_short-haired_pointer,0.175219,True
3,Sat Jul 29 00:08:17 +0000 2017,19532.0,Here we have a majestic great white breaching ...,http://pbs.twimg.com/media/DF3HwyEWsAABqE6.jpg,t.co/kQ04fDDRmh,0.0,2940.0,8.910880e+17,https://pbs.twimg.com/media/DF3HwyEWsAABqE6.jpg,1,Chesapeake_Bay_retriever,0.425595,True,Irish_terrier,0.116317,True,Indian_elephant,0.076902,False
4,Fri Jul 28 00:22:40 +0000 2017,62928.0,When you watch your owner call another dog a g...,http://pbs.twimg.com/media/DFyBag_UQAAhhBC.jpg,t.co/v0nONBcwxq,0.0,17831.0,8.907292e+17,https://pbs.twimg.com/media/DFyBahAVwAAhUTd.jpg,2,Pomeranian,0.566142,True,Eskimo_dog,0.178406,True,Pembroke,0.076507,True
5,Thu Jul 27 16:25:51 +0000 2017,26866.0,This is Zoey. She doesn't want to be one of th...,http://pbs.twimg.com/media/DFwUU__XcAEpyXI.jpg,t.co/9TwLuAGH0b,0.0,4041.0,8.906092e+17,https://pbs.twimg.com/media/DFwUU__XcAEpyXI.jpg,1,Irish_terrier,0.487574,True,Irish_setter,0.193054,True,Chesapeake_Bay_retriever,0.118184,True
6,Wed Jul 26 00:31:25 +0000 2017,29605.0,This is Koda. He is a South Australian decksha...,http://pbs.twimg.com/media/DFnwSY4WAAAMliS.jpg,t.co/dVPW0B0Mme,0.0,6925.0,8.900066e+17,https://pbs.twimg.com/media/DFnwSY4WAAAMliS.jpg,1,Samoyed,0.957979,True,Pomeranian,0.013884,True,chow,0.008167,True
7,Tue Jul 25 00:10:02 +0000 2017,26144.0,This is Ted. He does his best. Sometimes that'...,http://pbs.twimg.com/media/DFihzFfXsAYGDPR.jpg,t.co/f8dEDcrKSR,0.0,4251.0,8.896388e+17,https://pbs.twimg.com/media/DFihzFfXsAYGDPR.jpg,1,French_bulldog,0.991650,True,boxer,0.002129,True,Staffordshire_bullterrier,0.001498,True
8,Mon Jul 24 00:19:32 +0000 2017,24343.0,This is Oliver. You're witnessing one of his m...,http://pbs.twimg.com/ext_tw_video_thumb/889278...,t.co/WpHvrQedPb,0.0,5063.0,8.892788e+17,https://pbs.twimg.com/ext_tw_video_thumb/88927...,1,whippet,0.626152,True,borzoi,0.194742,True,Saluki,0.027351,True
9,Sun Jul 23 00:22:39 +0000 2017,28106.0,This is Jim. He found a fren. Taught him how t...,http://pbs.twimg.com/media/DFYRgsOUQAARGhO.jpg,t.co/chxruIOUJN,0.0,4246.0,8.889172e+17,https://pbs.twimg.com/media/DFYRgsOUQAARGhO.jpg,1,golden_retriever,0.714719,True,Tibetan_mastiff,0.120184,True,Labrador_retriever,0.105506,True


In [139]:
master_df = pd.merge(master_df, post_archive, on='tweet_id')
master_df.sample(20)

Unnamed: 0,created_at,favorite_counts,full_text,picture,post_link,retweeted,retweets,tweet_id,jpg_url,img_num,p1,p1_conf,p1_dog,p2,p2_conf,p2_dog,p3,p3_conf,p3_dog,in_reply_to_status_id,in_reply_to_user_id,timestamp,source,text,retweeted_status_id,retweeted_status_user_id,retweeted_status_timestamp,expanded_urls,rating_numerator,rating_denominator,name,doggo,floofer,pupper,puppo
572,Mon May 16 00:31:53 +0000 2016,15283.0,This is Larry. He has no self control. Tongue ...,http://pbs.twimg.com/media/CiibOMzUYAA9Mxz.jpg,t.co/ghyT4Ubk1r,0.0,5575.0,7.320056e+17,https://pbs.twimg.com/media/CiibOMzUYAA9Mxz.jpg,1,English_setter,0.677408,True,Border_collie,0.052724,True,cocker_spaniel,0.048572,True,,,2016-05-16 00:31:53 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Larry. He has no self control. Tongue ...,,,,https://twitter.com/dog_rates/status/732005617...,11,10,Larry,,,,
772,Fri Feb 05 03:18:42 +0000 2016,4503.0,We normally don't rate unicorns but this one h...,http://pbs.twimg.com/media/Caa407jWwAAJPH3.jpg,t.co/f9qlKiv39T,0.0,1866.0,6.954464e+17,https://pbs.twimg.com/media/Caa407jWwAAJPH3.jpg,1,basenji,0.748904,True,Cardigan,0.121102,True,Pembroke,0.111767,True,,,2016-02-05 03:18:42 +0000,"<a href=""http://twitter.com/download/iphone"" r...",We normally don't rate unicorns but this one h...,,,,https://twitter.com/dog_rates/status/695446424...,12,10,,,,,
610,Tue Apr 12 01:51:36 +0000 2016,4659.0,This is Clyde. He's making sure you're having ...,http://pbs.twimg.com/media/CfznaXuUsAAH-py.jpg,t.co/y206kWHAj0,0.0,1481.0,7.197045e+17,https://pbs.twimg.com/media/CfznaXuUsAAH-py.jpg,1,home_theater,0.059033,False,window_shade,0.038299,False,bathtub,0.035528,False,,,2016-04-12 01:51:36 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Clyde. He's making sure you're having ...,,,,https://twitter.com/dog_rates/status/719704490...,12,10,Clyde,,,pupper,
222,Mon Dec 26 03:00:30 +0000 2016,19549.0,Here is Atlas. He went all out this year. 13/1...,http://pbs.twimg.com/media/C0khWkVXEAI389B.jpg,t.co/DVYIZOnO81,0.0,7726.0,8.132179e+17,https://pbs.twimg.com/media/C0khWkVXEAI389B.jpg,1,Samoyed,0.905972,True,Pomeranian,0.048038,True,West_Highland_white_terrier,0.035667,True,,,2016-12-26 03:00:30 +0000,"<a href=""http://twitter.com/download/iphone"" r...",Here is Atlas. He went all out this year. 13/1...,,,,https://twitter.com/dog_rates/status/813217897...,13,10,Atlas,,,,
665,Sun Mar 13 15:43:18 +0000 2016,4870.0,This is Klevin. He's addicted to sandwiches (y...,http://pbs.twimg.com/media/CdcGBB3WwAAGBuU.jpg,t.co/7BkkVNu5pd,0.0,1685.0,7.090422e+17,https://pbs.twimg.com/media/CdcGBB3WwAAGBuU.jpg,1,hotdog,0.826579,False,Rottweiler,0.068179,True,Labrador_retriever,0.049218,True,,,2016-03-13 15:43:18 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Klevin. He's addicted to sandwiches (y...,,,,https://twitter.com/dog_rates/status/709042156...,9,10,Klevin,,,,
117,Sun Apr 02 00:03:26 +0000 2017,25138.0,Meet Odin. He's supposed to be giving directio...,http://pbs.twimg.com/media/C8XbDR1WAAAxND8.jpg,t.co/1pSqUbLQ5Z,0.0,4684.0,8.48325e+17,https://pbs.twimg.com/media/C8XbDR1WAAAxND8.jpg,1,malamute,0.544576,True,Siberian_husky,0.290268,True,Eskimo_dog,0.154421,True,,,2017-04-02 00:03:26 +0000,"<a href=""http://twitter.com/download/iphone"" r...",Meet Odin. He's supposed to be giving directio...,,,,https://twitter.com/dog_rates/status/848324959...,12,10,Odin,,,,
1072,Sun Dec 06 01:48:12 +0000 2015,874.0,Take a moment and appreciate how these two dog...,http://pbs.twimg.com/media/CVgbIoYVEAA9xMv.jpg,t.co/juX48bWpng,0.0,259.0,6.73318e+17,https://pbs.twimg.com/media/CVgbIobUYAEaeI3.jpg,2,miniature_pinscher,0.384099,True,bloodhound,0.079923,True,Rottweiler,0.068594,True,,,2015-12-06 01:48:12 +0000,"<a href=""http://twitter.com/download/iphone"" r...",Take a moment and appreciate how these two dog...,,,,https://twitter.com/dog_rates/status/673317986...,10,10,,,,,
550,Thu Jun 02 16:10:29 +0000 2016,3324.0,"""Don't talk to me or my son ever again"" ...10/...",http://pbs.twimg.com/media/Cj9VEs_XAAAlTai.jpg,t.co/s96OYXZIfK,0.0,863.0,7.384024e+17,https://pbs.twimg.com/media/Cj9VEs_XAAAlTai.jpg,1,cocker_spaniel,0.346695,True,Blenheim_spaniel,0.193905,True,Chihuahua,0.078,True,,,2016-06-02 16:10:29 +0000,"<a href=""http://twitter.com/download/iphone"" r...","""Don't talk to me or my son ever again"" ...10/...",,,,https://twitter.com/dog_rates/status/738402415...,10,10,,,,,
345,Thu Sep 29 16:03:01 +0000 2016,21719.0,Idk why this keeps happening. We only rate dog...,http://pbs.twimg.com/media/CtiIj0AWcAEBDvw.jpg,t.co/ya7bviQUUf,0.0,5873.0,7.815247e+17,https://pbs.twimg.com/media/CtiIj0AWcAEBDvw.jpg,1,tennis_ball,0.994712,False,Chesapeake_Bay_retriever,0.003523,True,Labrador_retriever,0.000921,True,,,2016-09-29 16:03:01 +0000,"<a href=""http://twitter.com/download/iphone"" r...",Idk why this keeps happening. We only rate dog...,,,,https://twitter.com/dog_rates/status/781524693...,12,10,,,,,
540,Tue Jun 07 16:09:13 +0000 2016,6869.0,This is getting incredibly frustrating. This i...,http://pbs.twimg.com/media/CkXEu2OUoAAs8yU.jpg,t.co/0yolOOyD3X,0.0,2028.0,7.40214e+17,https://pbs.twimg.com/media/CkXEu2OUoAAs8yU.jpg,1,Chesapeake_Bay_retriever,0.586414,True,Labrador_retriever,0.189782,True,vizsla,0.067607,True,,,2016-06-07 16:09:13 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is getting incredibly frustrating. This i...,,,,https://twitter.com/dog_rates/status/740214038...,10,10,getting,,,,


## Step 3 - Assessing the Master dataFrame

In [140]:
master_df.columns.tolist()

['created_at',
 'favorite_counts',
 'full_text',
 'picture',
 'post_link',
 'retweeted',
 'retweets',
 'tweet_id',
 'jpg_url',
 'img_num',
 'p1',
 'p1_conf',
 'p1_dog',
 'p2',
 'p2_conf',
 'p2_dog',
 'p3',
 'p3_conf',
 'p3_dog',
 'in_reply_to_status_id',
 'in_reply_to_user_id',
 'timestamp',
 'source',
 'text',
 'retweeted_status_id',
 'retweeted_status_user_id',
 'retweeted_status_timestamp',
 'expanded_urls',
 'rating_numerator',
 'rating_denominator',
 'name',
 'doggo',
 'floofer',
 'pupper',
 'puppo']

In [141]:
#Transforms the appropiate fields to datetime format
time_variables = ['created_at','timestamp','retweeted_status_timestamp']
for var in time_variables:
    master_df[var] = pd.to_datetime(master_df[var])

In [142]:
#Checking dTypes
master_df.dtypes

created_at                    datetime64[ns, UTC]
favorite_counts                           float64
full_text                                  object
picture                                    object
post_link                                  object
retweeted                                 float64
retweets                                  float64
tweet_id                                  float64
jpg_url                                    object
img_num                                     int64
p1                                         object
p1_conf                                   float64
p1_dog                                       bool
p2                                         object
p2_conf                                   float64
p2_dog                                       bool
p3                                         object
p3_conf                                   float64
p3_dog                                       bool
in_reply_to_status_id                     float64


In [143]:
#Searching for null values
master_df.isnull().sum()

created_at                       0
favorite_counts                  0
full_text                        0
picture                          0
post_link                        0
retweeted                        0
retweets                         0
tweet_id                         0
jpg_url                          0
img_num                          0
p1                               0
p1_conf                          0
p1_dog                           0
p2                               0
p2_conf                          0
p2_dog                           0
p3                               0
p3_conf                          0
p3_dog                           0
in_reply_to_status_id         1327
in_reply_to_user_id           1327
timestamp                        0
source                           0
text                             0
retweeted_status_id           1297
retweeted_status_user_id      1297
retweeted_status_timestamp    1297
expanded_urls                    0
rating_numerator    

### Lets do some manual accessing 
Using the code below we expand the dataFrame for further exploration.

In [144]:
pd.set_option('display.max_columns', 500)
master_df

Unnamed: 0,created_at,favorite_counts,full_text,picture,post_link,retweeted,retweets,tweet_id,jpg_url,img_num,p1,p1_conf,p1_dog,p2,p2_conf,p2_dog,p3,p3_conf,p3_dog,in_reply_to_status_id,in_reply_to_user_id,timestamp,source,text,retweeted_status_id,retweeted_status_user_id,retweeted_status_timestamp,expanded_urls,rating_numerator,rating_denominator,name,doggo,floofer,pupper,puppo
0,2017-07-31 00:18:03+00:00,24183.0,This is Archie. He is a rare Norwegian Pouncin...,http://pbs.twimg.com/media/DGBdLU1WsAANxJ9.jpg,t.co/wUnZnhtVJB,0.0,3925.0,8.918152e+17,https://pbs.twimg.com/media/DGBdLU1WsAANxJ9.jpg,1,Chihuahua,0.716012,True,malamute,0.078253,True,kelpie,0.031379,True,,,2017-07-31 00:18:03+00:00,"<a href=""http://twitter.com/download/iphone"" r...",This is Archie. He is a rare Norwegian Pouncin...,,,NaT,https://twitter.com/dog_rates/status/891815181...,12,10,Archie,,,,
1,2017-07-30 15:58:51+00:00,40629.0,This is Darla. She commenced a snooze mid meal...,http://pbs.twimg.com/media/DF_q7IAWsAEuuN8.jpg,t.co/tD36da7qLQ,0.0,8161.0,8.916896e+17,https://pbs.twimg.com/media/DF_q7IAWsAEuuN8.jpg,1,paper_towel,0.170278,False,Labrador_retriever,0.168086,True,spatula,0.040836,False,,,2017-07-30 15:58:51+00:00,"<a href=""http://twitter.com/download/iphone"" r...",This is Darla. She commenced a snooze mid meal...,,,NaT,https://twitter.com/dog_rates/status/891689557...,13,10,Darla,,,,
2,2017-07-29 16:00:24+00:00,38862.0,This is Franklin. He would like you to stop ca...,http://pbs.twimg.com/media/DF6hr6AVYAAZ8G8.jpg,t.co/AtUZn91f7f,0.0,8837.0,8.913276e+17,https://pbs.twimg.com/media/DF6hr6BUMAAzZgT.jpg,2,basset,0.555712,True,English_springer,0.225770,True,German_short-haired_pointer,0.175219,True,,,2017-07-29 16:00:24+00:00,"<a href=""http://twitter.com/download/iphone"" r...",This is Franklin. He would like you to stop ca...,,,NaT,https://twitter.com/dog_rates/status/891327558...,12,10,Franklin,,,,
3,2017-07-29 00:08:17+00:00,19532.0,Here we have a majestic great white breaching ...,http://pbs.twimg.com/media/DF3HwyEWsAABqE6.jpg,t.co/kQ04fDDRmh,0.0,2940.0,8.910880e+17,https://pbs.twimg.com/media/DF3HwyEWsAABqE6.jpg,1,Chesapeake_Bay_retriever,0.425595,True,Irish_terrier,0.116317,True,Indian_elephant,0.076902,False,,,2017-07-29 00:08:17+00:00,"<a href=""http://twitter.com/download/iphone"" r...",Here we have a majestic great white breaching ...,,,NaT,https://twitter.com/dog_rates/status/891087950...,13,10,,,,,
4,2017-07-28 00:22:40+00:00,62928.0,When you watch your owner call another dog a g...,http://pbs.twimg.com/media/DFyBag_UQAAhhBC.jpg,t.co/v0nONBcwxq,0.0,17831.0,8.907292e+17,https://pbs.twimg.com/media/DFyBahAVwAAhUTd.jpg,2,Pomeranian,0.566142,True,Eskimo_dog,0.178406,True,Pembroke,0.076507,True,,,2017-07-28 00:22:40+00:00,"<a href=""http://twitter.com/download/iphone"" r...",When you watch your owner call another dog a g...,,,NaT,https://twitter.com/dog_rates/status/890729181...,13,10,,,,,
5,2017-07-27 16:25:51+00:00,26866.0,This is Zoey. She doesn't want to be one of th...,http://pbs.twimg.com/media/DFwUU__XcAEpyXI.jpg,t.co/9TwLuAGH0b,0.0,4041.0,8.906092e+17,https://pbs.twimg.com/media/DFwUU__XcAEpyXI.jpg,1,Irish_terrier,0.487574,True,Irish_setter,0.193054,True,Chesapeake_Bay_retriever,0.118184,True,,,2017-07-27 16:25:51+00:00,"<a href=""http://twitter.com/download/iphone"" r...",This is Zoey. She doesn't want to be one of th...,,,NaT,https://twitter.com/dog_rates/status/890609185...,13,10,Zoey,,,,
6,2017-07-26 00:31:25+00:00,29605.0,This is Koda. He is a South Australian decksha...,http://pbs.twimg.com/media/DFnwSY4WAAAMliS.jpg,t.co/dVPW0B0Mme,0.0,6925.0,8.900066e+17,https://pbs.twimg.com/media/DFnwSY4WAAAMliS.jpg,1,Samoyed,0.957979,True,Pomeranian,0.013884,True,chow,0.008167,True,,,2017-07-26 00:31:25+00:00,"<a href=""http://twitter.com/download/iphone"" r...",This is Koda. He is a South Australian decksha...,,,NaT,https://twitter.com/dog_rates/status/890006608...,13,10,Koda,,,,
7,2017-07-25 00:10:02+00:00,26144.0,This is Ted. He does his best. Sometimes that'...,http://pbs.twimg.com/media/DFihzFfXsAYGDPR.jpg,t.co/f8dEDcrKSR,0.0,4251.0,8.896388e+17,https://pbs.twimg.com/media/DFihzFfXsAYGDPR.jpg,1,French_bulldog,0.991650,True,boxer,0.002129,True,Staffordshire_bullterrier,0.001498,True,,,2017-07-25 00:10:02+00:00,"<a href=""http://twitter.com/download/iphone"" r...",This is Ted. He does his best. Sometimes that'...,,,NaT,https://twitter.com/dog_rates/status/889638837...,12,10,Ted,,,,
8,2017-07-24 00:19:32+00:00,24343.0,This is Oliver. You're witnessing one of his m...,http://pbs.twimg.com/ext_tw_video_thumb/889278...,t.co/WpHvrQedPb,0.0,5063.0,8.892788e+17,https://pbs.twimg.com/ext_tw_video_thumb/88927...,1,whippet,0.626152,True,borzoi,0.194742,True,Saluki,0.027351,True,,,2017-07-24 00:19:32+00:00,"<a href=""http://twitter.com/download/iphone"" r...",This is Oliver. You're witnessing one of his m...,,,NaT,https://twitter.com/dog_rates/status/889278841...,13,10,Oliver,,,,
9,2017-07-23 00:22:39+00:00,28106.0,This is Jim. He found a fren. Taught him how t...,http://pbs.twimg.com/media/DFYRgsOUQAARGhO.jpg,t.co/chxruIOUJN,0.0,4246.0,8.889172e+17,https://pbs.twimg.com/media/DFYRgsOUQAARGhO.jpg,1,golden_retriever,0.714719,True,Tibetan_mastiff,0.120184,True,Labrador_retriever,0.105506,True,,,2017-07-23 00:22:39+00:00,"<a href=""http://twitter.com/download/iphone"" r...",This is Jim. He found a fren. Taught him how t...,,,NaT,https://twitter.com/dog_rates/status/888917238...,12,10,Jim,,,,


Using manual and programtic accessment we have found the issues below

* Created_at should be date time
* Remove unnecessary variables
* Transform retweeted to boolean
* Transform retweets to int
* Remove jpg_url
* change predictions from p1 to prediction1
* Make all predictions follow the same sytax (some are lower/upper case)
* Drop ‚Äòtimestamp‚Äô
* Drop ‚Äòsource‚Äô
* Drop ‚Äòtext‚Äô
* Drop ‚Äòexpanded_urls‚Äô
* Meltdown ‚Äòdoggo, flooder, popper, and puppo‚Äô
* Fix spacing to be universal in dog predictions 
* Fix names to be universal if not a name, let‚Äôs go with NaN

## Step 4 - Cleaning the Master dataFrame
We will now go about cleaning the master dataFrame

In [145]:
#Drop unnecessary columns
master_df.drop(columns={'timestamp','jpg_url','source','text','expanded_urls','in_reply_to_status_id','in_reply_to_user_id','retweeted_status_id','retweeted_status_user_id','retweeted_status_timestamp'}, inplace=True)

In [146]:
#Reassign data types
master_df.retweets = master_df.retweets.astype(int)
master_df.favorite_counts = master_df.favorite_counts.astype(int)
master_df.retweeted = master_df.retweeted.astype(int)
master_df.retweeted = master_df.retweeted.astype(bool)
master_df

Unnamed: 0,created_at,favorite_counts,full_text,picture,post_link,retweeted,retweets,tweet_id,img_num,p1,p1_conf,p1_dog,p2,p2_conf,p2_dog,p3,p3_conf,p3_dog,rating_numerator,rating_denominator,name,doggo,floofer,pupper,puppo
0,2017-07-31 00:18:03+00:00,24183,This is Archie. He is a rare Norwegian Pouncin...,http://pbs.twimg.com/media/DGBdLU1WsAANxJ9.jpg,t.co/wUnZnhtVJB,False,3925,8.918152e+17,1,Chihuahua,0.716012,True,malamute,0.078253,True,kelpie,0.031379,True,12,10,Archie,,,,
1,2017-07-30 15:58:51+00:00,40629,This is Darla. She commenced a snooze mid meal...,http://pbs.twimg.com/media/DF_q7IAWsAEuuN8.jpg,t.co/tD36da7qLQ,False,8161,8.916896e+17,1,paper_towel,0.170278,False,Labrador_retriever,0.168086,True,spatula,0.040836,False,13,10,Darla,,,,
2,2017-07-29 16:00:24+00:00,38862,This is Franklin. He would like you to stop ca...,http://pbs.twimg.com/media/DF6hr6AVYAAZ8G8.jpg,t.co/AtUZn91f7f,False,8837,8.913276e+17,2,basset,0.555712,True,English_springer,0.225770,True,German_short-haired_pointer,0.175219,True,12,10,Franklin,,,,
3,2017-07-29 00:08:17+00:00,19532,Here we have a majestic great white breaching ...,http://pbs.twimg.com/media/DF3HwyEWsAABqE6.jpg,t.co/kQ04fDDRmh,False,2940,8.910880e+17,1,Chesapeake_Bay_retriever,0.425595,True,Irish_terrier,0.116317,True,Indian_elephant,0.076902,False,13,10,,,,,
4,2017-07-28 00:22:40+00:00,62928,When you watch your owner call another dog a g...,http://pbs.twimg.com/media/DFyBag_UQAAhhBC.jpg,t.co/v0nONBcwxq,False,17831,8.907292e+17,2,Pomeranian,0.566142,True,Eskimo_dog,0.178406,True,Pembroke,0.076507,True,13,10,,,,,
5,2017-07-27 16:25:51+00:00,26866,This is Zoey. She doesn't want to be one of th...,http://pbs.twimg.com/media/DFwUU__XcAEpyXI.jpg,t.co/9TwLuAGH0b,False,4041,8.906092e+17,1,Irish_terrier,0.487574,True,Irish_setter,0.193054,True,Chesapeake_Bay_retriever,0.118184,True,13,10,Zoey,,,,
6,2017-07-26 00:31:25+00:00,29605,This is Koda. He is a South Australian decksha...,http://pbs.twimg.com/media/DFnwSY4WAAAMliS.jpg,t.co/dVPW0B0Mme,False,6925,8.900066e+17,1,Samoyed,0.957979,True,Pomeranian,0.013884,True,chow,0.008167,True,13,10,Koda,,,,
7,2017-07-25 00:10:02+00:00,26144,This is Ted. He does his best. Sometimes that'...,http://pbs.twimg.com/media/DFihzFfXsAYGDPR.jpg,t.co/f8dEDcrKSR,False,4251,8.896388e+17,1,French_bulldog,0.991650,True,boxer,0.002129,True,Staffordshire_bullterrier,0.001498,True,12,10,Ted,,,,
8,2017-07-24 00:19:32+00:00,24343,This is Oliver. You're witnessing one of his m...,http://pbs.twimg.com/ext_tw_video_thumb/889278...,t.co/WpHvrQedPb,False,5063,8.892788e+17,1,whippet,0.626152,True,borzoi,0.194742,True,Saluki,0.027351,True,13,10,Oliver,,,,
9,2017-07-23 00:22:39+00:00,28106,This is Jim. He found a fren. Taught him how t...,http://pbs.twimg.com/media/DFYRgsOUQAARGhO.jpg,t.co/chxruIOUJN,False,4246,8.889172e+17,1,golden_retriever,0.714719,True,Tibetan_mastiff,0.120184,True,Labrador_retriever,0.105506,True,12,10,Jim,,,,


In [147]:
#Meltdown some columns for a Tidy dataFrame
melt = pd.melt(master_df, id_vars=(master_df.columns.tolist()[0:-4]), value_name='dog_category')
melt

Unnamed: 0,created_at,favorite_counts,full_text,picture,post_link,retweeted,retweets,tweet_id,img_num,p1,p1_conf,p1_dog,p2,p2_conf,p2_dog,p3,p3_conf,p3_dog,rating_numerator,rating_denominator,name,variable,dog_category
0,2017-07-31 00:18:03+00:00,24183,This is Archie. He is a rare Norwegian Pouncin...,http://pbs.twimg.com/media/DGBdLU1WsAANxJ9.jpg,t.co/wUnZnhtVJB,False,3925,8.918152e+17,1,Chihuahua,0.716012,True,malamute,0.078253,True,kelpie,0.031379,True,12,10,Archie,doggo,
1,2017-07-30 15:58:51+00:00,40629,This is Darla. She commenced a snooze mid meal...,http://pbs.twimg.com/media/DF_q7IAWsAEuuN8.jpg,t.co/tD36da7qLQ,False,8161,8.916896e+17,1,paper_towel,0.170278,False,Labrador_retriever,0.168086,True,spatula,0.040836,False,13,10,Darla,doggo,
2,2017-07-29 16:00:24+00:00,38862,This is Franklin. He would like you to stop ca...,http://pbs.twimg.com/media/DF6hr6AVYAAZ8G8.jpg,t.co/AtUZn91f7f,False,8837,8.913276e+17,2,basset,0.555712,True,English_springer,0.225770,True,German_short-haired_pointer,0.175219,True,12,10,Franklin,doggo,
3,2017-07-29 00:08:17+00:00,19532,Here we have a majestic great white breaching ...,http://pbs.twimg.com/media/DF3HwyEWsAABqE6.jpg,t.co/kQ04fDDRmh,False,2940,8.910880e+17,1,Chesapeake_Bay_retriever,0.425595,True,Irish_terrier,0.116317,True,Indian_elephant,0.076902,False,13,10,,doggo,
4,2017-07-28 00:22:40+00:00,62928,When you watch your owner call another dog a g...,http://pbs.twimg.com/media/DFyBag_UQAAhhBC.jpg,t.co/v0nONBcwxq,False,17831,8.907292e+17,2,Pomeranian,0.566142,True,Eskimo_dog,0.178406,True,Pembroke,0.076507,True,13,10,,doggo,
5,2017-07-27 16:25:51+00:00,26866,This is Zoey. She doesn't want to be one of th...,http://pbs.twimg.com/media/DFwUU__XcAEpyXI.jpg,t.co/9TwLuAGH0b,False,4041,8.906092e+17,1,Irish_terrier,0.487574,True,Irish_setter,0.193054,True,Chesapeake_Bay_retriever,0.118184,True,13,10,Zoey,doggo,
6,2017-07-26 00:31:25+00:00,29605,This is Koda. He is a South Australian decksha...,http://pbs.twimg.com/media/DFnwSY4WAAAMliS.jpg,t.co/dVPW0B0Mme,False,6925,8.900066e+17,1,Samoyed,0.957979,True,Pomeranian,0.013884,True,chow,0.008167,True,13,10,Koda,doggo,
7,2017-07-25 00:10:02+00:00,26144,This is Ted. He does his best. Sometimes that'...,http://pbs.twimg.com/media/DFihzFfXsAYGDPR.jpg,t.co/f8dEDcrKSR,False,4251,8.896388e+17,1,French_bulldog,0.991650,True,boxer,0.002129,True,Staffordshire_bullterrier,0.001498,True,12,10,Ted,doggo,
8,2017-07-24 00:19:32+00:00,24343,This is Oliver. You're witnessing one of his m...,http://pbs.twimg.com/ext_tw_video_thumb/889278...,t.co/WpHvrQedPb,False,5063,8.892788e+17,1,whippet,0.626152,True,borzoi,0.194742,True,Saluki,0.027351,True,13,10,Oliver,doggo,
9,2017-07-23 00:22:39+00:00,28106,This is Jim. He found a fren. Taught him how t...,http://pbs.twimg.com/media/DFYRgsOUQAARGhO.jpg,t.co/chxruIOUJN,False,4246,8.889172e+17,1,golden_retriever,0.714719,True,Tibetan_mastiff,0.120184,True,Labrador_retriever,0.105506,True,12,10,Jim,doggo,


The codes below are checking to see if we can safely clone and drop one of the melt columns. 
1. Check unique values
2. Query those values for manual accessment
3. Clone the variable column to the dog_category column
4. Drop the variable column, (inplace)
5. Reassess

In [148]:
melt.dog_category.unique()

array(['None', 'doggo', 'floofer', 'pupper', 'puppo'], dtype=object)

In [149]:
melt.query("dog_category in ['doggo', 'floofer', 'pupper', 'puppo']")

Unnamed: 0,created_at,favorite_counts,full_text,picture,post_link,retweeted,retweets,tweet_id,img_num,p1,p1_conf,p1_dog,p2,p2_conf,p2_dog,p3,p3_conf,p3_dog,rating_numerator,rating_denominator,name,variable,dog_category
58,2017-06-09 00:02:31+00:00,26443,Here's a very large dog. He has a date later. ...,http://pbs.twimg.com/media/DB1m871XkAUbCkY.jpg,t.co/EMYIdoblMR,False,5158,8.729671e+17,2,Labrador_retriever,0.476913,True,Chesapeake_Bay_retriever,0.174145,True,German_short-haired_pointer,0.092861,True,12,10,,doggo,doggo
89,2017-05-01 00:40:27+00:00,15403,I have stumbled puppon a doggo painting party....,http://pbs.twimg.com/media/C-s5oYZXkAAMHHq.jpg,t.co/cUeDMlHJbq,False,3438,8.588435e+17,1,golden_retriever,0.578120,True,Labrador_retriever,0.286059,True,bloodhound,0.026917,True,13,10,,doggo,doggo
99,2017-04-22 18:31:02+00:00,45445,Here's a puppo participating in the #ScienceMa...,http://pbs.twimg.com/media/C-CYWrvWAAU8AXH.jpg,t.co/cMhq16isel,False,17833,8.558515e+17,1,flat-coated_retriever,0.321676,True,Labrador_retriever,0.115138,True,groenendael,0.096100,True,13,10,,doggo,doggo
104,2017-04-17 16:34:26+00:00,16239,"At first I thought this was a shy doggo, but i...",http://pbs.twimg.com/media/C9oNt91WAAAFSLS.jpg,t.co/TXdT3tmuYk,False,3168,8.540102e+17,1,English_springer,0.354733,True,collie,0.177538,True,Border_collie,0.131706,True,11,10,,doggo,doggo
124,2017-03-24 22:08:59+00:00,7792,Say hello to Mimosa. She's an emotional suppor...,http://pbs.twimg.com/media/C7t0IzLWkAINoft.jpg,t.co/L6mLzrd7Mx,False,1850,8.453971e+17,1,Dandie_Dinmont,0.394404,True,Maltese_dog,0.186537,True,West_Highland_white_terrier,0.181985,True,13,10,Mimosa,doggo,doggo
153,2017-02-16 17:00:25+00:00,11690,Say hello to Smiley. He's a blind therapy dogg...,http://pbs.twimg.com/ext_tw_video_thumb/832273...,t.co/SHAb1wHjMz,False,2431,8.322734e+17,1,Pembroke,0.134081,True,ice_bear,0.051928,False,pug,0.044311,True,14,10,Smiley,doggo,doggo
163,2017-02-08 22:00:52+00:00,10835,Here's a stressed doggo. Had a long day. Many ...,http://pbs.twimg.com/media/C4LMUf8WYAkWz4I.jpg,t.co/fmRS43mWQB,False,2113,8.294499e+17,1,Labrador_retriever,0.315163,True,golden_retriever,0.153210,True,Pekinese,0.132791,True,11,10,,doggo,doggo
171,2017-02-01 17:44:55+00:00,37787,This is Cupid. He was found in the trash. Now ...,http://pbs.twimg.com/media/C3mOnZ8WMAAQXRY.jpg,t.co/WS0Gha8vRh,False,10680,8.268488e+17,4,Great_Pyrenees,0.858764,True,golden_retriever,0.023526,True,Pekinese,0.017104,True,13,10,Cupid,doggo,doggo
189,2017-01-18 17:07:18+00:00,8764,This is Duchess. She uses dark doggo forces to...,http://pbs.twimg.com/media/C2d_vnHWEAE9phX.jpg,t.co/maDNMETA52,False,1737,8.217659e+17,1,golden_retriever,0.980071,True,Labrador_retriever,0.008758,True,Saluki,0.001806,True,13,10,Duchess,doggo,doggo
194,2017-01-13 15:08:56+00:00,13447,Here we have a doggo who has messed up. He was...,http://pbs.twimg.com/ext_tw_video_thumb/819924...,t.co/XdRNXNYD4E,False,5125,8.199242e+17,1,bathtub,0.100896,False,shower_curtain,0.091866,False,tub,0.049176,False,11,10,,doggo,doggo


In [150]:
melt.dog_category = melt.variable

In [151]:
melt.drop(columns='variable', inplace=True)

In [152]:
melt.sample(50)

Unnamed: 0,created_at,favorite_counts,full_text,picture,post_link,retweeted,retweets,tweet_id,img_num,p1,p1_conf,p1_dog,p2,p2_conf,p2_dog,p3,p3_conf,p3_dog,rating_numerator,rating_denominator,name,dog_category
972,2015-12-15 04:19:18+00:00,2939,I promise this wasn't meant to be a cuteness o...,http://pbs.twimg.com/media/CWPUB9TWwAALPPx.jpg,t.co/mpQl2rJjDh,False,993,6.766175e+17,1,Chihuahua,0.841084,True,Pomeranian,0.12053,True,Pekinese,0.0066,True,13,10,,doggo
1439,2017-04-24 15:13:52+00:00,11796,"THIS IS CHARLIE, MARK. HE DID JUST WANT TO SAY...",http://pbs.twimg.com/media/C-L-aIYXgAIR0jY.jpg,t.co/p1hBHCmWnA,False,1864,8.565266e+17,1,Old_English_sheepdog,0.798481,True,Tibetan_terrier,0.060602,True,standard_poodle,0.040722,True,14,10,,floofer
4840,2016-01-22 03:24:22+00:00,3346,This is Phred. He's an Albanian Flepperkush. T...,http://pbs.twimg.com/media/CZSz3vWXEAACElU.jpg,t.co/VpfFCKE28C,False,872,6.903744e+17,1,kuvasz,0.286345,True,Labrador_retriever,0.107144,True,ice_bear,0.085086,False,11,10,Phred,puppo
2722,2017-06-27 00:10:17+00:00,22657,This is Bailey. He thinks you should measure e...,http://pbs.twimg.com/media/DDSVWMvXsAEgmMK.jpg,t.co/IxM9IMKQq8,False,3013,8.79492e+17,1,German_short-haired_pointer,0.479896,True,vizsla,0.124353,True,bath_towel,0.07332,False,12,10,Bailey,pupper
5162,2015-11-30 03:06:07+00:00,1652,Pack of horned dogs here. Very team-oriented b...,http://pbs.twimg.com/media/CVBzbWsWsAEyNMA.jpg,t.co/U7DQQdZ0mX,False,1098,6.711633e+17,1,African_hunting_dog,0.733025,False,plow,0.119377,False,Scottish_deerhound,0.026983,True,8,10,,puppo
2006,2016-03-14 02:04:08+00:00,2485,"From left to right:\nCletus, Jerome, Alejandro...",http://pbs.twimg.com/media/CdeUKpcWoAAJAWJ.jpg,t.co/sedre1ivTK,False,662,7.091984e+17,1,borzoi,0.490783,True,wire-haired_fox_terrier,0.083513,True,English_setter,0.083184,True,45,50,,floofer
1071,2015-12-06 01:56:44+00:00,13417,This is Frankie. He's wearing blush. 11/10 rea...,http://pbs.twimg.com/media/CVgdFjOWwAAa1PP.jpg,t.co/iJABMhVidf,False,7891,6.733201e+17,3,Samoyed,0.978833,True,Pomeranian,0.012763,True,Eskimo_dog,0.001853,True,11,10,Frankie,doggo
3682,2015-12-12 02:23:01+00:00,17273,"I shall call him squishy and he shall be mine,...",http://pbs.twimg.com/media/CV_cnjHWUAADc-c.jpg,t.co/WId5lxNdPH,False,5898,6.755011e+17,1,dough,0.806757,False,bakery,0.027907,False,French_loaf,0.018189,False,13,10,,pupper
1706,2016-09-19 01:42:24+00:00,11729,"""Yep... just as I suspected. You're not flossi...",http://pbs.twimg.com/media/CsrjryzWgAAZY00.jpg,t.co/SuXcI9B7pQ,False,3093,7.776842e+17,1,cocker_spaniel,0.253442,True,golden_retriever,0.16285,True,otterhound,0.110921,True,12,10,,floofer
1830,2016-07-04 22:00:12+00:00,2245,This is Spanky. He was a member of the 2002 US...,http://pbs.twimg.com/media/Cmf5WLGWYAAcmRw.jpg,t.co/7tlZPrePXd,False,562,7.500868e+17,1,pug,0.978277,True,teddy,0.003134,False,Brabancon_griffon,0.003061,True,12,10,Spanky,floofer


Now it is time to fix up some of the other problems mentioned earlier. I have been taught to tidy the data first then get rid of any messy data.

In [153]:
#Tidy the names
p_columns = ['p1','p2','p3']
for pvar in p_columns:
    melt[pvar] = melt[pvar].str.replace('-','_')
    melt[pvar] = melt[pvar].str.lower()

melt.sample(150)

Unnamed: 0,created_at,favorite_counts,full_text,picture,post_link,retweeted,retweets,tweet_id,img_num,p1,p1_conf,p1_dog,p2,p2_conf,p2_dog,p3,p3_conf,p3_dog,rating_numerator,rating_denominator,name,dog_category
322,2016-10-14 16:13:10+00:00,27935,This is Rory. He's got an interview in a few m...,http://pbs.twimg.com/media/Cuvau3MW8AAxaRv.jpg,t.co/ibj5g6xaAj,False,8443,7.869631e+17,1,golden_retriever,0.915303,True,saluki,0.046213,True,labrador_retriever,3.750410e-02,True,12,10,Rory,doggo
557,2016-05-29 01:49:16+00:00,4399,This is Chadrick. He's gnarly af 13/10,http://pbs.twimg.com/media/CjlpmZaUgAED54W.jpg,t.co/447tyBN0mW,False,1789,7.367361e+17,1,schipperke,0.545502,True,groenendael,0.298622,True,labrador_retriever,3.098640e-02,True,13,10,Chadrick,doggo
1429,2017-05-03 03:17:27+00:00,18389,Sorry for the lack of posts today. I came home...,http://pbs.twimg.com/media/C-3wvtxXcAUTuBE.jpg,t.co/BArWupFAn0,False,1542,8.596078e+17,1,golden_retriever,0.895529,True,irish_setter,0.024099,True,labrador_retriever,1.928540e-02,True,13,10,,floofer
528,2016-06-15 22:36:19+00:00,3956,"Meet Kayla, an underground poker legend. Playe...",http://pbs.twimg.com/media/ClBqDuDWkAALK2e.jpg,t.co/EkLku795aO,False,1416,7.432106e+17,1,golden_retriever,0.930705,True,chesapeake_bay_retriever,0.025934,True,labrador_retriever,7.535360e-03,True,10,10,Kayla,doggo
2679,2015-11-16 01:01:59+00:00,108,Here is the Rand Paul of retrievers folks! He'...,http://pbs.twimg.com/media/CT5Qw94XAAA_2dP.jpg,t.co/pYAJkAe76p,False,56,6.660586e+17,1,miniature_poodle,0.201493,True,komondor,0.192305,True,soft_coated_wheaten_terrier,8.208610e-02,True,8,10,the,floofer
4365,2016-10-03 01:00:34+00:00,7827,This is Deacon. He's the happiest almost dry d...,http://pbs.twimg.com/media/CtzgXgeXYAA1Gxw.jpg,t.co/C6fUMnHt1H,False,1479,7.827471e+17,1,golden_retriever,0.560699,True,otterhound,0.199482,True,clumber,4.068180e-02,True,11,10,Deacon,puppo
2452,2015-12-02 02:13:48+00:00,1241,This is Mia. She was specifically told not get...,http://pbs.twimg.com/media/CVL6op1WEAAUFE7.jpg,t.co/3J7wkwW4FG,False,551,6.718749e+17,1,china_cabinet,0.996031,False,entertainment_center,0.001986,False,bookcase,1.651810e-03,False,10,10,Mia,floofer
3677,2015-12-12 15:59:51+00:00,648,This is a Sizzlin Menorah spaniel from Brookly...,http://pbs.twimg.com/media/CWCXj35VEAIFvtk.jpg,t.co/7E0AiJXPmI,False,99,6.757066e+17,1,english_springer,0.990300,True,welsh_springer_spaniel,0.002080,True,cocker_spaniel,2.013780e-03,True,10,10,a,pupper
2856,2017-01-31 01:27:39+00:00,13787,We only rate dogs. Please don't send in any mo...,http://pbs.twimg.com/media/C3dlVMbXAAUd-Gh.jpg,t.co/srXL2s868C,False,2716,8.262405e+17,1,french_bulldog,0.903048,True,pug,0.096242,True,boston_bull,2.343640e-04,True,11,10,,pupper
5084,2015-12-07 03:07:12+00:00,1438,Large blue dog here. Cool shades. Flipping us ...,http://pbs.twimg.com/media/CVl2ydUWsAA1jD6.jpg,t.co/mcPd5AFfhA,False,565,6.737003e+17,1,water_bottle,0.614536,False,ashcan,0.050911,False,bucket,3.743190e-02,False,3,10,,puppo


Melt table looks good so we are going to transform that to the Master for further cleaning.

In [154]:
master_df = melt

In [155]:
master_df.drop_duplicates(subset='tweet_id', inplace=True)
master_df.sort_values('favorite_counts', ascending=False)

Unnamed: 0,created_at,favorite_counts,full_text,picture,post_link,retweeted,retweets,tweet_id,img_num,p1,p1_conf,p1_dog,p2,p2_conf,p2_dog,p3,p3_conf,p3_dog,rating_numerator,rating_denominator,name,dog_category
246,2016-12-09 06:17:20+00:00,124553,This is Stephan. He just wants to help. 13/10 ...,http://pbs.twimg.com/ext_tw_video_thumb/807106...,t.co/DkBYaCAg2d,False,58901,8.071068e+17,1,chihuahua,0.505370,True,pomeranian,0.120358,True,toy_terrier,0.077008,True,13,10,Stephan,doggo
198,2017-01-11 02:15:36+00:00,90604,This is Bo. He was a very good First Doggo. 14...,http://pbs.twimg.com/media/C12whDoVEAALRxa.jpg,t.co/AdPKrI8BZ1,False,38761,8.190048e+17,1,standard_poodle,0.351308,True,toy_poodle,0.271929,True,tibetan_terrier,0.094759,True,14,10,Bo,doggo
88,2017-05-02 00:04:57+00:00,89061,We only rate dogs. This is quite clearly a smo...,http://pbs.twimg.com/ext_tw_video_thumb/859196...,t.co/g2nSyGenG9,False,29693,8.591970e+17,1,angora,0.224218,False,malamute,0.216163,True,persian_cat,0.128383,False,12,10,quite,doggo
944,2015-12-20 02:20:55+00:00,80817,This made my day. 12/10 please enjoy,http://pbs.twimg.com/ext_tw_video_thumb/678399...,t.co/VRTbo3aAcm,False,32605,6.783997e+17,1,swing,0.929196,False,bedlington_terrier,0.015047,True,great_pyrenees,0.014039,True,12,10,,doggo
63,2017-06-01 20:18:38+00:00,80570,This is Zoey. She really likes the planet. Wou...,http://pbs.twimg.com/media/DBQwlFCXkAACSkI.jpg,t.co/T1xlgaPujm,False,25363,8.703740e+17,1,golden_retriever,0.841001,True,great_pyrenees,0.099278,True,labrador_retriever,0.032621,True,13,10,Zoey,doggo
43,2017-06-22 03:54:17+00:00,76918,This is Aja. She was just told she's a good do...,http://pbs.twimg.com/media/DC5YqoPXgAA7Uph.jpg,t.co/lsPyyAiF1r,False,18048,8.777365e+17,2,chesapeake_bay_retriever,0.837956,True,labrador_retriever,0.062034,True,weimaraner,0.040599,True,13,10,Aja,doggo
979,2015-12-14 01:58:31+00:00,76421,This is Kenneth. He's stuck in a bubble. 10/10...,http://pbs.twimg.com/media/CWJqN9iWwAAg86R.jpg,t.co/uQt37xlYMJ,False,31481,6.762197e+17,1,bubble,0.997556,False,leafhopper,0.000159,False,whippet,0.000132,True,10,10,Kenneth,doggo
80,2017-05-10 00:08:34+00:00,73365,We only rate dogs. Please don't send perfectly...,http://pbs.twimg.com/media/C_bIo8MXkAA3xBK.jpg,t.co/nvZyyrp0kd,False,22694,8.620970e+17,2,chow,0.677589,True,pomeranian,0.270648,True,pekinese,0.038110,True,13,10,,doggo
159,2017-02-12 01:04:29+00:00,69209,This is Lilly. She just parallel barked. Kindl...,http://pbs.twimg.com/media/C4bTH6nWMAAX_bJ.jpg,t.co/SATN4If5H5,False,17521,8.305833e+17,1,labrador_retriever,0.908703,True,seat_belt,0.057091,False,pug,0.011933,True,13,10,Lilly,doggo
4,2017-07-28 00:22:40+00:00,62928,When you watch your owner call another dog a g...,http://pbs.twimg.com/media/DFyBag_UQAAhhBC.jpg,t.co/v0nONBcwxq,False,17831,8.907292e+17,2,pomeranian,0.566142,True,eskimo_dog,0.178406,True,pembroke,0.076507,True,13,10,,doggo


## Step 5 - Analyze the Data
Time for the good stuff! Our first analysis is taking the dog type predictions that were above 80% confidence and seeing if the people have any biase toward any one type of dog in general.

In [156]:
#Build a dataFrame that consists of just data with high confidence levels
confident_predictions = master_df.query("p1_conf > .8")

#Find total amount of likes per dog type
favorite_dogs = confident_predictions.query("p1_dog == True").groupby(['p1']).sum()\
                .sort_values('favorite_counts', ascending=False)[['favorite_counts']]
favorite_dogs

Unnamed: 0_level_0,favorite_counts
p1,Unnamed: 1_level_1
golden_retriever,635519
pembroke,357581
labrador_retriever,295286
french_bulldog,190984
chesapeake_bay_retriever,125052
samoyed,122784
pug,74074
basset,73758
pomeranian,65932
chihuahua,63981


In [157]:
#Get the number of posts per dog type
number_of_dog_posts['number_of_posts'] = confident_predictions.query("p1_dog == True").groupby(['p1']).count()[['created_at']]
number_of_dog_posts.drop(columns='created_at',inplace=True)

KeyError: "['created_at'] not found in axis"

In [None]:
most_loved_dogs = favorite_dogs.join(number_of_dog_posts)
most_loved_dogs['hearts_per_post'] = (most_loved_dogs.favorite_counts / most_loved_dogs.number_of_posts).astype(int)
most_loved_dogs.sort_values('hearts_per_post', ascending=False, inplace=True)
most_loved_dogs

In [None]:
#Build a bar chart illustrating which dog types had the most likes/post
most_loved_dogs['hearts_per_post'].sort_values().plot(kind='barh', figsize=(10,30));
plt.title('Hearts Per Post');
plt.ylabel('Dog Type');