# Project: Wrangling and Analyze Data

## Data Gathering
In the cell below, gather **all** three pieces of data for this project and load them in the notebook. **Note:** the methods required to gather each data are different.
1. Directly download the WeRateDogs Twitter archive data (twitter_archive_enhanced.csv)

In [8]:
import tweepy
import pandas as pd
import numpy as np
import requests
import re
import json

df_1 = pd.read_csv('twitter_archive_enhanced.csv')
df_1.sample(25)

Unnamed: 0,tweet_id,in_reply_to_status_id,in_reply_to_user_id,timestamp,source,text,retweeted_status_id,retweeted_status_user_id,retweeted_status_timestamp,expanded_urls,rating_numerator,rating_denominator,name,doggo,floofer,pupper,puppo
289,838201503651401729,,,2017-03-05 01:36:26 +0000,"<a href=""http://twitter.com/download/iphone"" r...",RT @dog_rates: Meet Sunny. He can take down a ...,8.207497e+17,4196984000.0,2017-01-15 21:49:15 +0000,https://twitter.com/dog_rates/status/820749716...,13,10,Sunny,,,,
1457,695095422348574720,,,2016-02-04 04:03:57 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is just a beautiful pupper good shit evol...,,,,https://twitter.com/dog_rates/status/695095422...,12,10,just,,,pupper,
1270,709449600415961088,,,2016-03-14 18:42:20 +0000,"<a href=""http://twitter.com/download/iphone"" r...",Meet Karma. She's just a head. Lost body durin...,,,,https://twitter.com/dog_rates/status/709449600...,10,10,Karma,,,,
654,791821351946420224,,,2016-10-28 01:58:16 +0000,"<a href=""http://twitter.com/download/iphone"" r...",RT @dog_rates: This little fella really hates ...,6.84831e+17,4196984000.0,2016-01-06 20:16:44 +0000,"https://vine.co/v/eEZXZI1rqxX,https://vine.co/...",13,10,,,,pupper,
1286,708400866336894977,,,2016-03-11 21:15:02 +0000,"<a href=""http://vine.co"" rel=""nofollow"">Vine -...",RT if you are as ready for summer as this pup ...,,,,https://vine.co/v/iHFqnjKVbIQ,12,10,,,,,
346,831926988323639298,8.31903e+17,20683724.0,2017-02-15 18:03:45 +0000,"<a href=""http://twitter.com/download/iphone"" r...",@UNC can confirm 12/10,,,,,12,10,,,,,
1610,685532292383666176,,,2016-01-08 18:43:29 +0000,"<a href=""http://twitter.com/download/iphone"" r...","For the last time, WE. DO. NOT. RATE. BULBASAU...",,,,https://twitter.com/dog_rates/status/685532292...,9,10,,,,,
595,798701998996647937,,,2016-11-16 01:39:30 +0000,"<a href=""http://twitter.com/download/iphone"" r...",RT @dog_rates: We normally don't rate marshmal...,7.186315e+17,4196984000.0,2016-04-09 02:47:55 +0000,https://twitter.com/dog_rates/status/718631497...,10,10,,,,,
1997,672591271085670400,,,2015-12-04 01:40:29 +0000,"<a href=""http://twitter.com/download/iphone"" r...",Lots of pups here. All are Judea Hazelnuts. Ex...,,,,https://twitter.com/dog_rates/status/672591271...,8,10,,,,,
844,766693177336135680,,,2016-08-19 17:47:52 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Brudge. He's a Doberdog. Going to be h...,,,,https://twitter.com/dog_rates/status/766693177...,11,10,Brudge,,,,


2. Use the Requests library to download the tweet image prediction (image_predictions.tsv)

In [7]:
# Use requests library to download tsv file from a website
import requests
url="https://d17h27t6h515a5.cloudfront.net/topher/2017/August/599fd2ad_image-predictions/image-predictions.tsv"
response = requests.get(url)

with open('image_predictions.tsv', 'wb') as file:
    file.write(response.content)

   #get tsv into pandas DF form called image_pred 
image_pred = pd.read_csv('image_predictions.tsv', sep='\t')
image_pred.head()

Unnamed: 0,tweet_id,jpg_url,img_num,p1,p1_conf,p1_dog,p2,p2_conf,p2_dog,p3,p3_conf,p3_dog
0,666020888022790149,https://pbs.twimg.com/media/CT4udn0WwAA0aMy.jpg,1,Welsh_springer_spaniel,0.465074,True,collie,0.156665,True,Shetland_sheepdog,0.061428,True
1,666029285002620928,https://pbs.twimg.com/media/CT42GRgUYAA5iDo.jpg,1,redbone,0.506826,True,miniature_pinscher,0.074192,True,Rhodesian_ridgeback,0.07201,True
2,666033412701032449,https://pbs.twimg.com/media/CT4521TWwAEvMyu.jpg,1,German_shepherd,0.596461,True,malinois,0.138584,True,bloodhound,0.116197,True
3,666044226329800704,https://pbs.twimg.com/media/CT5Dr8HUEAA-lEu.jpg,1,Rhodesian_ridgeback,0.408143,True,redbone,0.360687,True,miniature_pinscher,0.222752,True
4,666049248165822465,https://pbs.twimg.com/media/CT5IQmsXIAAKY4A.jpg,1,miniature_pinscher,0.560311,True,Rottweiler,0.243682,True,Doberman,0.154629,True


3. Use the Tweepy library to query additional data via the Twitter API (tweet_json.txt)

In [3]:
# I have replaced Personal API keys, secrets, and tokens with placeholders
access_token = 'MY ACCESS TOKEN'
access_secret = 'MY ACCESS SECRET'
consumer_key = 'MY CONSUMER KEY'
consumer_secret = 'MY CONSUMER SECRET'

# I have created variables or tweepy query
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)
api = tweepy.API(auth, wait_on_rate_limit = True)


# For loop made will add each available tweet to a new line of tweet_json.txt
with open('tweet_json.txt', 'a', encoding='utf8') as f:
    for tweet_id in df_1['tweet_id']:
        try:
            tweet = api.get_status(tweet_id, tweet_mode='extended')
            json.dump(tweet._json, f)
            f.write('\n')
        except:
            continue

In [9]:
# For loop to append each tweet into a list
tweets_data = []

tweet_file = open('tweet_json.txt', "r")

for line in tweet_file:
    try:
        tweet = json.loads(line)
        tweets_data.append(tweet)
    except:
        continue
        
tweet_file.close()

# Create tweet_info DataFrame
df_tweet = pd.DataFrame()

# add in variables we want to df
df_tweet['id'] = list(map(lambda tweet: tweet['id'], tweets_data))
df_tweet['retweet_count'] = list(map(lambda tweet: tweet['retweet_count'], tweets_data))
df_tweet['favorite_count'] = list(map(lambda tweet: tweet['favorite_count'], tweets_data))
df_tweet.head(4)

Unnamed: 0,id,retweet_count,favorite_count


## Assessing Data
In this section, detect and document at least **eight (8) quality issues and two (2) tidiness issue**. You must use **both** visual assessment
programmatic assessement to assess the data.

**Note:** pay attention to the following key points when you access the data.

* You only want original ratings (no retweets) that have images. Though there are 5000+ tweets in the dataset, not all are dog ratings and some are retweets.
* Assessing and cleaning the entire dataset completely would require a lot of time, and is not necessary to practice and demonstrate your skills in data wrangling. Therefore, the requirements of this project are only to assess and clean at least 8 quality issues and at least 2 tidiness issues in this dataset.
* The fact that the rating numerators are greater than the denominators does not need to be cleaned. This [unique rating system](http://knowyourmeme.com/memes/theyre-good-dogs-brent) is a big part of the popularity of WeRateDogs.
* You do not need to gather the tweets beyond August 1st, 2017. You can, but note that you won't be able to gather the image predictions for these tweets since you don't have access to the algorithm used.



### Quality issues
1. We want to remove retweets as we only want original ratings - retweets may skew our results 

2. Many dog names have None - need to replace this with NaN

3. The ID fields e.g. tweet_id, should be objects, not integers or floats because they are not numeric and aren't intended to perform calculations.

4. Dog Names has many names that are not names, many are lowercase eg 'a' and 'the' and need to be removed

5. We want to remove any reply tweets as we are only interested in orginal tweets

6. The time stamp column is in string format, it's the wrong data type - convert to date time format

7. Some tweets have rating denominators that do not equal 10 - these need to be removed

8. Tweet with ID 810984652412424192 does not have a rating and therefore needs to be removed


### Tidiness issues
1. The 'type' of dog (doggo / floofer / pupper / puppo) are all variables that have their own column. This needs to be fixed under a single variable / column named 'stage'.

2. the data sets need merging - combine tweet_df and image_predictions to twitter_archive table (df_1).



## Cleaning Data
In this section, clean **all** of the issues you documented while assessing. 

**Note:** Make a copy of the original data before cleaning. Cleaning includes merging individual pieces of data according to the rules of [tidy data](https://cran.r-project.org/web/packages/tidyr/vignettes/tidy-data.html). The result should be a high-quality and tidy master pandas DataFrame (or DataFrames, if appropriate).

In [10]:
# Make copies of original pieces of data
df_1_clean = df_1.copy()
image_pred_clean = image_pred.copy()
df_tweet_clean = df_tweet.copy()

#### Issue #1 tidy:

#### Define  The 'type' of dog (doggo / floofer / pupper / puppo) are all variables that have their own column. This needs to be fixed under a single variable / column named 'stage'.

#### Code

In [11]:
df_1_clean = df_1_clean
df_1_clean.doggo.replace('None', '', inplace=True)
df_1_clean.floofer.replace('None', '', inplace=True)
df_1_clean.pupper.replace('None', '', inplace=True)
df_1_clean.puppo.replace('None', '', inplace=True)

df_1_clean['stage'] = df_1_clean.doggo + df_1_clean.floofer + df_1_clean.pupper + df_1_clean.puppo
df_1_clean.loc[df_1_clean.stage == 'doggopupper', 'stage'] = 'doggo,pupper'
df_1_clean.loc[df_1_clean.stage == 'doggofloofer', 'stage'] = 'doggo,floofer'
df_1_clean.loc[df_1_clean.stage == 'doggopuppo', 'stage'] = 'doggo,puppo'
df_1_clean.loc[df_1_clean.stage == 'doggopupper', 'stage'] = 'doggo,pupper'

df_1_clean.drop(['doggo', 'floofer', 'pupper', 'puppo'],  axis=1)


Unnamed: 0,tweet_id,in_reply_to_status_id,in_reply_to_user_id,timestamp,source,text,retweeted_status_id,retweeted_status_user_id,retweeted_status_timestamp,expanded_urls,rating_numerator,rating_denominator,name,stage
0,892420643555336193,,,2017-08-01 16:23:56 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Phineas. He's a mystical boy. Only eve...,,,,https://twitter.com/dog_rates/status/892420643...,13,10,Phineas,
1,892177421306343426,,,2017-08-01 00:17:27 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Tilly. She's just checking pup on you....,,,,https://twitter.com/dog_rates/status/892177421...,13,10,Tilly,
2,891815181378084864,,,2017-07-31 00:18:03 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Archie. He is a rare Norwegian Pouncin...,,,,https://twitter.com/dog_rates/status/891815181...,12,10,Archie,
3,891689557279858688,,,2017-07-30 15:58:51 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Darla. She commenced a snooze mid meal...,,,,https://twitter.com/dog_rates/status/891689557...,13,10,Darla,
4,891327558926688256,,,2017-07-29 16:00:24 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Franklin. He would like you to stop ca...,,,,https://twitter.com/dog_rates/status/891327558...,12,10,Franklin,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2351,666049248165822465,,,2015-11-16 00:24:50 +0000,"<a href=""http://twitter.com/download/iphone"" r...",Here we have a 1949 1st generation vulpix. Enj...,,,,https://twitter.com/dog_rates/status/666049248...,5,10,,
2352,666044226329800704,,,2015-11-16 00:04:52 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is a purebred Piers Morgan. Loves to Netf...,,,,https://twitter.com/dog_rates/status/666044226...,6,10,a,
2353,666033412701032449,,,2015-11-15 23:21:54 +0000,"<a href=""http://twitter.com/download/iphone"" r...",Here is a very happy pup. Big fan of well-main...,,,,https://twitter.com/dog_rates/status/666033412...,9,10,a,
2354,666029285002620928,,,2015-11-15 23:05:30 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is a western brown Mitsubishi terrier. Up...,,,,https://twitter.com/dog_rates/status/666029285...,7,10,a,


#### Test

In [12]:
df_1_clean.head(5)

Unnamed: 0,tweet_id,in_reply_to_status_id,in_reply_to_user_id,timestamp,source,text,retweeted_status_id,retweeted_status_user_id,retweeted_status_timestamp,expanded_urls,rating_numerator,rating_denominator,name,doggo,floofer,pupper,puppo,stage
0,892420643555336193,,,2017-08-01 16:23:56 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Phineas. He's a mystical boy. Only eve...,,,,https://twitter.com/dog_rates/status/892420643...,13,10,Phineas,,,,,
1,892177421306343426,,,2017-08-01 00:17:27 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Tilly. She's just checking pup on you....,,,,https://twitter.com/dog_rates/status/892177421...,13,10,Tilly,,,,,
2,891815181378084864,,,2017-07-31 00:18:03 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Archie. He is a rare Norwegian Pouncin...,,,,https://twitter.com/dog_rates/status/891815181...,12,10,Archie,,,,,
3,891689557279858688,,,2017-07-30 15:58:51 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Darla. She commenced a snooze mid meal...,,,,https://twitter.com/dog_rates/status/891689557...,13,10,Darla,,,,,
4,891327558926688256,,,2017-07-29 16:00:24 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Franklin. He would like you to stop ca...,,,,https://twitter.com/dog_rates/status/891327558...,12,10,Franklin,,,,,


#### Issue #2 tidy:

#### Define  the data sets need merging - combine tweet_df and image_predictions to twitter_archive table.


#### Code

In [13]:
df_1_clean = pd.merge(left=df_1_clean, right=df_tweet_clean, left_on='tweet_id', right_on='id', how='inner')
df_1_clean = df_1_clean.merge(image_pred_clean, on='tweet_id', how='inner')
df_1_clean = df_1_clean.drop('id', axis=1)



#### Test

In [15]:
df_1_clean.info()

<class 'pandas.core.frame.DataFrame'>
Index: 0 entries
Data columns (total 31 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   in_reply_to_status_id       0 non-null      float64
 1   in_reply_to_user_id         0 non-null      float64
 2   timestamp                   0 non-null      object 
 3   source                      0 non-null      object 
 4   text                        0 non-null      object 
 5   retweeted_status_id         0 non-null      float64
 6   retweeted_status_user_id    0 non-null      float64
 7   retweeted_status_timestamp  0 non-null      object 
 8   expanded_urls               0 non-null      object 
 9   rating_numerator            0 non-null      int64  
 10  rating_denominator          0 non-null      int64  
 11  name                        0 non-null      object 
 12  doggo                       0 non-null      object 
 13  floofer                     0 non-null      object 


### Issue #1:

#### Define: remove retweets - drop all rows which contain rewteets

#### Code

In [16]:
df_1_clean[df_1_clean['retweeted_status_id'].isnull()]


Unnamed: 0,in_reply_to_status_id,in_reply_to_user_id,timestamp,source,text,retweeted_status_id,retweeted_status_user_id,retweeted_status_timestamp,expanded_urls,rating_numerator,...,img_num,p1,p1_conf,p1_dog,p2,p2_conf,p2_dog,p3,p3_conf,p3_dog


#### Test

In [17]:
df_1_clean.sample(10)


ValueError: a must be greater than 0 unless no samples are taken

### Issue #2:

#### Define  2.  Many dog names have None - need to replace this with NaN

#### Code

In [34]:
import numpy as np
df_1_clean['name'] = df_1_clean['name'].replace('None', np.NaN)


#### Test

In [35]:
df_1_clean.sample(5)

ValueError: a must be greater than 0 unless no samples are taken

### Issue #3:

#### Define  The ID fields e.g. tweet_id, should be objects, not integers or floats because they are not numeric and aren't intended to perform calculations - so will convert these into object

#### Code

In [100]:
df_1_clean.astype({'tweet_id': 'object', 'in_reply_to_status_id': 'object',  'in_reply_to_user_id': 'object'})

#### Test

In [36]:
df_1_clean.info()

<class 'pandas.core.frame.DataFrame'>
Index: 0 entries
Data columns (total 31 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   in_reply_to_status_id       0 non-null      float64
 1   in_reply_to_user_id         0 non-null      float64
 2   timestamp                   0 non-null      object 
 3   source                      0 non-null      object 
 4   text                        0 non-null      object 
 5   retweeted_status_id         0 non-null      float64
 6   retweeted_status_user_id    0 non-null      float64
 7   retweeted_status_timestamp  0 non-null      object 
 8   expanded_urls               0 non-null      object 
 9   rating_numerator            0 non-null      int64  
 10  rating_denominator          0 non-null      int64  
 11  name                        0 non-null      object 
 12  doggo                       0 non-null      object 
 13  floofer                     0 non-null      object 


#### Issue #4:

#### Define  Dog Names has many names that are not names, many are lowercase eg 'a' and 'the' and need to be removed


#### Code

In [132]:
#index for lower case names
lower_dog_name_index = df_1_clean[df_1_clean.name.str.islower()].index
#drop names
df_1_clean.drop(lower_dog_name_index, inplace=True)


#### Test

In [133]:
df_1_clean.head()

Unnamed: 0,tweet_id,in_reply_to_status_id,in_reply_to_user_id,timestamp,source,text,retweeted_status_id,retweeted_status_user_id,retweeted_status_timestamp,expanded_urls,rating_numerator,rating_denominator,name,doggo,floofer,pupper,puppo
0,892420643555336193,,,2017-08-01 16:23:56 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Phineas. He's a mystical boy. Only eve...,,,,https://twitter.com/dog_rates/status/892420643...,13,10,Phineas,,,,
1,892177421306343426,,,2017-08-01 00:17:27 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Tilly. She's just checking pup on you....,,,,https://twitter.com/dog_rates/status/892177421...,13,10,Tilly,,,,
2,891815181378084864,,,2017-07-31 00:18:03 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Archie. He is a rare Norwegian Pouncin...,,,,https://twitter.com/dog_rates/status/891815181...,12,10,Archie,,,,
3,891689557279858688,,,2017-07-30 15:58:51 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Darla. She commenced a snooze mid meal...,,,,https://twitter.com/dog_rates/status/891689557...,13,10,Darla,,,,
4,891327558926688256,,,2017-07-29 16:00:24 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Franklin. He would like you to stop ca...,,,,https://twitter.com/dog_rates/status/891327558...,12,10,Franklin,,,,


#### Issue #5:

#### Define  We want to remove any reply tweets as we are only interested in orginal tweets - drop rows with reply tweets

#### Code

In [7]:
df_1_clean = df_1_clean[df_1_clean.in_reply_to_status_id.isna()]


#### Test

In [8]:
print("number of reply tweets:  {}".format(sum(df_1_clean.in_reply_to_status_id.notnull())))


number of reply tweets:  0


#### Issue #6:

#### Define   The time stamp column is in string format, it's the wrong data type - convert to date time format

#### Code

In [11]:
df_1_clean['timestamp'] = pd.to_datetime(df_1_clean.timestamp)
df_1_clean['timestamp'] = df_1_clean.timestamp.dt.floor('s')


#### Test

In [12]:
df_1_clean.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2278 entries, 0 to 2355
Data columns (total 18 columns):
 #   Column                      Non-Null Count  Dtype              
---  ------                      --------------  -----              
 0   tweet_id                    2278 non-null   int64              
 1   in_reply_to_status_id       0 non-null      float64            
 2   in_reply_to_user_id         0 non-null      float64            
 3   timestamp                   2278 non-null   datetime64[ns, UTC]
 4   source                      2278 non-null   object             
 5   text                        2278 non-null   object             
 6   retweeted_status_id         181 non-null    float64            
 7   retweeted_status_user_id    181 non-null    float64            
 8   retweeted_status_timestamp  181 non-null    object             
 9   expanded_urls               2274 non-null   object             
 10  rating_numerator            2278 non-null   int64           

#### Issue #7:

#### Define   Some tweets have rating denominators that do not equal 10 - these need to be removed



#### Code

In [14]:
df_1_clean = df_1_clean[df_1_clean.rating_denominator == 10]


#### Test

In [15]:
df_1_clean.rating_denominator.value_counts().sort_index(ascending = False)


10    2260
Name: rating_denominator, dtype: int64

#### Issue #8

#### Define  Tweet with ID 810984652412424192 does not have a rating and therefore needs to be removed


#### Code

In [16]:
df_1_clean = df_1_clean[df_1_clean.tweet_id != 810984652412424192]


#### Test

In [17]:
df_1_clean[df_1_clean.tweet_id == 810984652412424192]


Unnamed: 0,tweet_id,in_reply_to_status_id,in_reply_to_user_id,timestamp,source,text,retweeted_status_id,retweeted_status_user_id,retweeted_status_timestamp,expanded_urls,rating_numerator,rating_denominator,name,doggo,floofer,pupper,puppo,stage


## Storing Data
Save gathered, assessed, and cleaned master dataset to a CSV file named "twitter_archive_master.csv".

In [24]:
# Save cleaned DataFrame to csv file
df_1_clean.to_csv('twitter_archive_master.csv')

## Analyzing and Visualizing Data
In this section, analyze and visualize your wrangled data. You must produce at least **three (3) insights and one (1) visualization.**

In [27]:
# create a copy of the clean data to analyse
df_clean = df_1.copy()
df_clean.sample(5)

Unnamed: 0,tweet_id,in_reply_to_status_id,in_reply_to_user_id,timestamp,source,text,retweeted_status_id,retweeted_status_user_id,retweeted_status_timestamp,expanded_urls,rating_numerator,rating_denominator,name,doggo,floofer,pupper,puppo
878,760656994973933572,,,2016-08-03 02:02:14 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Rose. Her face is stuck like that. 11/...,,,,https://twitter.com/dog_rates/status/760656994...,11,10,Rose,,,,
743,780459368902959104,,,2016-09-26 17:29:48 +0000,"<a href=""http://twitter.com/download/iphone"" r...","This is Bear. Don't worry, he's not a real bea...",,,,https://twitter.com/dog_rates/status/780459368...,11,10,Bear,,,,
552,804413760345620481,,,2016-12-01 19:56:00 +0000,"<a href=""http://twitter.com/download/iphone"" r...",RT @dog_rates: This is Rusty. He's going D1 fo...,7.84826e+17,4196984000.0,2016-10-08 18:41:19 +0000,https://twitter.com/dog_rates/status/784826020...,13,10,Rusty,,,,
2143,669970042633789440,,,2015-11-26 20:04:40 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Julio. He was one of the original Ring...,,,,https://twitter.com/dog_rates/status/669970042...,10,10,Julio,,,,
1638,684188786104872960,,,2016-01-05 01:44:52 +0000,"<a href=""http://twitter.com/download/iphone"" r...","""Yo Boomer I'm taking a selfie, grab your stic...",,,,https://twitter.com/dog_rates/status/684188786...,10,10,,,,,


### Insights:
1. What is the most common stage for a dog?

2.

3.

#### Insight 1 - What is the most common stage for a dog?

In [29]:
df_clean.stage.value_counts()


AttributeError: 'DataFrame' object has no attribute 'stage'

### Visualization