# Data Wrangling Template

In [17]:
import numpy as np
import pandas as pd
import requests
import os
import tweepy
from tweepy import OAuthHandler
import json
from timeit import default_timer as timer
import matplotlib.pyplot as plt

## Gather

WeRateDogs Twitter archive

- WeRateDogs Twitter archive contains basic tweet data for all 5000+ of their tweets, but not everything. One column the archive does contain though: each tweet's text, which I used to extract rating, dog name, and dog "stage" (i.e. doggo, floofer, pupper, and puppo) to make this Twitter archive "enhanced." Of the 5000+ tweets, I have filtered for tweets with ratings only (there are 2356).

In [401]:
#Import csv file
tae = pd.read_csv('twitter-archive-enhanced.csv')

In [19]:
#Create directory 
folder_name = 'image_predictions'
if not os.path.exists(folder_name):
    os.makedirs(folder_name)

In [20]:
#Download file from url
url='https://d17h27t6h515a5.cloudfront.net/topher/2017/August/599fd2ad_image-predictions/image-predictions.tsv'

response = requests.get(url)
response
with open(os.path.join(folder_name, url.split('/')[-1]), mode ='wb') as file:
    file.write(response.content)

In [21]:
#Import tsv file
ip = pd.read_csv('image_predictions/image-predictions.tsv', sep='\t')

Each tweet's retweet count and favorite ("like") count at minimum, and any additional data you find interesting. Using the tweet IDs in the WeRateDogs Twitter archive, query the Twitter API for each tweet's JSON data using Python's Tweepy library and store each tweet's entire set of JSON data in a file called tweet_json.txt file. Each tweet's JSON data should be written to its own line. Then read this .txt file line by line into a pandas DataFrame with (at minimum) tweet ID, retweet count, and favorite count. 


Image Predictions File

- The results: a table full of image predictions (the top three only) alongside each tweet ID, image URL, and the image number that corresponded to the most confident prediction (numbered 1 to 4 since tweets can have up to four images).

- tweet_id is the last part of the tweet URL after "status/" → https://twitter.com/dog_rates/status/889531135344209921
- p1 is the algorithm's #1 prediction for the image in the tweet → golden retriever
- p1_conf is how confident the algorithm is in its #1 prediction → 95%
- p1_dog is whether or not the #1 prediction is a breed of dog → TRUE
- p2 is the algorithm's second most likely prediction → Labrador retriever
- p2_conf is how confident the algorithm is in its #2 prediction → 1%
- p2_dog is whether or not the #2 prediction is a breed of dog → TRUE



## Assess

### Assessment of Twitter Archive Enhanced

In [22]:
tae.head()

Unnamed: 0,tweet_id,in_reply_to_status_id,in_reply_to_user_id,timestamp,source,text,retweeted_status_id,retweeted_status_user_id,retweeted_status_timestamp,expanded_urls,rating_numerator,rating_denominator,name,doggo,floofer,pupper,puppo
0,892420643555336193,,,2017-08-01 16:23:56 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Phineas. He's a mystical boy. Only eve...,,,,https://twitter.com/dog_rates/status/892420643...,13,10,Phineas,,,,
1,892177421306343426,,,2017-08-01 00:17:27 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Tilly. She's just checking pup on you....,,,,https://twitter.com/dog_rates/status/892177421...,13,10,Tilly,,,,
2,891815181378084864,,,2017-07-31 00:18:03 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Archie. He is a rare Norwegian Pouncin...,,,,https://twitter.com/dog_rates/status/891815181...,12,10,Archie,,,,
3,891689557279858688,,,2017-07-30 15:58:51 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Darla. She commenced a snooze mid meal...,,,,https://twitter.com/dog_rates/status/891689557...,13,10,Darla,,,,
4,891327558926688256,,,2017-07-29 16:00:24 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Franklin. He would like you to stop ca...,,,,https://twitter.com/dog_rates/status/891327558...,12,10,Franklin,,,,


In [23]:
tae.sample(5)

Unnamed: 0,tweet_id,in_reply_to_status_id,in_reply_to_user_id,timestamp,source,text,retweeted_status_id,retweeted_status_user_id,retweeted_status_timestamp,expanded_urls,rating_numerator,rating_denominator,name,doggo,floofer,pupper,puppo
1148,726887082820554753,,,2016-05-01 21:32:40 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Blitz. He's a new dad struggling to co...,,,,https://twitter.com/dog_rates/status/726887082...,10,10,Blitz,,,,
2078,670832455012716544,,,2015-11-29 05:11:35 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Amy. She is Queen Starburst. 10/10 une...,,,,https://twitter.com/dog_rates/status/670832455...,10,10,Amy,,,,
1383,700847567345688576,,,2016-02-20 01:00:55 +0000,"<a href=""http://twitter.com/download/iphone"" r...",Meet Crouton. He's a Galapagos Boonwiddle. Has...,,,,https://twitter.com/dog_rates/status/700847567...,10,10,Crouton,,,,
1941,673715861853720576,,,2015-12-07 04:09:13 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is a heavily opinionated dog. Loves walls...,,,,https://twitter.com/dog_rates/status/673715861...,4,10,a,,,,
491,813800681631023104,,,2016-12-27 17:36:16 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Sky. She's learning how to roll her R'...,,,,https://twitter.com/dog_rates/status/813800681...,12,10,Sky,,,,


In [24]:
tae.describe()

Unnamed: 0,tweet_id,in_reply_to_status_id,in_reply_to_user_id,retweeted_status_id,retweeted_status_user_id,rating_numerator,rating_denominator
count,2356.0,78.0,78.0,181.0,181.0,2356.0,2356.0
mean,7.427716e+17,7.455079e+17,2.014171e+16,7.7204e+17,1.241698e+16,13.126486,10.455433
std,6.856705e+16,7.582492e+16,1.252797e+17,6.236928e+16,9.599254e+16,45.876648,6.745237
min,6.660209e+17,6.658147e+17,11856340.0,6.661041e+17,783214.0,0.0,0.0
25%,6.783989e+17,6.757419e+17,308637400.0,7.186315e+17,4196984000.0,10.0,10.0
50%,7.196279e+17,7.038708e+17,4196984000.0,7.804657e+17,4196984000.0,11.0,10.0
75%,7.993373e+17,8.257804e+17,4196984000.0,8.203146e+17,4196984000.0,12.0,10.0
max,8.924206e+17,8.862664e+17,8.405479e+17,8.87474e+17,7.874618e+17,1776.0,170.0


In [25]:
tae.info()

# tweet_id string?
# timestamp time
# source change text
# text analysis
# retweeted timestamp time
# name analysis
# doggo, floofer, pupper, puppo --> one column

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2356 entries, 0 to 2355
Data columns (total 17 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   tweet_id                    2356 non-null   int64  
 1   in_reply_to_status_id       78 non-null     float64
 2   in_reply_to_user_id         78 non-null     float64
 3   timestamp                   2356 non-null   object 
 4   source                      2356 non-null   object 
 5   text                        2356 non-null   object 
 6   retweeted_status_id         181 non-null    float64
 7   retweeted_status_user_id    181 non-null    float64
 8   retweeted_status_timestamp  181 non-null    object 
 9   expanded_urls               2297 non-null   object 
 10  rating_numerator            2356 non-null   int64  
 11  rating_denominator          2356 non-null   int64  
 12  name                        2356 non-null   object 
 13  doggo                       2356 

### Access rows, columns and build subsets

In [26]:
# access rows
tae.iloc[1224:1225]

Unnamed: 0,tweet_id,in_reply_to_status_id,in_reply_to_user_id,timestamp,source,text,retweeted_status_id,retweeted_status_user_id,retweeted_status_timestamp,expanded_urls,rating_numerator,rating_denominator,name,doggo,floofer,pupper,puppo
1224,714214115368108032,,,2016-03-27 22:14:49 +0000,"<a href=""http://twitter.com/download/iphone"" r...",Happy Easter from the squad! 🐇🐶 13/10 for all ...,,,,https://twitter.com/dog_rates/status/714214115...,13,10,,,,,


In [286]:
# select columns in position 1, 3, 5
tae.iloc[:, [1,3,5]]

Unnamed: 0,in_reply_to_status_id,timestamp,text
0,,2017-08-01 16:23:56 +0000,This is Phineas. He's a mystical boy. Only eve...
1,,2017-08-01 00:17:27 +0000,This is Tilly. She's just checking pup on you....
2,,2017-07-31 00:18:03 +0000,This is Archie. He is a rare Norwegian Pouncin...
3,,2017-07-30 15:58:51 +0000,This is Darla. She commenced a snooze mid meal...
4,,2017-07-29 16:00:24 +0000,This is Franklin. He would like you to stop ca...
...,...,...,...
2351,,2015-11-16 00:24:50 +0000,Here we have a 1949 1st generation vulpix. Enj...
2352,,2015-11-16 00:04:52 +0000,This is a purebred Piers Morgan. Loves to Netf...
2353,,2015-11-15 23:21:54 +0000,Here is a very happy pup. Big fan of well-main...
2354,,2015-11-15 23:05:30 +0000,This is a western brown Mitsubishi terrier. Up...


In [287]:
# select rows that meet logical condition and only specific columns
tae.loc[tae['tweet_id'] > 10, ['timestamp', 'text']]

Unnamed: 0,timestamp,text
0,2017-08-01 16:23:56 +0000,This is Phineas. He's a mystical boy. Only eve...
1,2017-08-01 00:17:27 +0000,This is Tilly. She's just checking pup on you....
2,2017-07-31 00:18:03 +0000,This is Archie. He is a rare Norwegian Pouncin...
3,2017-07-30 15:58:51 +0000,This is Darla. She commenced a snooze mid meal...
4,2017-07-29 16:00:24 +0000,This is Franklin. He would like you to stop ca...
...,...,...
2351,2015-11-16 00:24:50 +0000,Here we have a 1949 1st generation vulpix. Enj...
2352,2015-11-16 00:04:52 +0000,This is a purebred Piers Morgan. Loves to Netf...
2353,2015-11-15 23:21:54 +0000,Here is a very happy pup. Big fan of well-main...
2354,2015-11-15 23:05:30 +0000,This is a western brown Mitsubishi terrier. Up...


In [27]:
# access columns
tae[['tweet_id', 'timestamp']]

Unnamed: 0,tweet_id,timestamp
0,892420643555336193,2017-08-01 16:23:56 +0000
1,892177421306343426,2017-08-01 00:17:27 +0000
2,891815181378084864,2017-07-31 00:18:03 +0000
3,891689557279858688,2017-07-30 15:58:51 +0000
4,891327558926688256,2017-07-29 16:00:24 +0000
...,...,...
2351,666049248165822465,2015-11-16 00:24:50 +0000
2352,666044226329800704,2015-11-16 00:04:52 +0000
2353,666033412701032449,2015-11-15 23:21:54 +0000
2354,666029285002620928,2015-11-15 23:05:30 +0000


In [28]:
# build subset of table
tae[['tweet_id', 'timestamp']].iloc[1224:1228]

Unnamed: 0,tweet_id,timestamp
1224,714214115368108032,2016-03-27 22:14:49 +0000
1225,714141408463036416,2016-03-27 17:25:54 +0000
1226,713919462244790272,2016-03-27 02:43:58 +0000
1227,713909862279876608,2016-03-27 02:05:49 +0000


In [31]:
columns = tae.columns
columns

Index(['tweet_id', 'in_reply_to_status_id', 'in_reply_to_user_id', 'timestamp',
       'source', 'text', 'retweeted_status_id', 'retweeted_status_user_id',
       'retweeted_status_timestamp', 'expanded_urls', 'rating_numerator',
       'rating_denominator', 'name', 'doggo', 'floofer', 'pupper', 'puppo'],
      dtype='object')

In [86]:
# difference between loc (index) and iloc (new "index")?
tae.loc[0:4]

Unnamed: 0,tweet_id,in_reply_to_status_id,in_reply_to_user_id,time,source,text,retweeted_status_id,retweeted_status_user_id,retweeted_status_timestamp,expanded_urls,rating_numerator,rating_denominator,name,doggo,floofer,pupper,puppo
0,892420643555336193,,,2017-08-01 16:23:56 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Phineas. He's a mystical boy. Only eve...,,,,https://twitter.com/dog_rates/status/892420643...,13,10,Phineas,,,,
1,892177421306343426,,,2017-08-01 00:17:27 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Tilly. She's just checking pup on you....,,,,https://twitter.com/dog_rates/status/892177421...,13,10,Tilly,,,,
2,891815181378084864,,,2017-07-31 00:18:03 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Archie. He is a rare Norwegian Pouncin...,,,,https://twitter.com/dog_rates/status/891815181...,12,10,Archie,,,,
3,891689557279858688,,,2017-07-30 15:58:51 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Darla. She commenced a snooze mid meal...,,,,https://twitter.com/dog_rates/status/891689557...,13,10,Darla,,,,
4,891327558926688256,,,2017-07-29 16:00:24 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Franklin. He would like you to stop ca...,,,,https://twitter.com/dog_rates/status/891327558...,12,10,Franklin,,,,


In [285]:
# select all columns between tweet_id and timestamp
tae.loc[:, 'tweet_id':'timestamp']

Unnamed: 0,tweet_id,in_reply_to_status_id,in_reply_to_user_id,timestamp
0,892420643555336193,,,2017-08-01 16:23:56 +0000
1,892177421306343426,,,2017-08-01 00:17:27 +0000
2,891815181378084864,,,2017-07-31 00:18:03 +0000
3,891689557279858688,,,2017-07-30 15:58:51 +0000
4,891327558926688256,,,2017-07-29 16:00:24 +0000
...,...,...,...,...
2351,666049248165822465,,,2015-11-16 00:24:50 +0000
2352,666044226329800704,,,2015-11-16 00:04:52 +0000
2353,666033412701032449,,,2015-11-15 23:21:54 +0000
2354,666029285002620928,,,2015-11-15 23:05:30 +0000


### Build dataframe

In [29]:
# build new dataframe
data = tae[['tweet_id', 'timestamp']].iloc[1224:1228]

### Query dataframe

In [30]:
# query dataframe
data.query('tweet_id == 714214115368108032')

Unnamed: 0,tweet_id,timestamp
1224,714214115368108032,2016-03-27 22:14:49 +0000


In [32]:
(len(tae)) - (tae.in_reply_to_status_id.isna().sum())

78

In [93]:
tae.query('tweet_id <= 884162670584377345')

Unnamed: 0,tweet_id,in_reply_to_status_id,in_reply_to_user_id,time,source,text,retweeted_status_id,retweeted_status_user_id,retweeted_status_timestamp,expanded_urls,rating_numerator,rating_denominator,name,doggo,floofer,pupper,puppo
43,884162670584377345,,,2017-07-09 21:29:42 +0000,"<a href=""http://twitter.com/download/iphone"" r...",Meet Yogi. He doesn't have any important dog m...,,,,https://twitter.com/dog_rates/status/884162670...,12,10,Yogi,doggo,,,
44,883838122936631299,,,2017-07-09 00:00:04 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Noah. He can't believe someone made th...,,,,https://twitter.com/dog_rates/status/883838122...,12,10,Noah,,,,
45,883482846933004288,,,2017-07-08 00:28:19 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Bella. She hopes her smile made you sm...,,,,https://twitter.com/dog_rates/status/883482846...,5,10,Bella,,,,
46,883360690899218434,,,2017-07-07 16:22:55 +0000,"<a href=""http://twitter.com/download/iphone"" r...",Meet Grizzwald. He may be the floofiest floofe...,,,,https://twitter.com/dog_rates/status/883360690...,13,10,Grizzwald,,floofer,,
47,883117836046086144,,,2017-07-07 00:17:54 +0000,"<a href=""http://twitter.com/download/iphone"" r...",Please only send dogs. We don't rate mechanics...,,,,https://twitter.com/dog_rates/status/883117836...,13,10,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2351,666049248165822465,,,2015-11-16 00:24:50 +0000,"<a href=""http://twitter.com/download/iphone"" r...",Here we have a 1949 1st generation vulpix. Enj...,,,,https://twitter.com/dog_rates/status/666049248...,5,10,,,,,
2352,666044226329800704,,,2015-11-16 00:04:52 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is a purebred Piers Morgan. Loves to Netf...,,,,https://twitter.com/dog_rates/status/666044226...,6,10,a,,,,
2353,666033412701032449,,,2015-11-15 23:21:54 +0000,"<a href=""http://twitter.com/download/iphone"" r...",Here is a very happy pup. Big fan of well-main...,,,,https://twitter.com/dog_rates/status/666033412...,9,10,a,,,,
2354,666029285002620928,,,2015-11-15 23:05:30 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is a western brown Mitsubishi terrier. Up...,,,,https://twitter.com/dog_rates/status/666029285...,7,10,a,,,,


In [100]:
tae.query("name == 'Phineas'")

Unnamed: 0,tweet_id,in_reply_to_status_id,in_reply_to_user_id,time,source,text,retweeted_status_id,retweeted_status_user_id,retweeted_status_timestamp,expanded_urls,rating_numerator,rating_denominator,name,doggo,floofer,pupper,puppo
0,892420643555336193,,,2017-08-01 16:23:56 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Phineas. He's a mystical boy. Only eve...,,,,https://twitter.com/dog_rates/status/892420643...,13,10,Phineas,,,,
2104,670668383499735048,,,2015-11-28 18:19:37 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Phineas. He's a magical dog. Only appe...,,,,https://twitter.com/dog_rates/status/670668383...,10,10,Phineas,,,,


In [267]:
tae[tae.tweet_id > 890000070584377345]

Unnamed: 0,tweet_id,in_reply_to_status_id,in_reply_to_user_id,time,source,text,retweeted_status_id,retweeted_status_user_id,retweeted_status_timestamp,expanded_urls,rating_numerator,rating_denominator,name,doggo,floofer,pupper,puppo
0,892420643555336193,,,2017-08-01 16:23:56 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Phineas. He's a mystical boy. Only eve...,,,,https://twitter.com/dog_rates/status/892420643...,13,10,Phineas,,,,
1,892177421306343426,,,2017-08-01 00:17:27 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Tilly. She's just checking pup on you....,,,,https://twitter.com/dog_rates/status/892177421...,13,10,Tilly,,,,
2,891815181378084864,,,2017-07-31 00:18:03 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Archie. He is a rare Norwegian Pouncin...,,,,https://twitter.com/dog_rates/status/891815181...,12,10,Archie,,,,
3,891689557279858688,,,2017-07-30 15:58:51 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Darla. She commenced a snooze mid meal...,,,,https://twitter.com/dog_rates/status/891689557...,13,10,Darla,,,,
4,891327558926688256,,,2017-07-29 16:00:24 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Franklin. He would like you to stop ca...,,,,https://twitter.com/dog_rates/status/891327558...,12,10,Franklin,,,,
5,891087950875897856,,,2017-07-29 00:08:17 +0000,"<a href=""http://twitter.com/download/iphone"" r...",Here we have a majestic great white breaching ...,,,,https://twitter.com/dog_rates/status/891087950...,13,10,,,,,
6,890971913173991426,,,2017-07-28 16:27:12 +0000,"<a href=""http://twitter.com/download/iphone"" r...",Meet Jax. He enjoys ice cream so much he gets ...,,,,"https://gofundme.com/ydvmve-surgery-for-jax,ht...",13,10,Jax,,,,
7,890729181411237888,,,2017-07-28 00:22:40 +0000,"<a href=""http://twitter.com/download/iphone"" r...",When you watch your owner call another dog a g...,,,,https://twitter.com/dog_rates/status/890729181...,13,10,,,,,
8,890609185150312448,,,2017-07-27 16:25:51 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Zoey. She doesn't want to be one of th...,,,,https://twitter.com/dog_rates/status/890609185...,13,10,Zoey,,,,
9,890240255349198849,,,2017-07-26 15:59:51 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Cassie. She is a college pup. Studying...,,,,https://twitter.com/dog_rates/status/890240255...,14,10,Cassie,doggo,,,


### Use loops

#### For loops

In [33]:
long_columns = []

for column in columns:
    length = len(column)
    if length > 10:
        long_columns.append(column)    

In [34]:
long_columns

['in_reply_to_status_id',
 'in_reply_to_user_id',
 'retweeted_status_id',
 'retweeted_status_user_id',
 'retweeted_status_timestamp',
 'expanded_urls',
 'rating_numerator',
 'rating_denominator']

In [35]:
id_columns = []

for column in columns:
    if 'id' in column:
        id_columns.append(column)    

In [36]:
id_columns

['tweet_id',
 'in_reply_to_status_id',
 'in_reply_to_user_id',
 'retweeted_status_id',
 'retweeted_status_user_id']

In [None]:
meet_list = []

for i in range(0, len(tae)):
    if 'meet' in tae.text.iloc[i]:
        meet = tae[['tweet_id', 'text']].iloc[i]
        meet_list.append(meet)

In [38]:
meet_list

[tweet_id                                   886680336477933568
 text        This is Derek. He's late for a dog meeting. 13...
 Name: 28, dtype: object,
 tweet_id                                   884162670584377345
 text        Meet Yogi. He doesn't have any important dog m...
 Name: 43, dtype: object,
 tweet_id                                   779123168116150273
 text        This is Reggie. He hugs everyone he meets. 12/...
 Name: 750, dtype: object,
 tweet_id                                   669353438988365824
 text        This is Tessa. She is also very pleased after ...
 Name: 2169, dtype: object,
 tweet_id                                   667806454573760512
 text        This is Filup. He is overcome with joy after f...
 Name: 2251, dtype: object]

In [64]:
meet_list

[tweet_id                                   886680336477933568
 text        This is Derek. He's late for a dog meeting. 13...
 Name: 28, dtype: object,
 tweet_id                                   884162670584377345
 text        Meet Yogi. He doesn't have any important dog m...
 Name: 43, dtype: object,
 tweet_id                                   779123168116150273
 text        This is Reggie. He hugs everyone he meets. 12/...
 Name: 750, dtype: object,
 tweet_id                                   669353438988365824
 text        This is Tessa. She is also very pleased after ...
 Name: 2169, dtype: object,
 tweet_id                                   667806454573760512
 text        This is Filup. He is overcome with joy after f...
 Name: 2251, dtype: object]

In [67]:
# create dataframe from list (column names need to match)
meet_df = pd.DataFrame(meet_list, columns=['tweet_id', 'text'])

In [39]:
meet_df = pd.DataFrame(meet_list, columns = ['tweet_id', 'text'])
meet_df

Unnamed: 0,tweet_id,text
28,886680336477933568,This is Derek. He's late for a dog meeting. 13...
43,884162670584377345,Meet Yogi. He doesn't have any important dog m...
750,779123168116150273,This is Reggie. He hugs everyone he meets. 12/...
2169,669353438988365824,This is Tessa. She is also very pleased after ...
2251,667806454573760512,This is Filup. He is overcome with joy after f...


In [40]:
for i in range(0, len(tae)):
    if 'meat' in tae.text.iloc[i]:
        print(tae.text.iloc[i])

"So... we meat again" (I'm so sorry for that pun I couldn't resist pls don't unfollow) 10/10 https://t.co/XFBrrqapZa


In [42]:
meat_list = []

for i in range(0, len(tae)):
    if 'meat' in tae.text_meat.iloc[i]:
        meat = tae[['tweet_id', 'text']].iloc[i]
        meat_list.append(meat)

In [43]:
meat_df = pd.DataFrame(meat_list, columns = ['tweet_id', 'text'])
meat_df

Unnamed: 0,tweet_id,text
28,886680336477933568,This is Derek. He's late for a dog meeting. 13...
43,884162670584377345,Meet Yogi. He doesn't have any important dog m...
750,779123168116150273,This is Reggie. He hugs everyone he meets. 12/...
995,748346686624440324,"""So... we meat again"" (I'm so sorry for that p..."
2169,669353438988365824,This is Tessa. She is also very pleased after ...
2251,667806454573760512,This is Filup. He is overcome with joy after f...


In [137]:
list = []

for i in range(0, len(tae)):
    if 'meet' in (tae.text.iloc[i]):
        list.append(tae.text.iloc[i])

list

["This is Derek. He's late for a dog meeting. 13/10 pet...al to the metal https://t.co/BCoWue0abA",
 "Meet Yogi. He doesn't have any important dog meetings today he just enjoys looking his best at all times. 12/10 for dangerously dapper doggo https://t.co/YSI00BzTBZ",
 'This is Reggie. He hugs everyone he meets. 12/10 keep spreading the love Reggie https://t.co/uMfhduaate',
 'This is Tessa. She is also very pleased after finally meeting her biological father. 10/10 https://t.co/qDS1aCqppv',
 'This is Filup. He is overcome with joy after finally meeting his father. 10/10 https://t.co/TBmDJXJB75']

In [133]:
df = pd.DataFrame(list, columns=['text'])
df

Unnamed: 0,text
0,This is Derek. He's late for a dog meeting. 13...
1,Meet Yogi. He doesn't have any important dog m...
2,This is Reggie. He hugs everyone he meets. 12/...
3,This is Tessa. She is also very pleased after ...
4,This is Filup. He is overcome with joy after f...


#### If statements

In [370]:
'9' in tae.tweet_id[0].astype(str)

True

In [387]:
tweet_id_list_8 = []
tweet_id_list_9 = []

for i in range(0, len(tae)):
    if '9' in tae.tweet_id[i].astype(str):
        tweet_id_list_9.append(tae.tweet_id[i])
    elif '8' in tae.tweet_id[i].astype(str):
        tweet_id_list_8.append(tae.tweet_id[i])
    else:
        print('Unrecognizeable')

Unrecognizeable
Unrecognizeable
Unrecognizeable
Unrecognizeable
Unrecognizeable
Unrecognizeable
Unrecognizeable
Unrecognizeable
Unrecognizeable
Unrecognizeable
Unrecognizeable
Unrecognizeable
Unrecognizeable
Unrecognizeable
Unrecognizeable
Unrecognizeable
Unrecognizeable
Unrecognizeable
Unrecognizeable
Unrecognizeable
Unrecognizeable
Unrecognizeable
Unrecognizeable
Unrecognizeable
Unrecognizeable
Unrecognizeable
Unrecognizeable
Unrecognizeable
Unrecognizeable
Unrecognizeable
Unrecognizeable
Unrecognizeable
Unrecognizeable
Unrecognizeable
Unrecognizeable
Unrecognizeable


In [385]:
len(tweet_id_list_8)

377

In [386]:
len(tweet_id_list_9)

1943

In [56]:
# count words in tae.text - list

In [406]:
text_length = []

for i in range (0, len(tae)):
    x = len(tae.text[i].split(' '))
    text_length.append(x)

In [407]:
text_length

[18,
 26,
 24,
 17,
 27,
 20,
 23,
 26,
 24,
 21,
 20,
 22,
 21,
 18,
 19,
 23,
 20,
 29,
 13,
 19,
 25,
 23,
 19,
 17,
 17,
 23,
 20,
 19,
 15,
 20,
 17,
 17,
 5,
 25,
 19,
 12,
 20,
 21,
 19,
 22,
 14,
 24,
 20,
 25,
 27,
 22,
 26,
 21,
 20,
 20,
 23,
 21,
 27,
 11,
 19,
 15,
 19,
 24,
 13,
 23,
 20,
 19,
 21,
 16,
 3,
 17,
 16,
 24,
 24,
 23,
 18,
 19,
 9,
 23,
 21,
 22,
 22,
 18,
 17,
 24,
 26,
 23,
 21,
 23,
 25,
 22,
 24,
 25,
 17,
 22,
 18,
 25,
 23,
 25,
 27,
 19,
 23,
 23,
 22,
 28,
 22,
 22,
 19,
 25,
 26,
 24,
 22,
 26,
 26,
 24,
 6,
 23,
 28,
 7,
 23,
 25,
 24,
 24,
 25,
 19,
 26,
 26,
 22,
 17,
 22,
 18,
 21,
 21,
 11,
 26,
 10,
 26,
 24,
 11,
 24,
 22,
 23,
 23,
 27,
 22,
 23,
 20,
 24,
 20,
 22,
 16,
 24,
 23,
 24,
 27,
 23,
 21,
 25,
 25,
 17,
 19,
 13,
 23,
 15,
 23,
 8,
 22,
 19,
 26,
 27,
 9,
 27,
 24,
 29,
 24,
 27,
 25,
 26,
 18,
 27,
 27,
 26,
 23,
 24,
 4,
 23,
 17,
 25,
 25,
 25,
 18,
 27,
 22,
 15,
 24,
 20,
 21,
 25,
 25,
 22,
 22,
 21,
 18,
 21,
 28,
 27,
 23

In [456]:
# count words in tae.text - dict
text_length_dict = {}

for i in range (0, len(tae)):
    x = len(tae.text[i].split(' '))   
    new_variable = x
    text_length_dict[i] = new_variable

In [457]:
text_length_dict

{0: 18,
 1: 26,
 2: 24,
 3: 17,
 4: 27,
 5: 20,
 6: 23,
 7: 26,
 8: 24,
 9: 21,
 10: 20,
 11: 22,
 12: 21,
 13: 18,
 14: 19,
 15: 23,
 16: 20,
 17: 29,
 18: 13,
 19: 19,
 20: 25,
 21: 23,
 22: 19,
 23: 17,
 24: 17,
 25: 23,
 26: 20,
 27: 19,
 28: 15,
 29: 20,
 30: 17,
 31: 17,
 32: 5,
 33: 25,
 34: 19,
 35: 12,
 36: 20,
 37: 21,
 38: 19,
 39: 22,
 40: 14,
 41: 24,
 42: 20,
 43: 25,
 44: 27,
 45: 22,
 46: 26,
 47: 21,
 48: 20,
 49: 20,
 50: 23,
 51: 21,
 52: 27,
 53: 11,
 54: 19,
 55: 15,
 56: 19,
 57: 24,
 58: 13,
 59: 23,
 60: 20,
 61: 19,
 62: 21,
 63: 16,
 64: 3,
 65: 17,
 66: 16,
 67: 24,
 68: 24,
 69: 23,
 70: 18,
 71: 19,
 72: 9,
 73: 23,
 74: 21,
 75: 22,
 76: 22,
 77: 18,
 78: 17,
 79: 24,
 80: 26,
 81: 23,
 82: 21,
 83: 23,
 84: 25,
 85: 22,
 86: 24,
 87: 25,
 88: 17,
 89: 22,
 90: 18,
 91: 25,
 92: 23,
 93: 25,
 94: 27,
 95: 19,
 96: 23,
 97: 23,
 98: 22,
 99: 28,
 100: 22,
 101: 22,
 102: 19,
 103: 25,
 104: 26,
 105: 24,
 106: 22,
 107: 26,
 108: 26,
 109: 24,
 110: 6,
 111

In [493]:
# count words in tae.text - dataframe
text_length_df = pd.DataFrame(columns=['number', 'word_count'])

for i in range (0, len(tae)):
    x = len(tae.text[i].split(' '))
    df_append = pd.DataFrame([[i, x]], columns = ['number', 'word_count'])
    text_length_df = text_length_df.append(df_append)

text_length_df

Unnamed: 0,number,word_count
0,0,18
0,1,26
0,2,24
0,3,17
0,4,27
...,...,...
0,2351,19
0,2352,22
0,2353,21
0,2354,22


### Create new columns

In [41]:
# new column needs to be created
tae.text_meat = tae.text.str.replace('meet', 'meat')

  tae.text_meat = tae.text.str.replace('meet', 'meat')


In [53]:
tae.rating_sum = tae.rating_numerator/2

  tae.rating_sum = tae.rating_numerator/2


### Rename columns

In [44]:
# rename columns
meat_df = meat_df.rename(columns={'tweet_id':'tweat_id'})
meat_df

Unnamed: 0,tweat_id,text
28,886680336477933568,This is Derek. He's late for a dog meeting. 13...
43,884162670584377345,Meet Yogi. He doesn't have any important dog m...
750,779123168116150273,This is Reggie. He hugs everyone he meets. 12/...
995,748346686624440324,"""So... we meat again"" (I'm so sorry for that p..."
2169,669353438988365824,This is Tessa. She is also very pleased after ...
2251,667806454573760512,This is Filup. He is overcome with joy after f...


In [72]:
meet_df = meet_df.rename(columns={'tweet_id':'tweat_id', 'text':'test'})
meet_df

In [45]:
# rename columns (same output)
tae.rename(columns={'timestamp':'time'}, inplace=True)
tae = tae.rename(columns={'timestamp':'time'})

### Merge columns

In [54]:
dog_columns = ['doggo', 'floofer', 'pupper', 'puppo']

for column in dog_columns:
    tae[column] = tae[column].str.replace('None','')

In [197]:
new_dog = tae[['doggo', 'floofer', 'pupper', 'puppo']]
new_dog

Unnamed: 0,doggo,floofer,pupper,puppo
0,,,,
1,,,,
2,,,,
3,,,,
4,,,,
...,...,...,...,...
2351,,,,
2352,,,,
2353,,,,
2354,,,,


In [204]:
new_dog['new_column'] = new_dog['doggo'] + new_dog['floofer'] + new_dog['pupper'] + new_dog['puppo']

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_dog['new_column'] = new_dog['doggo'] + new_dog['floofer'] + new_dog['pupper'] + new_dog['puppo']


In [208]:
new_dog['new_column'].value_counts()

                1976
pupper           245
doggo             83
puppo             29
doggopupper       12
floofer            9
doggopuppo         1
doggofloofer       1
Name: new_column, dtype: int64

In [217]:
new_dog['new_column'] = new_dog['new_column'].str.replace('doggopupper','undecided')
new_dog['new_column'] = new_dog['new_column'].str.replace('doggopuppo','undecided')
new_dog['new_column'] = new_dog['new_column'].str.replace('doggofloofer','undecided')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_dog['new_column'] = new_dog['new_column'].str.replace('doggopupper','undecided')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_dog['new_column'] = new_dog['new_column'].str.replace('doggopuppo','undecided')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_dog['new_column'] = new_dog['new

In [218]:
new_dog['new_column'].value_counts()

             1976
pupper        245
doggo          83
puppo          29
undecided      14
floofer         9
Name: new_column, dtype: int64

In [57]:
# replace NaN (as string or other method)

### Use index

In [144]:
meet_df

Unnamed: 0,tweat_id,test
28,886680336477933568,This is Derek. He's late for a dog meeting. 13...
43,884162670584377345,Meet Yogi. He doesn't have any important dog m...
750,779123168116150273,This is Reggie. He hugs everyone he meets. 12/...
2169,669353438988365824,This is Tessa. She is also very pleased after ...
2251,667806454573760512,This is Filup. He is overcome with joy after f...


In [139]:
meet_df.index

Int64Index([28, 43, 750, 2169, 2251], dtype='int64')

In [173]:
text_list = []

for i in range (0, len(meet_df.index)):
    x = meet_df.index[i]
    text_list.append(tae.text.iloc[x])

In [177]:
text = pd.DataFrame(text_list, columns=['text'])
text

Unnamed: 0,text
0,This is Derek. He's late for a dog meeting. 13...
1,Meet Yogi. He doesn't have any important dog m...
2,This is Reggie. He hugs everyone he meets. 12/...
3,This is Tessa. She is also very pleased after ...
4,This is Filup. He is overcome with joy after f...


### Create list from dataframe

In [181]:
tweet_id_list = tae.tweet_id.tolist()
tweet_id_list

[892420643555336193,
 892177421306343426,
 891815181378084864,
 891689557279858688,
 891327558926688256,
 891087950875897856,
 890971913173991426,
 890729181411237888,
 890609185150312448,
 890240255349198849,
 890006608113172480,
 889880896479866881,
 889665388333682689,
 889638837579907072,
 889531135344209921,
 889278841981685760,
 888917238123831296,
 888804989199671297,
 888554962724278272,
 888202515573088257,
 888078434458587136,
 887705289381826560,
 887517139158093824,
 887473957103951883,
 887343217045368832,
 887101392804085760,
 886983233522544640,
 886736880519319552,
 886680336477933568,
 886366144734445568,
 886267009285017600,
 886258384151887873,
 886054160059072513,
 885984800019947520,
 885528943205470208,
 885518971528720385,
 885311592912609280,
 885167619883638784,
 884925521741709313,
 884876753390489601,
 884562892145688576,
 884441805382717440,
 884247878851493888,
 884162670584377345,
 883838122936631299,
 883482846933004288,
 883360690899218434,
 883117836046

### Reshaping data

#### Melt

In [237]:
short_df = tae.query('tweet_id >= 890000070584377345')[['tweet_id', 'rating_numerator']]
short_df

Unnamed: 0,tweet_id,rating_numerator
0,892420643555336193,13
1,892177421306343426,13
2,891815181378084864,12
3,891689557279858688,13
4,891327558926688256,12
5,891087950875897856,13
6,890971913173991426,13
7,890729181411237888,13
8,890609185150312448,13
9,890240255349198849,14


In [238]:
melt_df = pd.melt(short_df)
melt_df

Unnamed: 0,variable,value
0,tweet_id,892420643555336193
1,tweet_id,892177421306343426
2,tweet_id,891815181378084864
3,tweet_id,891689557279858688
4,tweet_id,891327558926688256
5,tweet_id,891087950875897856
6,tweet_id,890971913173991426
7,tweet_id,890729181411237888
8,tweet_id,890609185150312448
9,tweet_id,890240255349198849


#### Concat

In [239]:
short_df_1 = short_df.iloc[0:5]
short_df_1

Unnamed: 0,tweet_id,rating_numerator
0,892420643555336193,13
1,892177421306343426,13
2,891815181378084864,12
3,891689557279858688,13
4,891327558926688256,12


In [240]:
short_df_2 = short_df.iloc[5:11]
short_df_2

Unnamed: 0,tweet_id,rating_numerator
5,891087950875897856,13
6,890971913173991426,13
7,890729181411237888,13
8,890609185150312448,13
9,890240255349198849,14
10,890006608113172480,13


In [243]:
concat_df = pd.concat([short_df_2, short_df_1])
concat_df

Unnamed: 0,tweet_id,rating_numerator
5,891087950875897856,13
6,890971913173991426,13
7,890729181411237888,13
8,890609185150312448,13
9,890240255349198849,14
10,890006608113172480,13
0,892420643555336193,13
1,892177421306343426,13
2,891815181378084864,12
3,891689557279858688,13


In [247]:
small_df_1 = short_df['tweet_id']
small_df_2 = short_df['rating_numerator']

In [251]:
concat_df_sideways = pd.concat([small_df_2, small_df_1], axis=1)
concat_df_sideways

Unnamed: 0,rating_numerator,tweet_id
0,13,892420643555336193
1,13,892177421306343426
2,12,891815181378084864
3,13,891689557279858688
4,12,891327558926688256
5,13,891087950875897856
6,13,890971913173991426
7,13,890729181411237888
8,13,890609185150312448
9,14,890240255349198849


#### Pivot

In [254]:
short_df.pivot(columns='rating_numerator', values='tweet_id')

rating_numerator,12,13,14
0,,8.924206e+17,
1,,8.921774e+17,
2,8.918152e+17,,
3,,8.916896e+17,
4,8.913276e+17,,
5,,8.91088e+17,
6,,8.909719e+17,
7,,8.907292e+17,
8,,8.906092e+17,
9,,,8.902403e+17


### Sort values

#### Dataframes

In [257]:
short_df.sort_values('tweet_id', ascending=True)

Unnamed: 0,tweet_id,rating_numerator
10,890006608113172480,13
9,890240255349198849,14
8,890609185150312448,13
7,890729181411237888,13
6,890971913173991426,13
5,891087950875897856,13
4,891327558926688256,12
3,891689557279858688,13
2,891815181378084864,12
1,892177421306343426,13


In [258]:
short_df.sort_index()

Unnamed: 0,tweet_id,rating_numerator
0,892420643555336193,13
1,892177421306343426,13
2,891815181378084864,12
3,891689557279858688,13
4,891327558926688256,12
5,891087950875897856,13
6,890971913173991426,13
7,890729181411237888,13
8,890609185150312448,13
9,890240255349198849,14


In [261]:
# moves index to column
longer_df = short_df.reset_index()

In [262]:
longer_df.drop(columns=['index'])

Unnamed: 0,tweet_id,rating_numerator
0,892420643555336193,13
1,892177421306343426,13
2,891815181378084864,12
3,891689557279858688,13
4,891327558926688256,12
5,891087950875897856,13
6,890971913173991426,13
7,890729181411237888,13
8,890609185150312448,13
9,890240255349198849,14


#### Lists

In [302]:
number_list = ['1', '5', '3']

['1', '5', '3']

In [305]:
sorted(number_list)

['1', '3', '5']

#### Dictonaries

In [310]:
new_dict = {'joni': 31, 'anne': 33, 'zwörgü':0.5}

In [320]:
import collections

new_dict_order = collections.OrderedDict(sorted(new_dict.items()))
new_dict_order

OrderedDict([('anne', 33), ('joni', 31), ('zwörgü', 0.5)])

In [316]:
new_dict_details = {'joni': {'age': 31, 'height':185}, 'anne': {'age': 33, 'height':165}, 'zwörgü': {'age':0.5, 'height':0.5}}
new_dict_details

{'joni': {'age': 31, 'height': 185},
 'anne': {'age': 33, 'height': 165},
 'zwörgü': {'age': 0.5, 'height': 0.5}}

### Dictonary

In [323]:
for keys, values in new_dict_order.items(): 
    print (keys, values)

anne 33
joni 31
zwörgü 0.5


In [336]:
mini_zwörgü = 0.1
new_dict['mini_zwörgü'] = mini_zwörgü
new_dict

{'joni': 31, 'anne': 33, 'zwörgü': 0.5, 'mini_zwörgü': 0.1}

In [324]:
new_dict_order.keys()

odict_keys(['anne', 'joni', 'zwörgü'])

In [325]:
new_dict_order.values()

odict_values([33, 31, 0.5])

In [326]:
new_dict_details.keys()

dict_keys(['joni', 'anne', 'zwörgü'])

In [327]:
new_dict_details.values()

dict_values([{'age': 31, 'height': 185}, {'age': 33, 'height': 165}, {'age': 0.5, 'height': 0.5}])

In [340]:
new_dict_details.items()

dict_items([('joni', {'age': 31, 'height': 185}), ('anne', {'age': 33, 'height': 165}), ('zwörgü', {'age': 0.5, 'height': 0.5}), ('mini_zwörgü', {'age': 0.1, 'height': 0.1})])

In [329]:
# access values for key
new_dict_details.get('anne')

{'age': 33, 'height': 165}

In [330]:
new_dict_details['anne']

{'age': 33, 'height': 165}

In [333]:
# create new object for dict
mini_zwörgü = {'age': 0.1, 'height':0.1}
new_dict_details['mini_zwörgü'] = mini_zwörgü
new_dict_details

{'joni': {'age': 31, 'height': 185},
 'anne': {'age': 33, 'height': 165},
 'zwörgü': {'age': 0.5, 'height': 0.5},
 'mini_zwörgü': {'age': 0.1, 'height': 0.1}}

### List

In [353]:
new_list = [1, 5, 3]

In [354]:
new_list.append(4)
new_list

[1, 5, 3, 4]

In [355]:
new_list

[1, 5, 3, 4]

In [356]:
new_list.remove(1)

In [357]:
new_list

[5, 3, 4]

### Create subsets

In [263]:
# select and order top entries
tae.nlargest(10, 'tweet_id')

Unnamed: 0,tweet_id,in_reply_to_status_id,in_reply_to_user_id,time,source,text,retweeted_status_id,retweeted_status_user_id,retweeted_status_timestamp,expanded_urls,rating_numerator,rating_denominator,name,doggo,floofer,pupper,puppo
0,892420643555336193,,,2017-08-01 16:23:56 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Phineas. He's a mystical boy. Only eve...,,,,https://twitter.com/dog_rates/status/892420643...,13,10,Phineas,,,,
1,892177421306343426,,,2017-08-01 00:17:27 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Tilly. She's just checking pup on you....,,,,https://twitter.com/dog_rates/status/892177421...,13,10,Tilly,,,,
2,891815181378084864,,,2017-07-31 00:18:03 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Archie. He is a rare Norwegian Pouncin...,,,,https://twitter.com/dog_rates/status/891815181...,12,10,Archie,,,,
3,891689557279858688,,,2017-07-30 15:58:51 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Darla. She commenced a snooze mid meal...,,,,https://twitter.com/dog_rates/status/891689557...,13,10,Darla,,,,
4,891327558926688256,,,2017-07-29 16:00:24 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Franklin. He would like you to stop ca...,,,,https://twitter.com/dog_rates/status/891327558...,12,10,Franklin,,,,
5,891087950875897856,,,2017-07-29 00:08:17 +0000,"<a href=""http://twitter.com/download/iphone"" r...",Here we have a majestic great white breaching ...,,,,https://twitter.com/dog_rates/status/891087950...,13,10,,,,,
6,890971913173991426,,,2017-07-28 16:27:12 +0000,"<a href=""http://twitter.com/download/iphone"" r...",Meet Jax. He enjoys ice cream so much he gets ...,,,,"https://gofundme.com/ydvmve-surgery-for-jax,ht...",13,10,Jax,,,,
7,890729181411237888,,,2017-07-28 00:22:40 +0000,"<a href=""http://twitter.com/download/iphone"" r...",When you watch your owner call another dog a g...,,,,https://twitter.com/dog_rates/status/890729181...,13,10,,,,,
8,890609185150312448,,,2017-07-27 16:25:51 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Zoey. She doesn't want to be one of th...,,,,https://twitter.com/dog_rates/status/890609185...,13,10,Zoey,,,,
9,890240255349198849,,,2017-07-26 15:59:51 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Cassie. She is a college pup. Studying...,,,,https://twitter.com/dog_rates/status/890240255...,14,10,Cassie,doggo,,,


In [265]:
# select and order bottom entries
tae.nsmallest(10, 'tweet_id')

Unnamed: 0,tweet_id,in_reply_to_status_id,in_reply_to_user_id,time,source,text,retweeted_status_id,retweeted_status_user_id,retweeted_status_timestamp,expanded_urls,rating_numerator,rating_denominator,name,doggo,floofer,pupper,puppo
2355,666020888022790149,,,2015-11-15 22:32:08 +0000,"<a href=""http://twitter.com/download/iphone"" r...",Here we have a Japanese Irish Setter. Lost eye...,,,,https://twitter.com/dog_rates/status/666020888...,8,10,,,,,
2354,666029285002620928,,,2015-11-15 23:05:30 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is a western brown Mitsubishi terrier. Up...,,,,https://twitter.com/dog_rates/status/666029285...,7,10,a,,,,
2353,666033412701032449,,,2015-11-15 23:21:54 +0000,"<a href=""http://twitter.com/download/iphone"" r...",Here is a very happy pup. Big fan of well-main...,,,,https://twitter.com/dog_rates/status/666033412...,9,10,a,,,,
2352,666044226329800704,,,2015-11-16 00:04:52 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is a purebred Piers Morgan. Loves to Netf...,,,,https://twitter.com/dog_rates/status/666044226...,6,10,a,,,,
2351,666049248165822465,,,2015-11-16 00:24:50 +0000,"<a href=""http://twitter.com/download/iphone"" r...",Here we have a 1949 1st generation vulpix. Enj...,,,,https://twitter.com/dog_rates/status/666049248...,5,10,,,,,
2350,666050758794694657,,,2015-11-16 00:30:50 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is a truly beautiful English Wilson Staff...,,,,https://twitter.com/dog_rates/status/666050758...,10,10,a,,,,
2349,666051853826850816,,,2015-11-16 00:35:11 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is an odd dog. Hard on the outside but lo...,,,,https://twitter.com/dog_rates/status/666051853...,2,10,an,,,,
2348,666055525042405380,,,2015-11-16 00:49:46 +0000,"<a href=""http://twitter.com/download/iphone"" r...",Here is a Siberian heavily armored polar bear ...,,,,https://twitter.com/dog_rates/status/666055525...,10,10,a,,,,
2347,666057090499244032,,,2015-11-16 00:55:59 +0000,"<a href=""http://twitter.com/download/iphone"" r...",My oh my. This is a rare blond Canadian terrie...,,,,https://twitter.com/dog_rates/status/666057090...,9,10,a,,,,
2346,666058600524156928,,,2015-11-16 01:01:59 +0000,"<a href=""http://twitter.com/download/iphone"" r...",Here is the Rand Paul of retrievers folks! He'...,,,,https://twitter.com/dog_rates/status/666058600...,8,10,the,,,,


In [266]:
# query
tae[tae.tweet_id > 890000070584377345]

Unnamed: 0,tweet_id,in_reply_to_status_id,in_reply_to_user_id,time,source,text,retweeted_status_id,retweeted_status_user_id,retweeted_status_timestamp,expanded_urls,rating_numerator,rating_denominator,name,doggo,floofer,pupper,puppo
0,892420643555336193,,,2017-08-01 16:23:56 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Phineas. He's a mystical boy. Only eve...,,,,https://twitter.com/dog_rates/status/892420643...,13,10,Phineas,,,,
1,892177421306343426,,,2017-08-01 00:17:27 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Tilly. She's just checking pup on you....,,,,https://twitter.com/dog_rates/status/892177421...,13,10,Tilly,,,,
2,891815181378084864,,,2017-07-31 00:18:03 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Archie. He is a rare Norwegian Pouncin...,,,,https://twitter.com/dog_rates/status/891815181...,12,10,Archie,,,,
3,891689557279858688,,,2017-07-30 15:58:51 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Darla. She commenced a snooze mid meal...,,,,https://twitter.com/dog_rates/status/891689557...,13,10,Darla,,,,
4,891327558926688256,,,2017-07-29 16:00:24 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Franklin. He would like you to stop ca...,,,,https://twitter.com/dog_rates/status/891327558...,12,10,Franklin,,,,
5,891087950875897856,,,2017-07-29 00:08:17 +0000,"<a href=""http://twitter.com/download/iphone"" r...",Here we have a majestic great white breaching ...,,,,https://twitter.com/dog_rates/status/891087950...,13,10,,,,,
6,890971913173991426,,,2017-07-28 16:27:12 +0000,"<a href=""http://twitter.com/download/iphone"" r...",Meet Jax. He enjoys ice cream so much he gets ...,,,,"https://gofundme.com/ydvmve-surgery-for-jax,ht...",13,10,Jax,,,,
7,890729181411237888,,,2017-07-28 00:22:40 +0000,"<a href=""http://twitter.com/download/iphone"" r...",When you watch your owner call another dog a g...,,,,https://twitter.com/dog_rates/status/890729181...,13,10,,,,,
8,890609185150312448,,,2017-07-27 16:25:51 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Zoey. She doesn't want to be one of th...,,,,https://twitter.com/dog_rates/status/890609185...,13,10,Zoey,,,,
9,890240255349198849,,,2017-07-26 15:59:51 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Cassie. She is a college pup. Studying...,,,,https://twitter.com/dog_rates/status/890240255...,14,10,Cassie,doggo,,,


In [268]:
tae.drop_duplicates()

Unnamed: 0,tweet_id,in_reply_to_status_id,in_reply_to_user_id,time,source,text,retweeted_status_id,retweeted_status_user_id,retweeted_status_timestamp,expanded_urls,rating_numerator,rating_denominator,name,doggo,floofer,pupper,puppo
0,892420643555336193,,,2017-08-01 16:23:56 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Phineas. He's a mystical boy. Only eve...,,,,https://twitter.com/dog_rates/status/892420643...,13,10,Phineas,,,,
1,892177421306343426,,,2017-08-01 00:17:27 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Tilly. She's just checking pup on you....,,,,https://twitter.com/dog_rates/status/892177421...,13,10,Tilly,,,,
2,891815181378084864,,,2017-07-31 00:18:03 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Archie. He is a rare Norwegian Pouncin...,,,,https://twitter.com/dog_rates/status/891815181...,12,10,Archie,,,,
3,891689557279858688,,,2017-07-30 15:58:51 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Darla. She commenced a snooze mid meal...,,,,https://twitter.com/dog_rates/status/891689557...,13,10,Darla,,,,
4,891327558926688256,,,2017-07-29 16:00:24 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Franklin. He would like you to stop ca...,,,,https://twitter.com/dog_rates/status/891327558...,12,10,Franklin,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2351,666049248165822465,,,2015-11-16 00:24:50 +0000,"<a href=""http://twitter.com/download/iphone"" r...",Here we have a 1949 1st generation vulpix. Enj...,,,,https://twitter.com/dog_rates/status/666049248...,5,10,,,,,
2352,666044226329800704,,,2015-11-16 00:04:52 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is a purebred Piers Morgan. Loves to Netf...,,,,https://twitter.com/dog_rates/status/666044226...,6,10,a,,,,
2353,666033412701032449,,,2015-11-15 23:21:54 +0000,"<a href=""http://twitter.com/download/iphone"" r...",Here is a very happy pup. Big fan of well-main...,,,,https://twitter.com/dog_rates/status/666033412...,9,10,a,,,,
2354,666029285002620928,,,2015-11-15 23:05:30 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is a western brown Mitsubishi terrier. Up...,,,,https://twitter.com/dog_rates/status/666029285...,7,10,a,,,,


In [281]:
tae.filter(regex='id$') # how it ends (columns)
tae.filter(regex='^r') # how it begins (columns)
tae.filter(regex='^r[1-5]$') # how it begins r and ends between 1-5 (columns)
tae.filter(regex='^(?!Species$).*') # except for species


Unnamed: 0,tweet_id,in_reply_to_status_id,in_reply_to_user_id,time,source,text,retweeted_status_id,retweeted_status_user_id,retweeted_status_timestamp,expanded_urls,rating_numerator,rating_denominator,name,doggo,floofer,pupper,puppo
0,892420643555336193,,,2017-08-01 16:23:56 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Phineas. He's a mystical boy. Only eve...,,,,https://twitter.com/dog_rates/status/892420643...,13,10,Phineas,,,,
1,892177421306343426,,,2017-08-01 00:17:27 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Tilly. She's just checking pup on you....,,,,https://twitter.com/dog_rates/status/892177421...,13,10,Tilly,,,,
2,891815181378084864,,,2017-07-31 00:18:03 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Archie. He is a rare Norwegian Pouncin...,,,,https://twitter.com/dog_rates/status/891815181...,12,10,Archie,,,,
3,891689557279858688,,,2017-07-30 15:58:51 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Darla. She commenced a snooze mid meal...,,,,https://twitter.com/dog_rates/status/891689557...,13,10,Darla,,,,
4,891327558926688256,,,2017-07-29 16:00:24 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Franklin. He would like you to stop ca...,,,,https://twitter.com/dog_rates/status/891327558...,12,10,Franklin,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2351,666049248165822465,,,2015-11-16 00:24:50 +0000,"<a href=""http://twitter.com/download/iphone"" r...",Here we have a 1949 1st generation vulpix. Enj...,,,,https://twitter.com/dog_rates/status/666049248...,5,10,,,,,
2352,666044226329800704,,,2015-11-16 00:04:52 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is a purebred Piers Morgan. Loves to Netf...,,,,https://twitter.com/dog_rates/status/666044226...,6,10,a,,,,
2353,666033412701032449,,,2015-11-15 23:21:54 +0000,"<a href=""http://twitter.com/download/iphone"" r...",Here is a very happy pup. Big fan of well-main...,,,,https://twitter.com/dog_rates/status/666033412...,9,10,a,,,,
2354,666029285002620928,,,2015-11-15 23:05:30 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is a western brown Mitsubishi terrier. Up...,,,,https://twitter.com/dog_rates/status/666029285...,7,10,a,,,,


### To Dos

In [47]:
# create dict
# access different parts of dict

In [48]:
# timestamp delete first part

In [49]:
# last columns replace none with nan
# use one part while concat

In [50]:
# put two columns together

In [51]:
# change order of columns

In [52]:
# order rows with criteria

In [None]:
# write functions

In [None]:
# delete columns or rows

In [None]:
# change types (astype)

### Exercise

In [None]:
# create dataframe from list

In [None]:
# rename columns

In [None]:
# logic operators (in, not)

In [None]:
# query (string and no string)

In [None]:
# replace

In [None]:
# access column and columns

In [None]:
# access row and rows

In [None]:
# access index

### Issues Twitter Archive Enhanced

#### Tidiness

#### Quality


### Issues Image Predictions

#### Tidiness

#### Quality


# Clean


## Clean Twitter Archive Enhanced

#### Define

#### Code

#### Test

# Store, Analyze, and Visualize