### Wrangle and Analyze Data Project Details 
1. `wrangle_act.ipynb`: code for gathering, assessing, cleaning, analyzing, and visualizing data
2. `wrangle_report.pdf` or `wrangle_report.html`: `300-600` word documentation for data wrangling steps: gather, assess, and clean 
2. `act_report.pdf` or `act_report.html`: `250` word minimum documentation of analysis and insights into final data
3. `twitter_archive_enhanced.csv`: file as given
4. `image_predictions.tsv`: file downloaded programmatically
5. `tweet_json.txt`: file constructed via API
6. `twitter_archive_master.csv`: combined and cleaned data
7. any additional files
8. At least three `3` insights and one `1` visualization must be assessed. 


### Gathering

In [5]:
import pandas as pd
import requests
import os
import logging
import sys
import json
# global logger level is configured in main()
Logger = None

df = pd.read_csv('twitter-archive-enhanced.csv')

#### Requests Library

In [17]:
folder_name = 'tweet_image_predictions'
if not os.path.exists(folder_name):
    os.makedirs(folder_name)

url = 'https://d17h27t6h515a5.cloudfront.net/topher/2017/August/599fd2ad_image-predictions/image-predictions.tsv'
response = requests.get(url)

with open(os.path.join(folder_name, url.split('/')[-1]), mode='wb') as file:
    file.write(response.content)

#### Twitter API

1. Query all of the tweet IDs in the WeRateDogs Twitter archive, printing out each tweet ID after it was queried.
2. Set the wait_on_rate_limit and wait_on_rate_limit_notify parameters to True in the tweepy.api class.
3. Tweet data is stored in JSON format by Twitter. 
4. Set the tweet_mode parameter to 'extended' in the get_status call, i.e., api.get_status(tweet_id, tweet_mode='extended').
- You only want original ratings (no retweets) that have images. 
- Though there are 5000+ tweets in the dataset, not all are dog ratings and some are retweets.

In [21]:
import tweepy
from tweepy import OAuthHandler
import json
from timeit import default_timer as timer

consumer_key = 'API_KEY'
consumer_secret = 'SECRET'
access_token = 'TOKEN'
access_secret = 'SECRET'
    
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)

api = tweepy.API(auth_handler=auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)

In [25]:
tweet_ids = df.tweet_id.values
len(tweet_ids)

count = 0
fails_dict = {}
start = timer()
with open('tweet_json.txt', 'w') as outfile:
    for tweet_id in tweet_ids:
        count += 1
        print(str(count) + ": " + str(tweet_id))
        try:
            tweet = api.get_status(tweet_id, tweet_mode='extended')
            print("Got thru!")
            json.dump(tweet._json, outfile)
            outfile.write('\n')
        except tweepy.TweepError as e:
            print("Did not get thru.")
            fails_dict[tweet_id] = e
            pass
end = timer()
print(end - start)
print(fails_dict)

1: 892420643555336193
Got thru!
2: 892177421306343426
Got thru!
3: 891815181378084864
Got thru!
4: 891689557279858688
Got thru!
5: 891327558926688256
Got thru!
6: 891087950875897856
Got thru!
7: 890971913173991426
Got thru!
8: 890729181411237888
Got thru!
9: 890609185150312448
Got thru!
10: 890240255349198849
Got thru!
11: 890006608113172480
Got thru!
12: 889880896479866881
Got thru!
13: 889665388333682689
Got thru!
14: 889638837579907072
Got thru!
15: 889531135344209921
Got thru!
16: 889278841981685760
Got thru!
17: 888917238123831296
Got thru!
18: 888804989199671297
Got thru!
19: 888554962724278272
Got thru!
20: 888202515573088257
Did not get thru.
21: 888078434458587136
Got thru!
22: 887705289381826560
Got thru!
23: 887517139158093824
Got thru!
24: 887473957103951883
Got thru!
25: 887343217045368832
Got thru!
26: 887101392804085760
Got thru!
27: 886983233522544640
Got thru!
28: 886736880519319552
Got thru!
29: 886680336477933568
Got thru!
30: 886366144734445568
Got thru!
31: 8862670

Got thru!
245: 846042936437604353
Got thru!
246: 845812042753855489
Got thru!
247: 845677943972139009
Got thru!
248: 845459076796616705
Did not get thru.
249: 845397057150107648
Got thru!
250: 845306882940190720
Got thru!
251: 845098359547420673
Got thru!
252: 844979544864018432
Got thru!
253: 844973813909606400
Got thru!
254: 844704788403113984
Did not get thru.
255: 844580511645339650
Got thru!
256: 844223788422217728
Got thru!
257: 843981021012017153
Got thru!
258: 843856843873095681
Got thru!
259: 843604394117681152
Got thru!
260: 843235543001513987
Got thru!
261: 842892208864923648
Did not get thru.
262: 842846295480000512
Got thru!
263: 842765311967449089
Got thru!
264: 842535590457499648
Got thru!
265: 842163532590374912
Got thru!
266: 842115215311396866
Got thru!
267: 841833993020538882
Got thru!
268: 841680585030541313
Got thru!
269: 841439858740625411
Got thru!
270: 841320156043304961
Got thru!
271: 841314665196081154
Got thru!
272: 841077006473256960
Got thru!
273: 840761248

Got thru!
485: 814638523311648768
Got thru!
486: 814578408554463233
Got thru!
487: 814530161257443328
Got thru!
488: 814153002265309185
Got thru!
489: 813944609378369540
Got thru!
490: 813910438903693312
Got thru!
491: 813812741911748608
Got thru!
492: 813800681631023104
Got thru!
493: 813217897535406080
Got thru!
494: 813202720496779264
Got thru!
495: 813187593374461952
Got thru!
496: 813172488309972993
Got thru!
497: 813157409116065792
Got thru!
498: 813142292504645637
Got thru!
499: 813130366689148928
Got thru!
500: 813127251579564032
Got thru!
501: 813112105746448384
Got thru!
502: 813096984823349248
Got thru!
503: 813081950185472002
Got thru!
504: 813066809284972545
Got thru!
505: 813051746834595840
Got thru!
506: 812781120811126785
Got thru!
507: 812747805718642688
Did not get thru.
508: 812709060537683968
Got thru!
509: 812503143955202048
Got thru!
510: 812466873996607488
Got thru!
511: 812372279581671427
Got thru!
512: 811985624773361665
Got thru!
513: 811744202451197953
Got th

Got thru!
726: 782722598790725632
Got thru!
727: 782598640137187329
Got thru!
728: 782305867769217024
Got thru!
729: 782021823840026624
Got thru!
730: 781955203444699136
Got thru!
731: 781661882474196992
Got thru!
732: 781655249211752448
Got thru!
733: 781524693396357120
Got thru!
734: 781308096455073793
Got thru!
735: 781251288990355457
Got thru!
736: 781163403222056960
Got thru!
737: 780931614150983680
Got thru!
738: 780858289093574656
Got thru!
739: 780800785462489090
Got thru!
740: 780601303617732608
Got thru!
741: 780543529827336192
Got thru!
742: 780496263422808064
Got thru!
743: 780476555013349377
Got thru!
744: 780459368902959104
Got thru!
745: 780192070812196864
Got thru!
746: 780092040432480260
Got thru!
747: 780074436359819264
Got thru!
748: 779834332596887552
Got thru!
749: 779377524342161408
Got thru!
750: 779124354206535695
Got thru!
751: 779123168116150273
Did not get thru.
752: 779056095788752897
Got thru!
753: 778990705243029504
Got thru!
754: 778774459159379968
Got th

Rate limit reached. Sleeping for: 749


Got thru!
902: 758474966123810816
Got thru!
903: 758467244762497024
Got thru!
904: 758405701903519748
Got thru!
905: 758355060040593408
Got thru!
906: 758099635764359168
Got thru!
907: 758041019896193024
Got thru!
908: 757741869644341248
Got thru!
909: 757729163776290825
Got thru!
910: 757725642876129280
Got thru!
911: 757611664640446465
Got thru!
912: 757597904299253760
Got thru!
913: 757596066325864448
Got thru!
914: 757400162377592832
Got thru!
915: 757393109802180609
Got thru!
916: 757354760399941633
Got thru!
917: 756998049151549440
Got thru!
918: 756939218950160384
Got thru!
919: 756651752796094464
Got thru!
920: 756526248105566208
Got thru!
921: 756303284449767430
Got thru!
922: 756288534030475264
Got thru!
923: 756275833623502848
Got thru!
924: 755955933503782912
Got thru!
925: 755206590534418437
Got thru!
926: 755110668769038337
Got thru!
927: 754874841593970688
Got thru!
928: 754856583969079297
Got thru!
929: 754747087846248448
Got thru!
930: 754482103782404096
Got thru!
931:

Got thru!
1140: 728015554473250816
Got thru!
1141: 727685679342333952
Got thru!
1142: 727644517743104000
Got thru!
1143: 727524757080539137
Got thru!
1144: 727314416056803329
Got thru!
1145: 727286334147182592
Got thru!
1146: 727175381690781696
Got thru!
1147: 727155742655025152
Got thru!
1148: 726935089318363137
Got thru!
1149: 726887082820554753
Got thru!
1150: 726828223124897792
Got thru!
1151: 726224900189511680
Got thru!
1152: 725842289046749185
Got thru!
1153: 725786712245440512
Got thru!
1154: 725729321944506368
Got thru!
1155: 725458796924002305
Got thru!
1156: 724983749226668032
Got thru!
1157: 724771698126512129
Got thru!
1158: 724405726123311104
Got thru!
1159: 724049859469295616
Got thru!
1160: 724046343203856385
Got thru!
1161: 724004602748780546
Got thru!
1162: 723912936180330496
Got thru!
1163: 723688335806480385
Got thru!
1164: 723673163800948736
Got thru!
1165: 723179728551723008
Got thru!
1166: 722974582966214656
Got thru!
1167: 722613351520608256
Got thru!
1168: 7215

Got thru!
1376: 701889187134500865
Got thru!
1377: 701805642395348998
Got thru!
1378: 701601587219795968
Got thru!
1379: 701570477911896070
Got thru!
1380: 701545186879471618
Got thru!
1381: 701214700881756160
Got thru!
1382: 700890391244103680
Got thru!
1383: 700864154249383937
Got thru!
1384: 700847567345688576
Got thru!
1385: 700796979434098688
Got thru!
1386: 700747788515020802
Got thru!
1387: 700518061187723268
Got thru!
1388: 700505138482569216
Got thru!
1389: 700462010979500032
Got thru!
1390: 700167517596164096
Got thru!
1391: 700151421916807169
Got thru!
1392: 700143752053182464
Got thru!
1393: 700062718104104960
Got thru!
1394: 700029284593901568
Got thru!
1395: 700002074055016451
Got thru!
1396: 699801817392291840
Got thru!
1397: 699788877217865730
Got thru!
1398: 699779630832685056
Got thru!
1399: 699775878809702401
Got thru!
1400: 699691744225525762
Got thru!
1401: 699446877801091073
Got thru!
1402: 699434518667751424
Got thru!
1403: 699423671849451520
Got thru!
1404: 6994

Got thru!
1611: 685532292383666176
Got thru!
1612: 685325112850124800
Got thru!
1613: 685321586178670592
Got thru!
1614: 685315239903100929
Got thru!
1615: 685307451701334016
Got thru!
1616: 685268753634967552
Got thru!
1617: 685198997565345792
Got thru!
1618: 685169283572338688
Got thru!
1619: 684969860808454144
Got thru!
1620: 684959798585110529
Got thru!
1621: 684940049151070208
Got thru!
1622: 684926975086034944
Got thru!
1623: 684914660081053696
Got thru!
1624: 684902183876321280
Got thru!
1625: 684880619965411328
Got thru!
1626: 684830982659280897
Got thru!
1627: 684800227459624960
Got thru!
1628: 684594889858887680
Got thru!
1629: 684588130326986752
Got thru!
1630: 684567543613382656
Got thru!
1631: 684538444857667585
Got thru!
1632: 684481074559381504
Got thru!
1633: 684460069371654144
Got thru!
1634: 684241637099323392
Got thru!
1635: 684225744407494656
Got thru!
1636: 684222868335505415
Got thru!
1637: 684200372118904832
Got thru!
1638: 684195085588783105
Got thru!
1639: 6841

Rate limit reached. Sleeping for: 748


Got thru!
1801: 676975532580409345
Got thru!
1802: 676957860086095872
Got thru!
1803: 676949632774234114
Got thru!
1804: 676948236477857792
Got thru!
1805: 676946864479084545
Got thru!
1806: 676942428000112642
Got thru!
1807: 676936541936185344
Got thru!
1808: 676916996760600576
Got thru!
1809: 676897532954456065
Got thru!
1810: 676864501615042560
Got thru!
1811: 676821958043033607
Got thru!
1812: 676819651066732545
Got thru!
1813: 676811746707918848
Got thru!
1814: 676776431406465024
Got thru!
1815: 676617503762681856
Got thru!
1816: 676613908052996102
Got thru!
1817: 676606785097199616
Got thru!
1818: 676603393314578432
Got thru!
1819: 676593408224403456
Got thru!
1820: 676590572941893632
Got thru!
1821: 676588346097852417
Got thru!
1822: 676582956622721024
Got thru!
1823: 676575501977128964
Got thru!
1824: 676533798876651520
Got thru!
1825: 676496375194980353
Got thru!
1826: 676470639084101634
Got thru!
1827: 676440007570247681
Got thru!
1828: 676430933382295552
Got thru!
1829: 6762

Got thru!
2036: 671735591348891648
Got thru!
2037: 671729906628341761
Got thru!
2038: 671561002136281088
Got thru!
2039: 671550332464455680
Got thru!
2040: 671547767500775424
Got thru!
2041: 671544874165002241
Got thru!
2042: 671542985629241344
Got thru!
2043: 671538301157904385
Got thru!
2044: 671536543010570240
Got thru!
2045: 671533943490011136
Got thru!
2046: 671528761649688577
Got thru!
2047: 671520732782923777
Got thru!
2048: 671518598289059840
Got thru!
2049: 671511350426865664
Got thru!
2050: 671504605491109889
Got thru!
2051: 671497587707535361
Got thru!
2052: 671488513339211776
Got thru!
2053: 671486386088865792
Got thru!
2054: 671485057807351808
Got thru!
2055: 671390180817915904
Got thru!
2056: 671362598324076544
Got thru!
2057: 671357843010908160
Got thru!
2058: 671355857343524864
Got thru!
2059: 671347597085433856
Got thru!
2060: 671186162933985280
Got thru!
2061: 671182547775299584
Got thru!
2062: 671166507850801152
Got thru!
2063: 671163268581498880
Got thru!
2064: 6711

Got thru!
2272: 667495797102141441
Got thru!
2273: 667491009379606528
Got thru!
2274: 667470559035432960
Got thru!
2275: 667455448082227200
Got thru!
2276: 667453023279554560
Got thru!
2277: 667443425659232256
Got thru!
2278: 667437278097252352
Got thru!
2279: 667435689202614272
Got thru!
2280: 667405339315146752
Got thru!
2281: 667393430834667520
Got thru!
2282: 667369227918143488
Got thru!
2283: 667211855547486208
Got thru!
2284: 667200525029539841
Got thru!
2285: 667192066997374976
Got thru!
2286: 667188689915760640
Got thru!
2287: 667182792070062081
Got thru!
2288: 667177989038297088
Got thru!
2289: 667176164155375616
Got thru!
2290: 667174963120574464
Got thru!
2291: 667171260800061440
Got thru!
2292: 667165590075940865
Got thru!
2293: 667160273090932737
Got thru!
2294: 667152164079423490
Got thru!
2295: 667138269671505920
Got thru!
2296: 667119796878725120
Got thru!
2297: 667090893657276420
Got thru!
2298: 667073648344346624
Got thru!
2299: 667070482143944705
Got thru!
2300: 6670

In [26]:
df.head()

Unnamed: 0,tweet_id,in_reply_to_status_id,in_reply_to_user_id,timestamp,source,text,retweeted_status_id,retweeted_status_user_id,retweeted_status_timestamp,expanded_urls,rating_numerator,rating_denominator,name,doggo,floofer,pupper,puppo
0,892420643555336193,,,2017-08-01 16:23:56 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Phineas. He's a mystical boy. Only eve...,,,,https://twitter.com/dog_rates/status/892420643...,13,10,Phineas,,,,
1,892177421306343426,,,2017-08-01 00:17:27 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Tilly. She's just checking pup on you....,,,,https://twitter.com/dog_rates/status/892177421...,13,10,Tilly,,,,
2,891815181378084864,,,2017-07-31 00:18:03 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Archie. He is a rare Norwegian Pouncin...,,,,https://twitter.com/dog_rates/status/891815181...,12,10,Archie,,,,
3,891689557279858688,,,2017-07-30 15:58:51 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Darla. She commenced a snooze mid meal...,,,,https://twitter.com/dog_rates/status/891689557...,13,10,Darla,,,,
4,891327558926688256,,,2017-07-29 16:00:24 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Franklin. He would like you to stop ca...,,,,https://twitter.com/dog_rates/status/891327558...,12,10,Franklin,,,,


In [33]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2356 entries, 0 to 2355
Data columns (total 17 columns):
tweet_id                      2356 non-null int64
in_reply_to_status_id         78 non-null float64
in_reply_to_user_id           78 non-null float64
timestamp                     2356 non-null object
source                        2356 non-null object
text                          2356 non-null object
retweeted_status_id           181 non-null float64
retweeted_status_user_id      181 non-null float64
retweeted_status_timestamp    181 non-null object
expanded_urls                 2297 non-null object
rating_numerator              2356 non-null int64
rating_denominator            2356 non-null int64
name                          2356 non-null object
doggo                         2356 non-null object
floofer                       2356 non-null object
pupper                        2356 non-null object
puppo                         2356 non-null object
dtypes: float64(4), int64(3), ob

In [34]:
df.describe()

Unnamed: 0,tweet_id,in_reply_to_status_id,in_reply_to_user_id,retweeted_status_id,retweeted_status_user_id,rating_numerator,rating_denominator
count,2356.0,78.0,78.0,181.0,181.0,2356.0,2356.0
mean,7.427716e+17,7.455079e+17,2.014171e+16,7.7204e+17,1.241698e+16,13.126486,10.455433
std,6.856705e+16,7.582492e+16,1.252797e+17,6.236928e+16,9.599254e+16,45.876648,6.745237
min,6.660209e+17,6.658147e+17,11856340.0,6.661041e+17,783214.0,0.0,0.0
25%,6.783989e+17,6.757419e+17,308637400.0,7.186315e+17,4196984000.0,10.0,10.0
50%,7.196279e+17,7.038708e+17,4196984000.0,7.804657e+17,4196984000.0,11.0,10.0
75%,7.993373e+17,8.257804e+17,4196984000.0,8.203146e+17,4196984000.0,12.0,10.0
max,8.924206e+17,8.862664e+17,8.405479e+17,8.87474e+17,7.874618e+17,1776.0,170.0


In [6]:
with open('tweet_json.txt') as file:
    tweet_json_list = []
    for line in file:
        tweet_json_list.append(json.loads(line)) # cite 1

In [7]:
tweet_json_list[0]

{'created_at': 'Tue Aug 01 16:23:56 +0000 2017',
 'id': 892420643555336193,
 'id_str': '892420643555336193',
 'full_text': "This is Phineas. He's a mystical boy. Only ever appears in the hole of a donut. 13/10 https://t.co/MgUWQ76dJU",
 'truncated': False,
 'display_text_range': [0, 85],
 'entities': {'hashtags': [],
  'symbols': [],
  'user_mentions': [],
  'urls': [],
  'media': [{'id': 892420639486877696,
    'id_str': '892420639486877696',
    'indices': [86, 109],
    'media_url': 'http://pbs.twimg.com/media/DGKD1-bXoAAIAUK.jpg',
    'media_url_https': 'https://pbs.twimg.com/media/DGKD1-bXoAAIAUK.jpg',
    'url': 'https://t.co/MgUWQ76dJU',
    'display_url': 'pic.twitter.com/MgUWQ76dJU',
    'expanded_url': 'https://twitter.com/dog_rates/status/892420643555336193/photo/1',
    'type': 'photo',
    'sizes': {'thumb': {'w': 150, 'h': 150, 'resize': 'crop'},
     'medium': {'w': 540, 'h': 528, 'resize': 'fit'},
     'small': {'w': 540, 'h': 528, 'resize': 'fit'},
     'large': {'w': 

In [8]:
df_json = pd.DataFrame.from_records(tweet_json_list)

In [10]:
df_json.sample(10)

Unnamed: 0,contributors,coordinates,created_at,display_text_range,entities,extended_entities,favorite_count,favorited,full_text,geo,...,quoted_status,quoted_status_id,quoted_status_id_str,quoted_status_permalink,retweet_count,retweeted,retweeted_status,source,truncated,user
184,,,Sat Apr 22 16:18:34 +0000 2017,"[0, 110]","{'hashtags': [], 'symbols': [], 'user_mentions...",,26556,False,I HEARD HE TIED HIS OWN BOWTIE MARK AND HE JUS...,,...,{'created_at': 'Sat Apr 22 05:36:05 +0000 2017...,8.556564e+17,8.556564310050611e+17,"{'url': 'https://t.co/5BEjzT2Tth', 'expanded':...",5402,False,,"<a href=""http://twitter.com/download/iphone"" r...",False,"{'id': 4196983835, 'id_str': '4196983835', 'na..."
146,,,Thu May 11 17:34:13 +0000 2017,"[0, 135]","{'hashtags': [], 'symbols': [], 'user_mentions...","{'media': [{'id': 862722516858445824, 'id_str'...",16784,False,This is Dave. He passed the h*ck out. It's bar...,,...,,,,,3447,False,,"<a href=""http://twitter.com/download/iphone"" r...",False,"{'id': 4196983835, 'id_str': '4196983835', 'na..."
490,,,Sat Dec 24 17:18:34 +0000 2016,"[0, 94]","{'hashtags': [], 'symbols': [], 'user_mentions...","{'media': [{'id': 812709052820099072, 'id_str'...",6939,False,This is Brandi and Harley. They are practicing...,,...,,,,,1512,False,,"<a href=""http://twitter.com/download/iphone"" r...",False,"{'id': 4196983835, 'id_str': '4196983835', 'na..."
1243,,,Wed Mar 16 00:37:03 +0000 2016,"[0, 140]","{'hashtags': [], 'symbols': [], 'user_mentions...","{'media': [{'id': 709901249051754496, 'id_str'...",691,False,WeRateDogs stickers are here and they're 12/10...,,...,,,,,103,False,,"<a href=""http://twitter.com/download/iphone"" r...",False,"{'id': 4196983835, 'id_str': '4196983835', 'na..."
2125,,,Thu Nov 26 05:28:02 +0000 2015,"[0, 129]","{'hashtags': [], 'symbols': [], 'user_mentions...","{'media': [{'id': 669749424290164736, 'id_str'...",264,False,Say hello to Clarence. Clarence thought he saw...,,...,,,,,65,False,,"<a href=""http://twitter.com/download/iphone"" r...",False,"{'id': 4196983835, 'id_str': '4196983835', 'na..."
145,,,Fri May 12 00:46:44 +0000 2017,"[0, 121]","{'hashtags': [], 'symbols': [], 'user_mentions...","{'media': [{'id': 862831346963447808, 'id_str'...",18842,False,This is Zooey. She's the world's biggest fan o...,,...,,,,,5016,False,,"<a href=""http://twitter.com/download/iphone"" r...",False,"{'id': 4196983835, 'id_str': '4196983835', 'na..."
966,,,Fri Jul 01 20:31:43 +0000 2016,"[0, 112]","{'hashtags': [], 'symbols': [], 'user_mentions...","{'media': [{'id': 748977397119258624, 'id_str'...",11081,False,What jokester sent in a pic without a dog in i...,,...,,,,,3527,False,,"<a href=""http://twitter.com/download/iphone"" r...",False,"{'id': 4196983835, 'id_str': '4196983835', 'na..."
655,,,Sat Oct 22 00:45:17 +0000 2016,"[0, 41]","{'hashtags': [], 'symbols': [], 'user_mentions...","{'media': [{'id': 789628648391401472, 'id_str'...",7888,False,This is Eli. He can fly. 13/10 magical af http...,,...,,,,,1879,False,,"<a href=""http://twitter.com/download/iphone"" r...",False,"{'id': 4196983835, 'id_str': '4196983835', 'na..."
31,,,Sat Jul 15 02:45:48 +0000 2017,"[0, 50]","{'hashtags': [{'text': 'BATP', 'indices': [21,...",,0,False,RT @Athletics: 12/10 #BATP https://t.co/WxwJmv...,,...,,8.860534e+17,8.860534340754719e+17,"{'url': 'https://t.co/WxwJmvjfxo', 'expanded':...",104,False,{'created_at': 'Sat Jul 15 02:44:07 +0000 2017...,"<a href=""http://twitter.com/download/iphone"" r...",False,"{'id': 4196983835, 'id_str': '4196983835', 'na..."
1248,,,Mon Mar 14 18:42:20 +0000 2016,"[0, 140]","{'hashtags': [], 'symbols': [], 'user_mentions...","{'media': [{'id': 709449595638706176, 'id_str'...",2286,False,Meet Karma. She's just a head. Lost body durin...,,...,,,,,589,False,,"<a href=""http://twitter.com/download/iphone"" r...",False,"{'id': 4196983835, 'id_str': '4196983835', 'na..."


#### Assessing Data
- Assess and clean at least **eight `8` quality issues** and **two `2` tidiness issues** in this dataset.
- Cleaning includes merging individual pieces of data according to the rules of tidy data.
- The fact that the rating numerators are greater than the denominators does not need to be cleaned. 
- This unique rating system is a big part of the popularity of WeRateDogs.
- You do not need to gather the tweets beyond August 1st, 2017. 

#### Quality

**twitter-archive-enhanced**: 
1. `NaN` values in `in_reply_to_status_id`, `in_reply_to_user_id`, `retweeted_status_id`, `retweeted_status_timestamp`
2. `None` values in `doggo`, `floofer`, `pupper`, `puppo`
3. Single letters in `name`

**tweet_json.txt**: 
4.  `None` values in `contributors`, `coordinates`, `geo`
5. `NaN` values in `extended_entities`, `quoted_status`, `quoted_status_id`, `quoted_status_id_str`, `quoted_status_permalink`, `retweeted_status`
6. 
7. 
8. 

#### Tidiness

1. the `twitter-archive-enhanced` df and `tweet_json` df need to be merged to include the most important metadata that is readily available.
2. extract `user` from string object convert to string

#### Storing, Analyzing, and Visualizing Data

In [27]:
from sqlalchemy import create_engine
engine = create_engine('sqlite:///tweets.db')

In [28]:
df.to_sql('master', engine, index=False)

In [29]:
df_gather = pd.read_sql('SELECT * FROM master', engine)

In [30]:
df_gather.head(3)

Unnamed: 0,tweet_id,in_reply_to_status_id,in_reply_to_user_id,timestamp,source,text,retweeted_status_id,retweeted_status_user_id,retweeted_status_timestamp,expanded_urls,rating_numerator,rating_denominator,name,doggo,floofer,pupper,puppo
0,892420643555336193,,,2017-08-01 16:23:56 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Phineas. He's a mystical boy. Only eve...,,,,https://twitter.com/dog_rates/status/892420643...,13,10,Phineas,,,,
1,892177421306343426,,,2017-08-01 00:17:27 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Tilly. She's just checking pup on you....,,,,https://twitter.com/dog_rates/status/892177421...,13,10,Tilly,,,,
2,891815181378084864,,,2017-07-31 00:18:03 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Archie. He is a rare Norwegian Pouncin...,,,,https://twitter.com/dog_rates/status/891815181...,12,10,Archie,,,,


Citations:
1. #https://stackoverflow.com/questions/47889565/reading-json-objects-from-text-file-into-pandas