## Wrangling details 
### Gathering Data for this Project

### Assessing Data for this Project
After gathering each of the above pieces of data, assess them visually and programmatically for quality and tidiness issues. Detect and document at least eight (8) quality issues and two (2) tidiness issues in your wrangle_act.ipynb Jupyter Notebook. To meet specifications, the issues that satisfy the Project Motivation (see the Key Points header on the previous page) must be assessed.

### Cleaning Data for this Project
Clean each of the issues you documented while assessing. Perform this cleaning in wrangle_act.ipynb as well. The result should be a high quality and tidy master pandas DataFrame (or DataFrames, if appropriate). Again, the issues that satisfy the Project Motivation must be cleaned.

## Other
Storing, Analyzing, and Visualizing Data for this Project
### Store 
the clean DataFrame(s) in a CSV file with the main one named twitter_archive_master.csv. If additional files exist because multiple tables are required for tidiness, name these files appropriately. Additionally, you may store the cleaned data in a SQLite database (which is to be submitted as well if you do).

### Analyze
and visualize your wrangled data in your wrangle_act.ipynb Jupyter Notebook. At least three (3) insights and one (1) visualization must be produced.

### Reporting
for this Project
Create a 300-600 word written report called wrangle_report.pdf or wrangle_report.html that briefly describes your wrangling efforts. This is to be framed as an internal document.

Create a 250-word-minimum written report called act_report.pdf or act_report.html that communicates the insights and displays the visualization(s) produced from your wrangled data. This is to be framed as an external document, like a blog post or magazine article, for example.

Both of these documents can be created in separate Jupyter Notebooks using the Markdown functionality of Jupyter Notebooks, then downloading those notebooks as PDF files or HTML files (see image below). You might prefer to use a word processor like Google Docs or Microsoft Word, however.
********

# Wrangle and Analyze Data
This project aims to use Twitter data to create interesting and trustworthy analysis and visualizations. 


## Table of Contents
<ul>
<li><a href="#Intro"> Part I: Introduction</a></li>
 <ul>
    <li><a href="#Datasource">1. Data Sources</a></li>
    <li><a href="#libraries"> 2. Frameworks and Libraries</a></li>
 </ul>
<li><a href="#T2">Part II: Data Wrangling</a></li>
<ul><li><a href="#T2_1">1. Data Gathering</a></li>
    <li><a href="#T2_2">2. Data Assessing</a></li>
    <li><a href="#T2_3">3. Data Cleaning</a></li>
</ul>
<li><a href="#Summary">Part III:Summary</a></li>
<ul><li><a href="#Visuals">1. Visualizations</a></li>
    <li><a href="#Conclusions">2. Conclusions</a></li>
</ul>
</ul>

<a id='Intro'></a>
## Part I: Introduction
<a id='Datasource'></a>
### 1. Data Sources
1. `twitter_archive_enhanced.csv`:   
The WeRateDogs Twitter archive [twitter_archive_enhanced.csv](https://d17h27t6h515a5.cloudfront.net/topher/2017/August/59a4e958_twitter-archive-enhanced/twitter-archive-enhanced.csv)

2. `image_predictions.tsv`:  
The tweet image predictions, i.e., what breed of dog (or other object, animal, etc.) is present in each tweet according to a neural network. This file (image_predictions.tsv) is hosted on Udacity's servers and should be downloaded programmatically using the Requests library and the [URL](https://d17h27t6h515a5.cloudfront.net/topher/2017/August/599fd2ad_image-predictions/image-predictions.tsv).

3. `tweet_json.txt`:   
Each tweet's retweet count and favorite ("like") count at minimum, and any additional data you find interesting. Using the tweet IDs in the WeRateDogs Twitter archive, query the Twitter API for each tweet's JSON data using Python's Tweepy library and store each tweet's entire set of JSON data in a file called tweet_json.txt file. Each tweet's JSON data should be written to its own line. Then read this .txt file line by line into a pandas DataFrame with (at minimum) tweet ID, retweet count, and favorite count. 

<a id='libraries'></a>
### 2. Frameworks and Libraries

In [32]:
import os
import pandas as pd
import numpy as np 
import requests 
import tweepy 
import json
import matplotlib.pyplot as plt 
import re 
import configparser
% matplotlib inline 

**** 

<a id='T2'></a>
## Part II: Data Wrangling

<a id='T2_1'></a>
### 1. Data Gathering

#### Steps:
1. Read local data `twitter_archive_enhanced.csv`.   
2. Access online `image_prediction.tsv` data using Requests library.   
3. Query Twitter API for `tweet_json.txt` data. 

#### 1.1  Read twitter_archive_enhanced data

In [18]:
df_twitter =  pd.read_csv(r'./Data/twitter-archive-enhanced.csv')
df_twitter.head(2)

Unnamed: 0,tweet_id,in_reply_to_status_id,in_reply_to_user_id,timestamp,source,text,retweeted_status_id,retweeted_status_user_id,retweeted_status_timestamp,expanded_urls,rating_numerator,rating_denominator,name,doggo,floofer,pupper,puppo
0,892420643555336193,,,2017-08-01 16:23:56 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Phineas. He's a mystical boy. Only eve...,,,,https://twitter.com/dog_rates/status/892420643...,13,10,Phineas,,,,
1,892177421306343426,,,2017-08-01 00:17:27 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Tilly. She's just checking pup on you....,,,,https://twitter.com/dog_rates/status/892177421...,13,10,Tilly,,,,


#### 1.2 Read image_prediction data

In [26]:
url = 'https://d17h27t6h515a5.cloudfront.net/topher/2017/August/599fd2ad_image-predictions/image-predictions.tsv' 

# get reponse 
response = requests.get(url)

# write to tsv file 
with open (r'./Data/image-prediction.tsv', mode ="wb") as file:
    file.write(response.content)

In [29]:
# read the tsv file into dataframe 
df_image = pd.read_csv(r'./Data/image-prediction.tsv', sep = '\t')
df_image.head(2)

Unnamed: 0,tweet_id,jpg_url,img_num,p1,p1_conf,p1_dog,p2,p2_conf,p2_dog,p3,p3_conf,p3_dog
0,666020888022790149,https://pbs.twimg.com/media/CT4udn0WwAA0aMy.jpg,1,Welsh_springer_spaniel,0.465074,True,collie,0.156665,True,Shetland_sheepdog,0.061428,True
1,666029285002620928,https://pbs.twimg.com/media/CT42GRgUYAA5iDo.jpg,1,redbone,0.506826,True,miniature_pinscher,0.074192,True,Rhodesian_ridgeback,0.07201,True


#### 1.3  tweet_json.txt from Twitter API

In [38]:
from config import consumer_key, consumer_secret, access_token, access_token_secret

# access the API 
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)

In [49]:
# get the tweet IDs in the Twitter archive then query Twitter API for tweet's JSON data 
twitter_ids = df_twitter.tweet_id.unique().tolist()

In [52]:
# -----------Query Twitter API for JSON data 
# with open(r'./Data/tweet_json.txt',"w") as file: 
#     for ele in twitter_ids: 
#         print(f"Gather id: {ele}") 
#         try:
#             #get all the twitter status 
#             tweet = api.get_status(ele, tweet_mode = "extended")
#             #dump the json data to file
#             json.dump(tweet._json, file)
#             #add a linebreak after each dump
#             file.write('\n')
#         except Exception as e:
#             print(f"Error - id: {ele}" + str(e))

Gather id: 892420643555336193
Gather id: 892177421306343426
Gather id: 891815181378084864
Gather id: 891689557279858688
Gather id: 891327558926688256
Gather id: 891087950875897856
Gather id: 890971913173991426
Gather id: 890729181411237888
Gather id: 890609185150312448
Gather id: 890240255349198849
Gather id: 890006608113172480
Gather id: 889880896479866881
Gather id: 889665388333682689
Gather id: 889638837579907072
Gather id: 889531135344209921
Gather id: 889278841981685760
Gather id: 888917238123831296
Gather id: 888804989199671297
Gather id: 888554962724278272
Gather id: 888202515573088257
Error - id: 888202515573088257[{'code': 144, 'message': 'No status found with that ID.'}]
Gather id: 888078434458587136
Gather id: 887705289381826560
Gather id: 887517139158093824
Gather id: 887473957103951883
Gather id: 887343217045368832
Gather id: 887101392804085760
Gather id: 886983233522544640
Gather id: 886736880519319552
Gather id: 886680336477933568
Gather id: 886366144734445568
Gather id:

Error - id: 845459076796616705[{'code': 144, 'message': 'No status found with that ID.'}]
Gather id: 845397057150107648
Gather id: 845306882940190720
Gather id: 845098359547420673
Gather id: 844979544864018432
Gather id: 844973813909606400
Gather id: 844704788403113984
Error - id: 844704788403113984[{'code': 144, 'message': 'No status found with that ID.'}]
Gather id: 844580511645339650
Gather id: 844223788422217728
Gather id: 843981021012017153
Gather id: 843856843873095681
Gather id: 843604394117681152
Gather id: 843235543001513987
Gather id: 842892208864923648
Error - id: 842892208864923648[{'code': 144, 'message': 'No status found with that ID.'}]
Gather id: 842846295480000512
Gather id: 842765311967449089
Gather id: 842535590457499648
Gather id: 842163532590374912
Gather id: 842115215311396866
Gather id: 841833993020538882
Gather id: 841680585030541313
Gather id: 841439858740625411
Gather id: 841320156043304961
Gather id: 841314665196081154
Gather id: 841077006473256960
Gather id:

Gather id: 813096984823349248
Gather id: 813081950185472002
Gather id: 813066809284972545
Gather id: 813051746834595840
Gather id: 812781120811126785
Gather id: 812747805718642688
Error - id: 812747805718642688[{'code': 144, 'message': 'No status found with that ID.'}]
Gather id: 812709060537683968
Gather id: 812503143955202048
Gather id: 812466873996607488
Gather id: 812372279581671427
Gather id: 811985624773361665
Gather id: 811744202451197953
Gather id: 811647686436880384
Gather id: 811627233043480576
Gather id: 811386762094317568
Gather id: 810984652412424192
Gather id: 810896069567610880
Gather id: 810657578271330305
Gather id: 810284430598270976
Gather id: 810254108431155201
Gather id: 809920764300447744
Gather id: 809808892968534016
Gather id: 809448704142938112
Gather id: 809220051211603969
Gather id: 809084759137812480
Gather id: 808838249661788160
Gather id: 808733504066486276
Gather id: 808501579447930884
Gather id: 808344865868283904
Gather id: 808134635716833280
Gather id:

Gather id: 777641927919427584
Gather id: 777621514455814149
Gather id: 777189768882946048
Gather id: 776819012571455488
Gather id: 776813020089548800
Gather id: 776477788987613185
Gather id: 776249906839351296
Gather id: 776218204058357768
Gather id: 776201521193218049
Gather id: 776113305656188928
Gather id: 776088319444877312
Gather id: 775898661951791106
Gather id: 775842724423557120
Gather id: 775733305207554048
Gather id: 775729183532220416
Gather id: 775364825476165632
Gather id: 775350846108426240
Gather id: 775096608509886464
Error - id: 775096608509886464[{'code': 144, 'message': 'No status found with that ID.'}]
Gather id: 775085132600442880
Gather id: 774757898236878852
Gather id: 774639387460112384
Gather id: 774314403806253056
Gather id: 773985732834758656
Gather id: 773922284943896577
Gather id: 773704687002451968
Gather id: 773670353721753600
Gather id: 773547596996571136
Gather id: 773336787167145985
Gather id: 773308824254029826
Gather id: 773247561583001600
Gather id:

Rate limit reached. Sleeping for: 767


Gather id: 758474966123810816
Gather id: 758467244762497024
Gather id: 758405701903519748
Gather id: 758355060040593408
Gather id: 758099635764359168
Gather id: 758041019896193024
Gather id: 757741869644341248
Gather id: 757729163776290825
Gather id: 757725642876129280
Gather id: 757611664640446465
Gather id: 757597904299253760
Gather id: 757596066325864448
Gather id: 757400162377592832
Gather id: 757393109802180609
Gather id: 757354760399941633
Gather id: 756998049151549440
Gather id: 756939218950160384
Gather id: 756651752796094464
Gather id: 756526248105566208
Gather id: 756303284449767430
Gather id: 756288534030475264
Gather id: 756275833623502848
Gather id: 755955933503782912
Gather id: 755206590534418437
Gather id: 755110668769038337
Gather id: 754874841593970688
Gather id: 754856583969079297
Gather id: 754747087846248448
Gather id: 754482103782404096
Gather id: 754449512966619136
Gather id: 754120377874386944
Gather id: 754011816964026368
Error - id: 754011816964026368[{'code': 

Gather id: 720340705894408192
Gather id: 720059472081784833
Gather id: 720043174954147842
Gather id: 719991154352222208
Gather id: 719704490224398336
Gather id: 719551379208073216
Gather id: 719367763014393856
Gather id: 719339463458033665
Gather id: 719332531645071360
Gather id: 718971898235854848
Gather id: 718939241951195136
Gather id: 718631497683582976
Gather id: 718613305783398402
Gather id: 718540630683709445
Gather id: 718460005985447936
Gather id: 718454725339934721
Gather id: 718246886998687744
Gather id: 718234618122661888
Gather id: 717841801130979328
Gather id: 717790033953034240
Gather id: 717537687239008257
Gather id: 717428917016076293
Gather id: 717421804990701568
Gather id: 717047459982213120
Gather id: 717009362452090881
Gather id: 716802964044845056
Gather id: 716791146589110272
Gather id: 716730379797970944
Gather id: 716447146686459905
Gather id: 716439118184652801
Gather id: 716285507865542656
Gather id: 716080869887381504
Gather id: 715928423106027520
Gather id:

Gather id: 696405997980676096
Gather id: 696100768806522880
Gather id: 695816827381944320
Gather id: 695794761660297217
Gather id: 695767669421768709
Gather id: 695629776980148225
Gather id: 695446424020918272
Gather id: 695409464418041856
Gather id: 695314793360662529
Gather id: 695095422348574720
Gather id: 695074328191332352
Gather id: 695064344191721472
Gather id: 695051054296211456
Gather id: 694925794720792577
Gather id: 694905863685980160
Gather id: 694669722378485760
Gather id: 694356675654983680
Gather id: 694352839993344000
Gather id: 694342028726001664
Gather id: 694329668942569472
Gather id: 694206574471057408
Gather id: 694183373896572928
Gather id: 694001791655137281
Gather id: 693993230313091072
Gather id: 693942351086120961
Gather id: 693647888581312512
Gather id: 693644216740769793
Gather id: 693642232151285760
Gather id: 693629975228977152
Gather id: 693622659251335168
Gather id: 693590843962331137
Gather id: 693582294167244802
Gather id: 693486665285931008
Gather id:

Gather id: 680115823365742593
Gather id: 680100725817409536
Gather id: 680085611152338944
Gather id: 680070545539371008
Gather id: 680055455951884288
Error - id: 680055455951884288[{'code': 144, 'message': 'No status found with that ID.'}]
Gather id: 679877062409191424
Gather id: 679872969355714560
Gather id: 679862121895714818
Gather id: 679854723806179328
Gather id: 679844490799091713
Gather id: 679828447187857408
Gather id: 679777920601223168
Gather id: 679736210798047232
Gather id: 679729593985699840
Gather id: 679722016581222400
Gather id: 679530280114372609
Gather id: 679527802031484928
Gather id: 679511351870550016
Gather id: 679503373272485890
Gather id: 679475951516934144
Gather id: 679462823135686656
Gather id: 679405845277462528
Gather id: 679158373988876288
Gather id: 679148763231985668
Gather id: 679132435750195208
Gather id: 679111216690831360
Gather id: 679062614270468097
Gather id: 679047485189439488
Gather id: 679001094530465792
Gather id: 678991772295516161
Gather id:

Rate limit reached. Sleeping for: 768


Gather id: 676975532580409345
Gather id: 676957860086095872
Gather id: 676949632774234114
Gather id: 676948236477857792
Gather id: 676946864479084545
Gather id: 676942428000112642
Gather id: 676936541936185344
Gather id: 676916996760600576
Gather id: 676897532954456065
Gather id: 676864501615042560
Gather id: 676821958043033607
Gather id: 676819651066732545
Gather id: 676811746707918848
Gather id: 676776431406465024
Gather id: 676617503762681856
Gather id: 676613908052996102
Gather id: 676606785097199616
Gather id: 676603393314578432
Gather id: 676593408224403456
Gather id: 676590572941893632
Gather id: 676588346097852417
Gather id: 676582956622721024
Gather id: 676575501977128964
Gather id: 676533798876651520
Gather id: 676496375194980353
Gather id: 676470639084101634
Gather id: 676440007570247681
Gather id: 676430933382295552
Gather id: 676263575653122048
Gather id: 676237365392908289
Gather id: 676219687039057920
Gather id: 676215927814406144
Gather id: 676191832485810177
Gather id:

Gather id: 670842764863651840
Gather id: 670840546554966016
Gather id: 670838202509447168
Gather id: 670833812859932673
Gather id: 670832455012716544
Gather id: 670826280409919488
Gather id: 670823764196741120
Gather id: 670822709593571328
Gather id: 670815497391357952
Gather id: 670811965569282048
Gather id: 670807719151067136
Gather id: 670804601705242624
Gather id: 670803562457407488
Gather id: 670797304698376195
Gather id: 670792680469889025
Gather id: 670789397210615808
Gather id: 670786190031921152
Gather id: 670783437142401025
Gather id: 670782429121134593
Gather id: 670780561024270336
Gather id: 670778058496974848
Gather id: 670764103623966721
Gather id: 670755717859713024
Gather id: 670733412878163972
Gather id: 670727704916926465
Gather id: 670717338665226240
Gather id: 670704688707301377
Gather id: 670691627984359425
Gather id: 670679630144274432
Gather id: 670676092097810432
Gather id: 670668383499735048
Gather id: 670474236058800128
Gather id: 670468609693655041
Gather id:

Gather id: 666051853826850816
Gather id: 666050758794694657
Gather id: 666049248165822465
Gather id: 666044226329800704
Gather id: 666033412701032449
Gather id: 666029285002620928
Gather id: 666020888022790149


In [None]:
created_at = 
    twitter_id = 
    retweet_count = 
    favorite_count = 
    retweeted = 
    text_range = 

In [79]:
false = False
null= np.nan
true = True
sample_object = {"created_at": "Sun Jul 30 15:58:51 +0000 2017", "id": 891689557279858688, "id_str": "891689557279858688", "full_text": "This is Darla. She commenced a snooze mid meal. 13/10 happens to the best of us https://t.co/tD36da7qLQ", "truncated": false, "display_text_range": [0, 79], "entities": {"hashtags": [], "symbols": [], "user_mentions": [], "urls": [], "media": [{"id": 891689552724799489, "id_str": "891689552724799489", "indices": [80, 103], "media_url": "http://pbs.twimg.com/media/DF_q7IAWsAEuuN8.jpg", "media_url_https": "https://pbs.twimg.com/media/DF_q7IAWsAEuuN8.jpg", "url": "https://t.co/tD36da7qLQ", "display_url": "pic.twitter.com/tD36da7qLQ", "expanded_url": "https://twitter.com/dog_rates/status/891689557279858688/photo/1", "type": "photo", "sizes": {"thumb": {"w": 150, "h": 150, "resize": "crop"}, "small": {"w": 510, "h": 680, "resize": "fit"}, "medium": {"w": 901, "h": 1200, "resize": "fit"}, "large": {"w": 1201, "h": 1600, "resize": "fit"}}}]}, "extended_entities": {"media": [{"id": 891689552724799489, "id_str": "891689552724799489", "indices": [80, 103], "media_url": "http://pbs.twimg.com/media/DF_q7IAWsAEuuN8.jpg", "media_url_https": "https://pbs.twimg.com/media/DF_q7IAWsAEuuN8.jpg", "url": "https://t.co/tD36da7qLQ", "display_url": "pic.twitter.com/tD36da7qLQ", "expanded_url": "https://twitter.com/dog_rates/status/891689557279858688/photo/1", "type": "photo", "sizes": {"thumb": {"w": 150, "h": 150, "resize": "crop"}, "small": {"w": 510, "h": 680, "resize": "fit"}, "medium": {"w": 901, "h": 1200, "resize": "fit"}, "large": {"w": 1201, "h": 1600, "resize": "fit"}}}]}, "source": "<a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a>", "in_reply_to_status_id": null, "in_reply_to_status_id_str": null, "in_reply_to_user_id": null, "in_reply_to_user_id_str": null, "in_reply_to_screen_name": null, "user": {"id": 4196983835, "id_str": "4196983835", "name": "WeRateDogs\u00ae", "screen_name": "dog_rates", "location": "\u300c DM YOUR DOGS \u300d", "description": "Your Only Source For Professional Dog Ratings Instagram and Facebook \u27aa WeRateDogs partnerships@weratedogs.com \u2800\u2800\u2800\u2800\u2800\u2800\u2800\u2800\u2800\u2800\u2800\u2800", "url": "https://t.co/Wrvtpnv7JV", "entities": {"url": {"urls": [{"url": "https://t.co/Wrvtpnv7JV", "expanded_url": "https://blacklivesmatters.carrd.co", "display_url": "blacklivesmatters.carrd.co", "indices": [0, 23]}]}, "description": {"urls": []}}, "protected": false, "followers_count": 8784297, "friends_count": 17, "listed_count": 5614, "created_at": "Sun Nov 15 21:41:29 +0000 2015", "favourites_count": 145966, "utc_offset": null, "time_zone": null, "geo_enabled": true, "verified": true, "statuses_count": 12428, "lang": null, "contributors_enabled": false, "is_translator": false, "is_translation_enabled": false, "profile_background_color": "000000", "profile_background_image_url": "http://abs.twimg.com/images/themes/theme1/bg.png", "profile_background_image_url_https": "https://abs.twimg.com/images/themes/theme1/bg.png", "profile_background_tile": false, "profile_image_url": "http://pbs.twimg.com/profile_images/1267972589722296320/XBr04M6J_normal.jpg", "profile_image_url_https": "https://pbs.twimg.com/profile_images/1267972589722296320/XBr04M6J_normal.jpg", "profile_banner_url": "https://pbs.twimg.com/profile_banners/4196983835/1591077312", "profile_link_color": "F5ABB5", "profile_sidebar_border_color": "000000", "profile_sidebar_fill_color": "000000", "profile_text_color": "000000", "profile_use_background_image": false, "has_extended_profile": false, "default_profile": false, "default_profile_image": false, "following": false, "follow_request_sent": false, "notifications": false, "translator_type": "none"}, "geo": null, "coordinates": null, "place": null, "contributors": null, "is_quote_status": false, "retweet_count": 7817, "favorite_count": 39263, "favorited": False, "retweeted": False, "possibly_sensitive": False, "possibly_sensitive_appealable": False, "lang": "en"}

In [82]:
df_api = []

with open(r'./Data/tweet_json.txt', "r") as file:
    for line in file:
        single_tweet = line["created_at"]
        
        

TypeError: string indices must be integers

<a id='T2_2'></a>
### 2. Data Assessing

**** 


<a id='T2_3'></a>
### 3. Data Cleaning

<a id='Summary'></a>
## Part III: Summary

<a id='Visuals'></a>
### 1. Visualizations

<a id='Conclusions'></a>
### 2. Conclusions


**** 