# Inferences from Aggregated Data

#### Welcome to the Inferences from Aggregated Data project for *Teaching Privacy*.

**This project is due on 00/00/00.**

This project will have you use Tweepy (a Python library to access the Twitter API) and pytumblr (a Python library to access the Tumblr API).

It is nearly common knowledge that companies and governments buy and sell user data. However, it is not as well known the magnitude to which to these groups collect data. Pulizter prize winning journalist, Andrew Lewis, tweeted in 2010, "If you are not paying for it, you're not the customer; you're the product being sold."

Users divulge personal data when signing up for accounts, but they can also be giving information away without being conscious of it. Companies can identify or partially identify users through their device fingerprints, with no need for an account being created or cookies being used. Device fingerprints are comprised of the device being used, its operating system, the browser, the browser version, and brower plugins among other things.

In this project, you will aggregate data from Kai Peroc from Tumblr and Twitter.

Below are their social media profiles: <br>
https://twitter.com/kaiperoc <br>
https://kaiperoc.tumblr.com

## Part 1: Setting up API

### Tumblr API

To use the Tumblr API, we will be using pytumblr. To install it run the following in your terminal: <br>
sudo pip install oauth oauth2 pytumblr

### Twitter API

This project will have you use Tweepy, a Python library to access the Twitter API. The easiest way to install is to run the following in your terminal:

*pip install tweepy*

If this does not work check out the readme file on https://github.com/tweepy/tweepy for the most up to date installation instructions. 

Now run the cell below to import the appropriate packages.

In [1]:
import pytumblr
import tweepy
from tweepy import TweepError
import json
import re

### Getting Tumblr API Keys

Now we need to get our keys.
1. Go to https://www.tumblr.com/oauth/apps. Create a tumblr account if you do not have one already or create a new one if you do not want to use the one you already have.
2. Click on 'Register application' and fill out the required fields. For the callback URL you can use https://www.google.com
3. Under your application, you should see your Oauth consumer key and your secret key. Go to https://api.tumblr.com/console and copy and paste these keys into the appropriate fields to get your oauth token and oauth secret key. <br> <br>
When using APIs that require tokens and keys for authentication, it is common practice to have your keys in a separate JSON file as to protect yourself and the application's users. Your file should not be posted on public repositories, and you should **never** share your keys. <br> <br>
4. Create a **tumblr_keys.json** file in the same format as below with your keys in the empty quotation marks: <br> <br>
{ <br>
   "consumer_key": "" <br>
   "consumer_secret":  "", <br>
   "oauth_token": "", <br>
   "oauth_secret": "" <br>
}

### Getting Twitter API Keys

1. Go to https://apps.twitter.com and click 'Sign In'. If you do not have a Twitter account or do not want to use your current Twitter account, you will have to create one.
2. Click on 'Create New App'.
3. Give your app a Name, Description and a Website. For the website you are allowed to put a placeholder such as https://www.google.com.

Create a new text file named **twitter_keys.json** with the following format:

{ <br>
   "consumer_key":"", <br>
   "consumer_secret":  "", <br>
   "access_token": "", <br>
   "access_token_secret": "" <br>
}
<br>
1. Go to your app you created in the previous step and head on to the 'Keys and Access Tokens' tab. 
2. Copy and paste the tokens and keys for the corresponding variables in your JSON file. <br>
    a. You will have to click 'Create my access token' the first time you create your app. <br>
    b. Make sure you copy and paste the tokens inside the quotation marks.
3. Run the cell below to assign your keys to the appropriate variable.

In [2]:
twitter_keys_file = 'twitter_keys.json'
tumblr_keys_file = 'tumblr_keys.json'
with open(twitter_keys_file) as file:
    twitter_keys = json.load(file)
with open(tumblr_keys_file) as file:
    tumblr_keys = json.load(file)

### Setting up Tumblr API

Now that you have setup your keys, run the cell below to esablish your Tumblr API.

In [3]:
client = pytumblr.TumblrRestClient(
    tumblr_keys['consumer_key'],
    tumblr_keys['consumer_secret'],
    tumblr_keys['oauth_token'],
    tumblr_keys['oauth_secret'],
)
client.info()

{'user': {'blogs': [{'admin': True,
    'ask': True,
    'ask_anon': True,
    'ask_page_title': 'Ask me anything',
    'can_send_fan_mail': True,
    'can_submit': True,
    'can_subscribe': False,
    'description': 'The Office, music, and friends. #CarpeDiem',
    'drafts': 0,
    'facebook': 'N',
    'facebook_opengraph_enabled': 'N',
    'followed': False,
    'followers': 396,
    'is_adult': False,
    'is_blocked_from_primary': False,
    'is_nsfw': False,
    'likes': 8807,
    'messages': 4,
    'name': 'ratchetmessiah',
    'posts': 16636,
    'primary': True,
    'queue': 0,
    'reply_conditions': '1',
    'share_likes': True,
    'submission_page_title': 'Submit a post',
    'submission_terms': {'accepted_types': ['text',
      'photo',
      'quote',
      'link',
      'video'],
     'guidelines': '',
     'tags': [],
     'title': 'Submit a post'},
    'subscribed': False,
    'title': 'Swagalicious Fergalicious Blog',
    'total_posts': 16636,
    'tweet': 'N',
    't

### Setting up Twitter API

Run the cell below to check if you have correctly set up the keys.

In [4]:
try:
    auth = tweepy.OAuthHandler(twitter_keys["api_key"], twitter_keys["api_secret"])
    auth.set_access_token(twitter_keys["access_token"], twitter_keys["access_token_secret"])
    api = tweepy.API(auth)
    print("You have correctly set up your API keys. Your username is:", api.auth.get_username())
except TweepError as e:
    print("Tweepy found an error. Revisit your keys.json file and make sure you have the correct keys.")

You have correctly set up your API keys. Your username is: returnCarlos


## Part 2: Obtaining Data

Now that you have setup the API, it is time to see what data you can obtain from their social media accounts.

Start off by grabbing Kai's Tumblr posts. 

##### Hint: Not all posts will have the same tags. Find the body or the caption. Also, use regex to remove the html tags from each post before appending it to your empty array.

In [5]:
tumblr_posts = []
tumblr_call = client.posts("kaiperoc")
for post in tumblr_call['posts']:
    regex = re.compile('<.*?>')
    if 'body' in post:
        clean_post = re.sub(regex, '', post['body'])
        tumblr_posts.append(clean_post)
    else:
        clean_post = re.sub(regex, '', post['caption'])
        tumblr_posts.append(clean_post)
tumblr_posts

['Just got an iPhone 6S; text me!\xa0605-475-6961',
 'Woo! Accepted to UC Berkeley Class of 2019! Proud to be a bear! \u202a#\u200eBerkeleyBound\u202c']

Now grab Kai's most-recent tweets and store them in an array.

In [6]:
#Social Media Site 2
twitter_posts = []
twitter_call = api.user_timeline(screen_name="kaiperoc", count=100)
for tweet in twitter_call:
    twitter_posts.append(tweet._json['text'])
twitter_posts

['Fellow incoming #berkeley #classof19 there is a great sandwich spot on Shattuck called The Sandwich Spot! http://t.co/OJXDIbU8Mp',
 "Can't wait to see it live in person!  https://t.co/5e1kePRzXK",
 "Cut out of work early last night to go to the A's game. Totally worth it! #athletics #stomper http://t.co/zOLEhPmvHD",
 'I guess the question Tilt asked all those years ago is finally answered. It was condemned. https://t.co/U8Coat3L1w #berkeleypier',
 "So proud to be a part of the class of '19! #berkeleybound https://t.co/3NwpZS7B0o",
 '@TheBerkStaff @Student_Store I love walking around this beautiful campus on a gorgeous summer day! So many school colors on show! #gobears',
 'Ugh, parking around campus. #amiright #berkeley',
 'Good stuff! https://t.co/1zTyjHoKot',
 'Have you heard about CRISPR?! It could be a cure of EVERYTHING! #science #is #rad  https://t.co/J6BKr5GH6V',
 'omgosh so scary! What if you were in the presence of this guy?! #LionsTigersAndBears #OhMy! https://t.co/5bMClgYw

## Part 3: Aggregate Data

Lorem ipsum

In [11]:
#Aggregate Data
agg_posts = []
for tweet in twitter_posts:
    agg_posts.append(tweet)
for tumblr_post in tumblr_posts:
    agg_posts.append(tumblr_post)

agg_posts

['Fellow incoming #berkeley #classof19 there is a great sandwich spot on Shattuck called The Sandwich Spot! http://t.co/OJXDIbU8Mp',
 "Can't wait to see it live in person!  https://t.co/5e1kePRzXK",
 "Cut out of work early last night to go to the A's game. Totally worth it! #athletics #stomper http://t.co/zOLEhPmvHD",
 'I guess the question Tilt asked all those years ago is finally answered. It was condemned. https://t.co/U8Coat3L1w #berkeleypier',
 "So proud to be a part of the class of '19! #berkeleybound https://t.co/3NwpZS7B0o",
 '@TheBerkStaff @Student_Store I love walking around this beautiful campus on a gorgeous summer day! So many school colors on show! #gobears',
 'Ugh, parking around campus. #amiright #berkeley',
 'Good stuff! https://t.co/1zTyjHoKot',
 'Have you heard about CRISPR?! It could be a cure of EVERYTHING! #science #is #rad  https://t.co/J6BKr5GH6V',
 'omgosh so scary! What if you were in the presence of this guy?! #LionsTigersAndBears #OhMy! https://t.co/5bMClgYw

## Part 4: New Context

Now that you have aggregated this user's data from multiple different services, what inferences can you make of them, their interests, affiliations, and personality? What has changed with this new context? How can you, or a company, use this bigger picture of the user?

### Student Answer
Things should touch on:
1. Fuller picture of user.
2. We can see they own an iPhone
3. Lorem ipsum

## Part 5: Conclusion

lorem ipsum