# A picture says more than a thousand hashtags

Hashtags have show an impressive story: Were they in the beginning merely a tool to index freetext, the leading pound sign (yes, that's how it was called back in the days), has risen to become a symbol of a generation that can carry powerful political messages (It can also be used for way less impactful purposes, but that's a story for another rainy day). 

Coining a hashtag can have great benefits for political parties or companies: They offer a proxy to measure a product's or campaign's impact on social media and offer an easy tool to measure trends, geographical distribution and connotations.

Sentiment analysis of tweets and other free-text messages is all abound, yet many users rely nowadays on photos to convey feelings and messages.

Here I demonstrate how to use the Instagram API and image recognition to see what motifs users associate with a hashtag. We first download the pre-trained Inception V3 network, which is trained on ImageNet to recognize 1000 different concepts. Using Instagram's API, we load images that users annotated with a certain tag, e.g. a company slogan or a political statement. These images are then analyzed with the neural network.

## Tech

Images are retrieved from [instagram](http://instagram.com) and then classified with Google's [Inception V3](https://www.tensorflow.org/tutorials/image_recognition) neural network.

The implementation is built in Python, using _requests_, _PIL_ (resp. pillow for Python 3), and, of course, Tensorflow.

> **A note on running this notebook:**
>
> It is recommended to install Tensorflow in a virtual environment. Read [this post](http://anbasile.github.io/programming/2017/06/25/jupyter-venv/) on how to run Jupyter notebooks in virtualenvs.

## Setup

### Dowload pre-trained Inception-v3 model

```sh

curl http://download.tensorflow.org/models/inception_v3_2016_08_28.tar.gz -o OUTFILE
tar -xvf OUTFILE
mv inception_v3.ckpt CHECKPOINT_DIR
 
```


In [None]:
from io import BytesIO

import requests
import numpy as np
from PIL import Image

import tensorflow.contrib.slim as slim
import tensorflow as tf
from tensorflow.contrib.slim.nets import inception

import numpy as np
from PIL import Image

Verify import:

In [None]:
eval = slim.evaluation.evaluate_once

Set constants and reset previously trained graphs:

In [None]:
checkpointfile = '/Users/andreashelfenstein/InceptV3-trained/inception_v3.ckpt'
im_size = 299 # Height and width of images for Inception

In [None]:
tf.reset_default_graph()

Create session and load pre-trained network:

In [None]:
sess = tf.Session()

inception.inception_v3.default_image_size = im_size
inception_v3 = inception.inception_v3
arg_scope = inception.inception_v3_arg_scope()
inputs = tf.placeholder(tf.float32, (None, im_size, im_size, 3))

with slim.arg_scope(arg_scope):
    logits, end_points = inception_v3(inputs, num_classes=1001,
                                                         is_training=False)

saver = tf.train.Saver()

In [None]:
saver.restore(sess,checkpointfile)

Download the plaintext labels (1000 + 1) for the ImageNet dataset from [here](https://gist.github.com/yrevar/942d3a0ac09ec9e5eb3a).

In [None]:
labels = pickle.load(urllib.request.urlopen('https://gist.githubusercontent.com/yrevar/6135f1bd8dcf2e0cc683/raw/d133d61a09d7e5a3b36b8c111a8dd5c4b5d560ee/imagenet1000_clsid_to_human.pkl') )
labels[1001] = 'unused background'

Prediction function:

In [None]:
def predict(img_array):
    """
    Predict image content with a pre-trained Inception-V3 network.
    img_array (np.array): image as numpy array of dimensions 299 * 299 * 3. 
                            RGB values are between 0 and 1 (not 255), so rescale if
                            necessary
    Returns:
    predictions(dict):   Predictions for each class
    
    See labels here:
    https://gist.github.com/yrevar/942d3a0ac09ec9e5eb3a
    """
    
    img_array = img_array.reshape(-1,299,299,3)
    predict_values, logit_values = sess.run([end_points['Predictions'], logits], feed_dict={inputs: img_array})
    return {'predict_values': predict_values, 'logit_values': logit_values}

## The Instagram API


In order to use the API, you need an Instagram account and a registered client to call the API endpoints.

**Register as a developer**

[https://www.instagram.com/developer/register/]



The API uses OAuth to authorize access. Your client/app has both a **client_id** and a **client_secret**, which you can both get from the Developer page >> [manage clients](https://www.instagram.com/developer/clients/manage/) >> Manage.

Also make sure to activate *public_scope* somewhere.

You also need a **redirect_uri**; a website to which the user is redirected to after authenticating. In this example, you can just use [localhost](localhost:5000)


In [None]:
client_id = "CLIENT_ID"
client_secret = "CLIENT_SECRET"
redirect_uri = "localhost:5000"

**Step 1: Get Code**

The request returns a website, which you can save on your local machine

In [None]:
url = "https://api.instagram.com/oauth/authorize/?client_id=%s&redirect_uri=%s&response_type=code&scope=public_content" % (client_id, redirect_uri)
response = requests.get(url)
with open('ig_OAuth.html', 'w') as f:
    f.write(response.text)

Open the file [ig_OAuth.html](file:///ig_OAuth.html) in your browser, enter your credentials and submit. If the submission fails, make sure the url in the address bar starts with _instagram.com_ and not _file:///_ or _localhost:_.

You are then redirected to *localhost:5000?code=YOUR_CODE*. Copy-paste the code into your code.

> N.B. there are probably better ways to achieve that, but for now let's just go with it

**Step 2: Use code to get access token**

Once you have your code, send the following POST request to get your access token:

In [None]:
code = "YOUR_CODE"
url =  "https://api.instagram.com/oauth/access_token"
payload = {"client_id": client_id,
    "client_secret": client_secret,
    "grant_type": 'authorization_code',
    "redirect_uri": redirect_uri,
    "code":code}

In [None]:
response = requests.post(url, payload)
id_token = response.json()
token = id_token['access_token']

Now you have a token to authenticate all your future requests. You can use it to get all images tagged with 'food', for example:

> N.B. As long as your client is in sandbox mode, you can only retrieve photos you have uploaded yourself

**Step 3: Load tagged photos**

In [None]:
tag = "food"

url = "https://api.instagram.com/v1/tags/%s/media/recent?access_token=%s" % (tag, token)
response = requests.get(url)
content = response.json()
image_urls = [post['images']['low_resolution']['url'] for post in content['data']]

## Load images and feed to the neural network

In [None]:
from matplotlib.pyplot import imshow
%matplotlib inline

for e, image_url in enumerate(image_urls):
    response = requests.get(image_url)
    img = Image.open(BytesIO(response.content))
    img = img.resize((im_size, im_size))
    img = img.convert('RGB')
    data = np.asarray(img, dtype=np.float32)
    data /= 255
    predictions = predict(data)
    predicted_label = labels[predictions['predict_values'].argmax()]
    imshow(data)
    print('Prediction for image %s: %s' % (e, predicted_label))