#Getting data from Instagram and pulling the data into a Pandas DataFrame

Here we'll be creating an API request to exract data from instagram. 

In [1]:
import requests
from pandas.io.json import json_normalize
import pandas as pd

Get your "CLIENT_ID" form your Instagram developer account [here](https://instagram.com/developer/).

Using the request import to pull in the data and json to conver the data into a readable format.

In [2]:
base_url = "https://api.instagram.com/v1"
CLIENT_ID = '768fcf1f36c94eb08506bae0a9caffa3' # not a valid client id
query = 'nyc'

url = '{0}/tags/{1}/media/recent?client_id={2}&count=30'.format(
    base_url, query, CLIENT_ID)

r = requests.get(url)
j = r.json()

The 'j' dictionary contains 3 dictionaries: pagination, meta and data. Everything we need is in "data". Let's pull that.

In [3]:
j.keys()

[u'pagination', u'meta', u'data']

In [4]:
j['data'] # hidding the output. this line displays a nested list of dictionaries

[{u'attribution': None,
  u'caption': {u'created_time': u'1443044203',
   u'from': {u'full_name': u'ronjansensolis',
    u'id': u'22652636',
    u'profile_picture': u'https://scontent.cdninstagram.com/hphotos-xtf1/t51.2885-19/10543992_264552953742412_1358742189_a.jpg',
    u'username': u'ronjansensolis'},
   u'id': u'1080655560584016946',
   u'text': u'Hey there stranger!!! Let me reintroduce myself... #fatgurlproblems #nyc #whatsmymotivation #chubbychaser'},
  u'comments': {u'count': 0, u'data': []},
  u'created_time': u'1443044203',
  u'filter': u'Amaro',
  u'id': u'1080655558050658263_22652636',
  u'images': {u'low_resolution': {u'height': 320,
    u'url': u'https://scontent.cdninstagram.com/hphotos-xaf1/t51.2885-15/s320x320/e35/11856733_913443538691413_1898206221_n.jpg',
    u'width': 320},
   u'standard_resolution': {u'height': 640,
    u'url': u'https://scontent.cdninstagram.com/hphotos-xaf1/t51.2885-15/s640x640/sh0.08/e35/11856733_913443538691413_1898206221_n.jpg',
    u'width':

1. Pulling "data" from the dictionary. 
2. Using json_normalize to clean up our data. 
3. Our data is then stored in a list. 

In [5]:
results = []
if 'data' in j: 
    data = j['data']
    df_instance = json_normalize(data)
    results.append(df_instance)

In just one line we could pull the list we created into a DataFrame.
We're ap

In [6]:
df = pd.DataFrame().append(results)

In [7]:
df # have a look see at our much cleaner dataframe

Unnamed: 0,attribution,caption.created_time,caption.from.full_name,caption.from.id,caption.from.profile_picture,caption.from.username,caption.id,caption.text,comments.count,comments.data,...,users_in_photo,videos.low_bandwidth.height,videos.low_bandwidth.url,videos.low_bandwidth.width,videos.low_resolution.height,videos.low_resolution.url,videos.low_resolution.width,videos.standard_resolution.height,videos.standard_resolution.url,videos.standard_resolution.width
0,,1443044203,ronjansensolis,22652636,https://scontent.cdninstagram.com/hphotos-xtf1...,ronjansensolis,1080655560584016946,Hey there stranger!!! Let me reintroduce mysel...,0,[],...,[],,,,,,,,,
1,,1443044202,Julius McFly,227204109,https://scontent.cdninstagram.com/hphotos-xaf1...,julius_mcfly,1080655555443309554,We can still rooftop in the fall right?????? #...,0,[],...,[],,,,,,,,,
2,,1443044200,Brazilian In New York,19177740,https://scontent.cdninstagram.com/hphotos-xaf1...,helciojuniorr,1080655544108898223,#newyork #pier #highway #intrepid #sun #sunset...,0,[],...,[],,,,,,,,,
3,,1443044200,,1987565028,https://scontent.cdninstagram.com/hphotos-xaf1...,orologi.italy.2016,1080655541941859348,#rogerdubuis #lifestyle #hot #bracelets #carti...,0,[],...,[],,,,,,,,,
4,,1443044197,bpstormborn,27912380,https://scontent.cdninstagram.com/hphotos-xaf1...,bpstormborn,1080655542380476396,Deciding how to style this wig for my Jolteon ...,0,[],...,[],480.0,https://scontent.cdninstagram.com/hphotos-xaf1...,480.0,480.0,https://scontent.cdninstagram.com/hphotos-xaf1...,480.0,640.0,https://scontent.cdninstagram.com/hphotos-xaf1...,640.0
5,,1443044197,Chloe,346852698,https://igcdn-photos-f-a.akamaihd.net/hphotos-...,chloe_boram,1080655514648063796,모마에 가기 전\n힐튼호텔 건너 할랄가이즈형들을 만남🙊\n저 왜 여지껏 먹은 것 ...,0,[],...,[],,,,,,,,,
6,,1443044197,Nick Katen,395361864,https://igcdn-photos-e-a.akamaihd.net/hphotos-...,n_katen,1080655514252944797,Calm Brooklyn mornings from my backyard #brook...,0,[],...,[],,,,,,,,,
7,,1443044195,HIGH POINT LA,1967278665,https://igcdn-photos-c-a.akamaihd.net/hphotos-...,highpointla,1080655502376519871,Where's your bracelet? #leather #menswear #ins...,0,[],...,[],,,,,,,,,
8,,1443044195,MarshaB. TV | DMV Fashion News,179584768,https://scontent.cdninstagram.com/hphotos-xaf1...,dmvfashionnews,1080655495104726731,❤ BEFORE YOU SPEAK THINK!\n#MarshaB #dmvfashi...,0,[],...,[],,,,,,,,,
9,,1443044193,✊Activist👊 🎭Anonymous🎭,1556377631,https://scontent.cdninstagram.com/hphotos-xfa1...,anonymousnyc59,1080655480841904217,#plannedparenthood #genocide #organtraffickin...,0,[],...,[],,,,,,,,,


That's pretty much it. We got our CLIENT_ID key from instagram, created a few variables to pass in the url, used requests to pull in the data, used json() to display it, json_normalize to clean it and DataFrame to have a nice visual of all the data we received.

The other thing we have left is selecting the data we want to work with and cleaning our data.

In [13]:
df.head(2) # top two rows of our dataframe

Unnamed: 0,attribution,caption.created_time,caption.from.full_name,caption.from.id,caption.from.profile_picture,caption.from.username,caption.id,caption.text,comments.count,comments.data,...,users_in_photo,videos.low_bandwidth.height,videos.low_bandwidth.url,videos.low_bandwidth.width,videos.low_resolution.height,videos.low_resolution.url,videos.low_resolution.width,videos.standard_resolution.height,videos.standard_resolution.url,videos.standard_resolution.width
0,,1442947535,RED STAR MODEL MGMT.,665938155,https://scontent.cdninstagram.com/hphotos-xfa1...,redstarmodels,1079844655835730512,RED STAR model ANDREI shot by RICK DAY (@rickd...,0,[],...,[],,,,,,,,,
1,,1442947535,iArremate,468341613,https://igcdn-photos-f-a.akamaihd.net/hphotos-...,iarremate,1079844649584638545,Lote 0044\nPaulo Laender\nMandala Verde - 70 c...,0,[],...,[],,,,,,,,,
