## Scraping Facebook data using the Python-Facebook API


First step is to get a user access token. This is a temporary token that is used for having the user authorize an access token and to obtain the token. 

Facebook implements OAuth 2.0 as its standard authentication mechanism.
You need to get an access token by logging in to your Facebook account and go to https://developers.facebook.com/tools/explorer/ to obtain an ACCESS_TOKEN.
See http://facebook-sdk.readthedocs.io/en/latest/api.html

Once authenticated, the access token expires after a while. So you will need to keep authorizing repetedly. Once it expires, you'll see a message like this if you try to use an expired token. 

{<br>
 "error": <br>
     { "message": "Error validating access token: Session has expired on Wednesday, 25-Oct-17 10:00:00 PDT. The current time is Wednesday, 25-Oct-17 18:55:58 PDT.", <br>
  "type": "OAuthException",<br>
  "code": 190, <br>
  "error_subcode": 463, <br>
  "fbtrace_id": "CaF9PR122/j" <br>
  } <br>
 }

In [525]:
ACCESS_TOKEN = 'EAACEdEose0cBAKsQp7ZAQ65HTbl5X25FQz7NGMZBdQpz78xeOQXuyMfmWjkKgnyKW3IV9kvWe1lX0sd42UcF27ggpgfwLt5WQNmRrrpZBju1vcN7Woxk7aGQUJNxCmYgrnZCZBZAB8w0KaEJzZCZANslmtiGBQgWzueCREsQKWpuwCTB91jZBdSumJ7ZBEFVoliZApZCXsmO7ZBWUIQZDZD'      

### Making Graph API requests over HTTP


In [526]:
import requests # pip install requests
import json
import pandas as pd

base_url = 'https://graph.facebook.com/me'

# Specify which fields to retrieve
fields = 'id,name,likes'

url = '{0}?fields={1}&access_token={2}'.format(base_url, fields, ACCESS_TOKEN)
print(url)

https://graph.facebook.com/me?fields=id,name,likes&access_token=EAACEdEose0cBAKsQp7ZAQ65HTbl5X25FQz7NGMZBdQpz78xeOQXuyMfmWjkKgnyKW3IV9kvWe1lX0sd42UcF27ggpgfwLt5WQNmRrrpZBju1vcN7Woxk7aGQUJNxCmYgrnZCZBZAB8w0KaEJzZCZANslmtiGBQgWzueCREsQKWpuwCTB91jZBdSumJ7ZBEFVoliZApZCXsmO7ZBWUIQZDZD


### Querying the Graph API with Python

Facebook SDK for Python API reference: http://facebook-sdk.readthedocs.io/en/v2.0.0/api.html

In [527]:
import facebook # pip install facebook-sdk

# Create a connection to the Graph API with your access token

graph = facebook.GraphAPI(ACCESS_TOKEN, version='2.7')

### Extracting all the necessary data

In [424]:
# retrieving information (posts) of a particular page 'production comapnies' in this case)

def retrieve_page_feed(page_id, n_posts):
    """Retrieve the first n_posts from a page's feed in reverse
    chronological order."""
    feed = graph.get_connections(page_id, 'posts')
    posts = []
    posts.extend(feed['data'])

    while len(posts) < n_posts:
        try:
            feed = requests.get(feed['paging']['next']).json()
            posts.extend(feed['data'])
        except KeyError:
            # When there are no more posts in the feed, break
            print('Reached end of feed.')
            break
            
    if len(posts) > n_posts:
        posts = posts[:n_posts]

    print('{} items retrieved from feed'.format(len(posts)))
    return posts

In [425]:
# Counting total number of page fans (fan count/likes on the page)

def fan_count(page_id):
    return int(graph.get_object(id=page_id, fields=['fan_count'])['fan_count'])

In [426]:
# Measure the relative engagement of each post based on the number of likes, shares, comments

def post_engagement(post_id):
    likes = graph.get_object(id=post_id, 
                         fields=['likes.limit(0).summary(true)'])\
                         ['likes']['summary']['total_count']
    shares = graph.get_object(id=post_id, 
                         fields=['shares.limit(0).summary(true)'])\
                         ['shares']['count']
    comments = graph.get_object(id=post_id, 
                         fields=['comments.limit(0).summary(true)'])\
                         ['comments']['summary']['total_count']
    engagement = [likes,shares,comments]
    return engagement

In [427]:
#extracting the hastags from the 'message' of each post

import re
def get_tags(page_id):
    tags=[]
    for each in page_id:
        try:
            x=each['message']
            y = re.findall(r'#(\w+)',x) #using the regEx function, import thr library 're'
            tags.append(y[0])
        except:
            pass
    indices=[]
    for i,x in enumerate(tags):
        if len(x) ==0:
            indices.append(i)
    for index in sorted(indices, reverse=True):
        del(tags[index])
    return tags

In [None]:
#Gettig public posts

list_of_dict = []
for each in graph.search(type='posts', q = 'Incredibles'):
    public_post =dict()
    public_post['name'] = each['name']
    public_post['followers'] = each['followers']
    public_post['likes'] = each['likes']
    public_post['shares'] = each['shares']
    public_post['comments'] = each['comment']
    list_of_dict.append(public_post)

In [558]:
# Storing all the public data into dataframes

facebook_public_data = pd.DataFrame(list_of_dict)
facebook_public_data = facebook_public_data[['name','followers/page_likes','post','post_likes','shares','comments']]
facebook_public_data

Unnamed: 0,name,followers/page_likes,post,post_likes,shares,comments
0,Disney,50047182,🚨 50 days. 🚨 #disney #Incredibles2 in theatres...,2900,732,255
1,Helen Parr(Elastic Girl),2036311,See Elastigirl and The Incredibles in Disney H...,40,2,0
2,IGN,5173840,From Avengers: Infinity War to The Incredibles...,686,27,21
3,Rtr Maitri Vasa,2316,#Incredibles #November #disneystudios,13,0,0
4,Anjaneya Shetty,769,GameSpots PostThe Incredibles are almost back!...,14,0,0
5,Keerthana Subramanian,660,OMGOMGOMGOMG THEY ARE BACKKKKKK Madhuvanthi ht...,1,0,1
6,Ankit Dhame,1126,FINALLY Disneys PostThe Incredibles 2 trailer ...,12,1,2
7,Better believe I will unshamefully go see this...,341,Nick Adams,4,1,0
8,Tanner Tucker,1080,This new trailer is great 😂😱👏🏼 so pumped for t...,4,1,1
9,The Incredibles,7322483,Congratulations to The Royal Family on their I...,5600,450,95


### Getting data from 6 production companies page on FB

Just as we extracted public posts relevant to our domain, we will be follwoing similar procedure inorder to get the data from 6 production companies page on Facebook. The production companies are: Universal Studios, Columbia Pictures, Marvel, Paramount Pictures, Disney and Warner Bros Ent. The data obtained from individual pages is stored as separate datframes for each production companies page. 

#### UNIVERSAL

In [529]:
#retrieving information from the 'universal studios' page

universal_page = retrieve_page_feed('UniversalStudiosEntertainment',100) #limit the no. of posts to 100
universal_page

100 items retrieved from feed


[{'created_time': '2018-04-26T17:05:10+0000',
  'id': '215204471085_2422291497788419',
  'message': 'You could win a trip to tour Notting Hill & picnic with Hugh Grant! You’ll visit the blue door, stay at the Ritz (where Julia stayed!) and more. Support Red Nose Day USA & ENTER:  http://bit.ly/Hugh-Grant-Picnic-You',
  'story': 'Universal Studios Entertainment with Omaze.'},
 {'created_time': '2018-04-25T23:13:38+0000',
  'id': '215204471085_10155153205336086',
  'story': 'Universal Studios Entertainment shared a video.'},
 {'created_time': '2018-04-25T18:54:36+0000',
  'id': '215204471085_10155152711706086',
  'message': 'Enter the Tremors Fan Art contest! Voting will be open to the public and the final round will be judged by Michael Gross- Burt Gummer himself. http://uni.pictures/TremorsContest\n\nNO PURCHASE NECESSARY. Contest entry runs 04/16/2018 – 04/30/2018. Open only to legal residents of the 50 U.S. & D.C., 13+. Subject to Official Rules at http://uni.pictures/TremorsContest.

In [462]:
# storing and appending all the information for all the posts 
# which have atleast 1 like,share,comment in their separate lists

likes_universal =[]
shares_universal=[]
comments_universal=[]
posts_universal =[]
page_likes= fan_count('UniversalStudiosEntertainment')
pg_likes_universal = []
time_universal=[]
universal_tags= get_tags(universal_page)

for x, y in enumerate(universal_page):
    try:
        likes = post_engagement(universal_page[x]['id'])[0]
    except:
        likes=0
    try:
        shares = post_engagement(universal_page[x]['id'])[1]
    except:
        shares=0
    try:
        comments = post_engagement(universal_page[x]['id'])[2]
    except:
        comments=0
    try:
        posts = str(y['message'])
    except:
        posts = str(y['story'])
    try:
        time = y['created_time']
    except:
        time='N/A'
    time_universal.append(time)
    posts_universal.append(posts)
    pg_likes_universal.append(page_likes)
    likes_universal.append(likes)
    shares_universal.append(shares)
    comments_universal.append(comments)

In [463]:
for i in range(0,42):
    universal_tags.append('N/A')

In [466]:
#storing the posts, created time, tags, no. of likes,shares,comments,no. of page followers into the dataframe

df_1 = pd.DataFrame(posts_universal, columns=['Posts'])
df_1['Production_company'] = 'Universal Studios'
df_1['Likes'] = likes_universal
df_1['Shares'] = shares_universal
df_1['Comments'] = comments_universal
df_1['Page_likes'] = pg_likes_universal
df_1['Hashtags'] = universal_tags
df_1['Created_time'] = time_universal
df_1 = df_1[['Production_company','Page_likes','Posts','Hashtags','Likes','Shares','Comments','Created_time']]
df_1.head()

Unnamed: 0,Production_company,Page_likes,Posts,Hashtags,Likes,Shares,Comments,Created_time
0,Universal Studios,8253534,Universal Studios Entertainment shared a video.,FiftyShadesFreed,70,15,3,2018-04-25T23:13:38+0000
1,Universal Studios,8253534,Enter the Tremors Fan Art contest! Voting will...,TheMostIndestructibleManInTheWorld,29,12,8,2018-04-25T18:54:36+0000
2,Universal Studios,8253534,"From the Producers of Insidious, Sinister and ...",WasteLess,34,9,2,2018-04-24T16:00:01+0000
3,Universal Studios,8253534,Don't miss the climax - Own the final chapter....,EarthDay,38,2,7,2018-04-24T13:00:46+0000
4,Universal Studios,8253534,"""Always vigilant my friends..."" See #TheMostIn...",SkyscraperMovie,177,41,20,2018-04-23T18:55:00+0000


#### COLUMBIA

In [467]:
columbia_page = retrieve_page_feed('ColumbiaPicturesEntertainment',100) #limit the no. of posts to 100
columbia_page

Reached end of feed.
70 items retrieved from feed


[{'created_time': '2014-02-16T07:36:56+0000',
  'id': '378473832258341_482476885191368',
  'message': "Jeri Ryan joins the cast of Helix as Ilaria's Constance Sutton and she's taking no prisoners! If you thought there were tension and twists before… you ain't seen nothing yet. Tune-in to Syfy Tonight at 10/9c to see the fireworks! http://bit.ly/1eTaZF8"},
 {'created_time': '2014-01-23T12:23:27+0000',
  'id': '378473832258341_472911399481250',
  'message': 'With stunning performances that have wowed critics and viewers alike, Blue Jasmine is now available on Blu-ray. http://amzn.to/1eor3uT'},
 {'created_time': '2014-01-23T12:22:21+0000',
  'id': '378473832258341_472911249481265',
  'message': 'She’s always hot on the trail of danger. #Gwensday'},
 {'created_time': '2014-01-22T15:48:47+0000',
  'id': '378473832258341_472613769511013',
  'message': 'Captain Phillips just landed on Blu-ray and Digital. http://amzn.to/1d115xC'},
 {'created_time': '2014-01-22T15:46:55+0000',
  'id': '3784738

In [468]:
# storing and appending all the information for all the posts 
# which have atleast 1 like,share,comment in their separate lists

likes_columbia =[]
shares_columbia=[]
comments_columbia=[]
posts_columbia =[]
page_likes= fan_count('ColumbiaPicturesEntertainment')
pg_likes_columbia = []
time_columbia = []
columbia_tags= get_tags(columbia_page)

for x, y in enumerate(columbia_page):
    try:
        likes = post_engagement(columbia_page[x]['id'])[0]
    except:
        likes=0
    try:
        shares = post_engagement(columbia_page[x]['id'])[1]
    except:
        shares=0
    try:
        comments = post_engagement(columbia_page[x]['id'])[2]
    except:
        comments=0
    try:
        posts = str(y['message'])
    except:
        posts = str(y['story'])
    try:
        time = y['created_time']
    except:
        time='N/A'
    time_columbia.append(time)
    posts_columbia.append(posts)
    pg_likes_columbia.append(page_likes)
    likes_columbia.append(likes)
    shares_columbia.append(shares)
    comments_columbia.append(comments)

In [469]:
for i in range(0,39):
    columbia_tags.append('N/A')
len(columbia_tags)

70

In [470]:
df_2 = pd.DataFrame(posts_columbia, columns=['Posts'])
df_2['Production_company'] = 'Columbia Studios'
df_2['Likes'] = likes_columbia
df_2['Shares'] = shares_columbia
df_2['Comments'] = comments_columbia
df_2['Page_likes'] = pg_likes_columbia
df_2['Hashtags'] = columbia_tags
df_2['Created_time'] = time_columbia
df_2 = df_2[['Production_company','Page_likes','Posts','Hashtags','Likes','Shares','Comments','Created_time']]
df_2.head()

Unnamed: 0,Production_company,Page_likes,Posts,Hashtags,Likes,Shares,Comments,Created_time
0,Columbia Studios,5045,Jeri Ryan joins the cast of Helix as Ilaria's ...,Gwensday,63,12,44,2014-02-16T07:36:56+0000
1,Columbia Studios,5045,With stunning performances that have wowed cri...,MonumentsMen,38,9,8,2014-01-23T12:23:27+0000
2,Columbia Studios,5045,She’s always hot on the trail of danger. #Gwen...,Helix,37,22,6,2014-01-23T12:22:21+0000
3,Columbia Studios,5045,Captain Phillips just landed on Blu-ray and Di...,AmericanHustle,30,10,5,2014-01-22T15:48:47+0000
4,Columbia Studios,5045,Who are your real-life heroes? #MonumentsMen,RoboCop,29,9,3,2014-01-22T15:46:55+0000


#### MARVEL

In [471]:
marvel_page = retrieve_page_feed('MarvelStudios',100) #limit the no. of posts to 100
marvel_page

100 items retrieved from feed


[{'created_time': '2018-04-25T22:05:30+0000',
  'id': '134891530271801_435981660162785',
  'message': 'It’s all been leading to this. Avengers: Infinity War #DoctorStrange'},
 {'created_time': '2018-04-25T21:35:39+0000',
  'id': '134891530271801_435971983497086',
  'message': 'Try out the new Avengers: Infinity War Facebook AR Effects now! Swipe right to open your camera on mobile.'},
 {'created_time': '2018-04-25T18:00:01+0000',
  'id': '134891530271801_435869873507297',
  'message': 'See Avengers: Infinity War tomorrow night. \n\nGet tickets: http://www.fandango.com/infinitywar'},
 {'created_time': '2018-04-25T16:56:21+0000',
  'id': '134891530271801_10156241784152488',
  'message': "Peter Parker's latest suit will be coming to Marvel’s Spider-Man! Can't wait to see Spider-Man's Iron Spider suit in action? Avengers: Infinity War hits theaters this Friday, get your tickets now: http://www.fandango.com/infinitywar"},
 {'created_time': '2018-04-25T16:30:00+0000',
  'id': '13489153027180

In [475]:
# storing and appending all the information for all the posts 
# which have atleast 1 like,share,comment in their separate lists

likes_marvel =[]
shares_marvel=[]
comments_marvel=[]
posts_marvel =[]
page_likes= fan_count('ColumbiaPicturesEntertainment')
pg_likes_marvel = []
time_marvel = []
marvel_tags= get_tags(marvel_page)

for x, y in enumerate(marvel_page):
    try:
        likes = post_engagement(marvel_page[x]['id'])[0]
    except:
        likes=0
    try:
        shares = post_engagement(marvel_page[x]['id'])[1]
    except:
        shares=0
    try:
        comments = post_engagement(marvel_page[x]['id'])[2]
    except:
        comments=0
    try:
        posts = str(y['message'])
    except:
        posts = str(y['story'])
    try:
        time = y['created_time']
    except:
        time='N/A'
    time_marvel.append(time)
    posts_marvel.append(posts)
    pg_likes_marvel.append(page_likes)
    likes_marvel.append(likes)
    shares_marvel.append(shares)
    comments_marvel.append(comments)

In [477]:
for i in range(0,74):
    marvel_tags.append('N/A')
len(marvel_tags)

100

In [478]:
df_3 = pd.DataFrame(posts_marvel, columns=['Posts'])
df_3['Production_company'] = 'Marvel Studios'
df_3['Likes'] = likes_marvel
df_3['Shares'] = shares_marvel
df_3['Comments'] = comments_marvel
df_3['Page_likes'] = pg_likes_marvel
df_3['Created_time'] = time_marvel
df_3['Hashtags'] = marvel_tags
df_3 = df_3[['Production_company','Page_likes','Posts','Hashtags','Likes','Shares','Comments','Created_time']]
df_3.head()

Unnamed: 0,Production_company,Page_likes,Posts,Hashtags,Likes,Shares,Comments,Created_time
0,Marvel Studios,5045,It’s all been leading to this. Avengers: Infin...,DoctorStrange,1362,195,57,2018-04-25T22:05:30+0000
1,Marvel Studios,5045,Try out the new Avengers: Infinity War Faceboo...,ThanosDemandsYourSilence,1149,110,30,2018-04-25T21:35:39+0000
2,Marvel Studios,5045,See Avengers: Infinity War tomorrow night. \n\...,TheVision,3409,1678,204,2018-04-25T18:00:01+0000
3,Marvel Studios,5045,Peter Parker's latest suit will be coming to M...,StarLord,3437,483,204,2018-04-25T16:56:21+0000
4,Marvel Studios,5045,Be the first to see Avengers: Infinity War tom...,Hulk,8012,3507,642,2018-04-25T16:30:00+0000


#### DISNEY

In [479]:
disney_page = retrieve_page_feed('Disney',100) #limit the no. of posts to 100
disney_page

100 items retrieved from feed


[{'created_time': '2018-04-26T01:00:01+0000',
  'id': '11784025953_417305992066037',
  'message': 'Don’t freeze. Chadwick Boseman, Danai Gurira, Mark Ruffalo, and The Russo Brothers share their Marvel memories. See Avengers: Infinity War on April 27.',
  'story': 'Disney posted an episode of The Oh My Disney Show.'},
 {'created_time': '2018-04-25T23:00:03+0000',
  'id': '11784025953_10155533735240954',
  'message': 'Delicious news! The Disney Eats Collection is now in stores and online at shopDisney. 🍽'},
 {'created_time': '2018-04-25T21:16:28+0000',
  'id': '11784025953_10155533576350954',
  'message': "Entertainment Weekly has an exclusive new look at Winnie the Pooh and his friends from Disney's Christopher Robin.",
  'story': 'Disney shared a post.'},
 {'created_time': '2018-04-25T17:00:01+0000',
  'id': '11784025953_10155791163026936',
  'message': 'Think of the happiest thing! Peter Pan flies onto Digital and Movies Anywhere on May 29, and Blu-ray on June 5. Pre-order: http://di.

In [480]:
# storing and appending all the information for all the posts 
# which have atleast 1 like,share,comment in their separate lists

likes_disney =[]
shares_disney=[]
comments_disney=[]
posts_disney =[]
page_likes= fan_count('Disney')
pg_likes_disney = []
time_disney=[]
disney_tags= get_tags(disney_page)

for x, y in enumerate(disney_page):
    try:
        likes = post_engagement(disney_page[x]['id'])[0]
    except:
        likes=0
    try:
        shares = post_engagement(disney_page[x]['id'])[1]
    except:
        shares=0
    try:
        comments = post_engagement(disney_page[x]['id'])[2]
    except:
        comments=0
    try:
        posts = str(y['message'])
    except:
        posts = str(y['story'])
    try:
        time = y['created_time']
    except:
        time='N/A'
    time_disney.append(time)
    posts_disney.append(posts)
    pg_likes_disney.append(page_likes)
    likes_disney.append(likes)
    shares_disney.append(shares)
    comments_disney.append(comments)

In [481]:
for i in range(0,72):
    disney_tags.append('N/A')
len(disney_tags)

100

In [482]:
df_4 = pd.DataFrame(posts_disney, columns=['Posts'])
df_4['Production_company'] = 'Disney'
df_4['Likes'] = likes_disney
df_4['Shares'] = shares_disney
df_4['Comments'] = comments_disney
df_4['Page_likes'] = pg_likes_disney
df_4['Created_time'] = time_disney
df_4['Hashtags'] = disney_tags
df_4 = df_4[['Production_company','Page_likes','Posts','Hashtags','Likes','Shares','Comments','Created_time']]
df_4.head()

Unnamed: 0,Production_company,Page_likes,Posts,Hashtags,Likes,Shares,Comments,Created_time
0,Disney,51176895,"Don’t freeze. Chadwick Boseman, Danai Gurira, ...",ForcesOfDestiny,601,89,16,2018-04-26T01:00:01+0000
1,Disney,51176895,Delicious news! The Disney Eats Collection is ...,ad,862,165,89,2018-04-25T23:00:03+0000
2,Disney,51176895,Entertainment Weekly has an exclusive new look...,Incredibles2,2329,406,157,2018-04-25T21:16:28+0000
3,Disney,51176895,Think of the happiest thing! Peter Pan flies o...,Incredibles2,1228,286,175,2018-04-25T17:00:01+0000
4,Disney,51176895,"A galaxy far, far away gets a little closer wh...",DisneyAnimalKingdom20,582,43,30,2018-04-25T01:00:04+0000


#### PARAMOUNT

In [None]:


paramount_page = retrieve_page_feed('Paramount',100) #limit the no. of posts to 100
paramount_page

In [484]:
# storing and appending all the information for all the posts 
# which have atleast 1 like,share,comment in their separate lists

likes_paramount =[]
shares_paramount=[]
comments_paramount=[]
posts_paramount =[]
page_likes= fan_count('Paramount')
pg_likes_paramount = []
paramount_time = []
paramount_tags= get_tags(paramount_page)

for x, y in enumerate(paramount_page):
    try:
        likes = post_engagement(paramount_page[x]['id'])[0]
    except:
        likes=0
    try:
        shares = post_engagement(paramount_page[x]['id'])[1]
    except:
        shares=0
    try:
        comments = post_engagement(paramount_page[x]['id'])[2]
    except:
        comments=0
    try:
        posts = str(y['message'])
    except:
        posts = str(y['story'])
    try:
        time = y['created_time']
    except:
        time='N/A'
    paramount_time.append(time)
    posts_paramount.append(posts)
    pg_likes_paramount.append(page_likes)
    likes_paramount.append(likes)
    shares_paramount.append(shares)
    comments_paramount.append(comments)

In [485]:
for i in range(0,54):
    paramount_tags.append('N/A')
len(paramount_tags)

100

In [486]:
df_5 = pd.DataFrame(posts_paramount, columns=['Posts'])
df_5['Production_company'] = 'Paramount'
df_5['Likes'] = likes_paramount
df_5['Shares'] = shares_paramount
df_5['Comments'] = comments_paramount
df_5['Created_time'] = paramount_time
df_5['Page_likes'] = pg_likes_paramount
df_5['Hashtags'] = paramount_tags
df_5 = df_5[['Production_company','Page_likes','Posts','Hashtags','Likes','Shares','Comments','Created_time']]
df_5.head()

Unnamed: 0,Production_company,Page_likes,Posts,Hashtags,Likes,Shares,Comments,Created_time
0,Paramount,11459910,"No rules, no limits, just pure fun. Join Johnn...",WorldBookDay,257,68,9,2018-04-24T00:29:52+0000
1,Paramount,11459910,"This #WorldBookDay, we’re celebrating fun, fri...",1,105,35,9,2018-04-23T13:11:03+0000
2,Paramount,11459910,A Quiet Place is the #1 movie in America! Get ...,AQuietPlace,181,36,20,2018-04-09T21:20:00+0000
3,Paramount,11459910,Fans can't stay quiet about #AQuietPlace. Buy ...,AQuietPlace,66,12,12,2018-04-07T22:50:16+0000
4,Paramount,11459910,"Don't walk, run...to the theatre. Be the first...",AQuietPlace,61,11,6,2018-04-05T23:12:17+0000


#### WARNER BROS

In [487]:
warner_page = retrieve_page_feed('WarnerBrosEnt',100) #limit the no. of posts to 100
warner_page

100 items retrieved from feed


[{'created_time': '2018-04-25T17:15:00+0000',
  'id': '11640096627_1042939855846994',
  'message': 'The Perfect Date.'},
 {'created_time': '2018-04-25T16:39:40+0000',
  'id': '11640096627_2060202554222892',
  'message': 'Yeti or not, here they come. Watch the new trailer for #SMALLFOOT now!'},
 {'created_time': '2018-04-25T16:20:40+0000',
  'id': '11640096627_10156018255356628',
  'message': 'The Joker and Harley Quinn are double trouble together in #BatmanNinja! Own it on iTunes today!\n https://apple.co/2HaTVvW'},
 {'created_time': '2018-04-24T16:29:20+0000',
  'id': '11640096627_10156016131501628',
  'story': 'Warner Bros. Entertainment shared a post.'},
 {'created_time': '2018-04-24T16:00:00+0000',
  'id': '11640096627_10156008130531628',
  'message': 'Batman never backs down from a challenge! See your favorite hero in the world of anime! Own #BatmanNinja today on iTunes!\nhttps://apple.co/2HaTVvW'},
 {'created_time': '2018-04-23T22:39:46+0000',
  'id': '11640096627_171811192160288

In [488]:
# storing and appending all the information for all the posts 
# which have atleast 1 like,share,comment in their separate lists

likes_warner =[]
shares_warner=[]
comments_warner=[]
posts_warner =[]
time_warner = []
page_likes= fan_count('WarnerBrosEnt')
pg_likes_warner = []
warner_tags= get_tags(warner_page)

for x, y in enumerate(warner_page):
    try:
        likes = post_engagement(warner_page[x]['id'])[0]
    except:
        likes=0
    try:
        shares = post_engagement(warner_page[x]['id'])[1]
    except:
        shares=0
    try:
        comments = post_engagement(warner_page[x]['id'])[2]
    except:
        comments=0
    try:
        posts = str(y['message'])
    except:
        posts = str(y['story'])
    try:
        time = y['created_time']
    except:
        time = 'N/A'
    time_warner.append(time)
    posts_warner.append(posts)
    pg_likes_warner.append(page_likes)
    likes_warner.append(likes)
    shares_warner.append(shares)
    comments_warner.append(comments)

In [489]:
for i in range(0,26):
    warner_tags.append('N/A')
len(warner_tags)

100

In [490]:
df_6 = pd.DataFrame(posts_warner, columns=['Posts'])
df_6['Production_company'] = 'Warner Studios'
df_6['Likes'] = likes_warner
df_6['Shares'] = shares_warner
df_6['Comments'] = comments_warner
df_6['Created_time'] = time_warner
df_6['Page_likes'] = pg_likes_warner
df_6['Hashtags'] = warner_tags
df_6 = df_6[['Production_company','Page_likes','Posts','Hashtags','Likes','Shares','Comments','Created_time']]
df_6.head()

Unnamed: 0,Production_company,Page_likes,Posts,Hashtags,Likes,Shares,Comments,Created_time
0,Warner Studios,3465877,The Perfect Date.,SMALLFOOT,99,32,6,2018-04-25T17:15:00+0000
1,Warner Studios,3465877,"Yeti or not, here they come. Watch the new tra...",BatmanNinja,290,109,19,2018-04-25T16:39:40+0000
2,Warner Studios,3465877,The Joker and Harley Quinn are double trouble ...,BatmanNinja,231,21,13,2018-04-25T16:20:40+0000
3,Warner Studios,3465877,Warner Bros. Entertainment shared a post.,CrazyRichAsians,136,10,2,2018-04-24T16:29:20+0000
4,Warner Studios,3465877,Batman never backs down from a challenge! See ...,BatmanNinja,448,89,49,2018-04-24T16:00:00+0000


## Merging six dataframes into a common dataframe

The dataframes created for individual pages are then merged into one big dataframe: prod_comp_fb_df. <br>
This dataframe is our FB production companies table in the SQL schema shown below. This dataframe is then converted into an SQL table for further analysis.

In [491]:
prod_comp_fb_df = pd.concat([df_1,df_2,df_3,df_4,df_5,df_6])
prod_comp_fb_df.reset_index(drop=True)

Unnamed: 0,Production_company,Page_likes,Posts,Hashtags,Likes,Shares,Comments,Created_time
0,Universal Studios,8253534,Universal Studios Entertainment shared a video.,FiftyShadesFreed,70,15,3,2018-04-25T23:13:38+0000
1,Universal Studios,8253534,Enter the Tremors Fan Art contest! Voting will...,TheMostIndestructibleManInTheWorld,29,12,8,2018-04-25T18:54:36+0000
2,Universal Studios,8253534,"From the Producers of Insidious, Sinister and ...",WasteLess,34,9,2,2018-04-24T16:00:01+0000
3,Universal Studios,8253534,Don't miss the climax - Own the final chapter....,EarthDay,38,2,7,2018-04-24T13:00:46+0000
4,Universal Studios,8253534,"""Always vigilant my friends..."" See #TheMostIn...",SkyscraperMovie,177,41,20,2018-04-23T18:55:00+0000
5,Universal Studios,8253534,"Take another look, did you miss something?\nOw...",EarthDay,10,1,3,2018-04-23T16:00:39+0000
6,Universal Studios,8253534,A good chef knows how to use all parts of thei...,EarthDay,54,6,2,2018-04-23T01:00:58+0000
7,Universal Studios,8253534,Riding a bike is a more eco-friendly alternati...,SPK9,12,1,1,2018-04-22T22:01:06+0000
8,Universal Studios,8253534,Fighter. Father. Hero. There are no limits whe...,PacificRimUprising,2430,512,202,2018-04-22T20:56:06+0000
9,Universal Studios,8253534,"Help save coral reefs, one bottle of sunscreen...",FallenKingdom,26,8,1,2018-04-22T19:01:00+0000


In [492]:
%store prod_comp_fb_df

Stored 'prod_comp_fb_df' (DataFrame)
