# Facebook Data Collection

##  Graph API Fundementals

Graph API is the main tool to fetch information about **users, pages, groups** etc. and it is the only way to get (limited) information of users and interaction between them. 

Graphs in facebook are formed by 3 main items:
- **nodes**: users, pages, groups, photos, etc.
- **fields**: information about a node. 
- **edges**: links between two nodes.

## Privacy and How to Fetch Data

Privacy issues of facebook is very **strict** so we are very limited to fetch data about users. 

- User data can only be fetched via user's permission
- Default permissions are limited, it only allows us to get user's email, public profile(gender,profile photo etc.), user's friends who also use our app
- If we want to get a user's post we need user_post permission.
- There is no way to get all the friends of a user
- If we create a group, we are allowed to fetch all the posts but we can't reach the information of user's identity who created the post.

The only way to fetch data about a user is to create an **application**. If the user gives permission for your app, then you may start to fetch info about user but the content of information is also very limited as an example there is no way to learn the user's friends list as it is stated in the following url https://www.quora.com/How-do-I-get-all-friends-on-Facebook-using-Facebook-API-and-Python. 

On the other hand you can learn public profile info of a user like *age, gender* and you can also get the user's friends who also use your application and this last information should be enough to create a graph of people in a community.

## Application Development

https://developers.facebook.com This url is the main place to find guidence and to develop application. There are different kinds of application environment types like web site, canvas, mobile. For our case, web site seems the most relevant one and basic development in the following shells has been done in this environment.

### App Development Step 1: App Creation

1. Create a new app and facebook will provide required **app_id** and **app_secret** information to you in dashboard section. 
2. Go settings and add web site environment with a url. 
3. Add a product (facebook login) to get required permissions from a user. Facebook will ask you to enter Valid OAuth redirect URIs. This part is important because we are going to get some key values via these urls and use them to get access_token specific to each user. 
4. Don't forget to publish application.

### App Development Step 2: Permission Handling

Then we need to ask people to share their information with us and these steps can be done by following the steps in the url https://developers.facebook.com/docs/facebook-login/manually-build-a-login-flow. A web app can be used to handle these steps and get information about user.

In [43]:
import requests
import json
import urllib
import urllib2

In [2]:
#info about our app
api_version = "v2.8"
app_id = "YOUR APP_ID"
app_secret = "YOUR APP_SECRET"

In [45]:
# we need to produce an access_token for our app
r = requests.get('https://graph.facebook.com/oauth/access_token?grant_type=client_credentials&client_id='+app_id+'&client_secret='+app_secret)
app_access_token = r.text.split('=')[1]

In [46]:
# we will apply get and post methods in localhost for now
redirect_uri = 'http://127.0.0.1:5000/'

In [47]:
#this is the url a user should click to give permission to our app
signup_url = "https://www.facebook.com/v2.8/dialog/oauth?client_id="+app_id+"&redirect_uri=http://127.0.0.1:5000/"

When a user click this signup_url facebook will show him/her a popup to accept or deny permissions. After getting permission, facebook will send a code to redirection url address. This code will be used to get user access_token.

In [1]:
code = "CODE YOU GOT"

In [6]:
#by using code and other variables we request acess token for a user
request_url = 'https://graph.facebook.com/v2.8/oauth/access_token?client_id='+app_id+'&redirect_uri='+redirect_uri+'&client_secret='+app_secret+'&code='+code
url = urllib.urlopen(request_url).read()
result = json.loads(url)

In [8]:
user_access_token = result['access_token']

In [11]:
#by using user access token we request user id
request_url = 'https://graph.facebook.com/debug_token?input_token='+user_access_token+'&access_token='+app_access_token
url = urllib.urlopen(request_url).read()
result = json.loads(url)
print result
user_id = result['data']['user_id']

{u'data': {u'scopes': [u'email', u'public_profile'], u'user_id': u'10209081109077283', u'app_id': u'330527133972801', u'expires_at': 1488624003, u'application': u'ShakerMaker3', u'is_valid': True, u'issued_at': 1483440003}}


In [12]:
#by using user access token and id we request number of friends
request_url = 'https://graph.facebook.com/v2.8/'+user_id+'/friends?access_token='+user_access_token
url = urllib.urlopen(request_url).read()
result = json.loads(url)
print result

{u'data': [], u'summary': {u'total_count': 278}}


In [14]:
#by using user access token and id we request info about user
request_url = "https://graph.facebook.com/v2.8/me?access_token="+user_access_token
url = urllib.urlopen(request_url).read()
result = json.loads(url)
print result

{u'name': u'Semih Akbayrak', u'id': u'10209081109077283'}


By default settings, only limited info about user can be fetched. If you want to get more info about user, you have to send your application to facebook and they need to approve your application. For all the fields which can be get via graph api, please have a look at https://developers.facebook.com/docs/graph-api/reference/user/

## How to Fetch Data from Pages

Fetching data from pages (even other pages not necessarily ShakerMaker) is much more simple when compared with fetching user data. By just using our application access token we can get the posts published by page, comments to post, likes to post, and we can also learn the interaction between users by looking at people who liked a comment to post. In the following shells, you can find an example for imdb page.

In [48]:
page_name = 'imdb' #Global variable

In [49]:
#General info abour page
request_url = "https://graph.facebook.com/" + api_version + "/" + page_name + "/?access_token=" + app_access_token
url = urllib.urlopen(request_url).read()
result = json.loads(url)
page_id = result['id']
print result

{u'name': u'IMDb', u'id': u'15925638948'}


### Posts in a Page

In [50]:
request_url = "https://graph.facebook.com/" + api_version + "/" + page_name + "/feed?access_token=" + app_access_token
url = urllib.urlopen(request_url).read()
result = json.loads(url)
for post in result['data']:
    print "_"*100
    print post['message']
    #print post['id']

____________________________________________________________________________________________________
Get planning: Find out when your favorite television shows premiere this month. http://imdb.to/2ix5PpA
____________________________________________________________________________________________________
These Globe nominee transformations are next level. http://imdb.to/2hN4fvL
____________________________________________________________________________________________________
What are you looking forward to watching this year? http://imdb.to/2iwM1mc
____________________________________________________________________________________________________
Looks like Harrelson is the top choice. http://imdb.to/2ixlxRo
____________________________________________________________________________________________________
Read on to learn more about what we discovered. http://imdb.to/2ivWfTU
____________________________________________________________________________________________________
2016 wa

### Post Likes

We can easily learn who liked the post, in this case we will look at the newest post published by imdb.

In [21]:
request_url = "https://graph.facebook.com/" + api_version + "/" + page_name + "/feed?access_token=" + app_access_token
url = urllib.urlopen(request_url).read()
result = json.loads(url)
post = result['data'][0]
post_id = post['id']

print result["data"][0]["message"]
print "_"*100

request_url = "https://graph.facebook.com/" + api_version + "/" + post_id + "/likes?access_token=" + app_access_token
url = urllib.urlopen(request_url).read()
result = json.loads(url)
#print result
for liker in result['data']:
    print liker['id'], liker['name']

#Cursor issue, limited number of results per request
while 'next' in result['paging']:
    request_url = result['paging']['next']
    url = urllib.urlopen(request_url).read()
    result = json.loads(url)
    for liker in result['data']:
        print liker['id'],  liker['name']

Get planning: Find out when your favorite television shows premiere this month. http://imdb.to/2ix5PpA
____________________________________________________________________________________________________
1593960377297808 Perra Sandström
1645750259062327 Let Hooker
10206420136896999 Chris Smith
1175613725819319 Ceyda Nursena Dağhan
1472807702752175 Sheikh Zobaid Ur Rahman
10210772298211194 Roberto Rudge Fonseca
1170467463048617 Jelena Spendrup
10154440538346032 Jonna Kurppa
180512345756668 Mainak Paul Paul
1827328270814097 Tony Pham
10206462951172292 Gareth Goose Hunter
10208036677776702 Ballina Prishtina
930584137075435 Mo'men Ashraf
10208170424882938 Corrina Jackson
1608949742465515 Amherst Wu
222367721554150 Nizra Aziera
10154812588493936 Taina Tossavainen
1222306861171154 Minchaul Bu
10208748799219493 Maureen Sol Briones
10158110426775249 Nella Romanova
10154053271568204 Madalina Georgescu
10210932372207550 Anna István
1297893046935064 Vladimir Klimenko
10206119946996985 Nguyễn Hạnh

### Post Comments

Fetch comments belong to the newest post

In [26]:
#cursor issue can be solved like above, but in this case I will print limited number of comments
request_url = "https://graph.facebook.com/" + api_version + "/" + post_id + "/comments?access_token=" + app_access_token
url = urllib.urlopen(request_url).read()
result = json.loads(url)
for comment in result['data']:
    print comment['from']['name'],":"
    print comment['message']
    print "_"*100

Sergej Zuckermann :
Last seaspns of grimm and bones. Nice
____________________________________________________________________________________________________
Jules Loupos :
Julia & Peter do research
____________________________________________________________________________________________________
Nicky Boes :
Eline Bonduelle Blindspot!
____________________________________________________________________________________________________
Matthew Basile :
Chloe.......
____________________________________________________________________________________________________
Joy Wharton :
Emma Wharton
____________________________________________________________________________________________________
Ilse Van de Groep :
Michiel
____________________________________________________________________________________________________
Hilmar Helmundsson Højgaard :
Oddur Højgaard
____________________________________________________________________________________________________


### Likers of a Comment

In [35]:
request_url = "https://graph.facebook.com/" + api_version + "/" + post_id + "/comments?access_token=" + app_access_token
url = urllib.urlopen(request_url).read()
result = json.loads(url)
comment_id = result['data'][2]['id']
request_url = "https://graph.facebook.com/" + api_version + "/" + comment_id + "/likes?access_token=" + app_access_token
url = urllib.urlopen(request_url).read()
result = json.loads(url)
for liker in result['data']:
    print liker['id'], liker['name']

10212008261665184 Eline Bonduelle


## How to Fetch Data in Groups

Similar to pages

In [37]:
page_name = '164851960539049' #Global variable. id can be used instead of name as in this case

In [38]:
#General info abour group
request_url = "https://graph.facebook.com/" + api_version + "/" + page_name + "/?access_token=" + app_access_token
url = urllib.urlopen(request_url).read()
result = json.loads(url)
#page_id = result['id']
print result

{u'id': u'164851960539049', u'name': u'Istanbul Startup Jobs', u'privacy': u'OPEN'}


We only changes page name and everything beside of it is same, and we will apply the same procedure. The only difference is that in groups other people can also create posts and we can track these posts.

### Posts

Problem in here, it is not possible to get the id of user who created the post

In [40]:
request_url = "https://graph.facebook.com/" + api_version + "/" + page_name + "/feed?access_token=" + app_access_token
url = urllib.urlopen(request_url).read()
result = json.loads(url)
for post in result['data']:
    print "_"*100
    if 'message' in post:
        print post['message']
    #print post['id']

____________________________________________________________________________________________________
Kıdemli Backend Developer Arıyoruz!
2017 Yılı'nın ilk yarısında online olacak, internet pazar yeri projemizde tam zamanlı olarak yer alacak bir takım arkadaşı arıyoruz:
Minimum 2 yıl, özellikle e-ticaret platformu geliştirmede deneyim sahibisin (Daha önce e-ticaret ya da pazar yeri projelerinde uygulama geliştirdin)
Teknik konuları takip edebilecek seviyede İngilizce biliyorsun
Dünya çapında gelişime açık, çok heyecanlı bir projenin çekirdek ekibinde yer almaktan mutluluk duyarsın
Tercihen İstanbul Anadolu Yakası'nda yaşıyorsun (Ofisimiz Şerifali'de)
Öz disiplin sahibi, sektörel yenilikleri takip eden ve hızla uygulayan, kendine güvenen, meraklı, öğrenmeye açık, güler yüzlü, pozitif, hırslı, heyecanlı, esnek, çalışkan, enerjik, detaycı, yardım sever bir takım oyuncususun
İyi kalpli, iyiliksever, pozitif, sorumlu, dengeli, adil, hoşgörülü, insancıl, hayvansever ve şeffaf bir insansın
Ter

# Search

## Page Search

In [65]:
query = "machine learning"
request_url = "https://graph.facebook.com/" + api_version + "/search?q="+query+"&type=page&access_token=" + app_access_token
url = urllib.urlopen(request_url).read()
result = json.loads(url)

In [66]:
pages = result['data']
for page in pages:
    print "_"*100
    print page['name']
    print page ['id']

____________________________________________________________________________________________________
Machine Learning
119762488098825
____________________________________________________________________________________________________
Learning Machine
716181448412695
____________________________________________________________________________________________________
تعلم آلي
112440992104486
____________________________________________________________________________________________________
Machine Learning
107675749255490
____________________________________________________________________________________________________
Learning Machine
1063171083740014
____________________________________________________________________________________________________
Machine Learning Group ULB
250022468368925
____________________________________________________________________________________________________
Machine Learning Mastery
1429846323896563
__________________________________________________