<a href="https://colab.research.google.com/github/hlecuanda/FB-search-html-page-creator/blob/master/FBSearch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Facebook Search Notebook


## Release Notes



This is a reimplementation of @ahmel's python script using a jupyter 
notebook to streamline usage and make it more accessible to researchers

The changes I've made are as follows:
- Converting existing comments to text cells so explanations are more readable.
- Changed some code to use ideomaticdiomatic python
- Made everything run on python3 consistent
- Refactored code to use Jinja2 templates instead of inlining html, which is so *php* from way back when.

if you've never used a Jupyter Notebook, or used Google Collab, it's a python execution environment and virtual machine connected to your browser and Google drive. (neat huh?) 

Every code cell has a little "play" button on the upper left corner (just mouse over it), when clicked, the code on the cell will run and the output will be displayed below the cell. A lot of neat stuff can be done with Jupyter and Collab, specially explaining how code works. This is a style of programming called Literate Programming, and it is excellent for teaching and academia.

Data science too.




## Purpose

This workbook reads a **json search result file**, sorts results by time, creates links to posts and puts the links together with corresponding messages and images in an html file. The notebook then displays the results as a webpage as an output, or, if so selected, in a new tab on your current browser.


## Prerequisites

First we need to make sure all dependencies are installed. just run the cell below and if any dependencies are missing they will be installed

In [0]:
%%bash

# install jinja2 templating engine. 
pip install --upgrade pip
pip install jinja2
pip install requests

# Create a subdirectory to hold template files
export TDIR="template" 
test -d "$TDIR" && echo 'templates dir ok' || mkdir -p "$TDIR"

Now, we set up a couple of nice pretty printers, makes development oh so easier! specially analyzing FB's JSON objects which have a tendency to unnecesary verbosity



In [0]:
import pprint

# Standard pretty printer
pp = pprint.PrettyPrinter()

# 1 and 2 level pretty printes, so we can analyze objects easely
p1 = pprint.PrettyPrinter(depth=1)
p2 = pprint.PrettyPrinter(depth=2)

pp

## How do i get this **json search results file** you speak of?

there used to be a nice detailed explanation here, but i realised this thread is full of greedy no-sharers, shady accounts that have no public repos (and whom i suspect to be the same guy pushing devs to solve his problems for free), a bunch of freeloaders who are probably interested only in getting pics their GF 'liked' so they can make a fuss about it, and whatnot. 

Except for a few real devs here (v.gr @sowdust) i don't really think any other people in this thread are even interested in coding, much less doing serious OSINT research. 

Having said that, you can get the search result json using one of the APIs mentioned in this thread, which turns out to be just a repackaged FB endpoint. I wouldn't use it though, since it's doing a bunch of stuff loading frames and Facebook's arbiter and I wouldn't be surprised if it was stealing access tokens from those who use it. (FB security is and always has been a joke)

Since the source is not being shared, i therefore deem it untrustworthy, and to be used at your own peril.

This workbook's purpose is only to show it's not really that hard to do things right.

In [0]:
import json

from google.colab import files

uploaded = files.upload()

for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(uploaded[fn])))

with open(fn,'r', encoding='utf-8') as f: 
    content = json.loads(f.read())

print('\nJSON from {}:'.format(fn))
p2.pprint(content)

Alternatively, we may get the facebook data via a http request.  Because this is an educational Notebook, we obtain the data from a mock request to the github repo of this project. If you uploaded a file using the previous cell, do not run this cell, just skip it and run the next cell Titled "[HTML Template][1]"

[1]:https://colab.research.google.com/drive/1ujtQrGgApGemlyUzDojiETWiIWHBq2ps#scrollTo=xcXvTQKBTjzg&line=4&uniqifier=1

In [0]:
import requests

dataurl = 'https://raw.githubusercontent.com/hlecuanda/'
dataurl += 'FB-search-html-page-creator/master/testdata/graphSearch.json'
response = requests.get(dataurl)

content = response.json()

p2.pprint(content)

## HTML template

In order to avoid an ugly mess of inlined HTML, which makes for unreadable code and stopped being cool sometime after php3 was released back in 1998, we prepare a nice [Jinja2][2] template to hold the results of our data processing

[2]:https://jinja.palletsprojects.com/en/2.10.x/

In [0]:
%%writefile template/SearchResults.html
<!DOCTYPE html>
<html>
    <head>
        <title>Ordered search Results</title>
    </head>
    <body>
        <h2> Ordered Search Results </h2>
        Generated on {{ timestamp }} 
        </hr>
        <ul>
            {% for item in items %}
                <li>
                    {{ item.creation_time|tsformat }}:
                    <a href="https://fb.me/{{ item.actor.id }}/posts/{{ item.id }}">
                        {{item.id}}
                    </a>:
                    {{ item.message }}</br>
                    {% if item.image %}
                        <img src="{{ item.image }}" width="200"/>
                    {% endif %}
                </li>
            {% endfor %}
        </ul>
    </body>
</html>

## Processing

First, check if `success` is one of the first level items on our retrieved JSON. if so, we'll print how many items in the `data` item, then sort the `result` item of the `data` item by `creation_time` as `sorted_dataset`, and pretty print the first item so we have an idea of what is available on each item.

If this cell prints an error, your json data is no good.

In [0]:
if 'success' in content.keys():                    
    print('{} items in dataset.\n'.format(len(content['data']['result'])))

    unsorted_dataset = content['data']['result']
    sorted_dataset = []

    for item in sorted(unsorted_dataset, 
            key= lambda unsorted_dataset : unsorted_dataset['creation_time'],
            reverse=True): 
        sorted_dataset.append(item)

    p2.pprint(sorted_dataset[0])

else:
    print('Error: check your JSON')


Finally, print our data, by rendering the template using `sorted_dataset` as source data.

In [0]:
import datetime
from IPython.display import HTML
from jinja2 import Environment, FileSystemLoader, select_autoescape

env = Environment(
    loader=FileSystemLoader('./template/'),
    autoescape=select_autoescape(['html', 'xml'])
)

# register a custom datetime filter to render FB timestamps as human readable
# dates v.gr: 1562002888 = 2016-04-30 07:54:48

def tsformat(timestamp):
    ts=int(timestamp)
    dto=datetime.datetime.utcfromtimestamp(ts)
    return dto.strftime('%Y-%m-%d %H:%M:%S')

env.filters['tsformat'] = tsformat

# load our previoysly defined template
template=env.get_template('SearchResults.html')

# render the template, as HTML, and display as this cell's output
display(HTML(template.render(items=sorted_dataset,
                             timestamp=datetime.datetime.now())))
