## Detecting canvas fingerprinting scripts
- This notebook demonstrates canvas fingerprinting detection using streaming analysis
- Canvas fingerprinting detection code is taken from https://github.com/sensor-js/OpenWPM-mobile
- For background on canvas fingerprinting see our [CCS'14](https://securehomes.esat.kuleuven.be/~gacar/persistent/) and [CCS'16](https://webtransparency.cs.princeton.edu/webcensus/) studies.



In [40]:
import re
import json
import sqlite3
import pandas as pd
from _collections import defaultdict
from tqdm import tqdm

In [41]:
# import some analysis utilities from https://github.com/englehardt/crawl_utils
import sys
sys.path.append('./crawl_utils/')
import domain_utils as du
import analysis_utils as au

pd.set_option("display.max_colwidth",500)
pd.set_option("display.max_rows",500)


In [42]:
# use the sample sqlite
DB = '/home/marleensteinhoff/UNi/Projektseminar/Datenanalyse/sample_2018-06_1m_stateless_census_crawl.sqlite'

### Load JavaScript calls

In [43]:
con = sqlite3.connect(DB)

con.row_factory = sqlite3.Row
cur = con.cursor()
js = pd.read_sql_query("SELECT * FROM javascript", con)

print("Number of javascript calls", len(js))

Number of javascript calls 501207


In [44]:
# Add the helper column
js['script_ps1'] = js['script_url'].apply(lambda x: du.get_ps_plus_1(x) if x is not None else None)
js.head(3)

Unnamed: 0,id,crawl_id,visit_id,script_url,script_line,script_col,func_name,script_loc_eval,document_url,top_level_url,call_stack,symbol,operation,value,arguments,time_stamp,script_ps1
0,1,7,7,https://www.google.co.in/?gws_rd=ssl,1,3641,,,https://www.google.co.in/?gws_rd=ssl,https://www.google.co.in/?gws_rd=ssl,,window.navigator.userAgent,get,Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0,,2018-06-27T14:19:39.880Z,google.co.in
1,2,7,7,https://www.google.co.in/?gws_rd=ssl,1,3731,,,https://www.google.co.in/?gws_rd=ssl,https://www.google.co.in/?gws_rd=ssl,,window.navigator.userAgent,get,Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0,,2018-06-27T14:19:39.880Z,google.co.in
2,3,7,7,https://www.google.co.in/?gws_rd=ssl,1,3732,,,https://www.google.co.in/?gws_rd=ssl,https://www.google.co.in/?gws_rd=ssl,,window.navigator.userAgent,get,Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0,,2018-06-27T14:19:39.882Z,google.co.in


### Breakdown of instrumented function calls

In [45]:
js[js.operation == "call"].symbol.value_counts().head(10)

window.Storage.getItem                  46851
window.Storage.setItem                  18104
window.Storage.removeItem               13812
CanvasRenderingContext2D.fill            7258
CanvasRenderingContext2D.save            7074
CanvasRenderingContext2D.restore         7070
HTMLCanvasElement.getContext             4208
window.Storage.key                       3689
CanvasRenderingContext2D.measureText     3103
CanvasRenderingContext2D.stroke          2393
Name: symbol, dtype: int64

### Canvas API calls
- Print the most common arguments to `CanvasRenderingContext2D.fillText`,
which is used to draw a text onto canvas.
- `Cwm fjordbank glyphs vext quiz` is a [perfect pangram](https://en.wikipedia.org/wiki/Pangram#Short_pangrams)
that [we found](https://securehomes.esat.kuleuven.be/~gacar/persistent/#canvas-results) to be commonly used by canvas fingerprinters.

In [46]:
js[(js.operation == "call") &
   (js.symbol == "CanvasRenderingContext2D.fillText")
  ].arguments.value_counts().head(10)

{"0":"Cwm fjordbank glyphs vext quiz, 😃","1":4,"2":45}    74
{"0":"Cwm fjordbank glyphs vext quiz, 😃","1":2,"2":15}    74
{"0":"!image!","1":4,"2":17}                              39
{"0":"!image!","1":2,"2":15}                              39
{"0":"!H71JCaj)]# 1@#","1":4,"2":8}                       19
{"0":"Soft Ruddy Foothold 2","1":2,"2":2}                 19
{"0":"🇺​🇳","1":0,"2":0}                                   18
{"0":"🇺🇳","1":0,"2":0}                                    18
{"0":"🕴​♀️","1":0,"2":0}                                  14
{"0":"🕴‍♀️","1":0,"2":0}                                  14
Name: arguments, dtype: int64

### Streaming analysis to detect canvas fingerprinting
- To detect potential canvas fingerprinters we seek for [a set of conditions](http://randomwalker.info/publications/OpenWPM_1_million_site_tracking_measurement.pdf#page=12)* to be present. (We use a slightly different set of conditions to reduce false negatives.)


In [47]:
CANVAS_READ_FUNCS = [
    "HTMLCanvasElement.toDataURL",
    "CanvasRenderingContext2D.getImageData"
    ]

CANVAS_WRITE_FUNCS = [
    "CanvasRenderingContext2D.fillText",
    "CanvasRenderingContext2D.strokeText"
    ]

"""
Criteria 3 from Englehardt & Narayanan, 2016
"3. The script should not call the save, restore, or addEventListener
methods of the rendering context."

`addEventListener` is only called for HTMLCanvasElement, so we use that.
"""
CANVAS_FP_DO_NOT_CALL_LIST = ["CanvasRenderingContext2D.save",
                              "CanvasRenderingContext2D.restore",
                              "HTMLCanvasElement.addEventListener"]

In [48]:
MIN_CANVAS_TEXT_LEN = 10
MIN_CANVAS_IMAGE_WIDTH = 16
MIN_CANVAS_IMAGE_HEIGHT = 16


def get_canvas_text(arguments):
    """Return the string that is written onto canvas from function arguments."""
    if not arguments:
        return ""
    canvas_write_args = json.loads(arguments)
    try:
        # cast numbers etc. to a unicode string
        print(type(canvas_write_args["0"]))
        return str(canvas_write_args["0"])
    except Exception:
        return ""


def are_get_image_data_dimensions_too_small(arguments):
    """Check if the retrieved pixel data is larger than min. dimensions."""
    # https://developer.mozilla.org/en-US/docs/Web/API/CanvasRenderingContext2D/getImageData#Parameters  # noqa
    get_image_data_args = json.loads(arguments)
    sw = int(get_image_data_args["2"])
    sh = int(get_image_data_args["3"])
    return (sw < MIN_CANVAS_IMAGE_WIDTH) or (sh < MIN_CANVAS_IMAGE_HEIGHT)



In [49]:
def get_canvas_fingerprinters(canvas_reads, canvas_writes, canvas_styles,
                              canvas_banned_calls, canvas_texts):
    canvas_fingerprinters = set()
    for script_address, visit_ids in canvas_reads.items():
        if script_address in canvas_fingerprinters:
            continue
        canvas_rw_visits = visit_ids.\
            intersection(canvas_writes[script_address])
        if not canvas_rw_visits:
            continue
        # we can remove the following, we don't use the style/color condition
        for canvas_rw_visit in canvas_rw_visits:
            # check if the script has made a call to save, restore or
            # addEventListener of the Canvas API. We exclude scripts making
            # these calls to eliminate false positives
            if canvas_rw_visit in canvas_banned_calls[script_address]:
                print ("Excluding potential canvas FP script", script_address,
                       "visit#", canvas_rw_visit,
                       canvas_texts[(script_address, canvas_rw_visit)])
                continue
            canvas_fingerprinters.add(script_address)
            #print ("Canvas fingerprinter", script_address, "visit#",
            #       canvas_rw_visit,
            #       canvas_texts[(script_address, canvas_rw_visit)])
            break

    return canvas_fingerprinters



#### Start streaming analysis

In [50]:
query = """SELECT sv.site_url, sv.visit_id,
    js.script_url, js.operation, js.arguments, js.symbol, js.value
    FROM javascript as js LEFT JOIN site_visits as sv
    ON sv.visit_id = js.visit_id WHERE
    js.script_url <> ''
    """

canvas_reads = defaultdict(set)
canvas_writes = defaultdict(set)
canvas_texts = defaultdict(set)
canvas_banned_calls = defaultdict(set)
canvas_styles = defaultdict(lambda: defaultdict(set))

for row in tqdm(cur.execute(query)):
    # visit_id, script_url, operation, arguments, symbol, value = row[0:6]
    visit_id = row["visit_id"]
    site_url = row["site_url"]
    script_url = row["script_url"]
    operation = row["operation"]
    arguments = row["arguments"]
    symbol = row["symbol"]
    value = row["value"]

    # Exclude relative URLs, data urls, blobs
    if not (script_url.startswith("http://")
            or script_url.startswith("https://")):
        continue
    if symbol in CANVAS_READ_FUNCS and operation == "call":
        if (symbol == "CanvasRenderingContext2D.getImageData" and
                are_get_image_data_dimensions_too_small(arguments)):
            continue
        canvas_reads[script_url].add(visit_id)
    elif symbol in CANVAS_WRITE_FUNCS:
        text = get_canvas_text(arguments)
        # Python miscalculates the length of unicode strings that contain
        # surrogate pairs such as emojis. This make strings look longer
        # than they really are, and is causing false positives.
        # For instance length of "🏴󠁧", which is written onto canvas by
        # Wordpress to check emoji support, is returned as 13.
        # We ignore non-ascii characters to prevent false positives.
        # Perhaps a good idea to log such cases to prevent real fingerprinting
        # scripts to slip in.
        if len(text.encode('ascii', 'ignore')) >= MIN_CANVAS_TEXT_LEN:
            canvas_writes[script_url].add(visit_id)
            # the following is used to debug false positives
            canvas_texts[(script_url, visit_id)].add(text)
    elif symbol == "CanvasRenderingContext2D.fillStyle" and\
            operation == "call":
        canvas_styles[script_url][visit_id].add(value)
    elif operation == "call" and symbol in CANVAS_FP_DO_NOT_CALL_LIST:
        canvas_banned_calls[script_url].add(visit_id)



28779it [00:00, 151499.44it/s]

<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>


64100it [00:00, 166739.66it/s]

<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>


101728it [00:00, 167464.08it/s]

<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>


137053it [00:00, 171541.81it/s]

<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>


168068it [00:01, 115753.60it/s]

<class 'str'>
<class 'int'>
<class 'str'>
<class 'int'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class

181635it [00:01, 91131.56it/s] 

<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>


216444it [00:01, 122521.08it/s]

<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>


248937it [00:01, 127240.29it/s]

<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>


285374it [00:02, 152810.02it/s]

<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>


322999it [00:02, 163999.38it/s]

<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class

367156it [00:02, 192108.94it/s]

<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>


408919it [00:02, 199886.27it/s]

<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>


429149it [00:02, 167835.59it/s]

<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>


466827it [00:03, 168413.24it/s]

<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>


500911it [00:03, 151208.28it/s]

<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>





In [51]:
canvas_fingerprinters = get_canvas_fingerprinters(canvas_reads,
                                                  canvas_writes,
                                                  canvas_styles,
                                                  canvas_banned_calls,
                                                  canvas_texts)


In [52]:
print(canvas_fingerprinters)

{'https://www.hotels.com/_bm/bd-1-30', 'http://www.fedex.com/_bm/bd-1-30', 'http://buckridge.link/js/lib.js', 'https://d2fbkzyicji7c4.cloudfront.net/?zkbfd=729305', 'https://www.cnet.com/akam/10/5e256288', 'http://security.iqiyi.com/static/cook/v1/cooksdk.js', 'https://client.perimeterx.net/PXZHh9f9x0/main.min.js', 'https://tags.tiqcdn.com/utag/cisco/home/prod/utag.56.js?utv=ut4.44.201806211837', 'https://client.perimeterx.net/PX8FCGYgk4/main.min.js', 'https://s0.bukalapak.com/ast/vendor-496a5b08a478b21701ebcc2a022639e6cdda398c2d82a84dd0a508f2d48f3e83.js', 'https://www.pof.com/', 'http://www.fedex.com/akam/10/42fcd055', 'https://secure.bankofamerica.com/login/sign-in/cc.go', 'https://client.perimeterx.net/PXSs13U803/main.min.js', 'https://gateway.foresee.com/code/19.5.0/fs.utils.js', 'https://yummy.consumable.com/3541/cnsmbl-audio-728x90-slider/widget/iframe.js?cb=1530109815282', 'https://cdnbigdata.azureedge.net/scripts/fingerprint2.min.js', 'https://i.hh.ru/jsbuild/640419b5-libs.js',

In [53]:
print(canvas_reads)

defaultdict(<class 'set'>, {'https://vk.com/js/cmodules/web/grip.js?4164501492': {14}, 'https://static.xx.fbcdn.net/rsrc.php/v3iYXl4/yu/l/en_US/aArLzhwqJVj.js': {3, 204}, 'https://static.licdn.com/scds/concat/common/js?h=a06jpss2hf43xwxobn0gl598m-44hhbxag3hinac547ym9vby09-5jratctnqzzuc1057yivxswgf-9zz2lhu3eq1epk7sq1t8cdb5s-eound1d1xhqm86h7g2p57b94l-edgsl2z4e4gk56cy2m5kbpp1q-acgipb6zomeaovod456pb7yjs-bctwwqj7p01tcj2smshz2bboe-88ec8b078z4fzj5q3z4qowg63-bftaa82sjwcbrohoe28skni7b-58m2n4boqb1vxfd6hgd34auwd-8ycvggo1571xgrdka3utvcyml-cfabcg4u1cj0em4yissh5mfxu': {16}, 'https://www.redditstatic.com/reddit-init.en.xE6foQQcI8M.js': {26}, 'http://statics.itc.cn/web/v3/static/js/main-f895b2f9d0.js': {18}, 'https://atanx.alicdn.com/t/tanxssp.js?_v=12': {654, 112, 18, 19, 697, 22, 601}, 'https://atanx2.alicdn.com/g/mm/tanx-cdn2/t/tanxssp.js?_v=12': {18}, 'http://a1.alicdn.com/creation/html/2016/06/20/creation-245057E3sJ6U0UZ8D-2830683.html#tanxdspv=https%3a%2f%2frdstat.tanx.com%2ftrd%3ff%3d%26k%3da09

In [54]:
# Mark canvas fingerprinting scripts in the dataframe
js["canvas_fp"] = js["script_url"].map(lambda x: x in canvas_fingerprinters)
# Extract first arguments of function calls as a separate column
js["arg0"] = js["arguments"].map(lambda x: json.loads(x)["0"] if x else "")
js.head(10)

Unnamed: 0,id,crawl_id,visit_id,script_url,script_line,script_col,func_name,script_loc_eval,document_url,top_level_url,call_stack,symbol,operation,value,arguments,time_stamp,script_ps1,canvas_fp,arg0
0,1,7,7,https://www.google.co.in/?gws_rd=ssl,1,3641,,,https://www.google.co.in/?gws_rd=ssl,https://www.google.co.in/?gws_rd=ssl,,window.navigator.userAgent,get,Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0,,2018-06-27T14:19:39.880Z,google.co.in,False,
1,2,7,7,https://www.google.co.in/?gws_rd=ssl,1,3731,,,https://www.google.co.in/?gws_rd=ssl,https://www.google.co.in/?gws_rd=ssl,,window.navigator.userAgent,get,Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0,,2018-06-27T14:19:39.880Z,google.co.in,False,
2,3,7,7,https://www.google.co.in/?gws_rd=ssl,1,3732,,,https://www.google.co.in/?gws_rd=ssl,https://www.google.co.in/?gws_rd=ssl,,window.navigator.userAgent,get,Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0,,2018-06-27T14:19:39.882Z,google.co.in,False,
3,4,11,11,https://www.google.co.jp/?gws_rd=ssl,1,3641,,,https://www.google.co.jp/?gws_rd=ssl,https://www.google.co.jp/?gws_rd=ssl,,window.navigator.userAgent,get,Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0,,2018-06-27T14:19:39.950Z,google.co.jp,False,
4,5,7,7,https://www.google.co.in/?gws_rd=ssl,1,5173,,,https://www.google.co.in/?gws_rd=ssl,https://www.google.co.in/?gws_rd=ssl,,window.navigator.userAgent,get,Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0,,2018-06-27T14:19:39.882Z,google.co.in,False,
5,6,7,7,https://www.google.co.in/?gws_rd=ssl,39,36,,,https://www.google.co.in/?gws_rd=ssl,https://www.google.co.in/?gws_rd=ssl,,window.navigator.userAgent,get,Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0,,2018-06-27T14:19:39.894Z,google.co.in,False,
6,7,7,7,https://www.google.co.in/?gws_rd=ssl,39,776,,,https://www.google.co.in/?gws_rd=ssl,https://www.google.co.in/?gws_rd=ssl,,window.navigator.platform,get,Linux x86_64,,2018-06-27T14:19:39.894Z,google.co.in,False,
7,8,7,7,https://www.google.co.in/?gws_rd=ssl,335,92,mp,,https://www.google.co.in/?gws_rd=ssl,https://www.google.co.in/?gws_rd=ssl,,window.navigator.userAgent,get,Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0,,2018-06-27T14:19:39.932Z,google.co.in,False,
8,9,11,11,https://www.google.co.jp/?gws_rd=ssl,1,3731,,,https://www.google.co.jp/?gws_rd=ssl,https://www.google.co.jp/?gws_rd=ssl,,window.navigator.userAgent,get,Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0,,2018-06-27T14:19:39.950Z,google.co.jp,False,
9,10,11,11,https://www.google.co.jp/?gws_rd=ssl,1,3732,,,https://www.google.co.jp/?gws_rd=ssl,https://www.google.co.jp/?gws_rd=ssl,,window.navigator.userAgent,get,Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0,,2018-06-27T14:19:39.950Z,google.co.jp,False,


## List canvas fingerprinting scripts

In [55]:
js[(js.canvas_fp) &
   (js.operation == "call") &
   (js.symbol == "CanvasRenderingContext2D.fillText")
  ].rename({"arg0": "canvas_text"}, axis='columns')[["top_level_url", "script_ps1", "canvas_text"]].\
        drop_duplicates()

Unnamed: 0,top_level_url,script_ps1,canvas_text
650,https://vk.com/,vk.com,Cwm fjordbank glyphs vext quiz
2241,https://www.linkedin.com/,licdn.com,92UV<v=Xd&N@Ig_P#1iqrWHBoclz>FZkyYu4xf(O^A8TJh)mbnGs$S]3-k!%j0Q{+w[RCKEat?L56}M~`D7e*
2719,https://www.reddit.com/,redditstatic.com,"Cwm fjordbank glyphs vext quiz, 😃"
5057,http://www.sohu.com/,alicdn.com,"Cwm fjordbank glyphs vext quiz, 😃"
5401,https://www.tmall.com/,alicdn.com,"Cwm fjordbank glyphs vext quiz, 😃"
5644,http://www.sina.com.cn/,alicdn.com,"Cwm fjordbank glyphs vext quiz, 😃"
17789,http://youku.com/,alicdn.com,"Cwm fjordbank glyphs vext quiz, 😃"
18456,https://www.pixnet.net/,pixanalytics.com,"Cwm fjordbank glyphs vext quiz, 😃"
29102,https://www.txxx.com/,txxx.com,"Cwm fjordbank glyphs vext quiz, 😃"
30501,https://www.rakuten.co.jp/,rakuten.co.jp,Soft Ruddy Foothold 2
