### E10S Experiment Beta: Top extensions

[Bug 1224518](https://bugzilla.mozilla.org/show_bug.cgi?id=1224518)

This analysis lists the top extensions in the Telemetry pings and compares them to the [whitelisted e10s addon list](https://wiki.mozilla.org/Electrolysis/Addons).

In [1]:
import ujson as json
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import plotly.plotly as py
import IPython

from __future__ import division
from moztelemetry.spark import get_pings, get_one_ping_per_client, get_pings_properties
from montecarlino import grouped_permutation_test

from whitelist import ADDON_WHITELIST

%pylab inline
IPython.core.pylabtools.figsize(16, 7)

Unable to parse whitelist (/home/hadoop/anaconda2/lib/python2.7/site-packages/moztelemetry/bucket-whitelist.json). Assuming all histograms are acceptable.
Populating the interactive namespace from numpy and matplotlib


In [2]:
sc.defaultParallelism

16

#### Get addons

In [3]:
dataset = sqlContext.read.load("s3://telemetry-parquet/e10s-experiment/e10s-enabled-beta-20151214@experiments.mozilla.org/generationDate=20160106", "parquet")

Transform Dataframe to RDD of pings

In [4]:
def row_2_ping(row):
    ping = {"environment": {"addons": json.loads(row.addons)}}
    return ping

In [5]:
subset = dataset.rdd.map(row_2_ping)
subset_count = subset.count()

In [6]:
subset_count

450968

In [7]:
def ping_has_addons(ping, check_id_func):
    activeAddons = ping["environment"]["addons"].get("activeAddons", {})
    if not activeAddons:
        return False
    for k, v in activeAddons.iteritems():
        if not check_id_func(k):
            return False
    return True

How many clients had at least one addon?

In [8]:
any_subset = subset.filter(lambda p: ping_has_addons(p, lambda k: True))
any_subset_count = any_subset.count()

In [9]:
print "{:.2f}%".format(100.0 * any_subset_count / subset_count)

47.50%


How many clients had only whitelisted addons?

In [10]:
whitelisted_subset = subset.filter(lambda p: ping_has_addons(p, lambda k: k in ADDON_WHITELIST))
whitelisted_subset_count = whitelisted_subset.count()

In [11]:
print "{:.2f}%".format(100.0 * whitelisted_subset_count / subset_count)

9.40%


How many clients had at least one unwhitelisted addon?

In [12]:
print "{:.2f}%".format(100.0 * (any_subset_count - whitelisted_subset_count) / subset_count)

38.10%


In [13]:
def get_ping_addons(ping):
    activeAddons = ping["environment"]["addons"].get("activeAddons", {})
    for k, v in activeAddons.iteritems():
        if v.get("name"):
            yield (k, v["name"].encode("ascii", "ignore"))

addons = subset.flatMap(get_ping_addons)

In [14]:
addon_counts = addons.countByKey()

How many addons did the clients have installed in total?

In [15]:
total_addons = sum(addon_counts.values())
total_addons

492721

Which whitelisted addons did not appear in the pings?

In [16]:
for addon in ADDON_WHITELIST:
    if not addon in addon_counts:
        print ADDON_WHITELIST[addon]

Vertical Tabs


#### Top whitelisted addons

In [17]:
from collections import Counter

for addon, addon_count in Counter(addon_counts).most_common():
    if addon in ADDON_WHITELIST:
        print "{:.3f}%: {}".format(100.0 * addon_count / total_addons, ADDON_WHITELIST[addon])

7.589%: Adblock Plus
2.932%: IDM CC
2.070%: Avast Online Security
2.010%: Video DownloadHelper
1.327%: Download YouTube Videos as MP4
1.141%: Firebug
1.038%: 1-Click YouTube Video Downloader
1.014%: McAfee WebAdvisor
1.011%: YouTube Video and Audio Downloader
0.892%: Flash Video Downloader - YouTube HD Download [4K]
0.731%: DownThemAll!
0.712%: Kaspersky URL Advisor
0.614%: Greasemonkey
0.519%: Adblock Plus Pop-up Addon
0.442%: Yandex Visual Bookmarks
0.432%: Yandex Elements
0.381%: Avira Browser Safety
0.370%: FlashGot
0.355%: Ghostery
0.354%: MEGA
0.308%: LastPass
0.300%: WOT
0.292%: AVG SafeGuard toolbar
0.265%: NoScript
0.263%: Google Translator for Firefox
0.205%: Element Hiding Helper for Adblock Plus
0.205%: Adblock Edge
0.188%: Pin It button
0.186%: Tab Mix Plus
0.184%: Download Status Bar
0.181%: Flagfox
0.164%: IE Tab 2
0.164%: Stylish
0.135%: Personas Plus
0.118%: FireFTP
0.107%: Yahoo! Toolbar
0.091%: Xmarks
0.087%: Garmin Communicator
0.072%: Flashblock
0.071%: uBlock
0.06

#### Top unwhitelisted addons

In [18]:
# An addon ID might have multiple names. Pick the longer one because some addons appear to have
# invalid names (e.g. single space).
addon_names = addons.reduceByKey(lambda a, b: a if len(a) > len(b) else b).collectAsMap()

In [19]:
for addon, addon_count in Counter(addon_counts).most_common(100):
    if not addon in ADDON_WHITELIST:
        print "{:.3f}%: {} ({})".format(100.0 * addon_count / total_addons, addon_names[addon], addon)

9.659%: Test Pilot (testpilot@labs.mozilla.com)
2.299%: Skype Click to Call ({82AF8DCA-6DE9-405D-BD5E-43525BDAD38A})
1.402%: VideoDownloadConverter (_4zMembers_@www.videodownloadconverter.com)
1.380%: SaveFrom.net - helper (helper-sig@savefrom.net)
1.371%: iLivid (LVD-SAE@iacsearchandmedia.com)
1.277%: FromDocToPDF (_65Members_@download.fromdoctopdf.com)
1.122%: Module de blocage des sites Internet dangereux (content_blocker@kaspersky.com)
1.059%: anonymoX (client@anonymox.net)
0.878%: Module de blocage des sites Internet dangereux (content_blocker_663BE84DBCC949E88C7600F63CA7F098@kaspersky.com)
0.877%: Virtuellt tangentbord (virtual_keyboard_07402848C2F6470194F131B0F3DE025E@kaspersky.com)
0.738%: PConverter (_dzMembers_@www.pconverter.com)
0.723%: Blokowanie banerw (anti_banner@kaspersky.com)
0.717%: Klawiatura wirtualna (virtual_keyboard@kaspersky.com)
0.717%: Sicherer Zahlungsverkehr (online_banking@kaspersky.com)
0.712%: Sicherer Zahlungsverkehr (online_banking_08806E753BE44495B44E