### e10s-beta45-withaddons: Top addons

[Bug 1224518](https://bugzilla.mozilla.org/show_bug.cgi?id=1224518)

This analysis lists the top addons in the Telemetry pings and compares them to the [whitelisted e10s addon list](https://wiki.mozilla.org/Electrolysis/Addons).

In [1]:
import ujson as json
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import plotly.plotly as py
import IPython

from __future__ import division
from moztelemetry.spark import get_pings, get_one_ping_per_client, get_pings_properties
from montecarlino import grouped_permutation_test

from whitelist import ADDON_WHITELIST

%pylab inline
IPython.core.pylabtools.figsize(16, 7)

Unable to parse whitelist (/home/hadoop/anaconda2/lib/python2.7/site-packages/moztelemetry/bucket-whitelist.json). Assuming all histograms are acceptable.
Populating the interactive namespace from numpy and matplotlib


In [2]:
sc.defaultParallelism

160

#### Get addons

In [3]:
dataset = sqlContext.read.load("s3://telemetry-parquet/e10s-experiment/e10s-beta45-withaddons@experiments.mozilla.org/generationDate=20160207", "parquet")

Transform Dataframe to RDD of pings

In [4]:
def row_2_ping(row):
    ping = {"environment": {"addons": json.loads(row.addons)}}
    return ping

In [5]:
subset = dataset.rdd.map(row_2_ping)
subset_count = subset.count()

In [6]:
subset_count

959609

In [7]:
def ping_has_addons(ping, check_id_func):
    activeAddons = ping["environment"]["addons"].get("activeAddons", {})
    if not activeAddons:
        return False
    for k, v in activeAddons.iteritems():
        if not check_id_func(k):
            return False
    return True

How many clients had at least one addon?

In [8]:
any_subset = subset.filter(lambda p: ping_has_addons(p, lambda k: True))
any_subset_count = any_subset.count()

In [9]:
print "{:.2f}%".format(100.0 * any_subset_count / subset_count)

99.29%


How many clients had only whitelisted addons?

In [10]:
whitelisted_subset = subset.filter(lambda p: ping_has_addons(p, lambda k: k in ADDON_WHITELIST))
whitelisted_subset_count = whitelisted_subset.count()

In [11]:
print "{:.2f}%".format(100.0 * whitelisted_subset_count / subset_count)

0.04%


How many clients had at least one unwhitelisted addon?

In [12]:
print "{:.2f}%".format(100.0 * (any_subset_count - whitelisted_subset_count) / subset_count)

99.25%


In [13]:
def get_ping_addons(ping):
    activeAddons = ping["environment"]["addons"].get("activeAddons", {})
    for k, v in activeAddons.iteritems():
        if v.get("name"):
            yield (k, v["name"].encode("ascii", "ignore"))

addons = subset.flatMap(get_ping_addons)

In [14]:
addon_counts = addons.countByKey()

How many addons did the clients have installed in total?

In [15]:
total_addons = sum(addon_counts.values())
total_addons

2103048

Which whitelisted addons did not appear in the pings?

In [16]:
for addon in ADDON_WHITELIST:
    if not addon in addon_counts:
        print ADDON_WHITELIST[addon]

Vertical Tabs


#### Top whitelisted addons

In [17]:
from collections import Counter

for addon, addon_count in Counter(addon_counts).most_common():
    if addon in ADDON_WHITELIST:
        print "{:.3f}%: {}".format(100.0 * addon_count / total_addons, ADDON_WHITELIST[addon])

4.217%: Adblock Plus
1.231%: IDM CC
1.084%: Video DownloadHelper
1.017%: Avast Online Security
0.688%: Download YouTube Videos as MP4
0.613%: Firebug
0.553%: YouTube Video and Audio Downloader
0.543%: 1-Click YouTube Video Downloader
0.518%: McAfee WebAdvisor
0.491%: Flash Video Downloader - YouTube HD Download [4K]
0.371%: DownThemAll!
0.329%: Greasemonkey
0.321%: Kaspersky URL Advisor
0.294%: Avira Browser Safety
0.290%: Adblock Plus Pop-up Addon
0.228%: MEGA
0.220%: Yandex Visual Bookmarks
0.213%: Yandex Elements
0.211%: AVG SafeGuard toolbar
0.199%: Ghostery
0.196%: FlashGot
0.170%: WOT
0.154%: NoScript
0.150%: Google Translator for Firefox
0.126%: Pin It button
0.115%: Adblock Edge
0.114%: Element Hiding Helper for Adblock Plus
0.105%: Flagfox
0.101%: Yahoo! Toolbar
0.100%: Download Status Bar
0.099%: Tab Mix Plus
0.089%: IE Tab 2
0.084%: Stylish
0.072%: FireFTP
0.070%: Personas Plus
0.047%: Garmin Communicator
0.043%: Xmarks
0.039%: Flashblock
0.039%: uBlock
0.038%: ColorfulTabs


#### Top unwhitelisted addons

In [18]:
# An addon ID might have multiple names. Pick the longer one because some addons appear to have
# invalid names (e.g. single space).
addon_names = addons.reduceByKey(lambda a, b: a if len(a) > len(b) else b).collectAsMap()

In [19]:
for addon, addon_count in Counter(addon_counts).most_common(100):
    if not addon in ADDON_WHITELIST:
        print "{:.3f}%: {} ({})".format(100.0 * addon_count / total_addons, addon_names[addon], addon)

45.211%: Firefox Hello Beta (loop@mozilla.org)
4.637%: Test Pilot (testpilot@labs.mozilla.com)
2.617%: Skype Click to Call ({82AF8DCA-6DE9-405D-BD5E-43525BDAD38A})
0.744%: FromDocToPDF (_65Members_@download.fromdoctopdf.com)
0.672%: SaveFrom.net - helper (helper-sig@savefrom.net)
0.553%: iLivid (LVD-SAE@iacsearchandmedia.com)
0.502%: Module de blocage des sites Internet dangereux (content_blocker@kaspersky.com)
0.495%: anonymoX (client@anonymox.net)
0.443%: Kaspersky Bescherming (light_plugin_D772DC8D6FAF43A29B25C4EBAA5AD1DE@kaspersky.com)
0.426%: Virtualioji klaviatra (virtual_keyboard_07402848C2F6470194F131B0F3DE025E@kaspersky.com)
0.426%: Module de blocage des sites Internet dangereux (content_blocker_663BE84DBCC949E88C7600F63CA7F098@kaspersky.com)
0.409%: goMovix - Movies And More (caa1-aDOiCAxFFMOVIX@jetpack)
0.403%: MySmartPrice (@mysmartprice-ff)
0.399%: Facebook Messenger (www.facebook.com@services.mozilla.org)
0.395%: WeatherBlink (_gcMembers_@www.weatherblink.com)
0.383%: PCo