# Measure % of pageviews from browsers without javascript support

[Task](https://phabricator.wikimedia.org/T253033)

# Background

As part of an ongoing discussion of support levels for different browser capabilities (https://www.mediawiki.org/wiki/Compatibility#Browsers), we want to know the percent of pageviews from users without JS support.

Relevant background links:
- https://www.mediawiki.org/wiki/Analytics/Reports/Clients_without_JavaScript
- https://www.mediawiki.org/wiki/No-JavaScript_notes


# Approach

We used the compatibility table (https://www.mediawiki.org/wiki/Compatibility#General_information) to identify browsers for which we surface JS features (Grade A Browsers) vs. ones we do not (all below Grade A browsers).

We then compared pageviews from Grade A browsers vs. pageviews from below Grade A browsers to estimate the percentage of pageviews where we actively provide Javascript support. 

Pageview data came from the [pageview_hourly](https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Pageview_hourly) table and the [browser_general](https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Browser_general) table.
Results were split by desktop vs mobile web and views from all spider or automated traffic were removed. For this analysis, we reviewed the last week in May 24- May 30th.




# Caveats

- This does not include information for modern (grade A) browsers where users have turned off JS.
- Pageviews and pageview percentages are different from devices, which is the most similar measure we have to "users".
- Data reflects traffic from browsers where we actively test and surface JS features.  Based on the information described on the [[ https://www.mediawiki.org/wiki/Compatibility#General_information | Compatability page ]], some below Grade A browsers may have JS functionality but we do not actively test or support these. 
- Data assumes that information in the Compatability Table is currently up to date.

# Identify Browsers 

Per the [compatibility table](https://www.mediawiki.org/wiki/Compatibility#General_information), there are three categories of browsers we support: Modern (Grade A), Basic (Grade C), and Unknown (Grade X).

"Grade A" is real, actively tested JS support.

"Grade C": Content is presented in a readable manner, and to some extent user actions can be performed, but these browsers do not get JavaScript features.  

"Grade X": includes browsers that are no longer developed or browsers not popular enough to justify the added maintenance cost for software development. These browsers are given the full feature set, which means HTTP, HTML, CSS and JS feature may or may not be compatible with these browsers.

The following browsers are categorized as Modern (Grade A) Browsers, where we actively test and support JS. 
- Chrome (Current and previous version) [1]. 
- Firefox (Current and previous version). 
- Opera (Current and previous version). Note: Opera mini has a lot of fragementation across OS
- Edge (Current and previous version) 
- Internet Explorer 11+
- Safari 5.1+
- iOS 6.1+ (Mobile Safari 6.0+)
- Android 4.1+

The Web team at the Wikimedia Foundation applies a narrower support matrix for mobile-specific skins. The [Compatibility page](https://www.mediawiki.org/wiki/Compatibility#Mobile) currently states that where browser usage is over 5% a modern experience (Grade A) is supported. I compliled the list of mobile web Grade A Browsers based on the data provided by the [analytics user agent breakdown dashboard](https://analytics.wikimedia.org/dashboards/browsers/#mobile-site-by-browser/browser-family-and-major-hierarchical-view). 

1. For this analysis, I used current and previous versions as of the last week of May for this analysis: May 24 to May 30.


# Collect Pageview Data 

In [4]:
shhh <- function(expr) suppressPackageStartupMessages(suppressWarnings(suppressMessages(expr)))
shhh({
    library(tidyverse)
})

In [1]:
#pageviews by browser and project using the pageview_hourly table

query <- "

SELECT 
    access_method,
    project,
    CONCAT(user_agent_map['browser_family'], '-', user_agent_map['browser_major']) AS browser,
    sum(view_count) AS pageviews
FROM wmf.pageview_hourly
WHERE 
    agent_type = 'user' AND
    access_method IN ('desktop', 'mobile web') AND
    CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')) > '2020-05-23' AND
    CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')) < '2020-05-31'
GROUP BY 
    access_method, 
    project,
    CONCAT(user_agent_map['browser_family'], '-', user_agent_map['browser_major'])
"


In [2]:
pageviews_bybrowser <- wmf::query_hive(query)

As a QA check, I reviewed top viewed browsers that week. We'd expect most pageviews to come from Grade A Browsers. 

In [58]:
desktop_browsers <- pageviews_bybrowser %>%
    filter(access_method == 'desktop') %>%
    group_by(access_method, browser)%>%
    summarise(pageviews = sum(as.numeric(pageviews)))%>%
    group_by(access_method) %>%
    mutate(total_pageviews_byaccess = sum(as.numeric(pageviews)),
           pct_pageviews = pageviews/total_pageviews_byaccess *100) %>%
    #filter(str_detect(browser, "^iOS")) %>%
    arrange(desc(pct_pageviews))

In [59]:
head(desktop_browsers, 10)

access_method,browser,pageviews,total_pageviews_byaccess,pct_pageviews
<chr>,<chr>,<dbl>,<dbl>,<dbl>
desktop,Chrome-81,411550012,1480458858,27.798815
desktop,Chrome-83,344803380,1480458858,23.290305
desktop,Firefox-76,180483510,1480458858,12.191052
desktop,Safari-13,121217471,1480458858,8.187831
desktop,Edge-18,65709670,1480458858,4.438466
desktop,IE-11,56265534,1480458858,3.800547
desktop,Opera-68,32362144,1480458858,2.185954
desktop,Chrome-70,24621263,1480458858,1.663083
desktop,Chrome-80,17767492,1480458858,1.200134
desktop,Edge-83,16594528,1480458858,1.120904


The top-viewed desktop browsers mostly match the browsers identified in the compatability table. There are a several older versions of Chrome (Chrome 70, Chrome 80, and Chrome 77) that accounted for over 1 percent of all pageviews during the reviewed time period and are not considered Grade A. 

In [56]:
#QA - review top viewed browsers. We'd expect most pageviews to come from Grade A Browsers.
mobileweb_browsers <- pageviews_bybrowser %>%
    filter(access_method == 'mobile web') %>%
    group_by(access_method, browser)%>%
    summarise(pageviews = sum(as.numeric(pageviews)))%>%
    group_by(access_method) %>%
    mutate(total_pageviews_byaccess = sum(as.numeric(pageviews)),
           pct_pageviews = pageviews/total_pageviews_byaccess *100) %>%
    #filter(str_detect(browser, "^Android")) %>%
    arrange(desc(pct_pageviews))

In [57]:
head(mobileweb_browsers, 10)

access_method,browser,pageviews,total_pageviews_byaccess,pct_pageviews
<chr>,<chr>,<dbl>,<dbl>,<dbl>
mobile web,Mobile Safari-13,675927676,2392161190,28.255942
mobile web,Chrome Mobile-81,652385864,2392161190,27.271819
mobile web,Chrome Mobile-83,152459098,2392161190,6.373279
mobile web,Samsung Internet-11,101881948,2392161190,4.258992
mobile web,Mobile Safari-12,89236248,2392161190,3.730361
mobile web,Chrome Mobile-80,76516572,2392161190,3.198638
mobile web,Google-107,69719083,2392161190,2.914481
mobile web,Chrome Mobile iOS-81,59087249,2392161190,2.470036
mobile web,Chrome Mobile-38,37895773,2392161190,1.584165
mobile web,Chrome-81,25334007,2392161190,1.059043


Mobile Safari-13, Chrome Mobile-81, and Chrome Mobile-83 browers each accounted for over 5% of pageview traffic to mobile web and were included in the list of Grade A browsers for mobile specific skins. 

# Pageviews by Grade A (JS-Tested) And Below Grade A (Non-JS tested Browsers)

In [103]:
#Compiled list of Grade A browsers identified using #https://www.mediawiki.org/wiki/Compatibility#General_information
grade_A_browsers <- c('Firefox-75', 'Firefox-76', 'Firefox-77', 'Firefox-68',
                      'Chrome-81', 'Chrome-83', 'Chrome Mobile-81', 'Chrome Mobile-83', 
                      'Opera-67', 'Opera-68', 
                     'Edge-18', 'Edge-17', 'Edge-83', 'Edge-81', 
                      'IE-11', 
                      'Safari-5', 'Safari-6','Safari-7', 'Safari-8', 'Safari-9', 'Safari-10', 'Safari-11', 'Safari-12', 'Safari-13',
                     'Mobile Safari-13',  
                    'Android-4', 'Android-5', 'Android-6', 'Android-7', 'Android-8', 'Android-9', 'Android-10'
                     )


In [102]:
js_pageviews <- pageviews_bybrowser %>%
    mutate(category = ifelse(browser %in% grade_A_browsers, "Grade A", "Below A")) %>%
    group_by(access_method, category) %>%
    summarize(views = sum(as.numeric(pageviews))) %>%
    mutate(pct_views = views/sum(views) *100)
    
js_pageviews 

access_method,category,views,pct_views
<chr>,<chr>,<dbl>,<dbl>
desktop,Below A,204354230,13.80344
desktop,Grade A,1276104628,86.19656
mobile web,Below A,783498149,32.75273
mobile web,Grade A,1608663041,67.24727


## Per Project Breakdown

I reviewed the pageview traffic from Grade A browsers on a per project basis to determine if the percentages varied signficantly across different size projects. I limited the review to only wikipedia projects and desktop.

### Wikipedia projects with highest percent traffic from Grade A Browsers on desktop

In [99]:

js_browsers_byproject_desktop <- pageviews_bybrowser %>%
    filter(str_detect(project, "wikipedia$"),
                     access_method == "desktop") %>% #look at only desktop and wikipedia
    mutate(category = ifelse(browser %in% grade_A_browsers, "Grade A", "Below_A")) %>%
    group_by(project, access_method, category) %>%
    summarize(views = sum(as.numeric(pageviews))) %>%
    group_by(project, access_method) %>%
    mutate(total_views = sum(views),
           pct_views = views/total_views *100) %>%
    filter(category == 'Grade A') %>%
    arrange(desc(pct_views))
    
head(js_browsers_byproject_desktop, 10)

project,access_method,category,views,total_views,pct_views
<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>
ja.wikipedia,desktop,Grade A,88524574,94814195,93.36637
pl.wikipedia,desktop,Grade A,32415865,35013400,92.58131
hu.wikipedia,desktop,Grade A,7451756,8091159,92.09751
pt.wikipedia,desktop,Grade A,26579705,28906963,91.94914
es.wikipedia,desktop,Grade A,95745216,104143204,91.93612
it.wikipedia,desktop,Grade A,41159164,45003519,91.45766
cs.wikipedia,desktop,Grade A,10071694,11104095,90.70252
th.wikipedia,desktop,Grade A,2789386,3081545,90.51907
fr.wikipedia,desktop,Grade A,60713531,67375607,90.11204
de.wikipedia,desktop,Grade A,87004955,96614984,90.05327


### Wikipedia projects with lowest percent traffic from Grade A Browsers on desktop

In [96]:
tail(js_browsers_byproject_desktop, 10)

project,access_method,category,views,total_views,pct_views
<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>
cdo.wikipedia,desktop,Grade A,4230,32268,13.108962
cv.wikipedia,desktop,Grade A,5808,44383,13.086092
yo.wikipedia,desktop,Grade A,4898,37646,13.010678
io.wikipedia,desktop,Grade A,6725,51961,12.942399
zh-min-nan.wikipedia,desktop,Grade A,9313,74025,12.580885
gv.wikipedia,desktop,Grade A,3483,27789,12.533736
hak.wikipedia,desktop,Grade A,3818,31050,12.296296
azb.wikipedia,desktop,Grade A,10441,103227,10.114602
ten.wikipedia,desktop,Grade A,78,1113,7.008086
ig.wikipedia,desktop,Grade A,2652,45228,5.863624


Larger-sized projects had a higher percentage of traffic from Grade A Browsers with around 86 to 93 percent of desktop pageviews coming from Grade A Browsers. Japanese Wikipedia had the highest percentage (93.4%) of overall desktop pageviews coming from Grade A Browsers. 

### Wikipedia projects with highest percent traffic from Grade A Browsers on mobile web

In [94]:
# Top Projects with highest traffic from Grade A Browsers on Mobile Web
js_browsers_byproject_mobileweb <- pageviews_bybrowser %>%
    filter(str_detect(project, "wikipedia$"),
                     access_method == "mobile web") %>% #look at only mobile web and wikipedia.
    mutate(category = ifelse(browser %in% grade_A_browsers, "Grade A", "Below_A")) %>%
    group_by(project, access_method, category) %>%
    summarize(views = sum(as.numeric(pageviews))) %>%
    group_by(project, access_method) %>%
    mutate(total_views = sum(views),
           pct_views = views/total_views *100) %>%
    filter(category == 'Grade A') %>%
    arrange(desc(pct_views))
    
head(js_browsers_byproject_mobileweb, 10)

project,access_method,category,views,total_views,pct_views
<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>
sk.wikipedia,mobile web,Grade A,1686361,2280480,73.94763
ca.wikipedia,mobile web,Grade A,1186844,1641460,72.30417
hu.wikipedia,mobile web,Grade A,6676539,9277015,71.96861
el.wikipedia,mobile web,Grade A,3535879,4923241,71.82015
cs.wikipedia,mobile web,Grade A,6800201,9471317,71.79784
pt.wikipedia,mobile web,Grade A,43511101,61053555,71.2671
no.wikipedia,mobile web,Grade A,2633465,3703405,71.10929
sl.wikipedia,mobile web,Grade A,636746,902258,70.5725
he.wikipedia,mobile web,Grade A,6233762,8869646,70.28197
lt.wikipedia,mobile web,Grade A,1013316,1453126,69.73353


In [98]:
### Wikipedia projects with lowest percent of overall traffic from Grade A Browsers on mobile web

In [97]:
tail(js_browsers_byproject_mobileweb, 10)

project,access_method,category,views,total_views,pct_views
<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>
ha.wikipedia,mobile web,Grade A,1402,34956,4.0107564
st.wikipedia,mobile web,Grade A,149,3921,3.800051
mus.wikipedia,mobile web,Grade A,32,867,3.6908881
aa.wikipedia,mobile web,Grade A,57,1609,3.542573
hz.wikipedia,mobile web,Grade A,20,741,2.6990553
ho.wikipedia,mobile web,Grade A,20,945,2.1164021
cho.wikipedia,mobile web,Grade A,14,770,1.8181818
kj.wikipedia,mobile web,Grade A,15,1058,1.4177694
mh.wikipedia,mobile web,Grade A,9,747,1.2048193
ig.wikipedia,mobile web,Grade A,501,170627,0.2936229


Smaller and more more mid-size wikis had the highest percent of overall pageviews coming from Grade A Browsers on mobile web. Slovak (73.9%) and Catalan Wikipedia (72.3%) had the highest percent of pageviews from Grade A mobile web browesers.

# Non-JS Pageviews bia Browser General

I reviewed the [browser general](https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Browser_general) table as well to confirm results were similar.

In [64]:

query <- "
SELECT
   CONCAT(browser_family, '-', browser_major) AS browser,
   access_method AS access_method,
   SUM(view_count) AS pageviews
FROM wmf.browser_general
WHERE
   access_method IN ('desktop', 'mobile web') AND
   CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')) > '2020-05-23' AND
   CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')) < '2020-05-31'
GROUP BY
   CONCAT(browser_family, '-', browser_major),
   access_method"


In [65]:
pageviews_bybrowser_2 <- wmf::query_hive(query)

In [66]:
head(pageviews_bybrowser_2)

Unnamed: 0_level_0,browser,access_method,pageviews
Unnamed: 0_level_1,<chr>,<chr>,<int>
1,Amazon Silk-81,mobile web,3300607
2,Apple Mail-605,desktop,6028864
3,Chrome Mobile WebView-30,mobile web,267919
4,Chrome Mobile WebView-81,mobile web,8657806
5,Chrome Mobile WebView-83,mobile web,549165
6,Chrome Mobile iOS-80,mobile web,2679818


In [70]:
percent_views <- pageviews_bybrowser_2 %>%
    filter(access_method == 'mobile web') %>%
    mutate(pct_pageviews = as.numeric(pageviews/sum(pageviews)) *100) %>%
    arrange(desc(pct_pageviews))

head(percent_views, 10)

Unnamed: 0_level_0,browser,access_method,pageviews,pct_pageviews
Unnamed: 0_level_1,<chr>,<chr>,<int>,<dbl>
1,Mobile Safari-13,mobile web,675730689,28.247707
2,Chrome Mobile-81,mobile web,652385535,27.271805
3,Other--,mobile web,232303503,9.711031
4,Chrome Mobile-83,mobile web,151654709,6.339653
5,Samsung Internet-11,mobile web,101690548,4.250991
6,Mobile Safari-12,mobile web,89135852,3.726164
7,Chrome Mobile-80,mobile web,75554341,3.158413
8,Google-107,mobile web,69718754,2.914467
9,Chrome Mobile iOS-81,mobile web,59084509,2.469922
10,Chrome Mobile-38,mobile web,37870579,1.583112


In [105]:
js_pageviews <- pageviews_bybrowser_2 %>%
    mutate(category = ifelse(browser %in% grade_A_browsers, "Grade A", "Below A")) %>%
    group_by(access_method, category) %>%
    summarize(views = sum(as.numeric(pageviews))) %>%
    mutate(pct_pageviews = views/sum(views) *100)
    
js_pageviews 

access_method,category,views,pct_pageviews
<chr>,<chr>,<dbl>,<dbl>
desktop,Below A,225294343,15.21787
desktop,Grade A,1255164515,84.78213
mobile web,Below A,884848677,36.98951
mobile web,Grade A,1507312513,63.01049


The percentages are close to the data from pageview_hourly. 