# Measure % of edits coming from users without JS [T240697](https://phabricator.wikimedia.org/T240697)

We need to find out how many people are editing with no JS support (either in a browser that doesn't have JS support, or a regular browser with JS turned off). See the parent task for more info [T234695](https://phabricator.wikimedia.org/T234695) . We just need some ballpark numbers here, sampling is fine.  

### Results : For the year 2020 :  
 - snapshot % of user edits done with no JS support : **26.74 %**
 - snapshot % of anon edits done with no JS support :  **12.79 %**   
 - snapshot % of all edits done with no JS support :  **24.08 %**  
 

### Results : For the year 2019 :  
 - snapshot % of user edits done with no JS support : **28.92 %**
 - snapshot % of anon edits done with no JS support :  **13.98 %**   
 - snapshot % of all edits done with no JS support :  **25.83 %**  
 
``
The listed percentages on non-JS editors are much higher than expected indicating that a large portion of these users have ad-blockers installed and/or enabled DNT. As a result, we don't think this data is useful in determining the percentage of non-JS users and we would recommend looking at adding instrumentation if more accurate numbers are needed.
``

### Breakdown by Editor Interface
    
    

| Year | Editor Type | Snapshot % of non-JS user edits |Snapshot % of non-JS anonymous edits |Snapshot % of all non-JS edits |
| -| -|-|-|-|
| 2020 |visualeditor | 0.02 % |0.017 % |0.019 % |
| 2020 |wikitext | 29.88 % |14.9 % |27.12 % |
| 2020 |wikitext-2017  | 0.03 % |0.0 % |0.03 % |
| -| -|-|-|-|
| 2019 |visualeditor | 0.1 % |0.05 % |0.08 % |
| 2019 |wikitext | 31.66 % |16.06 % |28.56 % |
| 2019 |wikitext-2017  | 0.03 % |0.0 % |0.03 % |



### The no-JS edit proportions by editing interfaces raises some interesting observations and questions :    
1. The wikitext-2017 editor is a turn-on beta preference hence the anonymous non-JS edits made with it is rightly 0%
2. We have excluded all oversampled sessions from editattemptstep to obtain these results. 
3. These proportions may not apply equitably to all Wikis.
4. With the assumption that VisualEditor always requires JS to load and enable editing, could the small proportion of non-JS edits for VE that we see above be indicative of users who have blocked Client-side eventlogging ?      
5. Based on our understanding of how the events are recorded and [schema documentation](https://meta.wikimedia.org/wiki/Schema:EditAttemptStep), this approach should really only work for the wikitext editor. A breakdown per platform helps clarify these numbers and confirms that there is only a small percentage for VisualEditor which might be users with DNT enabled. 



In [1]:
import matplotlib.pyplot as plt
from matplotlib.ticker import StrMethodFormatter
import numpy as np
import pandas as pd
from tabulate import tabulate
from wmfdata import charting, hive, mariadb
from wmfdata.charting import comma_fmt, pct_fmt
from wmfdata.utils import df_to_remarkup, pct_str

You are using wmfdata v1.0.1, but v1.0.3 is available.

To update, run `pip install --upgrade git+https://github.com/neilpquinn/wmfdata/wmfdata.git@release`.

To see the changes, refer to https://github.com/neilpquinn/wmfdata/blob/release/CHANGELOG.md


Based on discussions with R Kaldari and D Lynch here are the recommended steps that could be used to answer each of the 3 questions in the task description:  

Snapshot % of user edits done with no JS support:  
Within a specific timespan, take:  
number of sessions where user_id !== 0, integration === page, actions include init, saveSuccess, and not ready.  
Divide by:  
number of sessions where user_id !== 0, integration === page, actions include init, ready, and saveSuccess.

Snapshot % of anon edits done with no JS support:  
Within a specific timespan, take:  
number of sessions where user_id === 0, integration === page, actions include init, saveSuccess, and not ready.  
Divide by:  
number of sessions where user_id === 0, integration === page, actions include init, ready, and saveSuccess.  

Snapshot % of all edits done with no JS support:  
Within a specific timespan, take:  
number of sessions where integration === page, actions include init, saveSuccess, and not ready.  
Divide by:  
number of sessions where integration === page, actions include init, ready, and saveSuccess.  

Note that I've decided not to worry about which editor the user is using (editor_interface), which should simplify things a bit.




In [6]:
edit_sessions_query_2020= '''

-- We are only interested in sessions with saveSuccess events i.e. saved edits -- 

WITH saveSuccess_sessions as (
SELECT distinct event.editing_session_id AS ss_session_id,
  event.user_id AS user_id
FROM event_sanitized.editattemptstep
WHERE event.integration = 'page' 
  AND year = 2020  
  AND NOT event.is_oversample -- Taking out Oversampled edits
  AND event.action = 'saveSuccess' 
), 

init_sessions as (
SELECT event.editing_session_id AS init_session_id,
  event.action as init_action
FROM event_sanitized.editattemptstep eas
WHERE event.integration = 'page' 
  AND year = 2020  
  AND event.action='init'
),

-- Now we will use 'Ready' events to sessions without identify user edits done with no JS support -- 

ready_sessions as (
SELECT event.editing_session_id AS ready_session_id ,
  event.action as ready_action
FROM event_sanitized.editattemptstep eas
WHERE event.integration = 'page' 
  AND year = 2020  
  AND event.action='ready'
)


-- Main Query -- 
SELECT
  SUM(CAST(user_id!=0 AND ready_action is null AND init_action = 'init' AS int)) AS user_nonjs_edits,
  SUM(CAST(user_id=0 and ready_action is null AND init_action = 'init' AS int)) AS anon_nonjs_edits,
  SUM(CAST(user_id!=0 and ready_action='ready' AND init_action = 'init'  AS int)) AS user_js_edits,
  SUM(CAST(user_id=0 and ready_action='ready' AND init_action = 'init'  AS int)) AS anon_js_edits,
  SUM(CAST(ready_action is null AND init_action = 'init' AS int)) AS all_nonjs_edits,
  SUM(CAST(ready_action='ready' AND init_action = 'init' AS int)) AS all_js_edits

FROM (
  SELECT 
    user_id AS user_id , 
    ss_session_id AS ss_session_id,
    ready_action ,
    init_action
  FROM saveSuccess_sessions ss  
  LEFT JOIN ready_sessions ON ss.ss_session_id = ready_sessions.ready_session_id
  LEFT JOIN init_sessions ON ss.ss_session_id = init_sessions.init_session_id

        
  GROUP BY ss.user_id, 
    ss.ss_session_id, 
    ready_sessions.ready_action,
    init_sessions.init_action

    ) edit_sessions
'''

In [7]:
edit_sessions_2020= hive.run(
    [
        "SET mapreduce.map.memory.mb=4096", 
        "SET hive.mapred.mode=nonstrict",
        edit_sessions_query_2020 
    ]
)



Count of JS and non-JS edits with the assumption that -    
nonJS edits : number of sessions where event actions include init, saveSuccess, and not ready.     
JS edits : number of sessions where actions include init, ready, and saveSuccess.


In [8]:
edit_sessions_2020

Unnamed: 0,user_nonjs_edits,anon_nonjs_edits,user_js_edits,anon_js_edits,all_nonjs_edits,all_js_edits
0,767423,86502,2101643,589680,853925,2691323


### If we're interested in knowing the proportion of non-JS edits we should be dividing the number of non-JS edits by the sum of JS and non-JS edits

In [10]:
print ('Snapshot % of user edits done with no JS support in 2020: ',
       (100* (edit_sessions_2020['user_nonjs_edits'] / 
              ( edit_sessions_2020['user_js_edits'] + edit_sessions_2020['user_nonjs_edits'])))
      )

Snapshot % of user edits done with no JS support in 2020:  0    26.748182
dtype: float64


In [11]:
print ('Snapshot % of anon edits done with no JS support in 2020:',
       (100* (edit_sessions_2020['anon_nonjs_edits'] / 
              ( edit_sessions_2020['anon_js_edits'] + edit_sessions_2020['anon_nonjs_edits'])))
      )

Snapshot % of anon edits done with no JS support in 2020: 0    12.79271
dtype: float64


In [12]:
print ('Snapshot % of all edits done with no JS support in 2020:',
       (100* (edit_sessions_2020['all_nonjs_edits'] / 
              ( edit_sessions_2020['all_js_edits'] + edit_sessions_2020['all_nonjs_edits'])))
      )

Snapshot % of all edits done with no JS support in 2020: 0    24.086467
dtype: float64


**Now lets look at similar numbers from 2019**

In [13]:
edit_sessions_query_2019= '''

-- We are only interested in sessions with saveSuccess events i.e. saved edits -- 

WITH saveSuccess_sessions as (
SELECT distinct event.editing_session_id AS ss_session_id,
  event.user_id AS user_id
FROM event_sanitized.editattemptstep
WHERE event.integration = 'page' 
  AND year = 2019  
  AND NOT event.is_oversample -- Taking out Oversampled edits
  AND event.action = 'saveSuccess' 
), 

init_sessions as (
SELECT event.editing_session_id AS init_session_id,
  event.action as init_action
FROM event_sanitized.editattemptstep eas
WHERE event.integration = 'page' 
  AND year = 2019  
  AND event.action='init'
),

-- Now we will use 'Ready' events to sessions without identify user edits done with no JS support -- 

ready_sessions as (
SELECT event.editing_session_id AS ready_session_id ,
  event.action as ready_action
FROM event_sanitized.editattemptstep eas
WHERE event.integration = 'page' 
  AND year = 2019  
  AND event.action='ready'
)


-- Main Query -- 
SELECT
  SUM(CAST(user_id!=0 AND ready_action is null AND init_action = 'init' AS int)) AS user_nonjs_edits,
  SUM(CAST(user_id=0 and ready_action is null AND init_action = 'init' AS int)) AS anon_nonjs_edits,
  SUM(CAST(user_id!=0 and ready_action='ready' AND init_action = 'init'  AS int)) AS user_js_edits,
  SUM(CAST(user_id=0 and ready_action='ready' AND init_action = 'init'  AS int)) AS anon_js_edits,
  SUM(CAST(ready_action is null AND init_action = 'init' AS int)) AS all_nonjs_edits,
  SUM(CAST(ready_action='ready' AND init_action = 'init' AS int)) AS all_js_edits

FROM (
  SELECT 
    user_id AS user_id , 
    ss_session_id AS ss_session_id,
    ready_action ,
    init_action
  FROM saveSuccess_sessions ss  
  LEFT JOIN ready_sessions ON ss.ss_session_id = ready_sessions.ready_session_id
  LEFT JOIN init_sessions ON ss.ss_session_id = init_sessions.init_session_id

        
  GROUP BY ss.user_id, 
    ss.ss_session_id, 
    ready_sessions.ready_action,
    init_sessions.init_action

    ) edit_sessions
'''

In [14]:
 edit_sessions_2019= hive.run(
    [
        "SET mapreduce.map.memory.mb=4096", 
        "SET hive.mapred.mode=nonstrict",
        edit_sessions_query_2019
    ]
)



In [15]:
edit_sessions_2019

Unnamed: 0,user_nonjs_edits,anon_nonjs_edits,user_js_edits,anon_js_edits,all_nonjs_edits,all_js_edits
0,1594988,200541,3919261,1233940,1795529,5153201


In [16]:
print ('Snapshot % of user edits done with no JS support in 2019 :',
       (100* (edit_sessions_2019['user_nonjs_edits'] / 
              ( edit_sessions_2019['user_js_edits'] + edit_sessions_2019['user_nonjs_edits'])))
      )

Snapshot % of user edits done with no JS support in 2019 : 0    28.924845
dtype: float64


In [17]:
print ('Snapshot % of anon edits done with no JS support in 2019 :',
       (100* (edit_sessions_2019['anon_nonjs_edits'] / 
              ( edit_sessions_2019['anon_js_edits'] + edit_sessions_2019['anon_nonjs_edits'])))
      )

Snapshot % of anon edits done with no JS support in 2019 : 0    13.980039
dtype: float64


In [18]:
print ('Snapshot % of all edits done with no JS support in 2019 :',
       (100* (edit_sessions_2019['all_nonjs_edits'] / 
              ( edit_sessions_2019['all_js_edits'] + edit_sessions_2019['all_nonjs_edits'])))
      )

Snapshot % of all edits done with no JS support in 2019 : 0    25.839671
dtype: float64


Based on our understanding of how the events are recorded and [schema documentation](https://meta.wikimedia.org/wiki/Schema:EditAttemptStep), the above approach should really only work for the wikitext editor.  
**Hence a breakdown of each editing interface helps clarify these numbers though and confirms that there is only a small percentage for VisualEditor which might be users with DNT enabled).**

## non-JS edits proportion by editing interface
### For the year 2020

In [19]:
edit_interface_query_2020= '''

-- We are only interested in sessions with saveSuccess events i.e. saved edits -- 

WITH saveSuccess_sessions as (
SELECT distinct event.editing_session_id AS ss_session_id,
  event.user_id AS user_id, 
  event.editor_interface AS editing_interface
FROM event_sanitized.editattemptstep
WHERE event.integration = 'page' 
  AND year = 2020  
  AND NOT event.is_oversample -- Taking out Oversampled edits
  AND event.action = 'saveSuccess' 
), 

init_sessions as (
SELECT event.editing_session_id AS init_session_id,
  event.action as init_action
FROM event_sanitized.editattemptstep eas
WHERE event.integration = 'page' 
  AND year = 2020  
  AND event.action='init'
),

-- Now we will use sessions without 'Ready' events to identify user edits done with no JS support -- 

ready_sessions as (
SELECT event.editing_session_id AS ready_session_id ,
  event.action as ready_action
FROM event_sanitized.editattemptstep eas
WHERE event.integration = 'page' 
  AND year = 2020  
  AND event.action='ready'
)

-- Main Query -- 
SELECT
  editing_interface AS editing_interface,  
  SUM(CAST(user_id!=0 AND ready_action is null AND init_action = 'init' AS int)) AS user_nonjs_edits,
  SUM(CAST(user_id=0 and ready_action is null AND init_action = 'init' AS int)) AS anon_nonjs_edits,
  SUM(CAST(user_id!=0 and ready_action='ready' AND init_action = 'init' AS int)) AS user_js_edits,
  SUM(CAST(user_id=0 and ready_action='ready' AND init_action = 'init' AS int)) AS anon_js_edits,
  SUM(CAST(ready_action is null AND init_action = 'init' AS int)) AS all_nonjs_edits,
  SUM(CAST(ready_action='ready' AND init_action = 'init' AS int)) AS all_js_edits

FROM (
  SELECT 
    user_id AS user_id , 
    ss_session_id AS ss_session_id,
    editing_interface,
    ready_action ,
    init_action
  FROM saveSuccess_sessions ss  
  LEFT JOIN ready_sessions ON ss.ss_session_id = ready_sessions.ready_session_id 
  LEFT JOIN init_sessions ON ss.ss_session_id = init_sessions.init_session_id
        
  GROUP BY ss.user_id, 
    ss.ss_session_id, 
    ss.editing_interface,
    ready_sessions.ready_action,
    init_sessions.init_action
    ) edit_sessions

GROUP BY editing_interface    
'''

In [21]:
edit_interface_sessions_2020= hive.run(
    [
        "SET mapreduce.map.memory.mb=4096", 
        "SET hive.mapred.mode=nonstrict",
        edit_interface_query_2020 
    ]
)


In [22]:
edit_interface_sessions_2020

Unnamed: 0,editing_interface,user_nonjs_edits,anon_nonjs_edits,user_js_edits,anon_js_edits,all_nonjs_edits,all_js_edits
0,visualeditor,48,17,235233,96887,65,332120
1,wikitext,767978,86532,1801464,493867,854510,2295331
2,wikitext-2017,21,0,68999,10,21,69009


In [23]:
edit_interface_sessions_2020=edit_interface_sessions_2020.set_index('editing_interface')

In [25]:
print ('Snapshot % of user edits done with no JS support in 2020 :',
       (100* (edit_interface_sessions_2020['user_nonjs_edits'] / 
              ( edit_interface_sessions_2020['user_js_edits'] + edit_interface_sessions_2020['user_nonjs_edits'])))
      )

Snapshot % of user edits done with no JS support in 2020 : editing_interface
visualeditor      0.020401
wikitext         29.888902
wikitext-2017     0.030426
dtype: float64


In [26]:
print ('Snapshot % of anon edits done with no JS support in 2020:',
       (100* (edit_interface_sessions_2020['anon_nonjs_edits'] / 
              ( edit_interface_sessions_2020['anon_js_edits'] + edit_interface_sessions_2020['anon_nonjs_edits'])))
      )

Snapshot % of anon edits done with no JS support in 2020: editing_interface
visualeditor      0.017543
wikitext         14.909054
wikitext-2017     0.000000
dtype: float64


In [27]:
print ('Snapshot % of all edits done with no JS support in 2020:',
       (100* (edit_interface_sessions_2020['all_nonjs_edits'] / 
              ( edit_interface_sessions_2020['all_js_edits'] + edit_interface_sessions_2020['all_nonjs_edits'])))
      )

Snapshot % of all edits done with no JS support in 2020: editing_interface
visualeditor      0.019567
wikitext         27.128671
wikitext-2017     0.030422
dtype: float64


**Now let's look at the numbers for the year 2019**

In [28]:
edit_interface_query_2019= '''

-- We are only interested in sessions with saveSuccess events i.e. saved edits -- 

WITH saveSuccess_sessions as (
SELECT distinct event.editing_session_id AS ss_session_id,
  event.user_id AS user_id, 
  event.editor_interface AS editing_interface
FROM event_sanitized.editattemptstep
WHERE event.integration = 'page' 
  AND year = 2019  
  AND NOT event.is_oversample -- Taking out Oversampled edits
  AND event.action = 'saveSuccess' 
), 

init_sessions as (
SELECT event.editing_session_id AS init_session_id,
  event.action as init_action
FROM event_sanitized.editattemptstep eas
WHERE event.integration = 'page' 
  AND year = 2019  
  AND event.action='init'
),

-- Now we will use sessions without 'Ready' events to identify user edits done with no JS support -- 

ready_sessions as (
SELECT event.editing_session_id AS ready_session_id ,
  event.action as ready_action
FROM event_sanitized.editattemptstep eas
WHERE event.integration = 'page' 
  AND year = 2019  
  AND event.action='ready'
)

-- Main Query -- 
SELECT
  editing_interface AS editing_interface,  
  SUM(CAST(user_id!=0 AND ready_action is null AND init_action = 'init' AS int)) AS user_nonjs_edits,
  SUM(CAST(user_id=0 and ready_action is null AND init_action = 'init' AS int)) AS anon_nonjs_edits,
  SUM(CAST(user_id!=0 and ready_action='ready' AND init_action = 'init' AS int)) AS user_js_edits,
  SUM(CAST(user_id=0 and ready_action='ready' AND init_action = 'init' AS int)) AS anon_js_edits,
  SUM(CAST(ready_action is null AND init_action = 'init' AS int)) AS all_nonjs_edits,
  SUM(CAST(ready_action='ready' AND init_action = 'init' AS int)) AS all_js_edits

FROM (
  SELECT 
    user_id AS user_id , 
    ss_session_id AS ss_session_id,
    editing_interface,
    ready_action ,
    init_action
  FROM saveSuccess_sessions ss  
  LEFT JOIN ready_sessions ON ss.ss_session_id = ready_sessions.ready_session_id 
  LEFT JOIN init_sessions ON ss.ss_session_id = init_sessions.init_session_id
        
  GROUP BY ss.user_id, 
    ss.ss_session_id, 
    ss.editing_interface,
    ready_sessions.ready_action,
    init_sessions.init_action
    ) edit_sessions

GROUP BY editing_interface    
'''

In [29]:
edit_interface_sessions_2019= hive.run(
    [
        "SET mapreduce.map.memory.mb=4096", 
        "SET hive.mapred.mode=nonstrict",
        edit_interface_query_2019 
    ]
)


In [30]:
edit_interface_sessions_2019

Unnamed: 0,editing_interface,user_nonjs_edits,anon_nonjs_edits,user_js_edits,anon_js_edits,all_nonjs_edits,all_js_edits
0,visualeditor,377,104,372382,186774,481,559156
1,wikitext,1594572,200437,3441877,1047145,1795009,4489022
2,wikitext-2017,39,0,105006,21,39,105027


In [31]:
edit_interface_sessions_2019=edit_interface_sessions_2019.set_index('editing_interface')

In [32]:
print ('Snapshot % of user edits done with no JS support in 2019 :',
       (100* (edit_interface_sessions_2019['user_nonjs_edits'] / 
              ( edit_interface_sessions_2019['user_js_edits'] + edit_interface_sessions_2019['user_nonjs_edits'])))
      )

Snapshot % of user edits done with no JS support in 2019 : editing_interface
visualeditor      0.101138
wikitext         31.660640
wikitext-2017     0.037127
dtype: float64


In [33]:
print ('Snapshot % of anon edits done with no JS support in 2019:',
       (100* (edit_interface_sessions_2019['anon_nonjs_edits'] / 
              ( edit_interface_sessions_2019['anon_js_edits'] + edit_interface_sessions_2019['anon_nonjs_edits'])))
      )

Snapshot % of anon edits done with no JS support in 2019: editing_interface
visualeditor      0.055651
wikitext         16.066038
wikitext-2017     0.000000
dtype: float64


In [34]:
print ('Snapshot % of all edits done with no JS support in 2019:',
       (100* (edit_interface_sessions_2019['all_nonjs_edits'] / 
              ( edit_interface_sessions_2019['all_js_edits'] + edit_interface_sessions_2019['all_nonjs_edits'])))
      )

Snapshot % of all edits done with no JS support in 2019: editing_interface
visualeditor      0.085949
wikitext         28.564611
wikitext-2017     0.037120
dtype: float64
