# Typing Speeds

How can you improve your typing speed?

The file `typing-speeds.csv` contains typing speed data from >168,000 people typing 15 sentences each. The data was collected via an online typing test published at a free typing speed assessment webpage.

In [68]:
import pandas as pd

df = pd.read_csv('typing-speeds.csv')
df

Unnamed: 0,PARTICIPANT_ID,AGE,HAS_TAKEN_TYPING_COURSE,COUNTRY,LAYOUT,NATIVE_LANGUAGE,FINGERS,KEYBOARD_TYPE,ERROR_RATE,AVG_WPM_15,ROR
0,3,30,0,US,qwerty,en,1-2,full,0.511945,61.9483,0.2288
1,5,27,0,MY,qwerty,en,7-8,laptop,0.871080,72.8871,0.3675
2,7,13,0,AU,qwerty,en,7-8,laptop,6.685633,24.1809,0.0667
3,23,21,0,IN,qwerty,en,3-4,full,2.130493,24.7112,0.0413
4,24,21,0,PH,qwerty,tl,7-8,laptop,1.893287,45.3364,0.2678
...,...,...,...,...,...,...,...,...,...,...,...
168589,517932,20,0,US,qwerty,en,9-10,laptop,8.731466,24.9125,0.1842
168590,517936,25,0,PL,qwerty,pl,9-10,laptop,0.000000,66.2946,0.0639
168591,517943,38,1,US,qwerty,en,9-10,laptop,0.147929,75.6713,0.2021
168592,517944,28,0,GB,qwerty,en,9-10,laptop,0.278552,91.7083,0.5133




| **Variable**             | **Description**                                                                 |
|--------------------------|---------------------------------------------------------------------------------|
| `PARTICIPANT_ID`         | Unique ID of the participant                                                   |
| `AGE`                    | Age of the participant                                                         |
| `HAS_TAKEN_TYPING_COURSE`| Whether the participant has taken a typing course (1 = Yes, 0 = No)            |
| `COUNTRY`                | Country of the participant                                                     |
| `LAYOUT`      		   | Keyboard layout used (QWERTY, AZERTY, or QWERTZ)                               |
| `NATIVE_LANGUAGE`        | Native language of the participant                                             |
| `FINGERS`                | Number of fingers used for typing (options: 1-2, 3-4, 5-6, 7-8, 9-10)          |
| `KEYBOARD_TYPE`          | Type of keyboard used (Full/desktop, laptop, small physical, or touch)         |
| `ERROR_RATE(%)`          | Uncorrected error rate (as a percentage)                                       |
| `AVG_WPM_15`             | Words per minute averaged over 15 typed sentences                              |
| `ROR`                    | Rollover ratio                                                                 |


* Clean
* Rename

In [69]:
# Clean
columns_to_delete = ['PARTICIPANT_ID']
df = df.drop(columns=columns_to_delete)

# Rename 
rename_map = {
    'HAS_TAKEN_TYPING_COURSE' : 'COURSE',
    'NATIVE_LANGUAGE' : 'LANGUAGE',
    'KEYBOARD_TYPE' : 'KEYBOARD',
    'ERROR_RATE' : 'ERROR',
    'AVG_WPM_15' : "WPM"
}
df = df.rename(columns=rename_map)
df

Unnamed: 0,AGE,COURSE,COUNTRY,LAYOUT,LANGUAGE,FINGERS,KEYBOARD,ERROR,WPM,ROR
0,30,0,US,qwerty,en,1-2,full,0.511945,61.9483,0.2288
1,27,0,MY,qwerty,en,7-8,laptop,0.871080,72.8871,0.3675
2,13,0,AU,qwerty,en,7-8,laptop,6.685633,24.1809,0.0667
3,21,0,IN,qwerty,en,3-4,full,2.130493,24.7112,0.0413
4,21,0,PH,qwerty,tl,7-8,laptop,1.893287,45.3364,0.2678
...,...,...,...,...,...,...,...,...,...,...
168589,20,0,US,qwerty,en,9-10,laptop,8.731466,24.9125,0.1842
168590,25,0,PL,qwerty,pl,9-10,laptop,0.000000,66.2946,0.0639
168591,38,1,US,qwerty,en,9-10,laptop,0.147929,75.6713,0.2021
168592,28,0,GB,qwerty,en,9-10,laptop,0.278552,91.7083,0.5133


- Compare typing speeds across groups using different numbers of fingers, excluding the "10+" category for simplicity.

In [70]:
# Exclude the 10+ category
categories_selected = df.query('FINGERS != "10+"')

# Form groups
finger_speed = categories_selected.groupby('FINGERS')['WPM'].mean()
finger_speed

FINGERS
1-2     40.280812
3-4     41.004952
5-6     45.731789
7-8     50.057909
9-10    57.379572
Name: WPM, dtype: float64

- Control for consistency by first filtering to similar `AGE`, `KEYBOARD_LAYOUT`, `NATIVE_LANGUAGE`, `KEYBOARD_TYPE`, and `HAS_TAKEN_TYPING_COURSE` values.

In [71]:
# Given profile (example)
age = 20
layout = 'qwerty'
language = 'en'
keyboard = 'laptop'

filtered_df = df[
    (df['AGE'] == age) &
    (df['LAYOUT'] == layout) &
    (df['LANGUAGE'] == language) &
    (df['KEYBOARD'] == keyboard) 
]

filtered_df

Unnamed: 0,AGE,COURSE,COUNTRY,LAYOUT,LANGUAGE,FINGERS,KEYBOARD,ERROR,WPM,ROR
6,20,1,AF,qwerty,en,7-8,laptop,3.127715,9.9978,0.0049
28,20,0,US,qwerty,en,1-2,laptop,2.394366,30.1761,0.3059
141,20,1,US,qwerty,en,9-10,laptop,0.414938,82.8916,0.6281
145,20,1,IN,qwerty,en,9-10,laptop,0.176367,30.2062,0.0661
198,20,0,US,qwerty,en,5-6,laptop,1.313682,37.3148,0.1829
...,...,...,...,...,...,...,...,...,...,...
168368,20,0,US,qwerty,en,9-10,laptop,1.157025,86.2190,0.2872
168426,20,0,US,qwerty,en,7-8,laptop,0.138122,85.3706,0.5489
168476,20,0,PH,qwerty,en,3-4,laptop,6.396256,27.4366,0.0046
168495,20,0,AU,qwerty,en,7-8,laptop,0.301205,84.5160,0.4455


- Exclude participants with high error rates (ERROR_RATE > 3%) to focus on reliable data.

- Drop columns after filtering if they now only have a single value.

In [72]:
filtered_df = filtered_df.query('ERROR < 3')
filtered_df

Unnamed: 0,AGE,COURSE,COUNTRY,LAYOUT,LANGUAGE,FINGERS,KEYBOARD,ERROR,WPM,ROR
28,20,0,US,qwerty,en,1-2,laptop,2.394366,30.1761,0.3059
141,20,1,US,qwerty,en,9-10,laptop,0.414938,82.8916,0.6281
145,20,1,IN,qwerty,en,9-10,laptop,0.176367,30.2062,0.0661
198,20,0,US,qwerty,en,5-6,laptop,1.313682,37.3148,0.1829
263,20,0,PH,qwerty,en,9-10,laptop,2.991453,34.2174,0.1988
...,...,...,...,...,...,...,...,...,...,...
168228,20,0,UA,qwerty,en,3-4,laptop,0.170940,36.6214,0.3151
168307,20,0,US,qwerty,en,1-2,laptop,0.000000,39.2971,0.0401
168368,20,0,US,qwerty,en,9-10,laptop,1.157025,86.2190,0.2872
168426,20,0,US,qwerty,en,7-8,laptop,0.138122,85.3706,0.5489


- The Rollover Ratio (`ROR`) represents the proportion of keypresses where a new key is pressed before releasing the previous one.
- Compare typing speeds between participants with `ROR` ≤ 20% and those with `ROR` > 80%, keeping `AGE`, `KEYBOARD_TYPE`, `FINGERS`, and other variables constant.

In [None]:
# New df with the thre columns
filtered_df2 = pd.DataFrame()
filtered_df2['AGE'] = df['AGE']
filtered_df2['KEYBOARD'] = df['KEYBOARD']
filtered_df2['FINGERS'] = df['FINGERS']
filtered_df2['ROR'] = df['ROR'] * 100

# Results
ROR_under_20 = filtered_df2.query('ROR <= 20')
ROR_higher_80 = filtered_df2.query('ROR > 80')

Unnamed: 0,AGE,KEYBOARD,FINGERS,ROR
2,13,laptop,7-8,6.67
3,21,full,3-4,4.13
6,20,laptop,7-8,0.49
7,26,laptop,5-6,8.11
8,22,full,3-4,10.03
...,...,...,...,...
168582,18,laptop,5-6,7.04
168583,29,laptop,9-10,19.09
168584,26,full,1-2,3.72
168589,20,laptop,9-10,18.42


- Compare typing speeds between participants with a typing course (`HAS_TAKEN_TYPING_COURSE` = 1) and without (`HAS_TAKEN_TYPING_COURSE` = 0), holding other variables such as `KEYBOARD_TYPE`, `AGE` range, and `FINGER_COUNT` constant.

In [96]:
# Two groups
took_course = df.query('COURSE == 1')
no_course = df.query('COURSE == 0')

# Display with columns 
took_course[['ROR', 'KEYBOARD', 'AGE', 'FINGERS']]
no_course[['ROR', 'KEYBOARD', 'AGE', 'FINGERS']]

course_mean = took_course['ROR'].mean()
nocourse_mean = no_course['ROR'].mean()
print(f'Average ROR for persons who had a course : {course_mean}')
print(f'Average ROR for persons who ha no course : {nocourse_mean}')

Average ROR for persons who had a course : 0.28823871055074257
Average ROR for persons who ha no course : 0.23893594993241046
