# Typing Speeds

How can you improve your typing speed?

The file `typing-speeds.csv` contains typing speed data from >168,000 people typing 15 sentences each. The data was collected via an online typing test published at a free typing speed assessment webpage.

In [8]:
# FOR GOOGLE COLAB ONLY.
# Uncomment and run the code below. A dialog will appear to upload files.
# Upload 'typing-speeds.csv'.

# from google.colab import files
# uploaded = files.upload()

In [1]:
import pandas as pd

df = pd.read_csv('typing-speeds.csv')
df

Unnamed: 0,PARTICIPANT_ID,AGE,HAS_TAKEN_TYPING_COURSE,COUNTRY,LAYOUT,NATIVE_LANGUAGE,FINGERS,KEYBOARD_TYPE,ERROR_RATE,AVG_WPM_15,ROR
0,3,30,0,US,qwerty,en,1-2,full,0.511945,61.9483,0.2288
1,5,27,0,MY,qwerty,en,7-8,laptop,0.871080,72.8871,0.3675
2,7,13,0,AU,qwerty,en,7-8,laptop,6.685633,24.1809,0.0667
3,23,21,0,IN,qwerty,en,3-4,full,2.130493,24.7112,0.0413
4,24,21,0,PH,qwerty,tl,7-8,laptop,1.893287,45.3364,0.2678
...,...,...,...,...,...,...,...,...,...,...,...
168589,517932,20,0,US,qwerty,en,9-10,laptop,8.731466,24.9125,0.1842
168590,517936,25,0,PL,qwerty,pl,9-10,laptop,0.000000,66.2946,0.0639
168591,517943,38,1,US,qwerty,en,9-10,laptop,0.147929,75.6713,0.2021
168592,517944,28,0,GB,qwerty,en,9-10,laptop,0.278552,91.7083,0.5133




| **Variable**             | **Description**                                                                 |
|--------------------------|---------------------------------------------------------------------------------|
| `PARTICIPANT_ID`         | Unique ID of the participant                                                   |
| `AGE`                    | Age of the participant                                                         |
| `HAS_TAKEN_TYPING_COURSE`| Whether the participant has taken a typing course (1 = Yes, 0 = No)            |
| `COUNTRY`                | Country of the participant                                                     |
| `LAYOUT`      		   | Keyboard layout used (QWERTY, AZERTY, or QWERTZ)                               |
| `NATIVE_LANGUAGE`        | Native language of the participant                                             |
| `FINGERS`                | Number of fingers used for typing (options: 1-2, 3-4, 5-6, 7-8, 9-10)          |
| `KEYBOARD_TYPE`          | Type of keyboard used (Full/desktop, laptop, small physical, or touch)         |
| `ERROR_RATE(%)`          | Uncorrected error rate (as a percentage)                                       |
| `AVG_WPM_15`             | Words per minute averaged over 15 typed sentences                              |
| `ROR`                    | Rollover ratio                                                                 |


### Project Ideas
- Remove unnecessary columns, such as PARTICIPANT_ID, to streamline the dataset.

- Rename columns (e.g `AVG_WPM_15` to `wpm`, `ROR` to `ror`, `HAS_TAKEN_TYPING_COURSE` to `course`) for brevity and clarity during analysis.

Finger Count Analysis
- Compare typing speeds across groups using different numbers of fingers, excluding the "10+" category for simplicity.

- Control for consistency by first filtering to similar `AGE`, `KEYBOARD_LAYOUT`, `NATIVE_LANGUAGE`, `KEYBOARD_TYPE`, and `HAS_TAKEN_TYPING_COURSE` values.

- Exclude participants with high error rates (ERROR_RATE > 3%) to focus on reliable data.

- Drop columns after filtering if they now only have a single value.

Rollover Ratio Analysis
- The Rollover Ratio (`ROR`) represents the proportion of keypresses where a new key is pressed before releasing the previous one.

- Compare typing speeds between participants with `ROR` ≤ 20% and those with `ROR` > 80%, keeping `AGE`, `KEYBOARD_TYPE`, `FINGERS`, and other variables constant.

Influence of Typing Course
- Compare typing speeds between participants with a typing course (`HAS_TAKEN_TYPING_COURSE` = 1) and without (`HAS_TAKEN_TYPING_COURSE` = 0), holding other variables such as `KEYBOARD_TYPE`, `AGE` range, and `FINGER_COUNT` constant.


In [10]:
# YOUR CODE HERE (add additional cells as needed)

## Removing unnecessary columns

In [2]:
df.head(2)

Unnamed: 0,PARTICIPANT_ID,AGE,HAS_TAKEN_TYPING_COURSE,COUNTRY,LAYOUT,NATIVE_LANGUAGE,FINGERS,KEYBOARD_TYPE,ERROR_RATE,AVG_WPM_15,ROR
0,3,30,0,US,qwerty,en,1-2,full,0.511945,61.9483,0.2288
1,5,27,0,MY,qwerty,en,7-8,laptop,0.87108,72.8871,0.3675


In [4]:
df=df.drop(columns=['PARTICIPANT_ID'])
df

Unnamed: 0,AGE,HAS_TAKEN_TYPING_COURSE,COUNTRY,LAYOUT,NATIVE_LANGUAGE,FINGERS,KEYBOARD_TYPE,ERROR_RATE,AVG_WPM_15,ROR
0,30,0,US,qwerty,en,1-2,full,0.511945,61.9483,0.2288
1,27,0,MY,qwerty,en,7-8,laptop,0.871080,72.8871,0.3675
2,13,0,AU,qwerty,en,7-8,laptop,6.685633,24.1809,0.0667
3,21,0,IN,qwerty,en,3-4,full,2.130493,24.7112,0.0413
4,21,0,PH,qwerty,tl,7-8,laptop,1.893287,45.3364,0.2678
...,...,...,...,...,...,...,...,...,...,...
168589,20,0,US,qwerty,en,9-10,laptop,8.731466,24.9125,0.1842
168590,25,0,PL,qwerty,pl,9-10,laptop,0.000000,66.2946,0.0639
168591,38,1,US,qwerty,en,9-10,laptop,0.147929,75.6713,0.2021
168592,28,0,GB,qwerty,en,9-10,laptop,0.278552,91.7083,0.5133


## Renaming columns

In [6]:
df.head(2)

Unnamed: 0,AGE,HAS_TAKEN_TYPING_COURSE,COUNTRY,LAYOUT,NATIVE_LANGUAGE,FINGERS,KEYBOARD_TYPE,ERROR_RATE,AVG_WPM_15,ROR
0,30,0,US,qwerty,en,1-2,full,0.511945,61.9483,0.2288
1,27,0,MY,qwerty,en,7-8,laptop,0.87108,72.8871,0.3675


In [10]:
rename_columns={
    'AVG_WPM_15':'wpm', 'ROR':'ror', 'HAS_TAKEN_TYPING_COURSE' :'course'
}
df.rename(columns=rename_columns,inplace=True)

## Compare typing speeds across groups using different numbers of fingers, excluding the "10+" category for simplicity.

In [11]:
df.head(2)

Unnamed: 0,AGE,course,COUNTRY,LAYOUT,NATIVE_LANGUAGE,FINGERS,KEYBOARD_TYPE,ERROR_RATE,wpm,ror
0,30,0,US,qwerty,en,1-2,full,0.511945,61.9483,0.2288
1,27,0,MY,qwerty,en,7-8,laptop,0.87108,72.8871,0.3675


In [17]:
df.groupby('FINGERS')[['wpm']].mean()

Unnamed: 0_level_0,wpm
FINGERS,Unnamed: 1_level_1
1-2,40.280812
10+,42.7899
3-4,41.004952
5-6,45.731789
7-8,50.057909
9-10,57.379572


## Control for consistency by first filtering to similar AGE, KEYBOARD_LAYOUT, NATIVE_LANGUAGE, KEYBOARD_TYPE, and HAS_TAKEN_TYPING_COURSE values.m

In [19]:
df.head(2)

Unnamed: 0,AGE,course,COUNTRY,LAYOUT,NATIVE_LANGUAGE,FINGERS,KEYBOARD_TYPE,ERROR_RATE,wpm,ror
0,30,0,US,qwerty,en,1-2,full,0.511945,61.9483,0.2288
1,27,0,MY,qwerty,en,7-8,laptop,0.87108,72.8871,0.3675


In [None]:
df2=df['']



Rollover Ratio Analysis
- The Rollover Ratio (`ROR`) represents the proportion of keypresses where a new key is pressed before releasing the previous one.

- Compare typing speeds between participants with `ROR` ≤ 20% and those with `ROR` > 80%, keeping `AGE`, `KEYBOARD_TYPE`, `FINGERS`, and other variables constant.

In [42]:
df.head(3)

Unnamed: 0,AGE,course,COUNTRY,LAYOUT,NATIVE_LANGUAGE,FINGERS,KEYBOARD_TYPE,ERROR_RATE,wpm,ror
0,30,0,US,qwerty,en,1-2,full,0.511945,61.9483,0.2288
1,27,0,MY,qwerty,en,7-8,laptop,0.87108,72.8871,0.3675
2,13,0,AU,qwerty,en,7-8,laptop,6.685633,24.1809,0.0667


In [44]:
df.query("ror*100>80")

Unnamed: 0,AGE,course,COUNTRY,LAYOUT,NATIVE_LANGUAGE,FINGERS,KEYBOARD_TYPE,ERROR_RATE,wpm,ror
0,30,0,US,qwerty,en,1-2,full,0.511945,61.9483,0.2288
1,27,0,MY,qwerty,en,7-8,laptop,0.871080,72.8871,0.3675
2,13,0,AU,qwerty,en,7-8,laptop,6.685633,24.1809,0.0667
3,21,0,IN,qwerty,en,3-4,full,2.130493,24.7112,0.0413
4,21,0,PH,qwerty,tl,7-8,laptop,1.893287,45.3364,0.2678
...,...,...,...,...,...,...,...,...,...,...
160463,26,0,GB,qwerty,en,9-10,laptop,1.254480,88.3960,0.8214
163673,26,1,US,qwerty,en,9-10,laptop,0.000000,94.9534,0.8251
165811,15,1,US,qwerty,en,9-10,full,0.000000,98.4824,0.8048
166059,23,0,US,qwerty,en,9-10,full,1.155676,111.2842,0.8099


In [49]:
below20=df.query("ror*100<=20")
below20

Unnamed: 0,AGE,course,COUNTRY,LAYOUT,NATIVE_LANGUAGE,FINGERS,KEYBOARD_TYPE,ERROR_RATE,wpm,ror
2,13,0,AU,qwerty,en,7-8,laptop,6.685633,24.1809,0.0667
3,21,0,IN,qwerty,en,3-4,full,2.130493,24.7112,0.0413
6,20,1,AF,qwerty,en,7-8,laptop,3.127715,9.9978,0.0049
7,26,0,PK,qwerty,ur,5-6,laptop,0.297177,62.7715,0.0811
8,22,0,PH,qwerty,en,3-4,full,2.194357,28.9376,0.1003
...,...,...,...,...,...,...,...,...,...,...
168582,18,1,IN,qwerty,ml,5-6,laptop,0.465839,28.0818,0.0704
168583,29,0,ID,qwerty,id,9-10,laptop,0.085034,50.4555,0.1909
168584,26,0,BD,qwerty,bn,1-2,full,0.807754,15.8977,0.0372
168589,20,0,US,qwerty,en,9-10,laptop,8.731466,24.9125,0.1842


Influence of Typing Course
- Compare typing speeds between participants with a typing course (`HAS_TAKEN_TYPING_COURSE` = 1) and without (`HAS_TAKEN_TYPING_COURSE` = 0), holding other variables such as `KEYBOARD_TYPE`, `AGE` range, and `FINGER_COUNT` constant.

In [20]:
df.head(3)

Unnamed: 0,AGE,course,COUNTRY,LAYOUT,NATIVE_LANGUAGE,FINGERS,KEYBOARD_TYPE,ERROR_RATE,wpm,ror
0,30,0,US,qwerty,en,1-2,full,0.511945,61.9483,0.2288
1,27,0,MY,qwerty,en,7-8,laptop,0.87108,72.8871,0.3675
2,13,0,AU,qwerty,en,7-8,laptop,6.685633,24.1809,0.0667


In [39]:
df.query("course==1")[["AGE","KEYBOARD_TYPE","FINGERS","wpm"]]

Unnamed: 0,AGE,KEYBOARD_TYPE,FINGERS,wpm
6,20,laptop,7-8,9.9978
14,11,full,9-10,42.5532
20,18,laptop,3-4,53.0141
23,25,laptop,3-4,14.6155
24,14,full,9-10,57.4870
...,...,...,...,...
168557,55,laptop,9-10,44.0643
168569,32,laptop,9-10,36.0377
168580,38,full,9-10,63.0322
168582,18,laptop,5-6,28.0818
