<a href="https://colab.research.google.com/github/BhavikBuchke/Cisco-Data-science-program/blob/main/Data%20cleaning/typing%20speeds%20project/Typing%20Speed.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Typing Speeds

How can you improve your typing speed?

The file `typing-speeds.csv` contains typing speed data from >168,000 people typing 15 sentences each. The data was collected via an online typing test published at a free typing speed assessment webpage.

In [1]:
# FOR GOOGLE COLAB ONLY.
# Uncomment and run the code below. A dialog will appear to upload files.
# Upload 'typing-speeds.csv'.

from google.colab import files
uploaded = files.upload()

Saving typing-speeds.csv to typing-speeds (3).csv


In [2]:
import pandas as pd

df = pd.read_csv('typing-speeds.csv')
df

Unnamed: 0,PARTICIPANT_ID,AGE,HAS_TAKEN_TYPING_COURSE,COUNTRY,LAYOUT,NATIVE_LANGUAGE,FINGERS,KEYBOARD_TYPE,ERROR_RATE,AVG_WPM_15,ROR
0,3,30,0,US,qwerty,en,1-2,full,0.511945,61.9483,0.2288
1,5,27,0,MY,qwerty,en,7-8,laptop,0.871080,72.8871,0.3675
2,7,13,0,AU,qwerty,en,7-8,laptop,6.685633,24.1809,0.0667
3,23,21,0,IN,qwerty,en,3-4,full,2.130493,24.7112,0.0413
4,24,21,0,PH,qwerty,tl,7-8,laptop,1.893287,45.3364,0.2678
...,...,...,...,...,...,...,...,...,...,...,...
168589,517932,20,0,US,qwerty,en,9-10,laptop,8.731466,24.9125,0.1842
168590,517936,25,0,PL,qwerty,pl,9-10,laptop,0.000000,66.2946,0.0639
168591,517943,38,1,US,qwerty,en,9-10,laptop,0.147929,75.6713,0.2021
168592,517944,28,0,GB,qwerty,en,9-10,laptop,0.278552,91.7083,0.5133




| **Variable**             | **Description**                                                                 |
|--------------------------|---------------------------------------------------------------------------------|
| `PARTICIPANT_ID`         | Unique ID of the participant                                                   |
| `AGE`                    | Age of the participant                                                         |
| `HAS_TAKEN_TYPING_COURSE`| Whether the participant has taken a typing course (1 = Yes, 0 = No)            |
| `COUNTRY`                | Country of the participant                                                     |
| `LAYOUT`      		   | Keyboard layout used (QWERTY, AZERTY, or QWERTZ)                               |
| `NATIVE_LANGUAGE`        | Native language of the participant                                             |
| `FINGERS`                | Number of fingers used for typing (options: 1-2, 3-4, 5-6, 7-8, 9-10)          |
| `KEYBOARD_TYPE`          | Type of keyboard used (Full/desktop, laptop, small physical, or touch)         |
| `ERROR_RATE(%)`          | Uncorrected error rate (as a percentage)                                       |
| `AVG_WPM_15`             | Words per minute averaged over 15 typed sentences                              |
| `ROR`                    | Rollover ratio                                                                 |


### Project Ideas
- Remove unnecessary columns, such as PARTICIPANT_ID, to streamline the dataset.

- Rename columns (e.g `AVG_WPM_15` to `wpm`, `ROR` to `ror`, `HAS_TAKEN_TYPING_COURSE` to `course`) for brevity and clarity during analysis.

Finger Count Analysis
- Compare typing speeds across groups using different numbers of fingers, excluding the "10+" category for simplicity.

- Control for consistency by first filtering to similar `AGE`, `KEYBOARD_LAYOUT`, `NATIVE_LANGUAGE`, `KEYBOARD_TYPE`, and `HAS_TAKEN_TYPING_COURSE` values.

- Exclude participants with high error rates (ERROR_RATE > 3%) to focus on reliable data.

- Drop columns after filtering if they now only have a single value.

Rollover Ratio Analysis
- The Rollover Ratio (`ROR`) represents the proportion of keypresses where a new key is pressed before releasing the previous one.

- Compare typing speeds between participants with `ROR` ≤ 20% and those with `ROR` > 80%, keeping `AGE`, `KEYBOARD_TYPE`, `FINGERS`, and other variables constant.

Influence of Typing Course
- Compare typing speeds between participants with a typing course (`HAS_TAKEN_TYPING_COURSE` = 1) and without (`HAS_TAKEN_TYPING_COURSE` = 0), holding other variables such as `KEYBOARD_TYPE`, `AGE` range, and `FINGER_COUNT` constant.


In [3]:
# Remove unnecessary column.
df.drop(columns=['PARTICIPANT_ID'], inplace=True)
df.head()

Unnamed: 0,AGE,HAS_TAKEN_TYPING_COURSE,COUNTRY,LAYOUT,NATIVE_LANGUAGE,FINGERS,KEYBOARD_TYPE,ERROR_RATE,AVG_WPM_15,ROR
0,30,0,US,qwerty,en,1-2,full,0.511945,61.9483,0.2288
1,27,0,MY,qwerty,en,7-8,laptop,0.87108,72.8871,0.3675
2,13,0,AU,qwerty,en,7-8,laptop,6.685633,24.1809,0.0667
3,21,0,IN,qwerty,en,3-4,full,2.130493,24.7112,0.0413
4,21,0,PH,qwerty,tl,7-8,laptop,1.893287,45.3364,0.2678


In [4]:
# Renaming columns.
df.rename(columns = {'AVG_WPM_15':'wpm','ROR':'ror','HAS_TAKEN_TYPING_COURSE':'course'}, inplace = True)
df.head()

Unnamed: 0,AGE,course,COUNTRY,LAYOUT,NATIVE_LANGUAGE,FINGERS,KEYBOARD_TYPE,ERROR_RATE,wpm,ror
0,30,0,US,qwerty,en,1-2,full,0.511945,61.9483,0.2288
1,27,0,MY,qwerty,en,7-8,laptop,0.87108,72.8871,0.3675
2,13,0,AU,qwerty,en,7-8,laptop,6.685633,24.1809,0.0667
3,21,0,IN,qwerty,en,3-4,full,2.130493,24.7112,0.0413
4,21,0,PH,qwerty,tl,7-8,laptop,1.893287,45.3364,0.2678


In [5]:
df['ror'] = df['ror'] * 100
df['ERROR_RATE'] = round(df['ERROR_RATE'] * 10, 2)
df['wpm'] = round(df['wpm'], 2)
df

Unnamed: 0,AGE,course,COUNTRY,LAYOUT,NATIVE_LANGUAGE,FINGERS,KEYBOARD_TYPE,ERROR_RATE,wpm,ror
0,30,0,US,qwerty,en,1-2,full,5.12,61.95,22.88
1,27,0,MY,qwerty,en,7-8,laptop,8.71,72.89,36.75
2,13,0,AU,qwerty,en,7-8,laptop,66.86,24.18,6.67
3,21,0,IN,qwerty,en,3-4,full,21.30,24.71,4.13
4,21,0,PH,qwerty,tl,7-8,laptop,18.93,45.34,26.78
...,...,...,...,...,...,...,...,...,...,...
168589,20,0,US,qwerty,en,9-10,laptop,87.31,24.91,18.42
168590,25,0,PL,qwerty,pl,9-10,laptop,0.00,66.29,6.39
168591,38,1,US,qwerty,en,9-10,laptop,1.48,75.67,20.21
168592,28,0,GB,qwerty,en,9-10,laptop,2.79,91.71,51.33


In [6]:
# Speeds across groups using different numbers of fingers, excluding the "10+" category.
less_then_10 = df[df['FINGERS'] != '10+']
compare_speed = less_then_10.groupby('FINGERS')
compare_speed.first()

Unnamed: 0_level_0,AGE,course,COUNTRY,LAYOUT,NATIVE_LANGUAGE,KEYBOARD_TYPE,ERROR_RATE,wpm,ror
FINGERS,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
1-2,30,0,US,qwerty,en,full,5.12,61.95,22.88
3-4,21,0,IN,qwerty,en,full,21.3,24.71,4.13
5-6,26,0,PK,qwerty,ur,laptop,2.97,62.77,8.11
7-8,27,0,MY,qwerty,en,laptop,8.71,72.89,36.75
9-10,11,1,PH,qwerty,tl,full,0.0,42.55,17.41


In [7]:
# Filtering to similar AGE, KEYBOARD_LAYOUT, NATIVE_LANGUAGE, KEYBOARD_TYPE, and HAS_TAKEN_TYPING_COURSE values.
similar = df[(df['AGE'] == 20) & (df['LAYOUT'] == 'qwerty') & (df['NATIVE_LANGUAGE'] == 'en')
                           & (df['KEYBOARD_TYPE'] == 'full') & (df['course'] == 1)]
similar.head()

Unnamed: 0,AGE,course,COUNTRY,LAYOUT,NATIVE_LANGUAGE,FINGERS,KEYBOARD_TYPE,ERROR_RATE,wpm,ror
75,20,1,US,qwerty,en,9-10,full,1.62,74.49,53.87
106,20,1,PH,qwerty,en,9-10,full,5.78,31.67,22.8
243,20,1,IN,qwerty,en,9-10,full,8.55,60.53,47.82
333,20,1,IN,qwerty,en,5-6,full,16.06,25.08,3.58
567,20,1,US,qwerty,en,3-4,full,60.3,41.02,3.99


In [8]:
# Exclude participants with high error rates (ERROR_RATE > 3%).
#reliable = similar.query('ERROR_RATE < 3')
reliable = similar[similar['ERROR_RATE'] < 3]
reliable

Unnamed: 0,AGE,course,COUNTRY,LAYOUT,NATIVE_LANGUAGE,FINGERS,KEYBOARD_TYPE,ERROR_RATE,wpm,ror
75,20,1,US,qwerty,en,9-10,full,1.62,74.49,53.87
1220,20,1,US,qwerty,en,9-10,full,2.85,57.56,23.42
1532,20,1,US,qwerty,en,9-10,full,1.61,50.47,40.15
3158,20,1,US,qwerty,en,7-8,full,0.00,61.40,53.98
4134,20,1,US,qwerty,en,9-10,full,0.00,81.23,44.14
...,...,...,...,...,...,...,...,...,...,...
164334,20,1,US,qwerty,en,9-10,full,0.00,52.26,31.80
165576,20,1,US,qwerty,en,9-10,full,1.68,85.48,39.70
165661,20,1,US,qwerty,en,3-4,full,0.59,48.55,8.72
166025,20,1,PH,qwerty,en,9-10,full,1.50,69.59,56.45


In [9]:
# Compare typing speeds between participants.
ror = df[(df['AGE'] < 30) & (df['course'] == 1) & (df['COUNTRY'] == 'US') & (df['LAYOUT'] == 'qwerty')
             & (df['NATIVE_LANGUAGE'] == 'en') & (df['KEYBOARD_TYPE'] == 'full')]

In [10]:
# Participants with ROR ≤ 20%.
ror_less_then_20 = ror.query('ror < 20')
ror_less_then_20.head()

Unnamed: 0,AGE,course,COUNTRY,LAYOUT,NATIVE_LANGUAGE,FINGERS,KEYBOARD_TYPE,ERROR_RATE,wpm,ror
168,24,1,US,qwerty,en,5-6,full,3.39,31.0,7.6
189,24,1,US,qwerty,en,5-6,full,9.74,41.53,5.85
193,24,1,US,qwerty,en,5-6,full,14.99,38.29,7.13
477,19,1,US,qwerty,en,5-6,full,8.66,26.49,6.71
491,26,1,US,qwerty,en,9-10,full,18.5,42.53,5.93


In [11]:
# Participants with ROR ≤ 80%.
ror_more_then_80 = ror.query('ror > 80')
ror_more_then_80.head()

Unnamed: 0,AGE,course,COUNTRY,LAYOUT,NATIVE_LANGUAGE,FINGERS,KEYBOARD_TYPE,ERROR_RATE,wpm,ror
2104,27,1,US,qwerty,en,9-10,full,12.9,121.18,85.94
10850,28,1,US,qwerty,en,9-10,full,6.94,101.81,83.5
29814,27,1,US,qwerty,en,9-10,full,1.29,102.27,81.29
30894,23,1,US,qwerty,en,9-10,full,5.14,91.19,80.38
43263,24,1,US,qwerty,en,7-8,full,10.89,86.67,91.7


In [12]:
#Compare typing speeds between participants with a typing course and without
with_course = df[(df['course'] == 1) & (df['KEYBOARD_TYPE'] == 'full') & (df['AGE'] == 25) & (df['FINGERS'] == '3-4')
                    & (df['NATIVE_LANGUAGE'] == 'en') & (df['LAYOUT'] == 'qwerty') & (df['COUNTRY'] == 'US')]
with_course.head()

Unnamed: 0,AGE,course,COUNTRY,LAYOUT,NATIVE_LANGUAGE,FINGERS,KEYBOARD_TYPE,ERROR_RATE,wpm,ror
1029,25,1,US,qwerty,en,3-4,full,8.09,54.25,30.19
3606,25,1,US,qwerty,en,3-4,full,9.74,45.78,3.59
3630,25,1,US,qwerty,en,3-4,full,7.59,44.22,4.15
19501,25,1,US,qwerty,en,3-4,full,15.95,33.1,13.84
23585,25,1,US,qwerty,en,3-4,full,3.07,75.54,40.35


In [13]:
without_course = df[(df['course'] == 0) & (df['KEYBOARD_TYPE'] == 'full') & (df['AGE'] == 25) & (df['FINGERS'] == '3-4')
                    & (df['NATIVE_LANGUAGE'] == 'en') & (df['LAYOUT'] == 'qwerty') & (df['COUNTRY'] == 'US')]
without_course.head()

Unnamed: 0,AGE,course,COUNTRY,LAYOUT,NATIVE_LANGUAGE,FINGERS,KEYBOARD_TYPE,ERROR_RATE,wpm,ror
318,25,0,US,qwerty,en,3-4,full,6.91,34.46,6.32
907,25,0,US,qwerty,en,3-4,full,10.42,36.35,15.72
1885,25,0,US,qwerty,en,3-4,full,4.9,35.91,26.57
3174,25,0,US,qwerty,en,3-4,full,7.59,55.39,50.64
3238,25,0,US,qwerty,en,3-4,full,5.66,60.57,39.25
