/
index.Rmd
543 lines (351 loc) · 21.7 KB
/
index.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
---
title: "PASSWORD ETIQUTTE"
summary: "Lessons and observations from week 3 of 2020's #TidyTuesday challenge"
subtitle: "The dos and don'ts of creating passwords and why you need a password manager"
disable_comments: false # Optional, enable Table of Contents for specific post
authors:
- admin
date: '2020-05-29'
featured: true
image:
caption: 'Image credit: [**Photo by Markus Winkler on Unsplash**](https://unsplash.com/photos/OjSG0E_qcbo)'
focal_point: ""
placement: 3
preview_only: true
output: md_document
lastmod: "2020-05-29T17:51:00+03:00"
projects: [Tidy Tuesday]
categories: [R, Data Visualization]
tags: [TidyTuesday, R, Data Visualization]
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
#### INTRODUCTION
Exposing ourselves to cyber attacks comes as easy as having access to the internet. Almost every website you visit will require you to create an account to be able to access significant information or services from them - So where exactly does the problem with password strength begin? Memory methinks...
The whole purpose of memory is to preserve information. Events, names, faces, mathematical formulas etcetera, all seem recognizable every time we recollect because the mind memorizes them. I don't know how exactly remembering works, but you will agree with me on this; the memory is sometimes very unreliable when it comes to cramming passwords. It is for this very reason that, in the quest for memorizing our passwords, we find ourselves using weak passwords.
Different folk memorize passwords differently; some of us share passwords across platforms (very counter intuitive if you actually think about it re cyber security), some use personal information such as names, birthdays, pet names `r emo::ji("haha")` and what have you. My point is, we cannot fully trust ourselves to match brute force algorithms trying to hack their way into our Facebook(s), Instagram(s) and worse for me, online banking and credit card accounts. So here are a few Ps and Qs for setting up passwords.
>A small challenge for you [here](https://howsecureismypassword.net/), check how long it would take a computer to crack your bank account password.
I used [#TidyTuesday's passwords](https://github.com/rfordatascience/tidytuesday/blob/master/data/2020/2020-01-14/readme.md) data with some more compromised passwords from kaggle for illustration. To keep it concise, all the code used in this post can be found [here](https://github.com/CarlvinJerry/sources/blob/master/content/post/password-etiquette/index.Rmd). Shout out to my good friend and avid data analyst [Martin](https://www.linkedin.com/in/martin-wanjiru-98414111a/) who helped in building a password strength meter used in ranking passwords used here, you can read more on the tool [here](https://ndirangumartin.netlify.app/post/password-meter-in-r/). That being said, let's get visual.
<br>
#### SKIMMING THROUGH THE DATA
The graph bellow shows a scattered distribution of a small fraction of our compromised passwords. You want to ensure that if your passwords get parsed through this chart, they are on the bottom, far left of the chart i.e, the passwords are unique and exhibit stronger strengths. That way, you know that even a computer will take quite some time before cracking it. The further and lower it is along the x-axis,the smaller it should be to indicate uniqueness and the stronger it is hence the green color.
```{r chatterPlot, warning=FALSE, include=FALSE, results=FALSE}
#load packages
library(readr)
library(dplyr)
library(tidyr)
#library(tidyverse)
library(devtools)
library(remotes)
library(ggplot2)
library(ggrepel)
library(ggplot2)
library(scales)
#The Data...
#Read Kaggle data for compromised passwords
#dataset <- read.csv("compromised.csv", stringsAsFactors = FALSE)
dataset <- read.csv("J:/Personalprojects/Blogsite/sources/content/post/password-etiquette/compromised.csv", stringsAsFactors = FALSE)
#Read #TidyTuesday data
passwords <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-01-14/passwords.csv')
#Subset, bind password cols and check data
passwordDf = data.frame(Password = c(passwords$password,dataset$X0), stringsAsFactors = FALSE)
nrow(passwordDf) #Check rows to confirm new dataset
#remove nulls
passDf <- passwordDf %>% drop_na(Password)
nrow(passDf) #Check rows to confirm new dataset
#Save new dataSet
#write.csv(passDf,"J:/Personalprojects/Blogsite/sources/content/post/password-etiquette/compromisedPasswords.csv")
#Analysis
#Unique=========================================
uniques <- length(unique(passDf))
uniques
#Duplicates=============================
# duplicates <- length(passDf[duplicated(passDf)])
# duplicates
#Rankings======================================================
#Top 10 Passwords
#Counts
head(sort(table( passDf$Password ), decreasing = T),10)
#Percentages
head(sort(prop.table(table(passDf$Password)), decreasing = T),10)
#Tables
#Base words ====================
# library(tm)
# docs <- tm_map(docs, removeWords, stopwords("english"))
#
#
# 2. Number of characters
Passwordlength <- nchar(passDf$Password)
freq <- as.data.frame(table(Passwordlength))
sum(freq$Freq)
###PASSWORD STRENGTH ALGO
library(stringi)
library(stringr)
passMark <- function(passwords){
#Additions
#Number of characters----
numChars = nchar(passwords)
#Upper case letters ----
upperCase = stringi::stri_count(passwords, regex = "[A-Z]")
#Lower case letters ----
lowerCase = stringi::stri_count(passwords, regex = "[a-z]")
#Numbers----
nums = stringi::stri_count(passwords, regex = "[0-9]")
#symbols
symbols = c("~", "!", "@", "#", "\\$", "%", "\\^", "&", "\\*", "\\(" ,"\\)", "-", "\\+", "\\_", "=", "`" ,
"\\{" ,"\\}" ,"\\[" ,"\\]",":", ";" , "<" , ">", "\\?" ,"," ,"\\.", "\\'", "@", "#", noquote("\""))
num_symbols = stringr::str_count(passwords, paste(symbols, collapse = "|"))
#In the Middle numbers
midnums = stringi::stri_count(gsub('^.|.$', '', passwords), regex = "[0-9]")
#middle symbols
mid_symbols = stringr::str_count(gsub('^.|.$', '', passwords), paste(symbols, collapse = "|"))
requirements = c(upperCase, lowerCase, nums, num_symbols)
requirements_score = 0
for (i in requirements) {
if(i > 0) requirements_score = requirements_score + 1
}
requirements_score = ifelse(numChars < 8, 0, ifelse((requirements_score) < 3, 0, (requirements_score + 1)))
character_count_score = (numChars * 4)
uppercase_score = ifelse(upperCase == 0, 0, ((numChars - upperCase)*2))
lowercase_score = ifelse(lowerCase == 0, 0, ((numChars - lowerCase)*2))
numbers_score = ifelse(upperCase > 0 | lowerCase > 0 | num_symbols > 0, (nums * 4), 0)
symbols_score = (num_symbols * 6)
mid_nums_symbol_score = ((midnums + mid_symbols) * 2)
requirements_score = requirements_score * 2
summed = 0 + character_count_score + uppercase_score + lowercase_score + numbers_score + symbols_score + mid_nums_symbol_score + requirements_score
#Deductions
#letters only
letters_only = ifelse(numChars == (upperCase + lowerCase), numChars, 0)
#numbers only
numbers_only = ifelse(numChars == (nums), numChars, 0)
split_function = function(password){
password = str_extract_all(password, paste(c("[a-z]", "[A-Z]", "[0-9]", symbols), collapse = "|"))
password = password[[1]]
return(password)
}
#consecutive upper
split_pass = split_function(passwords)
consecutive_upper = ifelse(split_pass %in% LETTERS, 1, 0)
r = rle(consecutive_upper)
consecutive_upper = (r$lengths[r$values == 1]) - 1
consecutive_upper = sum(consecutive_upper)
#consecutive lower
split_pass = split_function(passwords)
consecutive_lower = ifelse(split_pass %in% letters, 1, 0)
r = rle(consecutive_lower)
consecutive_lower = (r$lengths[r$values == 1]) - 1
consecutive_lower = sum(consecutive_lower)
#Consecutive numbers
split_pass = split_function(passwords)
consecutive_numbers = ifelse(split_pass %in% (0:9), 1, 0)
r = rle(consecutive_numbers)
consecutive_numbers = (r$lengths[r$values == 1]) - 1
consecutive_numbers = sum(consecutive_numbers)
#Sequence checker function
sequence_checker = function(split_pass, var_type) {
if(var_type == "number"){
sequence_check = as.integer(ifelse(split_pass %in% (0:9), split_pass, 0))
} else if (var_type == "symbols"){
sequence_check = ifelse(split_pass %in% symbols[1:13], match(split_pass, symbols[1:13]), 0)
} else {
sequence_check = ifelse(tolower(split_pass) %in% letters, match(tolower(split_pass), letters), 0)
}
sequence_check = abs(diff(sequence_check))
sequence_check = ifelse(sequence_check == 1, 1, 0)
r = rle(sequence_check)
sequence_check = (r$lengths[r$values == 1]) - 1
sequence_check = sum(sequence_check)
return(sequence_check)
}
#letter sequence
sequence_letters = sequence_checker(split_pass, "letters")
#num sequence
sequence_num = sequence_checker(split_pass, "number")
#symbols
sequence_symbols = sequence_checker(split_pass, "symbols")
letters_only_score = letters_only
numbers_only_score = numbers_only
consecutive_upper_score = consecutive_upper * 2
consecutive_lower_score = consecutive_lower * 2
consecutive_numbers_score = consecutive_numbers * 2
sequence_letters_score = sequence_letters * 3
sequence_num_score = sequence_num * 3
sequence_symbols_score = sequence_symbols * 3
deductions = letters_only_score +
numbers_only_score +
consecutive_upper_score +
consecutive_lower_score +
consecutive_numbers_score +
sequence_letters_score +
sequence_num_score +
sequence_symbols_score
total_score = ifelse((summed - deductions) < 0 , 0, ifelse((summed - deductions) > 100, 100, (summed - deductions)))
# return(c(paste("Additions\nNumber of characters", character_count_score, sep = ": ") ,
# paste("\nUpper case characters", uppercase_score, sep = ":"),
# paste("\nLower case characters", lowercase_score, sep = ": "),
# paste("\nNumbers", numbers_score, sep = ": "),
# paste("\nSymbols", symbols_score, sep = ": "),
# paste("\nMiddle Numbers and symbols", mid_nums_symbol_score, sep = ": "),
# paste("\nRequirements", requirements_score, sep = ": "),
# paste("\n\nDeductions", sep = ""),
# paste("\nLetters only", letters_only_score, sep = ": "),
# paste("\nNumbers only", numbers_only_score, sep = ": "),
# paste("\nConsecutive upper", consecutive_upper_score, sep = ": "),
# paste("\nConsecutive lower", consecutive_lower_score, sep = ": "),
# paste("\nConsecutive numbers", consecutive_numbers_score, sep = ": "),
# paste("\nSequence letters", sequence_letters_score, sep = ": "),
# paste("\nSequence numbers", sequence_num_score, sep = ": "),
# paste("\nSequence symbols", sequence_symbols_score, sep = ": "),
return(paste("", total_score))
}
#Subset data randomly to get 1500 passwords for strenght measure
df<-sample_n(passDf, 1000)
strenghtdf <- df %>% select(Password) %>% mutate(Strength = passMark(Password))
#ChatterPlot
```
```{r ChatterPlot, echo=FALSE}
word_counts <- passDf %>% count(Password, sort = T) %>% data.frame(stringsAsFactors = FALSE)
chatterplotDf <- word_counts %>% rowwise() %>% mutate(Strength = as.numeric(passMark(Password)))
#write.csv(chatterplotDf,"J:/Personalprojects/Blogsite/sources/content/post/password-etiquette/chattrplotdf.csv")
set.seed(42)
p = head(chatterplotDf,200) %>% ggplot(aes( Strength,n, label = Password)) +
# ggrepel geom, make arrows transparent, color by rank, size by n
geom_text_repel(segment.alpha = 0,aes(colour = Strength, size=n)) +
#Scale the axes/log transform
coord_trans(x="log2", y="log2") +
# ggrepel geom, make arrows transparent, color by rank, size by n
#geom_text(aes(colour=Strength, size=n)) +
# set color gradient, customize legend
scale_color_gradient(low="red2", high="green4",
guide = guide_colourbar(direction = "vertical",
title.position = "top",
frame.colour = "black",
ticks.colour = "white",
label.theme = element_text(colour = "white"),
title.theme = element_text(colour = "white"),
barwidth = .2,
barheight = 20,
)) +
# set word size range & turn off legend
scale_size_continuous(range = c(1, 10),
guide = FALSE) +
ggtitle(paste0("Top 200 passwords from ",
nrow(chatterplotDf), # dynamically include row count
" compromised passwords, by frequency"),
subtitle = "Password frequency (size) ~ password strength (color)") +
labs(y = "Password Frequency (log scale)", x = "Password Srength (log scale)", caption = "Data source: TidyTuesday & Kaggle || Chart:Carlvin~BRD" )
#theme
p2 <- p + theme(panel.background = element_rect(fill = "gray4", color = NA),
panel.grid = element_blank(),
plot.background = element_rect(fill = "gray4", color = NA),
panel.grid.major.x = (element_line(color = "grey40",
size = 0.25,
linetype = "dotted")),
panel.grid.major.y = element_line(color = "grey40",
size = 0.1,
linetype = "dotted"),
# Change axis line
axis.line = element_line(colour = "white"),
axis.text = element_text(colour = "white"),
axis.title = element_text(size = 10,
colour = "white"),
plot.title = element_text(colour = "white",
face = "bold",
size = 11),
plot.subtitle = element_text(face = "italic",
size = 10,
colour = "white"),
plot.caption = element_text(color = "white", face = "italic"),
legend.background = element_blank())
p2
#Save plot
#ggsave(file="ptest.svg", plot=p2, width=10, height=8)
```
On the contrary, we have quite a lot of passwords clustered in between 0-35 on the strength meter. If you look at those passwords, it should be quite obvious that they're weak. This would explain why we have them here as our data. This chart shows most of the aspects of a password that I'll be discussing below.
<br>
#### CREATING SECURE PASSWORDS
#### 1.Don't share passwords
I mean, think about your bank accounts if your excuse is "I don't have any vital information on me anyway". Every time you share passwords across platforms, you make life easier for an attacker. Chances are this shared password is really strong and so you might think, why not? But here's the catch, a wise hacker understands that you might have done this, and all he has to do is gain access to that account you think has no information about you and look for an identity. With this little precious piece of information, it's easier for them to try out the accounts you know are actually important. Bank accounts and online shopping cards could tell you this if they were human.
#### 2.Regularly change your passwords
First of all, if you're too lazy to change a default password that some sites offer newbies, you deserve to be hacked. Start by creating a password of your own. Every now and then, you want to review and update your passwords because hackers are improving on a daily too.
#### 3.Personal information is a plea to be hacked
I remember using my own name on MySpace for a password, good days those ones. Times have changed and so should you. Hackers who guess passwords start basic. Your name will be first, then popular pet names because they saw you pose with your cats on Facebook. To put it in perspective, here's how cute you look to hackers when you use your personal information as passwords. Same goes for using dictionary words for passwords.
</center>
<figure>
![Weak password!](https://github.com/CarlvinJerry/sources/blob/master/static/MyImages/beware.jpg?raw=true)
</figure>
</center>
#### 4. Longer passwords are safer passwords
#### A look Variable Distributions
Here we have another plot showing distribution of our passwords based on length.
```{r passwords length, echo=FALSE, message=FALSE, warning=FALSE}
#plot1 number of characters distribution
data2 <- data.frame(x = freq$Passwordlength, y = freq$Freq)
plot1_CharactersDist <- ggplot(data2, aes(x = data2$x, y = data2$y)) +
geom_segment(aes(x = data2$x, xend = data2$x, y = 0, yend = data2$y),
color = ifelse( data2$x %in% c(6,8), 'orange1','grey69'),
size = ifelse( data2$x %in% c(6,8), 1.3,0.5)) +
geom_point( size = ifelse( data2$x %in% c(6,8), 3,2),
color = ifelse( data2$x %in% c(6,8), 'orange1','grey69')) +
theme_light() +
theme(
plot.background = element_rect(fill = "grey4"),
panel.background = element_rect(fill = 'grey4'),
panel.grid.major.x = element_blank(),
#panel.grid.minor.x = element_blank(),
panel.grid.major.y = element_line( size = .012, linetype = "dashed", colour = "grey"), # element_blank(),#
panel.grid.minor.y = element_blank(),# element_line( size = .1, linetype = "dashed"),
#panel.border = element_blank(),
axis.ticks.x = element_blank(),
axis.ticks.y = element_blank(),
plot.title = element_text( hjust = 0.5, vjust = 0, colour = 'grey'),
axis.text.x = element_text(colour = 'grey'),
axis.text.y = element_text(colour = 'grey', angle = 90, hjust = 0.5),
axis.title.x = element_text(colour = 'grey'),
axis.title.y = element_text(colour = 'grey'),
plot.caption = element_text(colour = "white", face = "italic")
) + labs(x = 'Password Length',y ='Frequency', caption = "Data source: TidyTuesday || Chart:Carlvin~BRD" )
plot1_CharactersDist +
annotate("text", x = data2$x[which(data2$x == 6)], y = data2$y[which(data2$x == 6)]*1,
label = paste("6 characters(Minimum recommended)\n (val=",data2$y[which(data2$x == 6)] %>% round(2),")",sep = "" ),
color = "firebrick4",size = 3.3, angle = 0, fontface = "italic", hjust = -0.05) +
annotate("text", x = data2$x[which(data2$x == 7)], y = data2$y[which(data2$x == 7)]*1,
label = paste("7 Characters\n (val=",data2$y[which(data2$x == 7)] %>% round(2),")",sep = "" ),
color = "orchid4", size = 3.3, angle = 0, fontface = "italic", hjust = -0.05) +
annotate("text", x = data2$x[which(data2$x == 8)], y = data2$y[which(data2$x == 8)]*1,
label = paste("8 Characters(Recommended by most websites)\n (val=",data2$y[which(data2$x == 8)] %>% round(2),")",sep = "" ),
color = "palegreen4", size = 3.3, angle = 0, fontface = "italic", hjust = -0.05) +
annotate("text", x = data2$x[which(data2$x == 0:5)], y = data2$y[which(data2$x == 0:5)]*1,
label = paste("\n ",data2$y[which(data2$x == 0:5)] %>% round(2),"",sep = "" ),
color = "firebrick3", size = 3.3, angle = 90, fontface = "italic", hjust = -0.05, vjust = 0.05) +
annotate("text", x = data2$x[which(data2$x %in% c( 9:41))], y = data2$y[which(data2$x %in% c( 9:41))]*1,
label = paste("\n ",data2$y[which(data2$x %in% c( 9:41))] %>% round(2),"",sep = "" ),
color = "palegreen4", size = 3.3, angle = 90, fontface = "plain", hjust = -0.05, vjust = 0.1) +
# annotate("rect", xmin = data2$x[which(data2$x == 6)], xmax = data2$x[which(data2$x == 41)], ymin = 0, ymax = 0,
# fill = 'green', alpha = .5) +
# annotate("rect", xmin = data2$x[which(data2$x == 0)], xmax = data2$x[which(data2$x == 6)], ymin = 0, ymax = 0,
# fill = 'red', alpha = 1) +
# annotate("segment", x = 7, xend = 10, y = 5000,
# yend = 5000, colour = "green", size = 0.5, alpha = 0.25) +
# annotate("segment", x = 3, xend = 7, y = 5000,
# yend = 5000, colour = "red", size = 0.5, alpha = 0.25)
ggtitle("Distribution Of Number Of Characters in Passwords")
#dev.off()
```
<br>
#### 5. 2FA is your friend, use it
Two-factor authentication offers an extra protection to your accounts. In an event someone cracks your password, they will require a security code to gain access to the account. This code is usually sent to a mobile number registered to the said account, one cannot access the account without keying in the right code. Have it enabled if it's available.
</center>
<figure>
![Weak password!](https://github.com/CarlvinJerry/sources/blob/master/static/MyImages/comic.png?raw=true)
<figcaption>Source: [comics](https://xkcd.com/936/)</figcaption>
</figure>
</center>
<br>
> Much as it's possible to do all these on your own, a password manager is your best shot. With a password manager, you only have to worry about one ```master password```. A password manager stores all your sensitive passwords for you, runs frequent checks on them and generates new stronger passwords for you as well, that way your memory won't fail you. Most password managers automatically recognize a website whenever visited and will automatically fill in your credentials. I personally use the free version of [Lastpass](https://www.lastpass.com/), there are many more which are entirely free or on subscription basis. You can read more about them [here](https://en.wikipedia.org/wiki/Password_manager). Stay safe `r emo::ji("smile")`.
<br>
<center>
![](https://fontmeme.com/permalink/190129/8b378e9ce35b7a28dd150c4f1d656807.png)
<center>
<br>