This is a repository for a tool to export and visualize the keystroke data generated by GiacomoLaw's Keylogger Tool tool.
This tool ingests the keystroke.log
file of the Keylogger tool and makes two things:
-
Frequency of single commands typed (e.g.
[return]
,[left-cmd]
,[del]
). -
Frequency of single non-commands (keystrokes & words) typed (e.g.
t
,13
,not
). -
Frequency of bi-grams (2 keystroke combinations - e.g.
hi
+there
,[left-cmd]
+[tab]
,ma
+[return]
). -
Frequency of tri-grams (3 keystroke combinations - e.g.
[left-cmd]
+[left-shift]
+v
,[left-cmd]
+[tab]
+[tab]
,i
+love
+you
).
^these 4 JSON files can be used for data analysis.
- Frequency of bigrams (like #3, but) in the "graph" data format used by the D3 Sankey.
^this JSON is used to create the visualization depicted above.
An interactive visualization of bi-grams made with Evan Galloway's D3 Sankey Diagram. You can see my version of it here.
If you have data from GiacomoLaw's Keylogger Tool in a keystroke.log
file, you can use this tool by following these steps:
-
Clone or download this repository:
git clone https://github.com/calebfergie/keylogger-parsing.git
-
Install node dependencies:
cd keylogger-parsing && npm install
-
Add/copy your
keystroke.log
file into the data folder - public/data -
Start the node server:
node bin/www
You should see the following in your terminal
updated words JSON
updated bigrams JSON
updated commands JSON
updated trigrams JSON
updated bigrams-sankey JSON
finished running log parser
...and the data folder should now have files (commands.json
,words.json
,bigrams.json
,trigrams.json
) updated with your data.
If you navigate to localhost:5000
in your browser, the sankey digram should appear. It is slightly interactive, try dragging the nodes up & down.
The JSON files mentioned above are formatted as in the examples below:
- Frequency of single commands typed (e.g.
[return]
,[left-cmd]
,[del]
):
[{"value":"left-cmd","type":"command","frequency":62706},
{"value":"del","type":"command","frequency":33336},
{"value":"left-shift","type":"command","frequency":27040}]
- Frequency of single non-commands (keystrokes & words) typed (e.g.
t
,13
,not
):
[{"value":"if","type":"character","frequency":97}},
{"value":"can","type":"character","frequency":96},
{"value":"do","type":"character","frequency":95}]
- Frequency of bi-grams (2 keystroke combinations - e.g.
hi
+there
,[left-cmd]
+[tab]
,ma
+[return]
):
[{"value":["left-cmd","v"],"frequency":2496},
{"value":["left-cmd","c"],"frequency":2388},
{"value":["left-option","left-shift"],"frequency":2206}]
- Frequency of tri-grams (3 keystroke combinations - e.g.
[left-cmd]
+[left-shift]
+v
,[left-cmd]
+[tab]
+[tab]
,i
+love
+you
):
[{"value":["return","return","return"],"frequency":718},
{"value":["left","left-option","left-shift"],"frequency":713},
{"value":["s","left-cmd","left-cmd"],"frequency":712}]
- Frequency of bigrams in the "graph" data format used by the D3 Sankey:
{
"nodes":[
{"name":"left-cmd","type":"source"},
{"name":"down","type":"target"}
...],
"links":[
{"source":14,"target":3,"value":527},
{"source":14,"target":41,"value":526}
...]
}
This tool is written with node.js with the code to process keystroke.log
is stored in the log-parser.js in the repository.
The bigrams.json and trigrams.json files don't include all bi-grams and tri-grams. They are limited to results that appear with a certain frequency (or more). You can change this frequency by changing the value of freqFilter
in the file log-parser.js file, set to 250 in the example below:
var freqFilter = 250; //minimum number of occurrences to be included in the output
The app.js
file runs the log-parser.js
file and then serves the D3 visualization through an express server.
The code for the D3 tool is adapted from Evan Galloway's D3 Sankey Diagram, stored in the file galloway-sankey.js.
The Keylogger records both the press and release of some commands (e.g. [shift], [cmd], [ctrl]). For example, the keystroke combo Command+Tab
would actually appear as ["left-cmd", "tab", "left-cmd"]. Here's a video demonstrating what I mean.
This 'double-dipping' effect makes it harder to analyze this information, as there is a superfluous keystroke injected between other real ones.
I put in a feature request for this on GitHub, so we'll see if any update occur. Otherwise, log-parser.js
file will need to be updated to handle this.
Words that are also array methods (e.g. push, pop, shift) are not processed correctly for the D3 data viz by log-parser.js
. For my personal data set, I added the following alterations to handle it for source
and target
nodes:
if (source.match(/^(push|find|keys|some|map|shift|every|pop|unshift)$/)) {
source = source + "_"
}
and...
if (target.match(/^(push|find|keys|some|map|shift|every|pop|unshift)$/)) {
target = target + "_"
}
If you are receiving an error that reads: could not find X of type Y in the nodes array - this will create an error in the sankey diagram
, add the word X
to the list of words above.
This tool was made in order to perform analysis on my own keystroke data. Use at your own risk!
It was done in an effort to understand my conscious and subconscious decisions - as part of NYU ITPs Rest of You class.
- Feb. 4: Installed this keylogger on my mac.
- Feb. 23: Created first log-parser.js file.
- March 23: Added sankey data visualization & cleaned up tool
I was mostly interested in what keystrokes I typed in combination - keyboard shortcuts (e.g. ctrl+c
, ctrl+v
, ctrl+tab
) and repeated key presses (e.g. tab+tab+tab
, delete+delete+delete
).
You can read more about it here.