Skip to content

Produces the map of words and frequency of their use in the unlimited size text file

License

Notifications You must be signed in to change notification settings

blin-1/large-text-analyzer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Large Text Analyzer

This ES6 module runs a user-defined function on the async stream of words and returns a map of aggregated results, in this example a word frequency count

IMPORTANT - This uses ES6 Module Loader - You will need a very recent version of node and below v13 run it with --node-experimental-features

Example of usage


import lta from 'large-text-analyzer'

// OPTIONALLY override the word delimiters - this RegEx is the default one:

lta.delimiters = /\s|[^a-zA-Z]|[0-9]/

// OPTIONALLY override the processWord(word) function // to be executed for each word. // NOTE "this.map" - this is the result map, // that gets returned at the end of the stream. // It can be used to store all kinds of results // this here is the default - counts unique // word occurences

lta.processWord = function (word){ word = word.toLowerCase() let count = this.map.get(word) count = count? ++count : 1 this.map.set(word,count) }

// Define async block: async function processFile (fileName) { let vocabulary = await lta.processWord(fileName); console.log(Vocabulary consists of ${vocabulary.size} words) console.log(vocabulary); }

// And execute it. processFile('./test/data/testdata.txt')


That will produce the map of words and frequency of their use in the input text file
SAMPLE OUTPUT:
Vocabulary consists of 2 words
Map {
'a' => 2,
'after' => 1
etc... }

About

Produces the map of words and frequency of their use in the unlimited size text file

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages