1. Big long string
2. Tokenize all the words
⁃ Store not just in a big row but as [Token: [Index]]
3. For each adjacent pair of tokens, look at all later occurrence indices to see if those tokens are grouped there too.
⁃ If you find a second occurrence of pairing, create a new "grouped" token for all of these pair-occurrences
⁃ Repeat this step until no more groups are created (i.e. adding a third, fourth, fifth word to the token)
4. Sort by token word count
5. Output json
charliewilliams/ChartingCallbacks
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published