forked from mushketyk/vocabulary-builder
-
Notifications
You must be signed in to change notification settings - Fork 0
afcarl/vocabulary-builder
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
vocabulary-builder is a simple script that collects words that occur the most frequently on a set of a crawled web-pages. It works as follows: start on a set of web-pages, strips HTML tags, count occurrence of each word on a page, collect all links from the page, repeat the same process for all pages that had links from the first page. This script can be helpful for collecting set of words that can be used later as inputs for neural networks. Used libraries: * stemming 1.0 - for word stemming * BeautifulSoup - for HTML parsing
About
Simple Python tool for selecting most used words from specific web-sites.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published
Languages
- Python 100.0%