toolkit for compiling corpus from various sources
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
corpus_builder
.gitignore
LICENSE
README.md
requirements.txt
scrapy.cfg
setup.py

README.md

banglakit/corpus-builder

Having a large enough set of text is essential for NLP tasks; this tool is designed for the sole purpose of building large collection of text documents from the web.

A practical understanding of Python and Scrapy is essential for using the tool.

Example Usage

scrapy crawl bangladesh_pratidin -a start_date='2016-06-01' -a end_date='2016-06-05' -o test3.csv