Skip to content

briko-org/BrikoCorpus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BrikoCorpus

Briko Corpus comprises content extracted from news websites such as SINA.COM, LSSDJT.COM, TMTPOST.COM, etc. The raw content is filtered and sorted in the aim of improve the quality for machine-learning purposes.

Briko Corpus can be downloaded here

The filtering script ApplyFilter.py is provided for your reference.

You can also use GetSample.py to extract random samples of the corpus for testing purpose.

GetSample.py Usage:

python GetSample.py -num_samples 100 -corpus_path FinalContent.txt -sample_path SampleCorpus.txt

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages