Skip to content

A Python script that extracts and cleans text from a SougouCS database

License

Notifications You must be signed in to change notification settings

Jaswers/SogouCS-Extractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

SougouCS-Extractor

Introduction

The project uses the SougouCS as source of documents for several purposes: as training data and as source of data to be annotated.

SougouCS are available from SougouCS database download.

The SougouCS extractor tool generates plain text from a SougouCS database.

Description

extractor.py is a Python script that extracts and cleans text from a SougouCS database.

Usage:

 extractor.py [options]

Options:

 -i,         : input file dir
 -o,         : out file dir
 --help      : display this help and exit

About

A Python script that extracts and cleans text from a SougouCS database

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages