Skip to content

chenmingxiang110/SimpleChinese

Repository files navigation

SimpleChinese

!!! This project is DEPRECATED. See 2nd edition at: SimpleChinese2. --------.. _SimpleChinese2: https://github.com/chenmingxiang110/SimpleChinese2

image

image

Documentation Status

Updates

Chinese text processing, representation, and visualization.

This package integrates many basic Chinese NLP functions, making Python-based Chinese word processing and information extraction simple and convenient.

Installation --------

To install SimpleChinese, run this command in your terminal:

$ pip install simplechinese

This is the preferred method to install SimpleChinese, as it will always install the most recent stable release.

If you don't have pip installed, this Python installation guide can guide you through the process.

From sources

The sources for SimpleChinese can be downloaded from the Github repo.

You can either clone the public repository:

$ git clone git://github.com/chenmingxiang110/simplechinese

Or download the tarball:

$ curl -OJL https://github.com/chenmingxiang110/simplechinese/tarball/master

Once you have a copy of the source, you can install it with:

$ python setup.py install

Features

  1. Read the data from a csv file.
df = pd.read_csv("test.csv")

image

  1. Clean the data.
sc.clean(df)

image

The clean function does the following:

fillna(): Fill the N/As in a pandas.DataFrame with an empty string.

toLower(): Transform alphabets to their lowercases.

remove_punctuations(): Remove all the punctuations in a string or a pandas.DataFrame.

remove_space(): Remove all the spaces in a string or a pandas.DataFrame.

  1. Extract words from the data
sc.extract_words(sc.clean(df))

image

  1. Vectorization
sc.pca(sc.tfidf(sc.clean(df).iloc[:,0]))

image

  1. Word cloud
sc.wordcloud(sc.clean(df).iloc[:,0], font_path="yahei.ttc")

image

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

About

This package integrates many basic Chinese NLP functions, making Python-based Chinese word processing and information extraction simple and convenient.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published