Skip to content

cqx931/chineseVisualEmbeddings

master
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
API
 
 
 
 
 
 
 
 
 
 

The Chinese Visual Embeddings

The Chinese Visual Embeddings are trained for the purpose of visual representations for the Chinese written signs. It is part of a larger project Graphein by Qianxun Chen

Training Chinese Visual Embeddings

STEP 0 Setup the environment

  • Download 'NotoSansCJKsc-Regular' and 'NotoSansCJKtc-Regular' from Google Noto Fonts and save them to folder 'fonts'
  • Create a new virtual environment with python 3 virtualenv --python=python3.6 venv
  • Activate the environment activate source venv/bin/activate
  • Install all the packages pip3 install -r requirements.txt

STEP 1 Prepare the data

python preprocess.py -img

STEP 2 Train embeddings

  • An earlier version of the embeddings was trained with python CNN.py
  • The latest version of the embeddings can be trained with python CNN_cuda_multiLabel.py

STEP3 Generate supporting files

  • Generate tsv files to preview embeddings in Embedding Projector python embeddings/VC/generateTSV.py
  • Generate word2vec format embeddings python embeddings/VC/tsvs2txt.py

API

Acknowledgement

dict.csv is created based on the dictionary.txt file from the project, makemehanzi.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages