Skip to content

Official implementaion of YouroQNet, a toyish quantum text classifier implemented with pyVQNet and pyQPanda

License

Notifications You must be signed in to change notification settings

Kahsolt/YouroQNet

Repository files navigation

YouroQNet: Quantum Text Classification with Context Memory

Official implementaion of YouroQNet, a toyish quantum text classifier implemented with pyVQNet and pyQPanda

This repo contains code for the final problem of the OriginQ's 2nd CCF "Pilot Cup" contest (Professional Group - Quantum Machine Learning Track).

Oh yes yes child, we've run a hard time struggling.

The final total score is 79.2, and ranking unknown; but why the fuck you owe me a point of 0.8?? 🐱

And, code repo for the qualifying stage is here: 第二届“司南杯”初赛

YouroQNet

Quickstart

⚪ install

  • conda create -n q python==3.8 (pyvqnet requires Python 3.8)
  • conda activate q
  • pip install -r requirements.txt

⚪ for contest problem (👈 Follow this to reproduce our contest results!!)

  • python answer.py for preprocess & train (⚠ VERY VERY SLOW!!)
  • python check.py for evaluate

⚪ for quick peek of YouroQNet components

  • python vis_tokenizer.py for adaptive k-gram tokeinzer interactive demo
  • python vis_youroqnet.py for YouroQNet interactive demo
    • run_quantum_toy.cmd (👈 run the toy version out of box before all)

⚪ for full development

  • download the full dataset simplifyweibo_4_moods, unzip simplifyweibo_4_moods.csv to data folder
  • pip install -r requirements_dev.txt for extra dependencies
  • pushd repo & init_repos.cmd & popd for extra git repos
    • fasttext==0.9.2 requires numpy<1.24 (things might changed)
  • start_shell.cmd to enter deveolp run command env
    • start_shell.cmd py to get a ipy console that quick refering to pyvqnet's fucking undocumented-documentation with help()
  • mk_preprocess.cmd for making clean datasets, stats, plots & vocabs etc... (~7 minutes)
  • python vis_project.py to see 3d data projection (you will understand what the fuck this dataset is 👿)
  • run_baseline.cmd to run classic models
  • run_quantum.cmd to run quantum models

⚠ The training sometimes might fail due to ill random parameter initialization, when trainset loss not tends to decay or quickly go overfit, just kill it & retry 😅

⚪ core idea & contributions

ℹ See our PPT YouroQNet.pdf for more conceptual understanding 🎉

Dataset

A subset from simplifyweibo_4_moods: 1600 samples for train, 400 samples for test. Class label names: 0 - joy, 1 - angry, 2 - hate, 3 - sad, however is not very semantically corresponding in the datasets :(

⚠ File naming rule: train.csv is train set, test.csv is valid set, and the generated valid.csv might be the real test set for this contest. We use csv filename to refer to each split in the code

Todo List

  • data exploration
    • guess the target test set (valid.txt)
    • vocab & freq stats
    • pca & cluster
    • data relabel (?)
  • data filtering
    • punctuation sanitize
    • stop words removal
    • too short / long sententce
  • feature extraction
    • tf-idf (syntaxical)
    • fasttext embedding (sematical)
    • adaptive tokenizer
  • baseline models
    • sklearn
    • vqnet-classical
  • quantum models
    • quantum embedding
    • model route on different length
    • multi to binary clf
    • contrastive learning
    • learn the difference

Project layout

# meterials
ref/                # thesis for dev
  Question-ML.png   # problem sheet
  YouroQNet.pdf     # solution PPT (YouroQNet)
  init_thesis.cmd   # thesis donwloader
repo/               # git repos for research
  init_repos.cmd    # git repo cloner
  update_repos.cmd
data/               # dataset
  simplifyweibo_4_moods.csv   # raw dataset (manually download)
  train|test.csv    # context dataset
  *_cleaned.csv
  *_tokenized.txt
  cc.zh.300.bin     # FastText pretrained word embedding (auto downloaded)
log/                # outputs
  <analyzer>/       # aka. vocab
    <feature>/      # sklearn models
    <model>/        # vqnet/torch models
tmp/                # generated intermediate results for debug

# contest related
answer.py           # run script for preprocessing & training
check.py            # run script for evalution

# preprocessors
mk_*.py
mk_preprocess.cmd   # run script for mk_*.py

# models
run_baseline_*.py   # classical experiments
run_baseline.cmd    # run script for run_baseline_*.py
run_quantum.py      # quantum experiments
run_quantum.cmd     # run script for run_quantum.py
run_quantum_toy.cmd # toy QNN for debug and verify

# misc
vis_*.py            # intercative demos or debug scaffolds
utils.py            # common utils
start_shell.cmd     # develop env entry

# doc & lic
README.md
TECH.md             # techincal & theoretical stuff
requirements_*.txt
LICESE

ℹ For the contest, only these files are submitted: answer.py, mk_vocab.py, run_quantum.py, utils.py, README.md; it should be enough to run all quantum parts 😀

References

=> find thesis of related work in ref/init_thesis.cmd
=> find implementations of related work in repo/init_repos.cmd

Citation

If you find this work useful, please give a star ⭐ and cite~ 😃

@misc{kahsolt2023,
  author = {Kahsolt},
  title  = {YouroQNet: Quantum Text Classification with Context Memory},
  howpublished = {\url{https://github.com/Kahsolt/YouroQNet}}
  month  = {May},
  year   = {2023}
}

by Armit 2023/05/03

About

Official implementaion of YouroQNet, a toyish quantum text classifier implemented with pyVQNet and pyQPanda

Topics

Resources

License

Stars

Watchers

Forks