Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
260 commits
Select commit Hold shift + click to select a range
aebc06b
Create seg.rst
BAOOOOOM Aug 13, 2021
1d87c63
Create parse.rst
BAOOOOOM Aug 13, 2021
2d3ff98
Create pretrain.rst
BAOOOOOM Aug 13, 2021
b9753bf
Create tokenize.rst
BAOOOOOM Aug 13, 2021
f4cc298
Create vectorization.rst
BAOOOOOM Aug 13, 2021
5eff405
Merge pull request #32 from BAOOOOOM/docs_dev
tswsxk Aug 13, 2021
0352b3d
Merge branch 'docs_dev' of https://github.com/bigdata-ustc/EduNLP int…
tswsxk Aug 13, 2021
1adf452
Merge pull request #1 from BAOOOOOM/docs_dev
BAOOOOOM Aug 13, 2021
8b1bc9c
[docs] update i2v example
tswsxk Aug 13, 2021
4ebefeb
Create vectorization.rst
BAOOOOOM Aug 13, 2021
fa1e823
Create vectorization.rst
BAOOOOOM Aug 13, 2021
ea9047a
Create pretrain.rst
BAOOOOOM Aug 13, 2021
9a3f52b
Merge pull request #33 from BAOOOOOM/master
tswsxk Aug 13, 2021
ad4b3ea
Create pretrain.rst
BAOOOOOM Aug 14, 2021
7d93d9a
Create pretrain.rst
BAOOOOOM Aug 14, 2021
4f6f708
Create pretrain.rst
BAOOOOOM Aug 14, 2021
ce78dd6
Create pretrain.rst
BAOOOOOM Aug 14, 2021
a044718
Create parse.rst
BAOOOOOM Aug 14, 2021
5dd8363
Create parse.rst
BAOOOOOM Aug 14, 2021
3d7481f
Create parse.rst
BAOOOOOM Aug 14, 2021
aa96071
Create formula.ipynb
BAOOOOOM Aug 14, 2021
84cb565
Create tree.ipynb
BAOOOOOM Aug 14, 2021
52d7635
Create d2v_bow_tfidf.ipynb
BAOOOOOM Aug 14, 2021
532be3f
Create d2v_general.ipynb
BAOOOOOM Aug 14, 2021
a8069fd
Create d2v_stem_tf.ipynb
BAOOOOOM Aug 14, 2021
8ed186c
Create w2v_stem_text.ipynb
BAOOOOOM Aug 14, 2021
57b0f6a
Create w2v_stem_tf.ipynb
BAOOOOOM Aug 14, 2021
8f414e2
Create prepare_dataset.ipynb
BAOOOOOM Aug 14, 2021
c6d67ed
Create d2v_bow_tfidf.ipynb
BAOOOOOM Aug 14, 2021
0313aaf
Create d2v_general.ipynb
BAOOOOOM Aug 14, 2021
d04887a
Create d2v_stem_tf.ipynb
BAOOOOOM Aug 14, 2021
5b2378e
Create w2v_stem_text.ipynb
BAOOOOOM Aug 14, 2021
93dddcc
Create w2v_stem_tf.ipynb
BAOOOOOM Aug 14, 2021
45572df
Create d2v.ipynb
BAOOOOOM Aug 14, 2021
2a6ee28
Create d2v_d1.ipynb
BAOOOOOM Aug 14, 2021
3968d8e
Create d2v_d2.ipynb
BAOOOOOM Aug 14, 2021
c26cc11
Create sif.ipynb
BAOOOOOM Aug 14, 2021
5428156
Create pretrain.rst
BAOOOOOM Aug 14, 2021
2ae34b4
Create parse.rst
BAOOOOOM Aug 14, 2021
44ff66d
Create vectorization.rst
BAOOOOOM Aug 14, 2021
8d1c2a8
Create index.rst
BAOOOOOM Aug 15, 2021
fcf1b60
Create AUTHORS.md
BAOOOOOM Aug 16, 2021
b85dabf
Merge pull request #35 from BAOOOOOM/master
tswsxk Aug 16, 2021
859c9cf
Create conf.py
BAOOOOOM Aug 17, 2021
542ab3f
Create conf.py
BAOOOOOM Aug 18, 2021
c7aa181
Merge pull request #40 from BAOOOOOM/master
tswsxk Aug 18, 2021
daa3788
Create seg.rst
BAOOOOOM Aug 18, 2021
8cd0439
Create index.rst
BAOOOOOM Aug 18, 2021
b24ba15
Create seg.rst
BAOOOOOM Aug 18, 2021
16014ef
Add files via upload
BAOOOOOM Aug 18, 2021
f43c346
Create index.rst
BAOOOOOM Aug 18, 2021
987e3b4
Create index.rst
BAOOOOOM Aug 18, 2021
a35404a
Add files via upload
BAOOOOOM Aug 18, 2021
431b91f
Create index.rst
BAOOOOOM Aug 18, 2021
f83f7ba
Create index.rst
BAOOOOOM Aug 18, 2021
91eb697
Create index.rst
BAOOOOOM Aug 18, 2021
600e7d7
Create index.rst
BAOOOOOM Aug 18, 2021
f5d16e6
Create conf.py
BAOOOOOM Aug 18, 2021
2191828
Add files via upload
BAOOOOOM Aug 18, 2021
1b374f8
Create conf.py
BAOOOOOM Aug 18, 2021
a4bad95
Add files via upload
BAOOOOOM Aug 18, 2021
28fd082
Create conf.py
BAOOOOOM Aug 18, 2021
d4cd37c
Create conf.py
BAOOOOOM Aug 19, 2021
f559be5
Add files via upload
BAOOOOOM Aug 19, 2021
211a61e
Delete examples/seg directory
BAOOOOOM Aug 19, 2021
d59799d
Add files via upload
BAOOOOOM Aug 19, 2021
d62851d
Add files via upload
BAOOOOOM Aug 19, 2021
f8b5171
Create index.rst
BAOOOOOM Aug 19, 2021
206c45c
Create tokenize.rst
BAOOOOOM Aug 19, 2021
e331a5a
Create index.rst
BAOOOOOM Aug 19, 2021
9df6eaf
Add files via upload
BAOOOOOM Aug 19, 2021
6f65436
Create index.rst
BAOOOOOM Aug 19, 2021
144b873
Create index.rst
BAOOOOOM Aug 19, 2021
4ff7f3f
Create index.rst
BAOOOOOM Aug 19, 2021
7398073
Add files via upload
BAOOOOOM Aug 19, 2021
a180021
Create TextTokenizer.rst
BAOOOOOM Aug 19, 2021
8885fb4
Create TextTokenizer.rst
BAOOOOOM Aug 19, 2021
1d27eee
Create TextTokenizer.rst
BAOOOOOM Aug 19, 2021
49eb426
Create TextTokenizer.rst
BAOOOOOM Aug 19, 2021
94a3ff6
Create TextTokenizer.rst
BAOOOOOM Aug 19, 2021
4ce0296
Create GensimSegTokenizer.rst
BAOOOOOM Aug 19, 2021
c11976f
Create GensimWordTokenizer.rst
BAOOOOOM Aug 19, 2021
390a6c6
Create GensimWordTokenizer.rst
BAOOOOOM Aug 19, 2021
dc10bb7
Create GensimSegTokenizer.rst
BAOOOOOM Aug 19, 2021
fa23d18
Create GensimSegTokenizer.rst
BAOOOOOM Aug 19, 2021
729cd39
Delete data.png
BAOOOOOM Aug 19, 2021
f133cd0
Delete formula.png
BAOOOOOM Aug 19, 2021
3b78217
Delete sif_addition.png
BAOOOOOM Aug 19, 2021
3269ca5
Delete tokenizer.png
BAOOOOOM Aug 19, 2021
acf077c
Create tokenize.rst
BAOOOOOM Aug 19, 2021
4f68ad7
Create tokenize.rst
BAOOOOOM Aug 19, 2021
45a9ba3
Create tokenize.rst
BAOOOOOM Aug 19, 2021
ab7a25f
Create tokenize.rst
BAOOOOOM Aug 19, 2021
37a8398
Create tokenize.rst
BAOOOOOM Aug 19, 2021
4470be7
Create formula.ipynb
BAOOOOOM Aug 19, 2021
b8ec4b5
Create tokenize.rst
BAOOOOOM Aug 19, 2021
60c4ccd
Add files via upload
BAOOOOOM Aug 19, 2021
f52895c
Create pretrain.rst
BAOOOOOM Aug 19, 2021
8c0b7b9
Create pretrain.rst
BAOOOOOM Aug 19, 2021
e0cae1e
Create tokenize.rst
BAOOOOOM Aug 19, 2021
617757f
Create seg.rst
BAOOOOOM Aug 19, 2021
4797654
Create conf.py
BAOOOOOM Aug 19, 2021
aaad4d7
Create parse.rst
BAOOOOOM Aug 19, 2021
94fa10b
Create parse.rst
BAOOOOOM Aug 19, 2021
421e1a9
Create parse.rst
BAOOOOOM Aug 19, 2021
05be6b2
Create pretrain.rst
BAOOOOOM Aug 19, 2021
4d2dfaa
Merge pull request #41 from BAOOOOOM/master
tswsxk Aug 20, 2021
24c4365
Create parse.rst
BAOOOOOM Aug 20, 2021
8b1777b
Create pretrain.rst
BAOOOOOM Aug 20, 2021
4b3c178
Merge pull request #42 from BAOOOOOM/master
tswsxk Aug 20, 2021
0b2dabd
Add files via upload
BAOOOOOM Aug 20, 2021
2e278f4
Create 文本语法结构解析.rst
BAOOOOOM Aug 20, 2021
4f323ee
Create 公式语法结构解析.rst
BAOOOOOM Aug 20, 2021
2e936fe
Create parse.rst
BAOOOOOM Aug 20, 2021
648e67d
Create parse.rst
BAOOOOOM Aug 20, 2021
cdd3082
Create 文本语法结构解析.rst
BAOOOOOM Aug 20, 2021
ccb4677
Create 文本语法结构解析.rst
BAOOOOOM Aug 20, 2021
e6fe68d
Create 公式语法结构解析.rst
BAOOOOOM Aug 20, 2021
1dc6b23
Create 公式语法结构解析.rst
BAOOOOOM Aug 20, 2021
c8d92f7
Create 公式语法结构解析.rst
BAOOOOOM Aug 20, 2021
7c83e20
Create 文本语法结构解析.rst
BAOOOOOM Aug 20, 2021
cf1ef1e
Create 公式语法结构解析.rst
BAOOOOOM Aug 20, 2021
2ef99c3
Create 文本语法结构解析.rst
BAOOOOOM Aug 20, 2021
4edafdf
Create 公式语法结构解析.rst
BAOOOOOM Aug 20, 2021
f3ce5e5
Add files via upload
BAOOOOOM Aug 20, 2021
c103035
Create 语义成分分解.rst
BAOOOOOM Aug 20, 2021
60b6820
Create 结构成分分解.rst
BAOOOOOM Aug 20, 2021
6b86fbc
Delete tokenizer.ipynb
BAOOOOOM Aug 20, 2021
85b9abd
Add files via upload
BAOOOOOM Aug 20, 2021
571d7be
Delete examples/Tokenizer directory
BAOOOOOM Aug 20, 2021
d5cc0f1
Add files via upload
BAOOOOOM Aug 20, 2021
e70e865
Delete examples/formula directory
BAOOOOOM Aug 20, 2021
d2f3eee
Add files via upload
BAOOOOOM Aug 20, 2021
db55d06
Add files via upload
BAOOOOOM Aug 20, 2021
c0bc61e
Add files via upload
BAOOOOOM Aug 20, 2021
2a8cb43
Add files via upload
BAOOOOOM Aug 20, 2021
4c8246b
Merge pull request #3 from test2021413/docs_dev
BAOOOOOM Aug 20, 2021
b53ba42
Create 结构成分分解.rst
BAOOOOOM Aug 20, 2021
f3f3bf2
Create 语义成分分解.rst
BAOOOOOM Aug 20, 2021
f45b415
Create 结构成分分解.rst
BAOOOOOM Aug 20, 2021
35bb4ed
Create pretrain.rst
BAOOOOOM Aug 20, 2021
ea5de51
Create seg.rst
BAOOOOOM Aug 20, 2021
7caf3a0
Create parse.rst
BAOOOOOM Aug 20, 2021
0b0c4fa
Create seg.rst
BAOOOOOM Aug 20, 2021
cf828b9
Create 公式语法结构解析.rst
BAOOOOOM Aug 20, 2021
f4e0819
Create 文本语法结构解析.rst
BAOOOOOM Aug 20, 2021
026d670
Create parse.rst
BAOOOOOM Aug 20, 2021
262a11c
Create seg.rst
BAOOOOOM Aug 20, 2021
48f45ae
Create 结构成分分解.rst
BAOOOOOM Aug 20, 2021
cc6981a
Create 语义成分分解.rst
BAOOOOOM Aug 20, 2021
6ad9a96
Create 结构成分分解.rst
BAOOOOOM Aug 20, 2021
a7c5837
Create 公式语法结构解析.rst
BAOOOOOM Aug 20, 2021
a5e8b24
Create 文本语法结构解析.rst
BAOOOOOM Aug 20, 2021
32d1316
Create 结构成分分解.rst
BAOOOOOM Aug 20, 2021
0fa9720
Create 语义成分分解.rst
BAOOOOOM Aug 20, 2021
393d2f3
Create 语义成分分解.rst
BAOOOOOM Aug 20, 2021
b17ac0b
Create i2v.ipynb
BAOOOOOM Aug 20, 2021
4e50287
Delete examples/i2v directory
BAOOOOOM Aug 20, 2021
f14520e
Delete examples/t2v directory
BAOOOOOM Aug 20, 2021
1be5415
Add files via upload
BAOOOOOM Aug 20, 2021
a5dae70
Add files via upload
BAOOOOOM Aug 20, 2021
faf5251
Create conf.py
BAOOOOOM Aug 20, 2021
590db46
Add files via upload
BAOOOOOM Aug 20, 2021
26a38d1
Create conf.py
BAOOOOOM Aug 20, 2021
b669ff8
Delete sif_addition.png
BAOOOOOM Aug 21, 2021
dbba074
Add files via upload
BAOOOOOM Aug 21, 2021
6cf3dea
Add files via upload
BAOOOOOM Aug 21, 2021
def858b
Delete seg.png
BAOOOOOM Aug 21, 2021
8ccc06e
Add files via upload
BAOOOOOM Aug 21, 2021
82d58cf
Create parse.ipynb
BAOOOOOM Aug 21, 2021
de36978
Delete data.png
BAOOOOOM Aug 21, 2021
7db6bbf
Add files via upload
BAOOOOOM Aug 21, 2021
06c73a3
Add files via upload
BAOOOOOM Aug 21, 2021
bde48bc
Delete data.png
BAOOOOOM Aug 21, 2021
7ea98d0
Add files via upload
BAOOOOOM Aug 21, 2021
eafbd9d
Create conf.py
BAOOOOOM Aug 21, 2021
8225171
Add files via upload
BAOOOOOM Aug 21, 2021
146f6b2
Delete data.png
BAOOOOOM Aug 21, 2021
418e48e
Add files via upload
BAOOOOOM Aug 21, 2021
cfc70cb
Create conf.py
BAOOOOOM Aug 21, 2021
d5d0530
Create conf.py
BAOOOOOM Aug 21, 2021
d43dc1d
Add files via upload
BAOOOOOM Aug 21, 2021
62451e5
Create conf.py
BAOOOOOM Aug 21, 2021
92afc44
Add files via upload
BAOOOOOM Aug 21, 2021
289ed10
Merge pull request #44 from BAOOOOOM/master
BAOOOOOM Aug 21, 2021
a73846d
Create conf.py
BAOOOOOM Aug 21, 2021
f159165
Add files via upload
BAOOOOOM Aug 21, 2021
d890696
Delete sif.png
BAOOOOOM Aug 21, 2021
eff6864
Add files via upload
BAOOOOOM Aug 21, 2021
276d431
Create conf.py
BAOOOOOM Aug 21, 2021
39c8ce6
Create conf.py
BAOOOOOM Aug 21, 2021
7127957
Create index.rst
BAOOOOOM Aug 21, 2021
edc8206
Create index.rst
BAOOOOOM Aug 21, 2021
b119b30
Add files via upload
BAOOOOOM Aug 21, 2021
949f9a2
Create parse.rst
BAOOOOOM Aug 21, 2021
f03b7e8
Create parse.rst
BAOOOOOM Aug 21, 2021
9121855
Create seg.rst
BAOOOOOM Aug 21, 2021
6c4e681
Create sif.rst
BAOOOOOM Aug 21, 2021
11e8627
Create sif.rst
BAOOOOOM Aug 21, 2021
03c2e6c
Create sif.rst
BAOOOOOM Aug 21, 2021
6ef1df4
Create sif.rst
BAOOOOOM Aug 21, 2021
275f0a0
Create sif.rst
BAOOOOOM Aug 21, 2021
6d6e029
Create sif.rst
BAOOOOOM Aug 21, 2021
1d4eb0d
Create sif.rst
BAOOOOOM Aug 21, 2021
ba1febb
Create sif.rst
BAOOOOOM Aug 21, 2021
ffb6e29
Create sif.rst
BAOOOOOM Aug 21, 2021
57c9811
Create sif.rst
BAOOOOOM Aug 21, 2021
d40a561
Create sif.rst
BAOOOOOM Aug 21, 2021
3c44eb4
Create sif.rst
BAOOOOOM Aug 21, 2021
9ee3a34
Create sif.rst
BAOOOOOM Aug 21, 2021
8456024
Create sif.rst
BAOOOOOM Aug 21, 2021
20aa133
Create sif.rst
BAOOOOOM Aug 21, 2021
64898b6
Create sif.rst
BAOOOOOM Aug 21, 2021
2668cf2
Create sif.rst
BAOOOOOM Aug 21, 2021
98988b9
Add files via upload
BAOOOOOM Aug 21, 2021
90e120b
Create 分词.rst
BAOOOOOM Aug 21, 2021
6533fd4
Create 分句.rst
BAOOOOOM Aug 21, 2021
ec57a31
Create 令牌化.rst
BAOOOOOM Aug 21, 2021
f964054
Create 令牌化.rst
BAOOOOOM Aug 21, 2021
824fe92
Create 令牌化.rst
BAOOOOOM Aug 21, 2021
fad5a72
Create 令牌化.rst
BAOOOOOM Aug 21, 2021
415b264
Create tokenize.rst
BAOOOOOM Aug 21, 2021
8011eb4
Create tokenize.rst
BAOOOOOM Aug 21, 2021
7f7d05f
Create tokenize.rst
BAOOOOOM Aug 21, 2021
7677c94
Add files via upload
BAOOOOOM Aug 21, 2021
b4fc768
Create 不使用预训练模型.txt
BAOOOOOM Aug 21, 2021
5feda19
Create 不使用预训练模型.txt
BAOOOOOM Aug 21, 2021
a32a7cd
Create 使用预训练模型.txt
BAOOOOOM Aug 21, 2021
e819706
Delete docs/source/tutorial/zh/vectorization directory
BAOOOOOM Aug 21, 2021
96e2799
Add files via upload
BAOOOOOM Aug 21, 2021
b96e1e2
Create vectorization.rst
BAOOOOOM Aug 21, 2021
a9e444e
Add files via upload
BAOOOOOM Aug 21, 2021
68bddc0
Create start.rst
BAOOOOOM Aug 21, 2021
2f9cc0c
Create loading.rst
BAOOOOOM Aug 21, 2021
b6101ab
Create pub.rst
BAOOOOOM Aug 21, 2021
c392a82
Create pretrain.rst
BAOOOOOM Aug 21, 2021
1a363f6
Create vectorization.rst
BAOOOOOM Aug 21, 2021
b42145f
Create vectorization.rst
BAOOOOOM Aug 21, 2021
1146873
Delete docs/source/tutorial/zh/vectorization directory
BAOOOOOM Aug 21, 2021
22bcdbb
Add files via upload
BAOOOOOM Aug 21, 2021
a2af455
Create 不使用预训练模型.rst
BAOOOOOM Aug 21, 2021
958a0e2
Create 使用预训练模型.rst
BAOOOOOM Aug 21, 2021
d325d16
Merge pull request #47 from BAOOOOOM/master
BAOOOOOM Aug 22, 2021
c77672b
Create vectorization.rst
BAOOOOOM Aug 22, 2021
0dc1564
Create seg.rst
BAOOOOOM Aug 22, 2021
9288fcb
Create seg.rst
BAOOOOOM Aug 22, 2021
8be67d6
Create seg.rst
BAOOOOOM Aug 22, 2021
d293cd2
Create seg.rst
BAOOOOOM Aug 22, 2021
d3a40e3
Create start.rst
BAOOOOOM Aug 22, 2021
af1a924
Merge pull request #48 from BAOOOOOM/master
BAOOOOOM Aug 22, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion AUTHORS.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,5 +12,6 @@

[Longhu Qin](https://github.com/KenelmQLH)

[Meikai Bao](https://github.com/BAOOOOOM)

The stared contributors are the corresponding authors.
The stared contributors are the corresponding authors.
8 changes: 8 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,14 @@ pip install EduNLP
pip install EduNLP[full]
```

### Usage

```python
from EduNLP import get_pretrained_i2v
i2v = get_pretrained_i2v("d2v_all_256", "./model")
item_vector, token_vector = i2v(["the content of item 1", "the content of item 2"])
```

### Tutorial

For more details, please refer to the full documentation ([latest](https://edunlp.readthedocs.io/en/latest) | [stable](https://edunlp.readthedocs.io/en/stable)).
Expand Down
Binary file added asset/_static/d2v.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added asset/_static/d2v_bow_tfidf.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added asset/_static/d2v_general.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added asset/_static/d2v_stem_tf.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added asset/_static/data.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added asset/_static/formula.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added asset/_static/i2v.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added asset/_static/parse.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added asset/_static/prepare_dataset.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added asset/_static/seg.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added asset/_static/sif.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added asset/_static/sif_addition.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added asset/_static/tokenizer.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added asset/_static/w2v_stem_text.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added asset/_static/w2v_stem_tf.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 2 additions & 1 deletion docs/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,5 @@ sphinx
sphinx_rtd_theme
sphinx_toggleprompt
sphinx-gallery>=0.6
nbsphinx
nbsphinx
m2r2
28 changes: 24 additions & 4 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,14 +46,34 @@ def copy_tree(src, tar):
'sphinx.ext.mathjax',
'sphinx_toggleprompt',
'nbsphinx',
'sphinx_gallery.load_style'
'sphinx_gallery.load_style',
'm2r2',
'IPython.sphinxext.ipython_console_highlighting',
'IPython.sphinxext.ipython_directive'
]

# extension variables setting
# npsphinx

nbsphinx_thumbnails = {
'build/blitz/sif/sif': '_static/item_figure.png',
'build/blitz/sif/sif': '_static/sif.png',
'build/blitz/sif/sif_addition': '_static/sif_addition.png',
'build/blitz/utils/data': '_static/data.png',
'build/blitz/formula/formula': '_static/formula.png',
'build/blitz/seg/seg': '_static/seg.png',
'build/blitz/parse/parse': '_static/parse.png',
'build/blitz/formula/formula': '_static/formula.png',
'build/blitz/tokenizer/tokenizer': '_static/tokenizer.png',
'build/blitz/vectorization/i2v': '_static/i2v.png',
'build/blitz/pretrain/prepare_dataset': '_static/prepare_dataset.jpg',
'build/blitz/pretrain/gensim/d2v_bow_tfidf': '_static/d2v_bow_tfidf.png',
'build/blitz/pretrain/gensim/d2v_general': '_static/d2v_general.png',
'build/blitz/pretrain/gensim/d2v_stem_tf': '_static/d2v_stem_tf.png',
'build/blitz/pretrain/gensim/w2v_stem_text': '_static/w2v_stem_text.png',
'build/blitz/pretrain/gensim/w2v_stem_tf': '_static/w2v_stem_tf.png',
'build/blitz/pretrain/seg_token/d2v': '_static/d2v.png',
'build/blitz/pretrain/seg_token/d2v_d1': '_static/d2v_d1.png',
'build/blitz/pretrain/seg_token/d2v_d2': '_static/d2v_d2.png',
}

# Add any paths that contain templates here, relative to this directory.
Expand All @@ -62,7 +82,7 @@ def copy_tree(src, tar):
# The suffix(es) of source filenames.
# You can specify multiple suffix as a list of string:
#
source_suffix = ['.rst', '.md', '.ipynb']
source_suffix = ['.rst', '.md']
# source_suffix = '.rst'

# The language for content autogenerated by Sphinx. Refer to documentation
Expand All @@ -75,7 +95,7 @@ def copy_tree(src, tar):
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path.
exclude_patterns = ['_build']
exclude_patterns = ['_build','**.ipynb_checkpoints']

# -- Options for HTML output -------------------------------------------------

Expand Down
10 changes: 10 additions & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,16 @@ But you can also install from source:

Getting Started
------------------

One basic usage of EduNLP is to convert an item into a vector, i.e.,

.. code-block:: python

from EduNLP import get_pretrained_i2v
i2v = get_pretrained_i2v("d2v_all_256", "./model")
item_vector, token_vector = i2v(["the content of item 1", "the content of item 2"])


For absolute beginners, start with the :doc:`Tutorial to EduNLP <tutorial/en/index>` :doc:`(中文版) <tutorial/zh/index>`.
It covers the basic concepts of EduNLP and
a step-by-step on training, loading and using the language models.
Expand Down
133 changes: 129 additions & 4 deletions docs/source/tutorial/zh/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,13 +11,138 @@
tokenize
vectorization


示例
--------

标准项目格式
^^^^^^^^

.. nbgallery::
:caption: This is a thumbnail gallery:
:name: sif_gallery
:glob:

Code for beginner to learn how to use SIF4Sci <../../build/blitz/sif/sif>
Code for beginner to learn how to use sif_additon <../../build/blitz/sif/sif_addition>


成分分解
^^^^^^^^^^^

语义成分分解
####################

.. nbgallery::
:caption: This is a thumbnail gallery:
:name: dict2str4sif_gallery
:glob:

Code for beginner to learn how to use dict2str4sif <../../build/blitz/utils/data.ipynb>


结构成分分解
####################

.. nbgallery::
:caption: This is a thumbnail gallery:
:name: seg_gallery
:glob:

Code for beginner to learn how to use seg <../../build/blitz/seg/seg.ipynb>


语法解析
^^^^^^^^^^^

文本语法结构解析
####################

.. nbgallery::
:caption: This is a thumbnail gallery:
:name: parse_gallery
:glob:

Code for beginner to learn how to use parse <../../build/blitz/parse/parse.ipynb>


公式语法结构解析
####################

.. nbgallery::
:caption: This is a thumbnail gallery:
:name: formula_gallery
:glob:

Code for beginner to learn how to use Formula <../../build/blitz/formula/formula.ipynb>


令牌化
^^^^^^^^^^^

.. nbgallery::
:caption: This is a thumbnail gallery:
:name: tokenizer_gallery
:glob:

Code for beginner to learn how to use Tokenizer <../../build/blitz/tokenizer/tokenizer.ipynb>


向量化
^^^^^^^^^^^

.. nbgallery::
:caption: This is a thumbnail gallery:
:name: vectorization_gallery
:glob:

Code for beginner to learn how to use i2v <../../build/blitz/vectorization/i2v.ipynb>


预训练
^^^^^^^^^^^

获得数据集
####################

.. nbgallery::
:caption: This is a thumbnail gallery:
:name: rst1-gallery
:glob:

prepare_dataset <../../build/blitz/pretrain/prepare_dataset.ipynb>


gensim模型d2v例子
####################

.. nbgallery::
:caption: This is a thumbnail gallery:
:name: rst2-gallery
:glob:

d2v_general <../../build/blitz/pretrain/gensim/d2v_general.ipynb>
d2v_bow_tfidf <../../build/blitz/pretrain/gensim/d2v_bow_tfidf.ipynb>
d2v_stem_tf <../../build/blitz/pretrain/gensim/d2v_stem_tf.ipynb>


gensim模型w2v例子
####################

.. nbgallery::
:caption: This is a thumbnail gallery:
:name: rst3-gallery
:glob:

w2v_stem_text <../../build/blitz/pretrain/gensim/w2v_stem_text.ipynb>
w2v_stem_tf <../../build/blitz/pretrain/gensim/w2v_stem_tf.ipynb>


seg_token例子
####################

.. nbgallery::
:caption: This is a thumbnail gallery:
:name: gallery
:name: rst4-gallery
:glob:
:reversed:

../../build/blitz/sif/sif
d2v.ipynb <../../build/blitz/pretrain/seg_token/d2v.ipynb>
28 changes: 27 additions & 1 deletion docs/source/tutorial/zh/parse.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,5 +6,31 @@
* 文本语法结构解析
* 公式语法结构解析

公式语法结构解析
其目的是:


1、将选择题中的括号,填空题中的下划线用特殊标识替换掉,并将字符、公式用$$包裹起来,使item能通过$符号准确的按照类型切割开;

2、判断当前item是否合法,并报出错误类型。

具体处理内容
--------------------

1.匹配公式之外的英文字母、数字,只对两个汉字之间的字母、数字做修正,其余匹配到的情况视为不合 latex 语法录入的公式

2.匹配“( )”型括号(包含英文格式和中文格式),即括号内无内容或为空格的括号,将括号替换$\\SIFChoice$

3.匹配下划线,替换连续的下划线或下划线中夹杂空格的情况,将其替换为$\\SIFBlank$

4.匹配latex公式,主要检查latex公式的完整性和可解析性,对latex 中出现中文字符发出警告

学习路线图
--------------------

.. toctree::
:maxdepth: 1
:titlesonly:

文本语法结构解析 <parse/文本语法结构解析>
公式语法结构解析 <parse/公式语法结构解析>

61 changes: 61 additions & 0 deletions docs/source/tutorial/zh/parse/公式语法结构解析.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
公式语法结构解析
--------------------

本功能主要由EduNLP.Formula模块实现,具有检查传入的公式是否合法,并将合法的公式转换为art树的形式。从实际使用的角度,本模块常作为中间处理过程,调用相应的模型即可自动选择本模块的相关参数,故一般不需要特别关注。

主要内容介绍
+++++++++++++++

1.Formula:对传入的单个公式进行判断,判断传入的公式是否为str形式,如果是则使用ast的方法进行处理,否则进行报错。此外,提供了variable_standardization参数,当此参数为True时,使用变量标准化方法,即同一变量拥有相同的变量编号。

2.FormulaGroup:如果需要传入公式集则可调用此接口,最终将形成ast森林,森林中树的结构同Formula。


Examples:

::

>>> text = '支持公式如$\\frac{y}{x}$,$\\SIFBlank$,$\\FigureID{1}$,不支持公式如$\\frac{ \\dddot y}{x}$'
>>> text_parser = Parser(text)
>>> text_parser.description_list()
>>> text_parser.fomula_illegal_flag
>>> 1

::

>>> f = Formula("x")
>>> f
<Formula: x>
>>> f.ast
[{'val': {'id': 0, 'type': 'mathord', 'text': 'x', 'role': None}, 'structure': {'bro': [None, None], 'child': None, 'father': None, 'forest': None}}]
>>> f.elements
[{'id': 0, 'type': 'mathord', 'text': 'x', 'role': None}]
>>> f.variable_standardization(inplace=True)
<Formula: x>
>>> f.elements
[{'id': 0, 'type': 'mathord', 'text': 'x', 'role': None, 'var': 0}]

::

>>> fg = FormulaGroup(["x + y", "y + x", "z + x"])
>>> fg
<FormulaGroup: <Formula: x + y>;<Formula: y + x>;<Formula: z + x>>
>>> fg = FormulaGroup(["x + y", Formula("y + x"), "z + x"])
>>> fg
<FormulaGroup: <Formula: x + y>;<Formula: y + x>;<Formula: z + x>>
>>> fg = FormulaGroup(["x", Formula("y"), "x"])
>>> fg.elements
[{'id': 0, 'type': 'mathord', 'text': 'x', 'role': None}, {'id': 1, 'type': 'mathord', 'text': 'y', 'role': None},\
{'id': 2, 'type': 'mathord', 'text': 'x', 'role': None}]
>>> fg = FormulaGroup(["x", Formula("y"), "x"], variable_standardization=True)
>>> fg.elements
[{'id': 0, 'type': 'mathord', 'text': 'x', 'role': None, 'var': 0}, {'id': 1, 'type': 'mathord', 'text': 'y', 'role': None, 'var': 1}, {'id': 2, 'type': 'mathord', 'text': 'x', 'role': None, 'var': 0}]

详细示范
+++++++++++++++

.. toctree::
:titlesonly:

树型处理效果 <../../../build/blitz/formula/tree.ipynb>
公式解析效果案例 <../../../build/blitz/formula/formula.ipynb>
39 changes: 39 additions & 0 deletions docs/source/tutorial/zh/parse/文本语法结构解析.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
文本语法结构解析
--------------------

本部分主要由EduNLP.SIF.Parse模块实现,主要功能为将文本中的字母、数字等进行提取,将其转换为标准格式。

主要流程介绍
+++++++++++++++

1.按照以下顺序,先后对传入的文本进行判断类型

* is_chinese:用于匹配中文字符 [\u4e00-\u9fa5]

* is_alphabet:匹配公式之外的英文字母,将匹配到的只对两个汉字之间的字母做修正(使用$$包裹起来),其余匹配到的情况视为不合 latex 语法录入的公式

* is_number:匹配公式之外的数字,只对两个汉字之间的数字做修正(使用$$包裹起来),其余匹配到的情况视为不合 latex 语法录入的公式

2.匹配 latex 公式

* latex 中出现中文字符,打印且只打印一次 warning

* 使用_is_formula_legal函数,检查latex公式的完整性和可解析性,对于不合法公式报错

Examples:

::

>>> text = '生产某种零件的A工厂25名工人的日加工零件数_ _'
>>> text_parser = Parser(text)
>>> text_parser.description_list()
>>> text_parser.text
>>> '生产某种零件的$A$工厂$25$名工人的日加工零件数$\\SIFBlank$'

详细示范
+++++++++++++++

.. toctree::
:titlesonly:

文本语法结构解析的案例 <../../../build/blitz/parse/parse.ipynb>
Loading