Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file modified docs/source/_static/流程图.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
25 changes: 12 additions & 13 deletions docs/source/tutorial/zh/pretrain.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,19 +16,6 @@
from EduNLP.I2V import get_pretrained_i2v
from EduNLP.Vector import get_pretrained_t2v


装载模型
--------

将所得到的模型传入I2V模块即可装载模型

Examples:

::

>>> model_path = "../test_model/test_gensim_luna_stem_tf_d2v_256.bin"
>>> i2v = D2V("text","d2v",filepath=model_path, pretrained_t2v = False)

训练模型
------------

Expand All @@ -55,6 +42,18 @@ Examples:
train_vector(sif_items, "../../../data/w2v/gensim_luna_stem_tf_", 10, method="d2v")


装载模型
--------

将所得到的模型传入I2V模块即可装载模型

Examples:

::

>>> model_path = "../test_model/test_gensim_luna_stem_tf_d2v_256.bin"
>>> i2v = D2V("text","d2v",filepath=model_path, pretrained_t2v = False)

公开模型一览
------------

Expand Down
10 changes: 5 additions & 5 deletions docs/source/tutorial/zh/tokenize.rst
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,11 @@ PureTextTokenizer
'0', '\\right', '\\}', ',', '\\quad', 'B', '=', '\\{', '-', '4', ',', '1', ',', '3', ',', '5', '\\}', ',',
'\\quad', 'A', '\\cap', 'B', '=']

GensimWordTokenizer
+++++++++++++++++++++++

此令牌解析器在默认情况下对传入的item中的图片、题目空缺符等部分转换成特殊字符进行保护,从而对文本、公式、标签、分隔符进行令牌化操作。此外,从令牌化方法而言,此令牌解析器对文本均采用线性的分析方法,而对公式采用抽象语法树的分析方法,提供了general参数可供使用者选择:当general为true的时候则代表着传入的item并非标准格式,此时对公式也使用线性的分析方法;当general为false时则代表使用抽象语法树的方法对公式进行解析。

GensimSegTokenizer
++++++++++++++++++++

Expand All @@ -150,11 +155,6 @@ GensimSegTokenizer
* 提供了切分深度的选项,即可以在sep标签或者tag标签处进行切割
* 默认在item组分(如text、formula)的头部插入开始标签

GensimWordTokenizer
+++++++++++++++++++++++

此令牌解析器在默认情况下对传入的item中的图片、题目空缺符等部分转换成特殊字符进行保护,从而对文本、公式、标签、分隔符进行令牌化操作。此外,从令牌化方法而言,此令牌解析器对文本均采用线性的分析方法,而对公式采用抽象语法树的分析方法,提供了general参数可供使用者选择:当general为true的时候则代表着传入的item并非标准格式,此时对公式也使用线性的分析方法;当general为false时则代表使用抽象语法树的方法对公式进行解析。

Examples
----------

Expand Down