Pull Update #1

dqwang122 · 2019-05-28T05:56:08Z

Description：Fork Update

Main reason: Update

Checklist 检查下面各项是否完成

Please feel free to remove inapplicable items for your PR.

The PR title starts with [$CATEGORY] (例如[bugfix]修复bug，[new]添加新功能，[test]修改测试，[rm]删除旧代码)
Changes are complete (i.e. I finished coding on this PR) 修改完成才提PR
All changes have test coverage 修改的部分顺利通过测试。对于fastnlp/fastnlp/的修改，测试代码必须提供在fastnlp/test/。
Code is well-documented 注释写好，API文档会从注释中抽取
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change 修改导致例子或tutorial有变化，请找核心开发人员

Changes: 逐项描述修改的内容

添加了新模型；用于句子分类的CNN，来自Yoon Kim的Convolutional Neural Networks for Sentence Classification
修改dataset.py中过时的和不合规则的注释将pytorch模型转换为fastNLP可用的模型 fastnlp/fastNLP#286
添加对var-LSTM的测试代码

Mention: 找人review你的PR

@修改过这个文件的人
@核心开发人员

- refine & fix Transformer Encoder - refine & speed up biaffine parser

* move used readers from reproduction to io/dataset_loader.py (API shall not call anything from reproduction/)

* 改名: chinese_word_segment ---> Chinese_word_segmentation * 改名: pos_tag_model ---> POS_tagging * 添加4个对Batch的测试 * 删除无用的chinese_word_segment/run.py

* 将dataset.py中的assert改为raise error * 给trainer添加try-except,捕捉EarlyStopError * 优化trainer代码 * 给callbacks添加测试

2. FieldArray默认使用AutoPadder, AutoPadder的行为与之前不使用padder是一致的的 3. 为了解决二维padding的问题，引入了EngChar2dPadder用于对character进行padding 4. 增加一份padding的tutorial。

… dev

* 重构dtype的检测代码，在FieldArray的初始化和append两处，达到更好的代码复用 * 类型检测的责任完全落在FieldArray，DataSet与之配合测试： * 整理dtype相关的测试代码 * 给所有tutorial添加测试其他： * 完善一个完整的Conll dataset loader * 升级POS tag model训练脚本

* 添加测试：FieldArray的初始化

* 添加两类Callback * 完善Trainer对error的捕捉

into dev

* rename callback methods. Use fastai's notation. * add a new callback method - on_valid_begin

* load pre-trained BERT weights from local binary * add tests

* 升级parser API和模型 * update docs: add new pages for tutorials * upgrade CWS api download source * add a new method for dataset field access * add introduction for bert * add more unit tests for api/processor * remove unused test data. Add new test data.

fastNLP V0.3.1

[new] Add ENAS (Efficient Neural Architecture Search)

Add Star-Transformer

If you use masked_fill according to ex_mask (0 for pad), it will fill not padding position(which value in ex_mask is 1) with 0, this will lead a bad performance.

@wlhgtc

fix the bug described in #138 . Thank @wlhgtc for bug reporting and pr.

fix mask bug in star-transformer

RT, another bug

Another bug in Star Transformer

fix for changing torch API

@Transfer

* 1. CRF增加支持bmeso类型的tag 2. vocabulary中增加注释 * BucketSampler增加一条错误检测 * 1.修改ClipGradientCallback的bug；删除LRSchedulerCallback中的print，之后应该传入pbar进行打印;2.增加MLP注释 * update MLP module * 增加metric注释；修改trainer save过程中的bug * Update README.md fix tutorial link * Add ENAS (Efficient Neural Architecture Search) * add ignore_type in DataSet.add_field * * AutoPadder will not pad when dtype is None * add ignore_type in DataSet.apply * 修复fieldarray中padder潜在bug * 修复crf中typo; 以及可能导致数值不稳定的地方 * 修复CRF中可能存在的bug * change two default init arguments of Trainer into None * Changes to Callbacks: * 给callback添加给定几个只读属性 * 通过manager设置这些属性 * 代码优化，减轻@Transfer的负担 * * 将enas相关代码放到automl目录下 * 修复fast_param_mapping的一个bug * Trainer添加自动创建save目录 * Vocabulary的打印，显示内容 * * 给vocabulary添加遍历方法 * 修复CRF为负数的bug * add SQuAD metric * add sigmoid activate function in MLP * - add star transformer model - add ConllLoader, for all kinds of conll-format files - add JsonLoader, for json-format files - add SSTLoader, for SST-2 & SST-5 - change Callback interface - fix batch multi-process when killed - add README to list models and their performance * - fix test * - fix callback & tests * - update README * 修改部分bug；调整callback * 准备发布0.4.0版本“ * update readme * support parallel loss * 防止多卡的情况导致无法正确计算loss“ * update advance_tutorial jupyter notebook * 1. 在embedding_loader中增加新的读取函数load_with_vocab(), load_without_vocab, 比之前的函数改变主要在(1)不再需要传入embed_dim(2)自动判断当前是word2vec还是glove. 2. vocabulary增加from_dataset(), index_dataset()函数。避免需要多行写index dataset的问题。 3. 在utils中新增一个cache_result()修饰器，用于cache函数的返回值。 4. callback中新增update_every属性 * 1.DataSet.apply()报错时提供错误的index 2.Vocabulary.from_dataset(), index_dataset()提供报错时的vocab顺序 3.embedloader在embed读取时遇到不规则的数据跳过这一行. * update attention * doc tools * fix some doc errors * 修改为中文注释，增加viterbi解码方法 * 样例版本 * - add pad sequence for lstm - add csv, conll, json filereader - update dataloader - remove useless dataloader - fix trainer loss print - fix tests * - fix test_tutorial * 注释增加 * 测试文档 * 本地暂存 * 本地暂存 * 修改文档的顺序 * - add document * 本地暂存 * update pooling * update bert * update documents in MLP * update documents in snli * combine self attention module to attention.py * update documents on losses.py * 对DataSet的文档进行更新 * update documents on metrics * 1. 删除了LSTM中print的内容; 2. 将Trainer和Tester的use_cuda修改为了device; 3.补充Trainer的文档 * 增加对Trainer的注释 * 完善了trainer，callback等的文档; 修改了部分代码的命名以使得代码从文档中隐藏 * update char level encoder * update documents on embedding.py * - update doc * 补充注释，并修改部分代码 * - update doc - add get_embeddings * 修改了文档配置项 * 修改embedding为init_embed初始化 * 1.增加对Trainer和Tester的多卡支持; * - add test - fix jsonloader * 删除了注释教程 * 给 dataset 增加了get_field_names * 修复bug * - add Const - fix bugs * 修改部分注释 * - add model runner for easier test models - add model tests * 修改了 docs 的配置和架构 * 修改了核心部分的一大部分文档，TODO： 1. 完善 trainer 和 tester 部分的文档 2. 研究注释样例与测试 * core部分的注释基本检查完成 * 修改了 io 部分的注释 * 全部改为相对路径引用 * 全部改为相对路径引用 * small change * 1. 从安装文件中删除api/automl的安装 2. metric中存在seq_len的bug 3. sampler中存在命名错误，已修改 * 修复 bug ：兼容 cpu 版本的 PyTorch TODO：其它地方可能也存在类似的 bug * 修改文档中的引用部分 * 把 tqdm.autonotebook 换成tqdm.auto * - fix batch & vocab * 上传了文档文件 *.rst * 上传了文档文件和若干 TODO * 讨论并整合了若干模块 * core部分的测试和一些小修改 * 删除了一些冗余文档 * update init files * update const files * update const files * 增加cnn的测试 * fix a little bug * - update attention - fix tests * 完善测试 * 完成快速入门教程 * 修改了sequence_modeling 命名为 sequence_labeling 的文档 * 重新 apidoc 解决改名的遗留问题 * 修改文档格式 * 统一不同位置的seq_len_to_mask, 现统一到core.utils.seq_len_to_mask * 增加了一行提示 * 在文档中展示 dataset_loader * 提示 Dataset.read_csv 会被 CSVLoader 替换 * 完成 Callback 和 Trainer 之间的文档 * index更新了部分 * 删除冗余的print * 删除用于分词的metric，因为有可能引起错误 * 修改文档中的中文名称 * 完成了详细介绍文档 * tutorial 的 ipynb 文件 * 修改了一些介绍文档 * 修改了 models 和 modules 的主页介绍 * 加上了 titlesonly 这个设置 * 修改了模块文档展示的标题 * 修改了 core 和 io 的开篇介绍 * 修改了 modules 和 models 开篇介绍 * 使用 .. todo:: 隐藏了可能被抽到文档中的 TODO 注释 * 修改了一些注释 * delete an old metric in test * 修改 tutorials 的测试文件 * 把暂不发布的功能移到 legacy 文件夹 * 删除了不能运行的测试 * 修改 callback 的测试文件 * 删除了过时的教程和测试文件 * cache_results 参数的修改 * 修改 io 的测试文件; 删除了一些过时的测试 * 修复bug * 修复无法通过test_utils.py的测试 * 修复与pytorch1.1中的padsequence的兼容问题; 修改Trainer的pbar * 1. 修复metric中的bug; 2.增加metric测试 * add model summary * 增加别名 * 删除encoder中的嵌套层 * 修改了 core 部分 import 的顺序，__all__ 暴露的内容 * 修改了 models 部分 import 的顺序，__all__ 暴露的内容 * 修改了文件名 * 修改了 modules 模块的__all__ 和 import * fix var runn * 增加vocab的clear方法 * 一些符合 PEP8 的微调 * 更新了cache_results的例子 * 1. 对callback中indices潜在None作出提示;2.DataSet支持通过List进行index * 修改了一个typo * 修改了 README.md * update documents on bert * update documents on encoder/bert * 增加一个fitlog callback，实现与fitlog实验记录 * typo * - update dataset_loader * 增加了到 fitlog 文档的链接。 * 增加了 DataSet Loader 的文档 * - add star-transformer reproduction

[merge] dpcnn相关，yelploader

fix a bug in metrics.py

yunfan and others added 30 commits January 14, 2019 19:13

- fix trainer with validate_every > 0

2e9e6c6

- refine & fix Transformer Encoder - refine & speed up biaffine parser

remove the gpu_id info when saving

a6dbbe9

code optimization

c4ba75d

* move used readers from reproduction to io/dataset_loader.py (API shall not call anything from reproduction/)

Updates:

1fdaf23

* 改名: chinese_word_segment ---> Chinese_word_segmentation * 改名: pos_tag_model ---> POS_tagging * 添加4个对Batch的测试 * 删除无用的chinese_word_segment/run.py

train增加注释；attention增加注释；新增transformer分词

6a0a1ed

conflict solved

1f50b01

* 添加callbacks：EarlyStopCallback

d80d944

* 将dataset.py中的assert改为raise error * 给trainer添加try-except,捕捉EarlyStopError * 优化trainer代码 * 给callbacks添加测试

修改Padder的测试用例

3e33a23

Merge branch 'dev' of github.com:choosewhatulike/fastNLP-private into…

73dd35d

… dev

* FieldArray添加对list of np.array的支持

b93ca9b

* 添加测试：FieldArray的初始化

添加FieldArray对list of np.array的支持

864c223

将batch增强为多进程batch

2e3ef52

减少batch中不断创建多进程的开销

d9ac334

* 重构POS API，改成接受word作为输入

ab953b4

* 添加两类Callback * 完善Trainer对error的捕捉

- fix parser train

eb55856

update reproduction

de856fb

- revert batch

a7f3701

添加LR finder，用第一个epoch找最佳lr,从第二个epoch开始训练

62ea4f7

Update POS API

b14dd58

- batch with multiprocessing

03f49c8

将tesorboardX处理为callback, 从trainer移除tensorboardX相关代码

f3cb812

trainer根据syf的多进程batch进行修改

47ec69e

Merge branch 'dev' of https://github.com/choosewhatulike/fastNLP-private

e93c6f0

into dev

add batch device

a37de43

Merge branch 'yyff' into dev

c02980e

remove device in batch

9474ab4

add testing tutorial

d4b4ffa

skip training while n_epoch in trainer is not greater than 0

e0d6a25

FengZiYjun and others added 27 commits January 25, 2019 21:43

update callbacks:

887fc92

* rename callback methods. Use fastai's notation. * add a new callback method - on_valid_begin

add BERT model

bfaf09d

* load pre-trained BERT weights from local binary * add tests

整理所有dataset loader，建立单元测试

9865411

Ready for V0.3.1

0c5630b

* 升级parser API和模型 * update docs: add new pages for tutorials * upgrade CWS api download source * add a new method for dataset field access * add introduction for bert * add more unit tests for api/processor * remove unused test data. Add new test data.

add codecov fix

d1b5ada

update API introduction

b66d7b8

Merge pull request #132 from FengZiYjun/v0.3.1

13faa2b

fastNLP V0.3.1

Add ENAS (Efficient Neural Architecture Search)

efeac2c

Merge pull request #134 from chenkaiyu1997/master

767e797

[new] Add ENAS (Efficient Neural Architecture Search)

- update transformer docs

5241e30

- add star-transformer

7c7f28f

Merge pull request #135 from choosewhatulike/pr

88d4de7

Add Star-Transformer

fix the "masked_fill" bug

8d61cd6

If you use masked_fill according to ex_mask (0 for pad), it will fill not padding position(which value in ex_mask is 1) with 0, this will lead a bad performance.

fix mask bug in star-transformer

b7008cb

fix the bug described in #138 . Thank @wlhgtc for bug reporting and pr.

Merge pull request #139 from fastnlp/choosewhatulike-patch-1

56410c9

fix mask bug in star-transformer

Fix bug in MSA2 (mixed k and v)

28d3f50

RT, another bug

Merge pull request #141 from wlhgtc/master

90d112c

Another bug in Star Transformer

Update README.md

667b312

Update README.md

c344f7d

Update README.md

0f8bed7

Update README.md

cc900a0

Update README.md

b8214f5

fix for changing torch API

ae3356b

Merge pull request #145 from fastnlp/choosewhatulike-patch-1

863a99f

fix for changing torch API

把文档的链接放在上面

927d386

修改了最新的文档

8dec821

dqwang122 merged commit d31c6d0 into dqwang122:master May 28, 2019

dqwang122 pushed a commit that referenced this pull request Jul 8, 2019

Merge pull request #1 from choosewhatulike/master

86ba01d

[merge] dpcnn相关，yelploader

dqwang122 pushed a commit that referenced this pull request Sep 17, 2019

Merge pull request #1 from fastnlp/master

cc2735c

fix a bug in metrics.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pull Update #1

Pull Update #1

dqwang122 commented May 28, 2019

Pull Update #1

Pull Update #1

Conversation

dqwang122 commented May 28, 2019