add modifty

CLUEbenchmark · Feb 22, 2020 · 228c0b8 · 228c0b8
1 parent cff526f
commit 228c0b8
Show file tree

Hide file tree

Showing 4 changed files with 271 additions and 4 deletions.
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -0,0 +1,139 @@
+# Contributing guidelines
+
+[中文版本](https://github.com/CLUEbenchmark/CLUE/tree/master/CONTRIBUTING_ZH.md)
+
+## Pull Request Rules
+
+Before sending your pull requests, make sure you followed this list.
+
+- Read contributing guidelines.
+- Check if my changes are consistent with the guidelines.
+- Changes are consistent with the [Coding Style].
+- Run Unit Tests.
+
+## How to become a contributor and submit your own code
+
+Welcome any contribution and whatever it is, even just a typo. Please raise your question via issue or email us privately. We take care both the documents and codes equally. So, just do it as long as you follow our rules.
+
+### Where shall I start
+
+----
+
+If this is your first time to touch CLUE, then we suggest you start from solving issues or our minimal tasks. 
+
+If you are already familiar with this project and you are definitely feel comfortable with NLP related problems, please raise what you want to do via issue or email, and follow the workflow below. 
+
+WELCOME!
+
+### **Github WorkFLOW**
+
+---
+We take branch "master" as our main branch, which means it is not wise to develop new feature directly on it. We encourage you to create your own branch and create a PR for your contribution, after you complete it.
+
+Here's the workflow：
+
+1. fort it into you github
+2. clone it into your local machine. 
+3. create a new branch and code on it
+4. push you code into YOUR git
+5. create a PR
+
+If you have huge modification, please make sure there is a coressbonding issue on our main res.
+
+
+```
+Describe what this PR does / why we need it
+Does this pull request fix one issue?
+Describe how you did it
+Describe how to verify it
+Special notes for reviews 
+[copied from https://github.com/alibaba/Sentinel/blob/master/.github/PULL_REQUEST_TEMPLATE.md]
+```
+After you create you PR, we will assign one or two reviewer for you PR.
+
+
+### Create Issue/PR
+
+---
+We use Github Issue and Pull Request to manage/track problems.
+
+If you find any little bug or typo, or you have new ideas about this project, you could create an issue. 
+
+If you want to contribute code, please the workflow above. If you have big modification about this project or you want to reconstruct this project ,PLEASE create an issue or email us (chineseGLUE@163.com) firstly.
+
+
+## Security Problems
+
+If you find there is any serious bug about security, please contact us via chineseGLUE@163.com privately. PLEASE DO NOT publish any security problem via ANY public way, including issue. Thank you very much.
+
+### Contribution guidelines and standards
+
+Before sending your pull request for [review](https://github.com/tensorflow/tensorflow/pulls), make sure your changes are consistent with the guidelines and follow the TensorFlow coding style.
+
+#### General guidelines and philosophy for contribution
+
+- Include unit tests when you contribute new features, as they help to a) prove that your code works correctly, and b) guard against future breaking changes to lower the maintenance cost.
+- Bug fixes also generally require unit tests, because the presence of bugs usually indicates insufficient test coverage.
+- Keep API compatibility in mind when you change code. Reviewers of your pull request will comment on any API compatibility issues.
+- When you contribute a new feature to CLUE, the maintenance burden is (by default) transferred to the CLUE team. This means that the benefit of the contribution must be compared against the cost of maintaining the feature.
+- As every PR requires several CPU/GPU hours of CI testing, we discourage submitting PRs to fix one typo, one warning,etc. We recommend fixing the same issue at the file level at least (e.g.: fix all typos in a file, fix all compiler warning in a file, etc.)
+
+### Code review
+
+All the code will need to be reviewed.
+
+#### License
+
+Include a license at the top of new files.
+
+Python Liscense:
+
+```
+# Copyright 2020 The CLUE Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# =============================================================================
+```
+
+#### Python coding style
+
+Use `pylint` to check your Python changes. To install `pylint` and check a file with `pylint` against TensorFlow's custom style definition:
+
+We encourage PEP-8.
+
+```
+pip install pylint
+pylint myfile.py
+```
+
+Note `pylint ` should run from the top level directory.
+
+#### Running unit tests
+
+We encourage you to send your PR with your test case. Then, the review process will be quick.
+
+# Community
+
+## Contact
+
+### Email
+
+Please contact us via [chineseGLUE@163.com](mailto:chineseGLUE@163.com).
+
+### Gitter
+
+Gitter room: https://github.com/CLUEbenchmark
+
+
+
+All the things above, we refer to：[Sentinel]([https://github.com/alibaba/Sentinel/wiki/%E5%BC%80%E6%BA%90%E8%B4%A1%E7%8C%AE%E6%8C%87%E5%8D%97](https://github.com/alibaba/Sentinel/wiki/开源贡献指南)) and [Tensorflow](https://github.com/tensorflow/tensorflow/blob/master/CONTRIBUTING.md). Thanks for their wisdom.
diff --git a/CONTRIBUTING_ZH.md b/CONTRIBUTING_ZH.md
@@ -0,0 +1,122 @@
+# 开源贡献指南
+
+# CLUE 开源贡献指南
+
+欢迎您对CLUE的相关工作感兴趣。本文档作为基本指南来指引您如何向CLUE进行贡献。如果您发现文档中有错误或者又缺失的内容，请及时与我们联系
+
+# 贡献流程
+
+## 行为准则
+
+- 阅读贡献指南（This one）
+- 检查您的修改是否与我们的guidelines一致
+- 您的贡献与我们的代码风格一致
+- 运行单元测试
+
+# 如何贡献
+
+我们随时都欢迎任何贡献，无论是简单的错别字修正，BUG 修复还是增加新功能。请踊跃提出问题或发起 PR。我们同样重视文档以及与其它开源项目的整合，欢迎在这方面做出贡献。
+
+## 从哪里入手？
+
+如果您是初次贡献，可以先从issue或者我们的小任务中开始快速参与社区贡献。
+
+如果您已经对NLP相关的任务很熟悉了，那么欢迎您来与我们一起建设NLP社区。
+
+您可以直接在相应 issue 中回复参与意愿，或者提出您想要做的工作，参照下面的 GitHub 工作流指引解决 issue 并按照规范提交 PR，通过 review 后就会被 merge 到 master 分支。
+
+## GitHub 工作流
+
+我们使用 `master` 分支作为我们的主，最好不要在上面直接开发。每个版本区间（如 0.1.x）都会创建一个 release 分支（如 `release-0.1`）作为稳定的发布分支。每发布一个新版本都会将其合并到对应的 release 分支并打上对应的 tag。
+
+下面是开源贡献者常用的工作流（workflow）：
+
+1. 将仓库 fork 到自己的 GitHub 下
+2. 将 fork 后的仓库 clone 到本地
+3. 创建新的分支，在新的分支上进行开发操作（**通常情况下，请确保对应的变更都有测试用例或 demo 进行验证**）
+4. 保持分支与远程 master 分支一致（通过 `fetch` 和 `rebase` 操作）
+5. 在本地提交变更（**注意 commit log 保持简练、规范**），**注意提交的 email 需要和 GitHub 的 email 保持一致**
+6. 将提交 push 到 fork 的仓库下
+7. 创建一个 pull request (PR)
+
+提交 PR 的时候请参考。在进行较大的变更的时候请确保 PR 有一个对应的 Issue。
+
+```
+Describe what this PR does / why we need it
+Does this pull request fix one issue?
+Describe how you did it
+Describe how to verify it
+Special notes for reviews 
+[copied from https://github.com/alibaba/Sentinel/blob/master/.github/PULL_REQUEST_TEMPLATE.md]
+```
+
+在提交 PR 后，系统会自动运行持续集成，请确保所有的 CI 均为 pass 状态。一切就绪后，我们会为 PR 分配一个或多个 reviewer。Reviewer 会对提交的代码进行 review。
+
+在合并 PR 的时候，请把多余的提交记录都 squash 成一个。最终的提交信息需要保证简练、规范。
+
+## 创建 Issue / PR
+
+我们使用 GitHub Issues 以及 Pull Requests 来管理/追踪问题。
+
+如果您发现了文档中有表述错误，或者代码发现了 BUG，或者希望开发新的特性，或者希望提建议，可以创建一个 Issue。请参考 Issue 模板中对应的指导信息来完善 Issue 的内容，来帮助我们更好地理解您的 Issue。
+
+如果您想要贡献代码，您可以参考上面的 [GitHub 工作流]，提交对应的 PR。若是对当前开发版本进行提交，则目标分支为 `master`。如果您的 PR 包含非常大的变更，比如模块的重构或者添加新的组件，请**务必先提出相关 issue，发起详细讨论，达成一致后再进行变更**，并为其编写详细的文档来阐述其设计、解决的问题和用途。注意一个 PR 尽量不要过于大。如果的确需要有大的变更，可以将其按功能拆分成多个单独的 PR。
+
+## 报告安全问题
+
+特别地，若您发现 CLUE 及其生态项目中有任何的安全漏洞（或潜在的安全问题），请第一时间通过邮箱[chineseGLUE@163.com私下联系我们。在对应代码修复之前，**请不要将对应安全问题对外披露，也不鼓励公开提 issue 报告安全问题**。
+
+## Contribution guidelines and standards
+
+在你提交你的PR之前，麻烦请确定你的改变符合我们的规范
+
+#### General guidelines and philosophy for contribution
+
+- 如果你贡献了一个新的特性，请尽量包含你的单元测试以保证你的代码可以使用，并且降低未来的维护成本。
+- 修复了bug也需要写单元测试
+- 请维持API的兼容性
+- 当您为CLUE贡献一个新特性时，维护责任(默认情况下)会转移到CLUE团队。这意味着必须将贡献的收益与维护特性的成本进行trade off。
+- 
+- 由于每个PR都需要几个CPU/GPU小时的CI测试，我们不鼓励提交PRs来修复一个错误，一个警告等等。我们建议至少在文件级别修复相同的问题(例如:修复文件中的所有拼写错误，修复文件中的所有编译器警告，等等)。如果有小的错误，可以通过issue来提出。感谢配合。
+
+### Code review
+
+所有的代码都需要经过 committer 进行 review。以下是我们推荐的一些原则：
+
+- 可读性：代码遵循我们的开发规约，重要代码需要有详细注释和文档
+- 优雅性：代码简练、复用度高，有着完善的设计
+- 测试：重要的代码需要有完善的测试用例（单元测试、集成测试），对应的衡量标准是测试覆盖率
+
+#### Python coding style
+
+Use `pylint` to check your Python changes. To install `pylint` and check a file with `pylint` against TensorFlow's custom style definition:
+
+We encourage PEP-8.
+
+```
+pip install pylint
+pylint myfile.py
+```
+
+Note `pylint ` should run from the top level directory.
+
+#### Running unit tests
+
+We encourage you to send your PR with your test case. Then, the review process will be quick.
+
+# 社区
+
+## 联系我们
+
+### 邮件组
+
+如果您有任何问题与建议，请通过邮箱[chineseGLUE@163.com](mailto:chineseGLUE@163.com)联系我们。
+
+### Gitter
+
+我们的 Gitter room: https://github.com/CLUEbenchmark
+
+
+
+以上贡献者模版参考自：[Sentinel]([https://github.com/alibaba/Sentinel/wiki/%E5%BC%80%E6%BA%90%E8%B4%A1%E7%8C%AE%E6%8C%87%E5%8D%97](https://github.com/alibaba/Sentinel/wiki/开源贡献指南)) and [Tensorflow](https://github.com/tensorflow/tensorflow/blob/master/CONTRIBUTING.md)。感谢他们的智慧。
+
diff --git a/README.md b/README.md
@@ -68,9 +68,13 @@ DRCD、CMRC2018: 繁体、简体抽取式阅读理解(F1, EM)；CHID: 成语多
        bash run_classifier_xxx.sh
        如运行 bash run_classifier_iflytek.sh 会开始iflytek任务的训练  
     4、tpu使用方式(可选)  
+        cd CLUE/baselines/models/bert/tpu  
+        sh run_classifier_tnews.sh即可测试tnews任务（注意更换里面的gs路径和tpu ip）。数据和模型会自动下载和上传。
+        
         cd CLUE/baselines/models/roberta/tpu  
         sh run_classifier_tiny.sh即可运行所有分类任务（注意更换里面的路径,模型地址和tpu ip）  
 
+        
 ### 运行环境
 tensorflow 1.12 /cuda 9.0 /cudnn7.0
 ### 工具包 Toolkit

diff --git a/baselines/models/bert/tpu/run_classifier_tnews.sh b/baselines/models/bert/tpu/run_classifier_tnews.sh
@@ -2,10 +2,12 @@ CURRENT_DIR=$(cd -P -- "$(dirname -- "$0")" && pwd -P)
 CURRENT_TIME=$(date "+%Y%m%d-%H%M%S")
 TASK_NAME="tnews"
 
+GS="gs" # change it to yours
+TPU_IP="1.1.1.1" # chagne it to your
 # please create folder 
-export PREV_TRAINED_MODEL_DIR=gs://clue_storage/prev_trained_models/nlp/bert-base/chinese_L-12_H-768_A-12
-export DATA_DIR=gs://clue_storage/nlp/chineseGLUEdatasets.v0.0.1/${TASK_NAME}
-export OUTPUT_DIR=gs://clue_storage/fine_tuning_models/nlp/bert-base/chinese_L-12_H-768_A-12/tpu/$TASK_NAME/$CURRENT_TIME
+export PREV_TRAINED_MODEL_DIR=$GS/prev_trained_models/nlp/bert-base/chinese_L-12_H-768_A-12
+export DATA_DIR=$GS/nlp/chineseGLUEdatasets.v0.0.1/${TASK_NAME}
+export OUTPUT_DIR=$GS/fine_tuning_models/nlp/bert-base/chinese_L-12_H-768_A-12/tpu/$TASK_NAME/$CURRENT_TIME
 
 
 MODEL_NAME="chinese_L-12_H-768_A-12"
@@ -80,4 +82,4 @@ python $CURRENT_DIR/../run_classifier.py \
   --learning_rate=2e-5 \
   --num_train_epochs=3.0 \
   --output_dir=$OUTPUT_DIR \
-  --num_tpu_cores=8 --use_tpu=True --tpu_name=grpc://192.168.0.2:8470
+  --num_tpu_cores=8 --use_tpu=True --tpu_name=grpc://$TPU_IP:8470