Skip to content

Commit

Permalink
Merge branch 'develop'
Browse files Browse the repository at this point in the history
  • Loading branch information
amaiya committed May 14, 2020
2 parents 7fffa14 + 02c3455 commit 0130a22
Show file tree
Hide file tree
Showing 9 changed files with 522 additions and 22 deletions.
11 changes: 11 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,17 @@ Most recent releases are shown at the top. Each release shows:
- **Fixed**: Bug fixes that don't change documented behaviour


## 0.15.1 (2020-05-14)

### New:
- N/A

### Changed
- Changed `Transformer.preprocess*` methods to accept sentence pairs for sentence pair classification

### Fixed:
- N/A

## 0.15.0 (2020-05-13)

### New:
Expand Down
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,8 @@
- **2020-05-13:**
- ***ktrain*** **v0.15.x is released** and includes support for:
- **image regression**: See the [example notebook on age prediction from photos](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/develop/examples/vision/utk_faces_age_prediction-resnet50.ipynb).
- **`tf.data.Datasets`**: See the [example notebook](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/develop/examples/vision/mnist-tf_workflow.ipynb) on using `tf.data.Datasets` in *ktrain* for custom models and data formats.
- **tf.data.Datasets**: See the [example notebook](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/develop/examples/vision/mnist-tf_workflow.ipynb) on using `tf.data.Datasets` in *ktrain* for custom models and data formats.
- **sentence pair classification**: See this [example notebook](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/develop/examples/text/MRPC-BERT.ipynb) on using BERT for paraphrase detection.<sub><sup>(Sentence pair classification included in v0.15.0, but not v0.15.0.)</sup></sub>
- **2020-04-15:**
- ***ktrain*** **v0.14.x is released** and now includes support for **open-domain question-answering**. See the [example QA notebook](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/master/examples/text/question_answering_with_bert.ipynb)
- **2020-04-09:**
Expand All @@ -36,6 +37,7 @@ ts.summarize(some_long_document)
- **Text Regression**: [BERT](https://arxiv.org/abs/1810.04805), [DistilBERT](https://arxiv.org/abs/1910.01108), Embedding-based linear text regression, [fastText](https://arxiv.org/abs/1607.01759), and other models <sub><sup>[[example notebook](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/master/examples/text/text_regression_example.ipynb)]</sup></sub>
- **Sequence Labeling (NER)**: Bidirectional LSTM with optional [CRF layer](https://arxiv.org/abs/1603.01360) and various embedding schemes such as pretrained [BERT](https://huggingface.co/transformers/pretrained_models.html) and [fasttext](https://fasttext.cc/docs/en/crawl-vectors.html) word embeddings and character embeddings <sub><sup>[[example notebook](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/master/examples/text/CoNLL2002_Dutch-BiLSTM.ipynb)]</sup></sub>
- **Ready-to-Use NER models for English, Chinese, and Russian** with no training required <sub><sup>[[example notebook](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/master/examples/text/shallownlp-examples.ipynb)]</sup></sub>
- **Sentence Pair Classification** for tasks like paraphrase detection <sub><sup>[[example notebook](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/develop/examples/text/MRPC-BERT.ipynb)]</sup></sub>
- **Unsupervised Topic Modeling** with [LDA](http://www.jmlr.org/papers/volume3/blei03a/blei03a.pdf) <sub><sup>[[example notebook](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/master/examples/text/20newsgroups-topic_modeling.ipynb)]</sup></sub>
- **Document Similarity with One-Class Learning**: given some documents of interest, find and score new documents that are semantically similar to them using [One-Class Text Classification](https://en.wikipedia.org/wiki/One-class_classification) <sub><sup>[[example notebook](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/master/examples/text/20newsgroups-document_similarity_scorer.ipynb)]</sup></sub>
- **Document Recommendation Engine**: given text from a sample document, recommend documents that are thematically-related to it from a larger corpus <sub><sup>[[example notebook](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/master/examples/text/20newsgroups-recommendation_engine.ipynb)]</sup></sub>
Expand Down
8 changes: 8 additions & 0 deletions examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ This directory contains various example notebooks using *ktrain*. The directory
- [text classification](#textclass): examples using various text classification models and datasets
- [text regression](#textregression): example for predicting continuous value purely from text
- [text sequence labeling](#seqlab): sequence tagging models
- [sentence pair classification](#sentpair): sentence pair classification for tasks such as paraphrase or sarcasm detection
- [topic modeling](#lda): unsupervised learning from unlabeled text data
- [document similarity with one-class learning](#docsim): given a sample of interesting documents, find and score new documents that are semantically similar to it using One-Class text classification
- [document recommender system](#docrec): given text from a sample document, recommend documents that are semantically similar to it from a larger corpus
Expand Down Expand Up @@ -93,6 +94,13 @@ The objective of the CoNLL2003 task is to classify sequences of words as belongi
- [CoNLL2002_Dutch-BiLSTM.ipynb](https://github.com/amaiya/ktrain/tree/master/examples/text): A Bidirectional LSTM model that uses pretrained BERT embeddings along with pretrained fasttext word embeddings - both for Dutch.


### <a name="sentpair"></a> Sentence Pair Classification

#### [Microsoft Research Paraphrase Corpus (MRPC)](https://www.microsoft.com/en-us/download/details.aspx?id=52398): Paraphrase Detection

- [MRPC-BERT.ipynb](https://github.com/amaiya/ktrain/tree/master/examples/text): Using BERT for sentence pair classification on MRPC dataset


### <a name="lda"></a> Topic Modeling

#### [20 News Groups](http://qwone.com/~jason/20Newsgroups/): unsupervised learning on 20newsgroups corpus
Expand Down

0 comments on commit 0130a22

Please sign in to comment.