Merge branch 'develop'

amaiya · Jul 29, 2020 · 041c0c0 · 041c0c0
2 parents d2499a5 + e33d971
commit 041c0c0
Show file tree

Hide file tree

Showing 30 changed files with 47,385 additions and 53 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -6,6 +6,26 @@ Most recent releases are shown at the top. Each release shows:
 - **Changed**: Additional parameters, changes to inputs or outputs, etc
 - **Fixed**: Bug fixes that don't change documented behaviour
 
+## 0.19.0 (2020-07-29)
+
+### New:
+- support for `tabular` data including explainable AI for tabular predictions
+- `learner.validate` and `learner.evaluate` now support regression models
+- added `restore_weights_only` flag to `lr_find`.  When True, only the model weights will be restored after
+  simulating training, not the optimizer weights. In at least a few observed cases, this "warm up" seems to improve performance
+  when actual training begins. Further investigation is needed, so it is False by default.
+
+### Changed
+- N/A
+
+### Fixed:
+- added `save_path` argument to `Learner.validate` and `Learner.evaluate`.  If `print_report=False`, classification
+  report will be saved as CSV to `save_path`.
+- Use `torch.no_grad` with `ZeroShotClassifier.predict` to [prevent OOM](https://github.com/amaiya/ktrain/issues/215)
+- Added `max_length` parameter to `ZeroShotClassifier.predict` to [prevent errors on long documnets](https://github.com/amaiya/ktrain/issues/215)
+- Added type check to `TransformersPreprocessor.preprocess_train`
+
+
 ## 0.18.5 (2020-07-20)
 
 ### New:

diff --git a/README.md b/README.md
@@ -7,6 +7,10 @@
 
 
 ### News and Announcements
+- **2020-07-29:**  
+  - ***ktrain*** **v0.19.x is released** and now includes support for "traditional" **tabular data** and **explainable AI for tabular predictions**.  See the [tutorial notebook on tabular models](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/develop/tutorials/tutorial-08-tabular_classification_and_regression.ipynb) for both:
+    - a classification example (using the [Kaggle's Titanic passenger survival prediction dataset](https://www.kaggle.com/c/titanic)) 
+    - a regression example (using the [UCI's Adults census dataset](http://archive.ics.uci.edu/ml/datasets/Census+Income) for age prediction)
 - **2020-07-07:**  
   - ***ktrain*** **v0.18.x is released** and now includes support for TensorFlow 2.2.0. Due to various TensorFlow 2.2.0 bugs, TF 2.2.0 is only installed if Python 3.8 is being used. 
     Otherwise,  TensorFlow 2.1.0 is always installed (i.e., on Python 3.6 and 3.7 systems).
@@ -49,7 +53,7 @@ zsl.predict(doc, topic_strings=topic_strings, include_labels=True)
 
 *ktrain* is a lightweight wrapper for the deep learning library [TensorFlow Keras](https://www.tensorflow.org/guide/keras/overview) (and other libraries) to help build, train, and deploy neural networks and other machine learning models.  Inspired by ML framework extensions like *fastai* and *ludwig*, it is designed to make deep learning and AI more accessible and easier to apply for both newcomers and experienced practitioners. With only a few lines of code, ktrain allows you to easily and quickly:
 
-- employ fast, accurate, and easy-to-use pre-canned models for  `text`, `vision`, and `graph` data:
+- employ fast, accurate, and easy-to-use pre-canned models for  `text`, `vision`, `graph`, and `tabular` data:
   - `text` data:
      - **Text Classification**: [BERT](https://arxiv.org/abs/1810.04805), [DistilBERT](https://arxiv.org/abs/1910.01108), [NBSVM](https://www.aclweb.org/anthology/P12-2018), [fastText](https://arxiv.org/abs/1607.01759), and other models <sub><sup>[[example notebook](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/master/examples/text/IMDb-BERT.ipynb)]</sup></sub>
      - **Text Regression**: [BERT](https://arxiv.org/abs/1810.04805), [DistilBERT](https://arxiv.org/abs/1910.01108), Embedding-based linear text regression, [fastText](https://arxiv.org/abs/1607.01759), and other models <sub><sup>[[example notebook](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/master/examples/text/text_regression_example.ipynb)]</sup></sub>
@@ -69,6 +73,10 @@ zsl.predict(doc, topic_strings=topic_strings, include_labels=True)
   - `graph` data:
     - **node classification** with graph neural networks ([GraphSAGE](https://cs.stanford.edu/people/jure/pubs/graphsage-nips17.pdf)) <sub><sup>[[example notebook](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/master/examples/graphs/pubmed_node_classification-GraphSAGE.ipynb)]</sup></sub>
     - **link prediction** with graph neural networks ([GraphSAGE](https://cs.stanford.edu/people/jure/pubs/graphsage-nips17.pdf)) <sub><sup>[[example notebook](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/master/examples/graphs/cora_link_prediction-GraphSAGE.ipynb)]</sup></sub>
+  - `tabular` data:
+    - **tabular classification** using the Titanic dataset <sub><sup>[[example notebook](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/develop/tutorials/tutorial-08-tabular_classification_and_regression.ipynb)]</sup></sub>
+    - **tabular regression** using Census data <sub><sup>[[example notebook](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/develop/tutorials/tutorial-08-tabular_classification_and_regression.ipynb)]</sup></sub>
+
 - estimate an optimal learning rate for your model given your data using a Learning Rate Finder
 - utilize learning rate schedules such as the [triangular policy](https://arxiv.org/abs/1506.01186), the [1cycle policy](https://arxiv.org/abs/1803.09820), and [SGDR](https://arxiv.org/abs/1608.03983) to effectively minimize loss and improve generalization
 - build text classifiers for any language (e.g., [Chinese Sentiment Analysis with BERT](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/master/examples/text/ChineseHotelReviews-BERT.ipynb), [Arabic Sentiment Analysis with NBSVM](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/master/examples/text/ArabicHotelReviews-nbsvm.ipynb))
@@ -87,6 +95,7 @@ Please see the following tutorial notebooks for a guide on how to use *ktrain* o
 * Tutorial 5: [Learning from Unlabeled Text Data](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/master/tutorials/tutorial-05-learning_from_unlabeled_text_data.ipynb)
 * Tutorial 6: [Text Sequence Tagging](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/master/tutorials/tutorial-06-sequence-tagging.ipynb) for Named Entity Recognition
 * Tutorial 7: [Graph Node Classification](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/master/tutorials/tutorial-07-graph-node_classification.ipynb) with Graph Neural Networks
+* Tutorial 8: [Tabular Classification and Regression](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/develop/tutorials/tutorial-08-tabular_classification_and_regression.ipynb) 
 * Tutorial A1: [Additional tricks](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/master/tutorials/tutorial-A1-additional-tricks.ipynb), which covers topics such as previewing data augmentation schemes, inspecting intermediate output of Keras models for debugging, setting global weight decay, and use of built-in and custom callbacks.
 * Tutorial A2: [Explaining Predictions and Misclassifications](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/master/tutorials/tutorial-A2-explaining-predictions.ipynb)
 * Tutorial A3: [Text Classification with Hugging Face Transformers](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/master/tutorials/tutorial-A3-hugging_face_transformers.ipynb)

diff --git a/examples/README.md b/examples/README.md
@@ -1,6 +1,6 @@
 # Example Notebooks
 
-This directory contains various example notebooks using *ktrain*.  The directory currently has three folders:
+This directory contains various example notebooks using *ktrain*.  The directory currently has four folders:
 - `text`:  
   - [text classification](#textclass): examples using various text classification models and datasets
   - [text regression](#textregression): example for predicting continuous value purely from text
@@ -20,6 +20,9 @@ This directory contains various example notebooks using *ktrain*.  The directory
 - `graphs`: 
   - [node classification](#-graph-node-classification-datasets): node classification in graphs or networks
   - [link prediction](#-graph-link-prediction-datasets): link prediction in graphs or networks
+- `tabular`: 
+  - [classification](#-tabular-classification-datasets): classification for tabular data
+  - [regression](#-tabular-regression-datasets): regression for tabular data
 
 
 ## Text Data
@@ -211,4 +214,37 @@ In the Cora citation graph, each node represents a paper. Links represent citati
 - [cora_link_prediction-GraphSAGE.ipynb](https://github.com/amaiya/ktrain/tree/master/examples/graphs): GraphSAGE model to predict missing links in the citation network.
 
 
+## Tabular Data
 
+### <a name="#tabularclass"></a> Tabular Classification Datasets
+
+#### [Titanic Survival Prediction](https://www.kaggle.com/c/titanic):  Tabular Classification
+
+This is the well-studied Titanic dataset from Kaggle.  The goal is to predict which passengers survived the Titanic disaster based on their attributes.
+
+- [tabular_classification_and_regression_example.ipynb](https://github.com/amaiya/ktrain/tree/master/examples/tabular): MLP for tabular classification
+
+
+#### [Income Prediction from Census Data](http://archive.ics.uci.edu/ml/datasets/Adult):  Tabular Classification
+
+This is the same dataset used in the [AutoGluon classification example](https://autogluon.mxnet.io/tutorials/tabular_prediction/tabular-quickstart.html).
+The goal is to predict which individuals make over $50K per year.
+
+
+- [IncomePrediction-MLP.ipynb](https://github.com/amaiya/ktrain/tree/master/examples/tabular): MLP for tabular classification
+
+
+### <a name="#tabularreg"></a> Tabular Regression Datasets
+
+
+#### [Adults Census Dataset](http://archive.ics.uci.edu/ml/datasets/Adult):  Tabular Regression
+
+The original goal of this dataset is to predict the individuals that make over $50K in this Census dataset.  We change the task to a regression problem
+and predict the Age attribute for each individual.  This is the same dataset used in the [AutoGluon regression example](https://autogluon.mxnet.io/tutorials/tabular_prediction/tabular-quickstart.html).
+
+- [tabular_classification_and_regression_example.ipynb](https://github.com/amaiya/ktrain/tree/master/examples/tabular): MLP for tabular regression
+
+
+#### [House Price Prediction](https://www.kaggle.com/c/house-prices-advanced-regression-techniques):  Tabular Regression
+
+- [HousePricePrediction-MLP.ipynb](https://github.com/amaiya/ktrain/tree/master/examples/tabular): MLP for tabular regression