fixed conflict

amaiya · Jan 29, 2022 · 3de019a · 3de019a
2 parents a871779 + ef43d36
commit 3de019a
Show file tree

Hide file tree

Showing 20 changed files with 502 additions and 119 deletions.
diff --git a/.gitignore b/.gitignore
@@ -97,6 +97,9 @@ venv.bak/
 # Rope project settings
 .ropeproject
 
+# VSCode project settings
+.vscode
+
 # mkdocs documentation
 /site
 

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -6,6 +6,22 @@ Most recent releases are shown at the top. Each release shows:
 - **Changed**: Additional parameters, changes to inputs or outputs, etc
 - **Fixed**: Bug fixes that don't change documented behaviour
 
+## 0.29.0 (2022-01-28)
+
+### new:
+- New vision models: added MobileNetV3-Small and EfficientNet.  Thanks to @ilos-vigil.
+
+### changed
+- `core.Learner.plot` now supports plotting of any value that exists in the training `History` object (e.g., `mae` if previously specified as metric). Thanks to @ilos-vigil.
+- added `raw_confidence` parameter to `QA.ask` method to return raw confidence scores. Thanks to @ilos-vigil.
+
+### fixed:
+- pin to `transformers==4.10.3` due to Issue #398
+- pin to `syntok==1.3.3` due to bug with `syntok==1.4.1` causing paragraph tokenization in `qa` module to break
+- properly suppress TF/CUDA warnings by default
+- ensure document fed to `keras_bert`  tokenizer to avoid [this issue](https://stackoverflow.com/questions/67360987/bert-model-bug-encountered-during-training/67375675#67375675)
+
+
 ## 0.28.3 (2021-11-05)
 
 ### new:

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -0,0 +1,37 @@
+# Contributing to ktrain
+
+We are happy to accept your contributions to make `ktrain` better! To avoid unnecessary work, please stick to the following process:
+
+1. Check if there is already [an issue](https://github.com/amaiya/ktrain/issues) for your concern.
+2. If there is not, open a new one to start a discussion. We hate to close finished PRs.
+3. We would be happy to accept a pull request, if it is decided that your concern requires a code change.
+
+
+## Developing locally
+
+We suggest cloning the repository and then checking out tutorials and examples for information on how to call various methods.
+Most relevant classes and methods should be documented. If not, you might consider helping to improve the docstrings.
+
+### Setup
+
+See the [installation instructions](https://github.com/amaiya/ktrain#installation) for setting things up. Using virtual environment (such as [venv](https://docs.python.org/3/library/venv.html) and [Poetry](https://python-poetry.org/)) is strongly recommended.
+
+### Tests
+
+To run all tests, execute:
+```bash
+cd ktrain/tests
+python3 -m unittest
+```
+
+To run a specific test (e.g., `test_dataloading.py`)
+```bash
+python3 test_dataloading.py
+```
+
+## PR Guidelines
+
+- Keep each PR focused. While it's more convenient, please try to avoid combining several unrelated fixes together.
+- Checkout to `develop` branch before make any changes. Make sure you choose `develop` branch as base on your PR.
+- Try to maintain backwards compatibility.  If this is not possible, please discuss with maintainer(s).
+- Use four spaces for indentation. 
diff --git a/README.md b/README.md
@@ -10,6 +10,8 @@
 
 
 ### News and Announcements
+- **2022-01-28**
+  - **ktrain v0.29.x** is released and includes miscellaneous enhancements contributed by [Sandy Khosasi](https://github.com/ilos-vigil) such as [support for MobileNetV3 and EfficientNet](https://colab.research.google.com/drive/1EJHpMVG6fBCg33UPla_Ly_6LQdswU2Ur?usp=sharing), [plotting improvements](https://colab.research.google.com/drive/1_WaRQ0J4g0VTn6HWS3kszdFZbBBWoa7R?usp=sharing), and [raw confidence scores in QA](https://colab.research.google.com/drive/1ParprLN9hFX6cxJ1w7bv91PYx4o0J1zm?usp=sharing).
 - **2021-10-13**
   - **ktrain v0.28.x** is released and now includes the `AnswerExtractor`, which allows you to extract any information of interest from documents by simply phrasing it in the form of a question. A short example is shown here, but see the [example notebook](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/master/examples/text/qa_information_extraction.ipynb) for more information.
 ```python
@@ -101,7 +103,7 @@ Please see the following tutorial notebooks for a guide on how to use **ktrain**
 * Tutorial A4: [Using Custom Data Formats and Models: Text Regression with Extra Regressors](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/master/tutorials/tutorial-A4-customdata-text_regression_with_extra_regressors.ipynb)
 
 
-Some blog tutorials about **ktrain** are shown below:
+Some blog tutorials and other guides about **ktrain** are shown below:
 
 > [**ktrain: A Lightweight Wrapper for Keras to Help Train Neural Networks**](https://towardsdatascience.com/ktrain-a-lightweight-wrapper-for-keras-to-help-train-neural-networks-82851ba889c) 
 
@@ -114,6 +116,8 @@ Some blog tutorials about **ktrain** are shown below:
 
 > [**Finetuning BERT using ktrain for Disaster Tweets Classification**](https://medium.com/analytics-vidhya/finetuning-bert-using-ktrain-for-disaster-tweets-classification-18f64a50910b) by Hamiz Ahmed
 
+> [**Indonesian NLP Examples with ktrain**](https://github.com/ilos-vigil/ktrain-assessment-study) by Sandy Khosasi
+
 
 
 
@@ -323,9 +327,12 @@ Using **ktrain** on **Google Colab**?  See these Colab examples:
 
 3. Install *ktrain*: `pip install ktrain`
 
+
 The above should be all you need on Linux systems and cloud computing environments like Google Colab and AWS EC2.  If you are using **ktrain** on a **Windows computer**, you can follow these 
 [more detailed instructions](https://github.com/amaiya/ktrain/blob/master/FAQ.md#how-do-i-install-ktrain-on-a-windows-machine) that include some extra steps.
 
+**ktrain** should currently support any version of TensorFlow at or above to v2.3: i.e., `pip install tensorflow>=2.3`.
+
 **Some important things to note about installation:**
 
 Some optional, extra libraries used for some operations can be installed as needed:
@@ -349,7 +356,11 @@ pip install datasets
 ```
 Notice that **ktrain** is using forked versions of the `eli5` and `stellargraph` libraries above in order to support TensorFlow2.
 
-
+<!--
+pip install pdoc3==0.9.2
+pdoc3 --html -o docs ktrain
+diff -qr docs/ktrain/ /path/to/repo/ktrain/docs
+-->
 
 ### How to Cite
 

diff --git a/docs/core.html b/docs/core.html
@@ -686,14 +686,18 @@ <h1 class="title">Module <code>ktrain.core</code></h1>
         ```
         plots training history
         Args:
-          plot_type (str):  one of {&#39;loss&#39;, &#39;lr&#39;, &#39;momentum&#39;}
+          plot_type (str):  A valid value in tf.keras History.  Either a built-in value  {&#39;loss&#39;, &#39;lr&#39;, &#39;momentum&#39;} or
+                            other values previously specified by user.  For instance, if &#39;mae&#39; and/or &#39;mse&#39; is previously specified as metrics
+                            when creating model, then these values can also be specified.
           return_fig(bool):  If True, return matplotlib.figure.Figure
         Return:
           matplotlib.figure.Figure if return_fig else None
         ```
         &#34;&#34;&#34;
         if self.history is None:
             raise Exception(&#39;No training history - did you train the model yet?&#39;)
+        if not isinstance(plot_type, str):
+            raise ValueError(&#39;plot_type must be str/string&#39;)
 
         fig = None
         if plot_type == &#39;loss&#39;:
@@ -722,7 +726,22 @@ <h1 class="title">Module <code>ktrain.core</code></h1>
             plt.ylabel(&#39;momentum&#39;)
             plt.xlabel(&#39;iterations&#39;)
         else:
-            raise ValueError(&#39;invalid type: choose loss, lr, or momentum&#39;)
+            if plot_type not in self.history.history:
+                raise ValueError(f&#39;no {plot_type} in history: are you sure {plot_type} exists in history?&#39;)
+            plt.plot(self.history.history[plot_type])
+
+            val_key = f&#39;val_{plot_type}&#39;
+            if val_key in self.history.history:
+                plt.plot(self.history.history[val_key])
+                legend_items = [&#39;train&#39;, &#39;validation&#39;]
+            else:
+                warnings.warn(f&#39;Validation value for {plot_type} wasn\&#39;t found in history&#39;)
+                legend_items = [&#39;train&#39;]
+
+            plt.title(f&#39;History of {plot_type}&#39;)
+            plt.ylabel(plot_type)
+            plt.xlabel(&#39;epoch&#39;)
+            plt.legend(legend_items, loc=&#39;upper left&#39;)
         fig = plt.gcf()
         plt.show()
         if return_fig: return fig
@@ -1580,8 +1599,12 @@ <h1 class="title">Module <code>ktrain.core</code></h1>
             preproc.datagen.preprocessing_function = pre_resnet50
         elif preproc_name == &#39;mobilenet&#39;:
             preproc.datagen.preprocessing_function = pre_mobilenet
+        elif preproc_name == &#39;mobilenetv3&#39;:
+            preproc.datagen.preprocessing_function = pre_mobilenetv3small
         elif preproc_name == &#39;inception&#39;:
             preproc.datagen.preprocessing_function = pre_inception
+        elif preproc_name == &#39;efficientnet&#39;:
+            preproc.datagen.preprocessing_function = pre_efficientnet
         else:
             raise Exception(&#39;Uknown preprocessing_function name: %s&#39; % (preproc_name))
 
@@ -1835,8 +1858,12 @@ <h2 class="section-title" id="header-functions">Functions</h2>
             preproc.datagen.preprocessing_function = pre_resnet50
         elif preproc_name == &#39;mobilenet&#39;:
             preproc.datagen.preprocessing_function = pre_mobilenet
+        elif preproc_name == &#39;mobilenetv3&#39;:
+            preproc.datagen.preprocessing_function = pre_mobilenetv3small
         elif preproc_name == &#39;inception&#39;:
             preproc.datagen.preprocessing_function = pre_inception
+        elif preproc_name == &#39;efficientnet&#39;:
+            preproc.datagen.preprocessing_function = pre_efficientnet
         else:
             raise Exception(&#39;Uknown preprocessing_function name: %s&#39; % (preproc_name))
 
@@ -3492,14 +3519,18 @@ <h3>Inherited members</h3>
         ```
         plots training history
         Args:
-          plot_type (str):  one of {&#39;loss&#39;, &#39;lr&#39;, &#39;momentum&#39;}
+          plot_type (str):  A valid value in tf.keras History.  Either a built-in value  {&#39;loss&#39;, &#39;lr&#39;, &#39;momentum&#39;} or
+                            other values previously specified by user.  For instance, if &#39;mae&#39; and/or &#39;mse&#39; is previously specified as metrics
+                            when creating model, then these values can also be specified.
           return_fig(bool):  If True, return matplotlib.figure.Figure
         Return:
           matplotlib.figure.Figure if return_fig else None
         ```
         &#34;&#34;&#34;
         if self.history is None:
             raise Exception(&#39;No training history - did you train the model yet?&#39;)
+        if not isinstance(plot_type, str):
+            raise ValueError(&#39;plot_type must be str/string&#39;)
 
         fig = None
         if plot_type == &#39;loss&#39;:
@@ -3528,7 +3559,22 @@ <h3>Inherited members</h3>
             plt.ylabel(&#39;momentum&#39;)
             plt.xlabel(&#39;iterations&#39;)
         else:
-            raise ValueError(&#39;invalid type: choose loss, lr, or momentum&#39;)
+            if plot_type not in self.history.history:
+                raise ValueError(f&#39;no {plot_type} in history: are you sure {plot_type} exists in history?&#39;)
+            plt.plot(self.history.history[plot_type])
+
+            val_key = f&#39;val_{plot_type}&#39;
+            if val_key in self.history.history:
+                plt.plot(self.history.history[val_key])
+                legend_items = [&#39;train&#39;, &#39;validation&#39;]
+            else:
+                warnings.warn(f&#39;Validation value for {plot_type} wasn\&#39;t found in history&#39;)
+                legend_items = [&#39;train&#39;]
+
+            plt.title(f&#39;History of {plot_type}&#39;)
+            plt.ylabel(plot_type)
+            plt.xlabel(&#39;epoch&#39;)
+            plt.legend(legend_items, loc=&#39;upper left&#39;)
         fig = plt.gcf()
         plt.show()
         if return_fig: return fig
@@ -4712,7 +4758,9 @@ <h3>Methods</h3>
 <dd>
 <div class="desc"><pre><code>plots training history
 Args:
-  plot_type (str):  one of {'loss', 'lr', 'momentum'}
+  plot_type (str):  A valid value in tf.keras History.  Either a built-in value  {'loss', 'lr', 'momentum'} or
+                    other values previously specified by user.  For instance, if 'mae' and/or 'mse' is previously specified as metrics
+                    when creating model, then these values can also be specified.
   return_fig(bool):  If True, return matplotlib.figure.Figure
 Return:
   matplotlib.figure.Figure if return_fig else None
@@ -4726,14 +4774,18 @@ <h3>Methods</h3>
     ```
     plots training history
     Args:
-      plot_type (str):  one of {&#39;loss&#39;, &#39;lr&#39;, &#39;momentum&#39;}
+      plot_type (str):  A valid value in tf.keras History.  Either a built-in value  {&#39;loss&#39;, &#39;lr&#39;, &#39;momentum&#39;} or
+                        other values previously specified by user.  For instance, if &#39;mae&#39; and/or &#39;mse&#39; is previously specified as metrics
+                        when creating model, then these values can also be specified.
       return_fig(bool):  If True, return matplotlib.figure.Figure
     Return:
       matplotlib.figure.Figure if return_fig else None
     ```
     &#34;&#34;&#34;
     if self.history is None:
         raise Exception(&#39;No training history - did you train the model yet?&#39;)
+    if not isinstance(plot_type, str):
+        raise ValueError(&#39;plot_type must be str/string&#39;)
 
     fig = None
     if plot_type == &#39;loss&#39;:
@@ -4762,7 +4814,22 @@ <h3>Methods</h3>
         plt.ylabel(&#39;momentum&#39;)
         plt.xlabel(&#39;iterations&#39;)
     else:
-        raise ValueError(&#39;invalid type: choose loss, lr, or momentum&#39;)
+        if plot_type not in self.history.history:
+            raise ValueError(f&#39;no {plot_type} in history: are you sure {plot_type} exists in history?&#39;)
+        plt.plot(self.history.history[plot_type])
+
+        val_key = f&#39;val_{plot_type}&#39;
+        if val_key in self.history.history:
+            plt.plot(self.history.history[val_key])
+            legend_items = [&#39;train&#39;, &#39;validation&#39;]
+        else:
+            warnings.warn(f&#39;Validation value for {plot_type} wasn\&#39;t found in history&#39;)
+            legend_items = [&#39;train&#39;]
+
+        plt.title(f&#39;History of {plot_type}&#39;)
+        plt.ylabel(plot_type)
+        plt.xlabel(&#39;epoch&#39;)
+        plt.legend(legend_items, loc=&#39;upper left&#39;)
     fig = plt.gcf()
     plt.show()
     if return_fig: return fig

diff --git a/docs/imports.html b/docs/imports.html
@@ -39,6 +39,9 @@ <h1 class="title">Module <code>ktrain.imports</code></h1>
 os.environ[&#39;NUMEXPR_MAX_THREADS&#39;] = &#39;8&#39; # suppress warning from NumExpr on machines with many CPUs
 
 # TensorFlow
+SUPPRESS_DEP_WARNINGS = strtobool(os.environ.get(&#39;SUPPRESS_DEP_WARNINGS&#39;, &#39;1&#39;))
+if SUPPRESS_DEP_WARNINGS: # 2021-11-12:  copied this here to properly suppress TF/CUDA warnings in Kaggle notebooks, etc. 
+    os.environ[&#34;TF_CPP_MIN_LOG_LEVEL&#34;] = &#34;3&#34;
 DISABLE_V2_BEHAVIOR = strtobool(os.environ.get(&#39;DISABLE_V2_BEHAVIOR&#39;, &#39;0&#39;))
 if DISABLE_V2_BEHAVIOR:
     # TF2-transition
@@ -142,9 +145,23 @@ <h1 class="title">Module <code>ktrain.imports</code></h1>
 ResNet50 = keras.applications.ResNet50
 MobileNet = keras.applications.mobilenet.MobileNet
 InceptionV3 = keras.applications.inception_v3.InceptionV3
+EfficientNetB1 = keras.applications.efficientnet.EfficientNetB1
+EfficientNetB7 = keras.applications.efficientnet.EfficientNetB7
 pre_resnet50 = keras.applications.resnet50.preprocess_input
 pre_mobilenet = keras.applications.mobilenet.preprocess_input
 pre_inception = keras.applications.inception_v3.preprocess_input
+pre_efficientnet = keras.applications.efficientnet.preprocess_input
+
+# for TF backwards compatibility (e.g., support for TF 2.3.x):
+try:
+    MobileNetV3Small = keras.applications.MobileNetV3Small
+    pre_mobilenetv3small = keras.applications.mobilenet_v3.preprocess_input
+    HAS_MOBILENETV3 = True
+except:
+    HAS_MOBILENETV3 = False
+
+
+
 
 
 #----------------------------------------------------------
@@ -277,7 +294,6 @@ <h1 class="title">Module <code>ktrain.imports</code></h1>
 
 
 # Suppress Warnings
-SUPPRESS_DEP_WARNINGS = strtobool(os.environ.get(&#39;SUPPRESS_DEP_WARNINGS&#39;, &#39;1&#39;))
 def set_global_logging_level(level=logging.ERROR, prefices=[&#34;&#34;]):
     &#34;&#34;&#34;
     Override logging levels of different modules based on their name as a prefix.

diff --git a/docs/index.html b/docs/index.html
@@ -479,8 +479,12 @@ <h2 class="section-title" id="header-functions">Functions</h2>
             preproc.datagen.preprocessing_function = pre_resnet50
         elif preproc_name == &#39;mobilenet&#39;:
             preproc.datagen.preprocessing_function = pre_mobilenet
+        elif preproc_name == &#39;mobilenetv3&#39;:
+            preproc.datagen.preprocessing_function = pre_mobilenetv3small
         elif preproc_name == &#39;inception&#39;:
             preproc.datagen.preprocessing_function = pre_inception
+        elif preproc_name == &#39;efficientnet&#39;:
+            preproc.datagen.preprocessing_function = pre_efficientnet
         else:
             raise Exception(&#39;Uknown preprocessing_function name: %s&#39; % (preproc_name))
 

diff --git a/docs/text/preprocessor.html b/docs/text/preprocessor.html
@@ -191,6 +191,8 @@ <h1 class="title">Module <code>ktrain.text.preprocessor</code></h1>
     indices = []
     for i in mb:
         for doc in pb:
+            # https://stackoverflow.com/questions/67360987/bert-model-bug-encountered-during-training/67375675#67375675
+            doc = str(doc) if isinstance(doc, (float, int)) else doc
             ids, segments = tokenizer.encode(doc, max_len=max_length)
             indices.append(ids)
         if verbose: mb.write(&#39;done.&#39;)
@@ -1582,6 +1584,8 @@ <h2 class="section-title" id="header-functions">Functions</h2>
     indices = []
     for i in mb:
         for doc in pb:
+            # https://stackoverflow.com/questions/67360987/bert-model-bug-encountered-during-training/67375675#67375675
+            doc = str(doc) if isinstance(doc, (float, int)) else doc
             ids, segments = tokenizer.encode(doc, max_len=max_length)
             indices.append(ids)
         if verbose: mb.write(&#39;done.&#39;)