sentiment analysis

amaiya · Apr 21, 2023 · 05a6109 · 05a6109
1 parent ca40637
commit 05a6109
Show file tree

Hide file tree

Showing 9 changed files with 211 additions and 4 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -6,10 +6,10 @@ Most recent releases are shown at the top. Each release shows:
 - **Changed**: Additional parameters, changes to inputs or outputs, etc
 - **Fixed**: Bug fixes that don't change documented behaviour
 
-## 0.35.2 (TBD)
+## 0.36.dev (TBD)
 
 ### new:
-- N/A
+- easy-to-use-wrapper for sentiment analysis
 
 ### changed
 - N/A

diff --git a/README.md b/README.md
@@ -12,6 +12,16 @@
 
 
 ### News and Announcements
+- **2023-04-21**
+  - **ktrain 0.36.x** is released and supports **Sentiment Analysis**. See the [example notebook](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/develop/examples/text/sentiment_analysis_example.ipynb) for more information. 
+```python
+# Example: Sentiment Analysis
+from ktrain.text.sentiment import SentimentAnalyzer
+classifier = SentimentAnalyzer()
+result = classifier.predict('I got a promotion today.')
+# OUTPUT:
+# {'POSITIVE': 0.9021117091178894}
+```
 - **2023-04-01**
   - **ktrain 0.35.x** is released and supports **Generative AI** using an instruction-fine-tuned version of GPT-J that can run on your own machine.  See the [example notebook](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/develop/examples/text/generative_ai_example.ipynb) for more information. Supply prompts in the form of instructions for what you want the model to do:
 ```python
@@ -378,6 +388,7 @@ can be used out-of-the-box **without** having TensorFlow installed, as summarize
 | [Speech Transcription](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/develop/examples/text/speech_transcription_example.ipynb) (pretrained)     |  ❌  | ✅  |❌   |
 | [Image Captioning](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/develop/examples/vision/image_captioning_example.ipynb) (pretrained)     |  ❌  | ✅  |❌   |
 | [Object Detection](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/develop/examples/vision/object_detection_example.ipynb) (pretrained)     |  ❌  | ✅  |❌   |
+| [Sentiment Analysis](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/develop/examples/sentiment_analysis_example.ipynb) (pretrained)     |  ❌  | ✅  |❌   |
 | [Topic Modeling](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/master/tutorials/tutorial-05-learning_from_unlabeled_text_data.ipynb) (sklearn)  |  ❌  | ❌  | ✅  |
 | [Keyphrase Extraction](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/develop/examples/text/keyword_extraction_example.ipynb) (textblob/nltk/sklearn)   |  ❌  | ❌  | ✅  |
 

diff --git a/examples/README.md b/examples/README.md
@@ -19,6 +19,7 @@ This directory contains various example notebooks using *ktrain*.  The directory
   - [Universal Information Extraction](#extraction): an example of using a Question-Answering model for information extraction
   - [Keyphrase Extraction](#kwextraction): an example of keyphrase extraction in **ktrain**
   - [Indonesian Text Examples](#indonesian):  examples such as zero-Shot text classification and question-answering on Indonesian text by [Sandy Khosasi](https://github.com/ilos-vigil)
+  - [Sentiment Analysis Examples](#sentiment): simple-to-use sentiment analysis
   - [Generative AI Examples](#generativeai):  provide instructions to a language model running on your own machine to solve various tasks
 - `vision`:  
   - [image classification](#imageclass):  models for image datasetsimage classification examples using various models and datasets
@@ -155,6 +156,7 @@ The objective of the CoNLL2003 task is to classify sequences of words as belongi
 ### <a name="extraction"></a>Universal Information Extraction: [qa_information_extraction.ipynb](https://github.com/amaiya/ktrain/tree/master/examples/text)
 ### <a name="kwextraction"></a>Keyphrase Extraction: [keyword_extraction_example.ipynb](https://github.com/amaiya/ktrain/tree/master/examples/text)
 ### <a name="indonesian"></a> [Indonesian NLP examples by Sandy Khosasi](https://github.com/ilos-vigil/ktrain-assessment-study) including Indonesian question-answering, emotion recognition, and document similarity
+### <a name="sentiment"></a> Sentiment Analysis:  [sentiment_analysis_example.ipynb](https://github.com/amaiya/ktrain/tree/master/examples/text/)
 ### <a name="generativeai"></a> Generative AI Using GPT:  [generative_ai_example.ipynb](https://github.com/amaiya/ktrain/tree/master/examples/text/generative_ai_example.ipynb)
 
 

diff --git a/examples/text/sentiment_analysis_example.ipynb b/examples/text/sentiment_analysis_example.ipynb
@@ -0,0 +1,102 @@
+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%reload_ext autoreload\n",
+    "%autoreload 2\n",
+    "%matplotlib inline\n",
+    "import os\n",
+    "os.environ[\"CUDA_DEVICE_ORDER\"]=\"PCI_BUS_ID\";\n",
+    "os.environ[\"CUDA_VISIBLE_DEVICES\"]=\"0\"; "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from ktrain.text.sentiment import SentimentAnalyzer"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "classifier = SentimentAnalyzer()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "texts = [\"The lower pollen count has provided some relief from my allergies.\", \n",
+    "         \"It looks like there will be cost overruns.\",\n",
+    "         \"I will be at a doctor's appointment at 3:30pm.\",\n",
+    "         \"Tesla stock is falling.\"]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[{'POSITIVE': 0.8364812731742859},\n",
+       " {'NEGATIVE': 0.7623286247253418},\n",
+       " {'NEUTRAL': 0.9303346276283264},\n",
+       " {'NEGATIVE': 0.7317317724227905}]"
+      ]
+     },
+     "execution_count": null,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "classifier.predict(texts)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "{'POSITIVE': 0.9378765821456909,\n",
+       " 'NEUTRAL': 0.06050467491149902,\n",
+       " 'NEGATIVE': 0.0016188238514587283}"
+      ]
+     },
+     "execution_count": null,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "classifier.predict(\"I got a promotion at work today.\", return_all_scores=True)    "
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "python3",
+   "language": "python",
+   "name": "python3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/ktrain/text/sentiment/__init__.py b/ktrain/text/sentiment/__init__.py
@@ -0,0 +1 @@
+from .core import SentimentAnalyzer
diff --git a/ktrain/text/sentiment/core.py b/ktrain/text/sentiment/core.py
@@ -0,0 +1,81 @@
+from typing import Union
+from transformers import pipeline
+
+from ... import utils as U
+from ...torch_base import TorchBase
+
+
+class SentimentAnalyzer(TorchBase):
+    """
+    interface to Sentiment Analyzer
+    """
+
+    def __init__(self, device=None, **kwargs):
+        """
+        ```
+        ImageCaptioner constructor
+
+        Args:
+          device(str): device to use (e.g., 'cuda', 'cpu')
+        ```
+        """
+
+        super().__init__(
+            device=device, quantize=False, min_transformers_version="4.12.3"
+        )
+        self.pipeline = pipeline(
+            "text-classification",
+            model="cardiffnlp/twitter-roberta-base-sentiment",
+            device=self.device_to_id(),
+            **kwargs
+        )
+        self.mapping = {
+            "LABEL_0": "NEGATIVE",
+            "LABEL_1": "NEUTRAL",
+            "LABEL_2": "POSITIVE",
+        }
+
+    def predict(
+        self,
+        texts: Union[str, list],
+        return_all_scores=False,
+        batch_size=U.DEFAULT_BS,
+        **kwargs
+    ):
+        """
+        ```
+        Performs sentiment analysis
+
+        This method accepts a list of texts and predicts their sentiment as either 'NEGATIVE', 'NEUTRAL', 'POSITIVE'.
+        Args:
+            texts: str|list
+            return_all_scores(bool): If True, return all labels/scores
+            batch_size: size of batches sent to model
+        Returns:
+            A dictionary of labels and scores
+
+        ```
+        """
+        str_input = isinstance(texts, str)
+        if str_input:
+            texts = [texts]
+        chunks = U.batchify(texts, batch_size)
+        results = []
+        for chunk in chunks:
+            preds = self.pipeline(
+                chunk, top_k=len(self.mapping) if return_all_scores else 1, **kwargs
+            )
+            results.extend(preds)
+        results = [self._flatten_prediction(pred) for pred in results]
+        return results[0] if str_input else results
+
+    def _flatten_prediction(self, prediction: list):
+        """
+        ```
+        flatten prediction to the form {'label':score}
+        ```
+        """
+        return_dict = {}
+        for d in prediction:
+            return_dict[self.mapping[d["label"]]] = d["score"]
+        return return_dict
diff --git a/ktrain/version.py b/ktrain/version.py
@@ -1,2 +1,2 @@
 __all__ = ["__version__"]
-__version__ = "0.35.2"
+__version__ = "0.36.dev"
diff --git a/ktrain/vision/object_detection/core.py b/ktrain/vision/object_detection/core.py
@@ -12,7 +12,7 @@ class ObjectDetector(TorchBase):
     def __init__(self, device=None, classification=False, threshold=0.9):
         """
         ```
-        ImageCaptioner constructor
+        Object detection constructor
 
         Args:
           device(str): device to use (e.g., 'cuda', 'cpu')

diff --git a/tests/resources/extra_tests/testrun_ptmodels.py b/tests/resources/extra_tests/testrun_ptmodels.py
@@ -301,3 +301,13 @@
 result = ic.caption(ifiles)
 print(time.time() - start)
 print(result)
+
+
+# sentiment-analysis
+from ktrain.text.sentiment import SentimentAnalyzer
+
+classifier = SentimentAnalyzer()
+start = time.time()
+result = classifier.predict("I got a promotion today.")
+print(time.time() - start)
+print(result)