<a href="https://colab.research.google.com/github/Maksym-Tymchenko/johnsnow/blob/main/CLASS_FOR_SENTIMENT_DETECTION_USING_SNOW_LABS_PIPELINES.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Class for Sentiment Analysis for News Articles**

## 1. Colab Setup

In [1]:
# Install PySpark and Spark NLP
! pip install -q pyspark==3.1.2 spark-nlp

# Install Spark NLP Display lib
! pip install --upgrade -q spark-nlp-display

[K     |████████████████████████████████| 212.4 MB 71 kB/s 
[K     |████████████████████████████████| 140 kB 56.0 MB/s 
[K     |████████████████████████████████| 198 kB 58.9 MB/s 
[?25h  Building wheel for pyspark (setup.py) ... [?25l[?25hdone
[K     |████████████████████████████████| 95 kB 2.7 MB/s 
[K     |████████████████████████████████| 66 kB 5.6 MB/s 
[?25h

In [2]:
import sparknlp
import pandas as pd
from pyspark.ml import Pipeline
from pyspark.sql import SparkSession
import pyspark.sql.functions as F
from tabulate import tabulate
import sparknlp
from sparknlp.annotator import *
from sparknlp.base import *
from sparknlp.pretrained import PretrainedPipeline

spark = sparknlp.start()

print("Spark NLP version: ", sparknlp.version())
print("Apache Spark version: ", spark.version)

Spark NLP version:  3.4.0
Apache Spark version:  3.1.2


# Define a News Article

In [44]:
article = [ # two strings - headline & article body
"""Google sued in US over 'deceptive' location tracking""", # headline
"""Google is being sued in the US over accusations it deceived people about how to control location tracking.

The legal action refers to a widely reported 2018 revelation turning off one location-tracking setting in its apps was insufficient to fully disable the feature.

It accuses Google of using so-called dark patterns, marketing techniques that deliberately confuse.

Google said the claims were inaccurate and outdated.

'Unfair practices'
The legal action was filed in the District of Columbia. Similar ones were also filed in Texas, Indiana and Washington state.

It refers to an Associated Press revelation turning off Location History when using Google Maps or Search was insufficient - as a separate setting, Web and App Activity, continued to log location and other personal data.

The study, with researchers at Princeton University, found up to two billion Android and Apple devices could be affected.

"Google has relied on, and continues to rely on, deceptive and unfair practices that make it difficult for users to decline location tracking or to evaluate the data collection and processing to which they are purportedly consenting," the legal action alleges.

'Robust controls'
Google told BBC News the case was based "on inaccurate claims and outdated assertions about our settings".

A representative added: "We have always built privacy features into our products and provided robust controls for location data.

"We will vigorously defend ourselves and set the record straight."

Visual misdirection
The legal action claims Google's policies contained other "misleading, ambiguous and incomplete descriptions... but guarantee that consumers will not understand when their location is collected and retained by Google or for what purposes".

It refers to dark patterns, design choices that alter users' decision-making for the designer's benefit - such as, complicated navigation menus, visual misdirection, confusing wording and repeated nudging towards a particular outcome.

Data regulators are increasingly focusing on these practices.

Google faces a raft of other legal actions in the US, including:

In May 2020, Arizona filed a legal action over the same issue
In December 2020, multiple US states sued over the price and process of advertising auctions
In October 2020, the US Justice Department alleged Google had a monopoly over search and search advertising"""]



## Define the Senitment Identification Class

In [53]:
class SentimentIdentification:

    def __init__(self, MODEL_NAME):
        """Creates a class for sentiment identication using specified model.

        Args:
          MODEL_NAME: Name of the Spark NLP pretrained pipeline.
        """

        # Create the pipeline instance
        self.MODEL_NAME = MODEL_NAME
        self.pipeline_model = PretrainedPipeline(self.MODEL_NAME, lang = 'en')


    def predict(self, text):
        """Predicts sentiment of the input string..

        Args:
          text: String to classify.
        """
        self.text = text

        # Annotate simple sentence
        annotations =  self.pipeline_model.annotate(self.text)
        print(f"{annotations['sentiment']} {annotations['document']}")

# Classify Article using analyze_sentimentdl_glove_imdb pipeline

In [54]:
identifier = SentimentIdentification(MODEL_NAME = "analyze_sentimentdl_glove_imdb")

# Predict by headline
headline = article[0]
identifier.predict(headline)

# Predict by body
body = article[1]
identifier.predict(body)

analyze_sentimentdl_glove_imdb download started this may take some time.
Approx size to download 155.3 MB
[OK!]
['neg'] ["Google sued in US over 'deceptive' location tracking"]
['neg'] ['Google is being sued in the US over accusations it deceived people about how to control location tracking.\n\nThe legal action refers to a widely reported 2018 revelation turning off one location-tracking setting in its apps was insufficient to fully disable the feature.\n\nIt accuses Google of using so-called dark patterns, marketing techniques that deliberately confuse.\n\nGoogle said the claims were inaccurate and outdated.\n\n\'Unfair practices\'\nThe legal action was filed in the District of Columbia. Similar ones were also filed in Texas, Indiana and Washington state.\n\nIt refers to an Associated Press revelation turning off Location History when using Google Maps or Search was insufficient - as a separate setting, Web and App Activity, continued to log location and other personal data.\n\nThe s