In [42]:
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer
from textblob import TextBlob

In [41]:
import nltk

# Download all necessary NLTK resources
nltk.download('punkt')  # Tokenizers
nltk.download('stopwords')  # Stop words
nltk.download('wordnet')  # WordNet lemmatizer
nltk.download('averaged_perceptron_tagger')  # POS tagging for lemmatization
nltk.download('vader_lexicon')  # Sentiment analysis lexicon
nltk.download('omw-1.4')  # WordNet data for lemmatization


[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\SAMI\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping tokenizers\punkt.zip.
[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\SAMI\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping corpora\stopwords.zip.
[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\SAMI\AppData\Roaming\nltk_data...
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     C:\Users\SAMI\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping taggers\averaged_perceptron_tagger.zip.
[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\SAMI\AppData\Roaming\nltk_data...
[nltk_data] Downloading package omw-1.4 to
[nltk_data]     C:\Users\SAMI\AppData\Roaming\nltk_data...


True

## Text Preprocessing Function

The `preprocess_text` function is designed to clean and prepare text data for analysis. This function performs several essential preprocessing steps to enhance the quality of the text, making it more suitable for further natural language processing tasks. Below are the key operations performed by the function:

1. **Tokenization**: 
   - The input text is converted to lowercase and split into individual words (tokens).
   
2. **Stopword Removal**:
   - Commonly used words (stopwords) that do not contribute meaningful information are filtered out from the tokenized text. This helps in reducing noise in the data.

3. **Lemmatization**:
   - Each remaining token is lemmatized, which means it is transformed into its base or root form. This process helps in reducing different forms of a word to a common base form, ensuring consistency in the analysis.

4. **Reconstruction**:
   - The processed tokens are then joined back into a single string, which represents the cleaned and prep   return processed_text


In [43]:
def preprocess_text(text):
    # Tokenize the text
    tokens = text.lower().split()
    filtered_tokens = [token for token in tokens if token not in stopwords.words('english')]
    lemmatizer = WordNetLemmatizer()
    lemmatized_tokens = [lemmatizer.lemmatize(token) for token in filtered_tokens]
    processed_text = ' '.join(lemmatized_tokens)
    return processed_text

## Sentiment Analysis Functions

In this section, we define two functions that leverage the TextBlob library to analyze the sentiment of a given text. These functions extract subjective and polarity scores, which are crucial for understanding the emotional tone of the text.

### 1. Function to Get Subjectivity
The `getSubjectivity` function calculates the subjectivity of the input text. Subjectivity refers to how personal or opinion-based the text is, with scores ranging from 0 (objective) to 1 (subjective). A higher score indicates a greater degree of subjectivity.

### 2. Function to Get Polarity
The `getPolarity` function computes the polarity of the input text, which represents the sentiment's orientation. Polarity scores range from -1 (very negative) to +1 (very positive). A score of 0 indicates a neutral sentiment.



In [44]:
#create function to get subjectivity
def getSubjectivity(text):
    return TextBlob(text).sentiment.subjectivity

#create function to get polarity
def getPolarity(text):
    return TextBlob(text).sentiment.polarity

## Sentiment Analysis Function

The `getSentiment` function is designed to classify the sentiment of a given score into three categories: Negative, Neutral, and Positive.


In [54]:
#create function to get sentiment data
def getSentiment(score):
    if score < 0:
        return 'Negative'
    elif score == 0:
        return 'Neutral'
    else:
        return 'Positive'

## Opinion Level Classification Function

The `getOpinionLevel` function is designed to classify the level of opinion expressed in a given score into four categories: Factual, Mostly Factual, Opinionated, and Highly Opinionated. This function helps in understanding how subjective or objective the text data iore < 0.5:


In [57]:
def getOpinionLevel(score):
    if score == 0:
        return 'Factual'
    elif score > 0 and score < 0.5:
        return 'Mostly Factual'
    elif score >= 0.5 and score < 1:
        return 'Opinionated'
    else:
        return 'Highly Opinionated'


In [45]:
comment = '''
    One of the best articles ever written on the topic. 
    It clearly reflects the differences without any unnecessary 
    details and is really to the point. Great job, Shweta!
'''

In [46]:
preprocessed_comment = preprocess_text(comment)

In [48]:
preprocessed_comment

'one best article ever written topic. clearly reflects difference without unnecessary detail really point. great job, shweta!'

In [55]:
getSentiment(getPolarity(preprocessed_comment))

'Positive'

In [58]:
getOpinionLevel(getSubjectivity(preprocessed_comment))

'Opinionated'