In [1]:
import pandas as pd

In [None]:
""" Customer Remarks:

Companies collect customer feedback, which can be positive or negative.
These remarks are used to understand customer sentiments and identify advocates (loyal customers who promote the company).

Natural Language Processing (NLP):

NLP is a field of AI that helps computers understand human language.
In this context, NLP is used to analyze customer remarks and determine their sentiment (positive, negative, or neutral).


Steps in Python:
Loading Data:

Customer remarks are loaded into a Python DataFrame for analysis.

Tokenizing Text:

Tokenization is the process of breaking down text into individual words or tokens.
Using nltk (Natural Language Toolkit), we tokenize the customer remarks to analyze each word.

Vectorizing Text:

Convert the text data into numerical data using CountVectorizer from scikit-learn.
This step creates a matrix of word counts, making it easier to analyze the frequency of each word.

Sentiment Analysis:

Use the vaderSentiment package to analyze the sentiment of the remarks.
The SentimentIntensityAnalyzer assigns a score to each remark, indicating whether it is positive, negative, or neutral.


Example:
Customer Remark: "This is the best bank on the planet."

Sentiment: Positive

Customer Remark: "Lots of changes to their savings account product. It is terrible."

Sentiment: Negative


Identifying Advocates:
By analyzing the sentiment scores of customer remarks, you can identify which customers are advocates (those with consistently positive remarks).
This information can be used to engage with these advocates and leverage their positive feedback to promote the company.

Practical Application:
In your role, you can use these techniques to analyze feedback from stakeholders or members of your organization.
This can help you identify key supporters and understand areas for improvement.
"""

In [2]:
df = pd.read_csv("data/ci-data.csv")

In [3]:
df.remarks[0:5]

0    In hac habitasse platea dictumst. Etiam faucib...
1    Praesent blandit. Nam nulla. Integer pede just...
2    Praesent id massa id nisl venenatis lacinia. A...
3    In hac habitasse platea dictumst. Morbi vestib...
4    Pellentesque at nulla. Suspendisse potenti. Cr...
Name: remarks, dtype: object

In [4]:
remarks = ['This is the best bank on the planet.',
           'Lots of changes to their savings account product. It is terrible.',
           'The new app takes some getting used to but it is good once you learn it']

In [5]:
from sklearn.feature_extraction.text import CountVectorizer
import nltk
nltk.download('punkt')

[nltk_data] Downloading package punkt to
[nltk_data]     /Users/chrisdallavilla/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

In [7]:
remarks_token_counts = CountVectorizer(min_df=1, tokenizer=nltk.word_tokenize)

In [8]:
remarks_as_sparse_vector = remarks_token_counts.fit_transform(remarks)

In [9]:
remarks_token_counts.vocabulary_

{'this': 25,
 'is': 9,
 'the': 23,
 'best': 4,
 'bank': 3,
 'on': 15,
 'planet': 17,
 '.': 0,
 'lots': 12,
 'of': 14,
 'changes': 6,
 'to': 26,
 'their': 24,
 'savings': 19,
 'account': 1,
 'product': 18,
 'it': 10,
 'terrible': 22,
 'new': 13,
 'app': 2,
 'takes': 21,
 'some': 20,
 'getting': 7,
 'used': 27,
 'but': 5,
 'good': 8,
 'once': 16,
 'you': 28,
 'learn': 11}

In [10]:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

In [11]:
analyser = SentimentIntensityAnalyzer()

In [12]:
def sentiment_analyser_scores(sentence):
    score = analyser.polarity_scores(sentence)
    print("{}{}".format(sentence, str(score)))

In [20]:
sentiment_analyser_scores("Best!!!!")

Best!!!!{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}
