### Sentiment Analysis 

Overall Research Questions:
1) How do customers define 'good' service, and how does the new script shift those definitions?
2) What aspects of service (clarity, empathy, agent personality) drive variance in sentiment?
3) Does the new script systematically change perceptions or emotional tone, particularly for high-value segments like VOLT?

In addition, specific to sentiment analysis:

- Compare:
    - Sentiment distribution between treatment vs. control
    - Strength of sentiment for VOLT vs. general sample
- What topics co-occur with negative or positive sentiment?
- Any shifts in emotional tone driven by the new script?

Since our comment data was already cleaned (but still has some typos), it should be ready for sentiment analysis, using LTR_COMMENT_CLEAN.

An appropriate model is Google Cloud's Natural Language API. This model is quite robust and can handle typos and the extensive nuances of hastily typed, somewhat messy customer feedback, as similar text comprises a significant chuck of its training data.

"Google Cloud's Natural Language API is designed to handle natural, conversational text and can automatically detect language and process informal language patterns."

It's important to note that the text cleaning steps involved in sentiment analysis are far less than in topic modelling, as models such as this one adapt to punctuation, capitalization, and the natural organic communication patterns in our data.

See [API documentation](https://cloud.google.com/natural-language/docs/analyzing-sentiment)

In [7]:
# Load our data 
exec(open('../scripts/setup.py').read()) # load our data/packages

Main dataset loaded: (582, 16)


#### Step 1: Straight run of sentiment using Google Cloud API



In [None]:
# from google.cloud import language_v1
# import pandas as pd
# import numpy as np

# # Instantiate the client. The library will find your credentials automatically.
# client = language_v1.LanguageServiceClient()

# def analyze_sentiment(text):
#     """
#     Analyze sentiment for a given text using Google Cloud Natural Language API
#     Returns tuple of (score, magnitude) or (None, None) for null/empty text
#     """
#     if pd.isna(text) or text is None or str(text).strip() == '':
#         return None, None
    
#     try:
#         # Create the document object
#         document = language_v1.Document(
#             content=str(text),
#             type_=language_v1.Document.Type.PLAIN_TEXT,
#         )
        
#         # Call the API to analyze sentiment
#         sentiment = client.analyze_sentiment(document=document).document_sentiment
        
#         return sentiment.score, sentiment.magnitude
    
#     except Exception as e:
#         print(f"Error analyzing text: {str(text)[:50]}... Error: {e}")
#         return None, None

# # Apply sentiment analysis to the dataframe
# print("Analyzing sentiment for comments...")
# sentiment_results = df['LTR_COMMENT_CLEAN'].apply(analyze_sentiment)

# # Split the results into separate columns
# df['sentiment_score'] = sentiment_results.apply(lambda x: x[0])
# df['sentiment_magnitude'] = sentiment_results.apply(lambda x: x[1])

# print(f"Sentiment analysis complete. Added columns: sentiment_score, sentiment_magnitude")
# print(f"Non-null sentiment scores: {df['sentiment_score'].notna().sum()}")
# print(f"Sample results:")
# print(df[['LTR_COMMENT_CLEAN', 'sentiment_score', 'sentiment_magnitude']].head())

# UNCOMMENT ONLY WHEN YOU WANT TO RUN THE API CALL AGAIN

Analyzing sentiment for comments...
Sentiment analysis complete. Added columns: sentiment_score, sentiment_magnitude
Non-null sentiment scores: 428
Sample results:
                                    LTR_COMMENT_CLEAN  sentiment_score  sentiment_magnitude
45                                       Good package              0.8                  0.8
46                         Very good customer service              0.9                  0.9
47  So far so good. Charlie was very efficient and...              0.7                  3.0
48                                Great communication              0.9                  0.9
49  Because Chris was amazing when she contacted m...              0.5                  0.5
Sentiment analysis complete. Added columns: sentiment_score, sentiment_magnitude
Non-null sentiment scores: 428
Sample results:
                                    LTR_COMMENT_CLEAN  sentiment_score  sentiment_magnitude
45                                       Good package           

In [None]:
# Check sentiment analysis results, the cleaned comment and the sentiment scores, print all results
print("\nFull sentiment analysis results:")
for index, row in df.iterrows():
    print(f"Comment: {row['LTR_COMMENT_CLEAN']}")
    print(f"Sentiment Score: {row['sentiment_score']}, Magnitude: {row['sentiment_magnitude']}")
    print("-" * 80)



Full sentiment analysis results:
Comment: Good package
Sentiment Score: 0.800000011920929, Magnitude: 0.800000011920929
--------------------------------------------------------------------------------
Comment: Very good customer service
Sentiment Score: 0.8999999761581421, Magnitude: 0.8999999761581421
--------------------------------------------------------------------------------
Comment: So far so good. Charlie was very efficient and helpful. Let's hope it continues so. I am confident it will
Sentiment Score: 0.699999988079071, Magnitude: 3.0
--------------------------------------------------------------------------------
Comment: Great communication
Sentiment Score: 0.8999999761581421, Magnitude: 0.8999999761581421
--------------------------------------------------------------------------------
Comment: Because Chris was amazing when she contacted me after I had put my details online.
Sentiment Score: 0.5, Magnitude: 0.5
------------------------------------------------------------

Sentiment score: ranges between -1.0 (negative) and 1.0 (positive) and corresponds to the overall emotional leaning of the text.

Magnitude: indicates the overall strength of emotion (both positive and negative) within the given text, between 0.0 and +inf. Unlike score, magnitude is not normalized for documentSentiment; each expression of emotion within the text (both positive and negative) contributes to the text's magnitude (so longer text blocks may have greater magnitudes)

The combination of the above two measures would allow:
- Transformation into a binary positive vs. negative encoding
- Use of a continuous sentiment measure
- Combination with magnitude enables a nuanced analysis of the informative longer comments that likely have more emotion attached to them (i.e. customers that took the time to write out a more lengthy response)

In [None]:
# Rename the columns for clarity to be capitalized
df.rename(columns={
    'LTR_COMMENT_CLEAN': 'LTR_COMMENT_CLEAN',
    'sentiment_score': 'SENTIMENT_SCORE',
    'sentiment_magnitude': 'SENTIMENT_MAGNITUDE'
}, inplace=True)

In [None]:
# Save our cleaned data as pickle
df.to_pickle('../data/processed/sentiment_analysis_results.pkl')