<img width="10%" alt="Naas" src="https://landen.imgix.net/jtci2pxwjczr/assets/5ice39g4.png?w=160"/>

# LinkedIn - Get sentiment, emotion, irony and offensiveness from comments
<a href="" target="_parent"><img src="https://naasai-public.s3.eu-west-3.amazonaws.com/open_in_naas.svg"/></a>

**Tags:** #linkedin #nlp #transformers #ai #post #comments #naas_drivers #content #snippet #dataframe

**Author:** [Nikolaj Groeneweg](https://www.linkedin.com/in/njgroene/)

## About this template

### What it does

This notebook gets all the comments on a LinkedIn post, and performs sentiment analysis, emotion classification and some semantic analysis on them. 
It classifies each comment and returns the following information:

- is the comment postive, negative or neutral?
- is the comment ironic?
- is the comment offensive?
- does the comment express joy, optimism, anger or sadness?


### References

This template is based on the following work :

F. Barbieri, J. Camacho-Collados, L. Neves and L.E. Anke (2020), *TweetEval: Unified Benchmark and Comparative Evaluation for Tweet Classification*, CoRR abs/2010.12421. Full paper : https://arxiv.org/abs/2010.12421. Official github : https://github.com/cardiffnlp/tweeteval

All credit goes to the above authors, any mistakes are on the author of this template. 
  

### Disclaimers

The machine learning models used in this template were trained on datasets of tweets.<br>Details can be found here : https://github.com/cardiffnlp/tweeteval/blob/main/README.md. 

These models may be expected to work well on shorter comments, but their output will become less reliable as the length of the text increases. 

The "emotion" classification performs rather unpredictably on very short comments. For many neutral comments it defaults to "joy", there being neutral category). It is most useful to identify and filter for negative emotions (sadness) and to identify interesting commments (optimism).

In general, caution is recommended when integrating these classifications in any automated decision pipeline.

## Input

### Import libraries

In [2]:
from naas_drivers import linkedin
from transformers import pipeline
from transformers import AutoModelForSequenceClassification
from transformers import TFAutoModelForSequenceClassification
from transformers import AutoTokenizer
import numpy as np
from scipy.special import softmax
import csv
import urllib.request
import os.path

### Setup LinkedIn

### Get your cookies
<a href='https://www.notion.so/LinkedIn-driver-Get-your-cookies-d20a8e7e508e42af8a5b52e33f3dba75'>How to get your cookies ?</a>

In [3]:
LI_AT = 'AQEDAQBbOD8F3bbOAAABgFyo1A4AAAGBlI12_k4AMAx8JsQTmkTc4fRe97N1xYlyPqfpjKh_D5Gk0gBIbxAhoJmyBL1ElGXP4ZW-Uiy5KwHMP6bqZmaXXZzjsetwrEJbaIddmanadE7w2w4TPRQJeujG'  # EXAMPLE AQFAzQN_PLPR4wAAAXc-FCKmgiMit5FLdY1af3-2
JSESSIONID = 'ajax:4652558780091658889'  # EXAMPLE ajax:8379907400220387585

### Enter post URL

In [23]:
POST_URL = "https://www.linkedin.com/posts/owenminde_with-30-year-fixed-mortgage-rates-moving-activity-6942758528172396545-RiZW"

## Model

Get the post comments and return them in a dataframe. <br> Colums are added for classifier output.<br>

**Available columns after classification :**
- PROFILE_URN : LinkedIn unique profile id
- PROFILE_ID : LinkedIn public profile id
- FIRSTNAME
- LASTNAME
- TEXT
- SENTIMENT
- SENTIMENT_SCORE
- IRONY
- IRONY_SCORE
- OFFENSIVE
- OFFENSIVE_SCORE
- EMOTION
- EMOTION_SCORE
- OCCUPATION
- ACTIVITY_COMMENTS
- ACTIVITY_LIKES
- DISTANCE
- POST_URL

In [None]:
df = linkedin.connect(LI_AT, JSESSIONID).post.get_comments(POST_URL)
# add columns for classification output
df.insert(loc=11, column='SENTIMENT', value=None)
df.insert(loc=12, column='SENTIMENT_SCORE', value=None)
df.insert(loc=13, column='IRONY', value=None)
df.insert(loc=14, column='IRONY_SCORE', value=None)
df.insert(loc=15, column='OFFENSIVE', value=None)
df.insert(loc=16, column='OFFENSIVE_SCORE', value=None)
df.insert(loc=17, column='EMOTION', value=None)
df.insert(loc=18, column='EMOTION_SCORE', value=None)

In [25]:
def preprocess(text):
    """Preprocess text to be classified
    Replaces user-tags and URLs with neutral token
    """
    new_text = []
    for t in text.split(" "):
        t = '@user' if t.startswith('@') and len(t) > 1 else t
        t = 'http' if t.startswith('http') else t
        new_text.append(t)
    return " ".join(new_text)

def classify(text, task, tokenizers, models, task_labels):
    """Classifies text using task classifier
    with the corresponding tokenizer and model
    :return: dictionary with winning label and corresponding score
    """
    text = preprocess(text)
    tokenizer = tokenizers[task]
    model = models[task]
    labels = task_labels[task]
    encoded_input = tokenizer(text, return_tensors='pt')
    output = model(**encoded_input)
    scores = output[0][0].detach().numpy()
    scores = softmax(scores)
    ranking = np.argsort(scores)
    ranking = ranking[::-1]
    idx = ranking[0]
    label = str(labels[idx])
    score =np.round(float(scores[idx]), 4)
    return {"label":label, "score":score}

In [26]:
# selected subset of available tasks
tasks = ["sentiment", "emotion", "irony", "offensive"]
# these labels are slightly modified to improve readibility
labels = {"sentiment":['negative', 'neutral', 'positive'], "emotion":['anger', 'joy', 'optimism', 'sadness'], "irony":['not-ironic', 'ironic'], "offensive":['not-offensive', 'offensive']}
# models and tokenizers will be loaded from huggingface
models = {}
tokenizers = {}

# perform each of the classifications and enrich dataframe
for task in tasks:
    MODEL = f"cardiffnlp/twitter-roberta-base-{task}"
    # on first run, tokenizer and model are loaded from hugging face
    tokenizer = AutoTokenizer.from_pretrained(MODEL)
    model = AutoModelForSequenceClassification.from_pretrained(MODEL)  
    
    # save tokenizer and models to local disk
    tokenizer.save_pretrained(MODEL)
    model.save_pretrained(MODEL)
    
    models[task] = model
    tokenizers[task] = tokenizer
    
    # execution time is not a concern, so we can just use .apply() to apply classifier
    result = df["TEXT"].apply(classify, args=(task,tokenizers, models, labels))
    
    # keep only winning label and score, inspect result to see full classifier output
    df[str.upper(task)] = [d['label'] for d in result]
    df[str.upper(task)+"_SCORE"] = [d['score'] for d in result]

## Output

### Display result

In [27]:
# shows only text and classification output
for index, row in df.iterrows():
    print(f"{row['TEXT']}\n\t{row['SENTIMENT']}({row['SENTIMENT_SCORE']})\n\t{row['IRONY']}({row['IRONY_SCORE']})\n\t{row['OFFENSIVE']}({row['OFFENSIVE_SCORE']})\n\t{row['EMOTION']}({row['EMOTION_SCORE']})\n\t\n\n")

it's still time for Europeans to catch the boat 🛥
	neutral(0.731)
	not-ironic(0.8926)
	not-offensive(0.8844)
	joy(0.4769)
	


😀 I like that analogy. I got a 5%, 25 year fixed rate circa 2008 and thought it was the best financial decision I had ever made (remembering the scars of 15% in 1990.)  I took me about 10 years to abandon ship.   Still, if I went back in time with the knowledge I had at the then I would still make the same mistake again. If I were a first time buyer starting out now, I would jump at the chance of getting a 30 year fixed rate mortgage at 3% and borrow as much as I could possibly afford.
	neutral(0.4571)
	not-ironic(0.671)
	not-offensive(0.8953)
	sadness(0.5962)
	


Unfortunately it doesn’t usually work that way.  Prices will react since cash flow is what matters, not last traded price, when applying for a loan.  If the next buyer can only pay 20% less, you now can’t sell.  But life happens.  The world isn’t made up of stable, life time Employment IBM jobs with 2.