#Cognitive Modeling of News Consumption in Social Media

In [None]:
!pip install -U speedyibl

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting speedyibl
  Downloading speedyibl-0.2.3-py3-none-any.whl (13 kB)
Installing collected packages: speedyibl
Successfully installed speedyibl-0.2.3


In [None]:
from speedyibl import Agent
import random 
import numpy as np
import matplotlib.pyplot as plt
from collections import deque

In [None]:
import re
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer
import nltk

In [None]:
wpt = nltk.WordPunctTokenizer()

In [None]:
from collections import deque

In [None]:
nltk.download('stopwords')
stop_words = nltk.corpus.stopwords.words('english')

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


In [None]:
from sklearn.feature_extraction.text import TfidfVectorizer
corpus = [
    'This is the first document.',
    'This document is the second document.',
    'And this is the third one.',
    'Is this the first document?',
]
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(corpus)
vectorizer.get_feature_names_out()


print(X.shape)

(4, 9)


In [None]:
vectorizer.get_feature_names_out()

array(['and', 'document', 'first', 'is', 'one', 'second', 'the', 'third',
       'this'], dtype=object)

The objective here is to build out a model for the process by which beliefs are updated when consuming news through social media feeds. We use data from our initial behavioral experiment that studied the effect of frequency of corrections on misinformation belief.

The idea is that the Agent has a set of beliefs for each category of news. That belief may be closer or more away from the truth. When they read the news, they assess the truthfulness of the claim in a way that aligns with their beliefs. This can be termed as their response. Depending on their response, their beliefs are updated as being closer or away from the truth. For example, a user starts with initial belief of 0.5 in the news category of politics. They are faced with a claim that is not the truth. (`label = 1`) However, they respond that the claim is the truth. (`correct_assessment = 0`) Following this, their beliefs in the `news_category` of poltics will be updated as follows: 
`self.belief['politics'] = (self.belief['politics'] + correct_assessment*(1 - label))/2`

Occasionally, users get feedback on their assessment in our experiment interface. When they do, they will try to reconcile their belief with what they see in their feedback. This depends on how much they trust the fact checking organization (modeled as a binary variable here: 0 for no trust, 1 for trust). If they don't trust the organization, they will not reconsider their beliefs. If they do trust the organization, they will revise their beliefs in that news category. Since belief revisions cannot be an instantaneous process, receiving feedback contrary to their belief when they trust the fact-checking organization, will cause them to revise their belifs a little in that category. 

 `self.belief['politics'] = (self.belief['politics'] + 1 - label)/2`

 ** Consider using Q learning (approach) instead of a direct average **

 When they don't trust the fact-checking organization there is no change in their beliefs when they receive feedback.

 ### Rewards 
When there is no trust:
What kind of reward would I get if I dont trust the organization and I get positive or negative rewards?
(Experiment with 0 rewards, Positive rewards and negative rewards)

When there is trust:
Positive reward when correct assessment is done
Negative reward otherwise

## Actions or Assessments

Try using actions or assessments

In [None]:
all_claims = ["Majority of lawmakers in 116th Congress are millionaires",
"Launch Directional Robot Intelligent Circuitry (LDRIC) golfing robot hits a hole-in-one at TPC Scottsdale golf club.",
"Once you’ve had the novel Coronavirus, you are immune; masks & social distancing do not affect coronavirus transmission; This lockdown was a mistake.",
"Latest on spread. There is growing evidence that droplets and airborne particles can remain suspended in the air &be breathed in by others, and travel distances beyond 6 feet - In general, indoor environments without good ventilation increase risk.",
"GM has pledged to stop making gasoline-powered passenger cars, vans and sport utility vehicles by 2035.",
"To prevent and increase defenses against the #CoronaVirus during the #quarantine, colloidal silver is an #antibiotic with the natural ability to kill fungi, viruses, bacteria and parasites",
"A man with 'Adolf Hitler' in his name was elected to a councilor seat in Namibia.",
"Chronic Traumatic Encephelpathy (CTE) maybe seen in large numbers in players that participated in football for many years. But the exact risk and exact number is not known.",
"The American Medical Association rescinded a previous statement and now says hydroxychloroquine is okay for COVID-19, after Biden was elected.",
"I coughed on two petri dishes, one while wearing a mask and one not. Shocking! The one on the left with no growth is the one I used a mask for! The one on the right where I didn’t wear a mask is full of bacteria. So yeah, maybe cloth masks don’t give you much protection, but they protect the people around you.",
"I blew e-cigarette vapor through a variety of masks, including a surgical-style mask and a cloth mask like those commonly worn to control the spread of COVID-19.  90% of aerosols and viruses from the vapor go right through cloth masks. ",
"Amazon paid 0 federal income tax 2017-2018 & 1% in 2019",
"Hydroxychloroquine for COVID19 is passé. We now know it doesn’t work. ",
"Facts:All major COVID treatments: Anticoagulants, dexamethasone, plasma are generic/cheap",
"The results from a recent German study regarding children and the impacts of mask wearing, involving almost 26,000 children saw negative affects in 24 different physical, behavioral, and psychological complaints in about 68% of these children.",
"Twitter is banned in Iran.",
"Nobody needs to get sick. This virus has a cure - It is called hydroxychloroquine, zinc and Zithromax, I have treated over 350 patients and not had one death.",
"The virus that causes the illness, SARS-CoV-2, gains its foothold by infecting certain nasal cells, studies suggest. ",
"Positivity rate of tests conducted in the US shows that herd immunity has been reached. In fact, it was probably reached in May.",
"What does a mask do? Blocks respiratory droplets coming from your mouth and throat. I sneezed, sang, talked & coughed toward an agar culture plate with or without a mask. Bacteria colonies show where droplets landed. A mask blocks virtually all of them.",
"Pine-Sol cleaner has been approved to kill coronavirus on hard surfaces. ",
"Many countries either adopted or declined early treatment [of COVID-19] with [hydroxychloroquine], forming a large country-randomized controlled trial with 2.0 billion people in the treatment group and 663 million in the control group; The [hydroxychloroquine] treatment group has a 79.1% lower death rate.",
"Research illustrates a clear correlation between vitamin D deficiencies and (higher) COVID-19 mortality rates",
"Gloves shouldn’t be worn in public because if you are wearing the same set of gloves all over town without  changing them you are only spreading germs everywhere you go.",
"Huge—CDC now finally says coronavirus can commonly spread through respiratory droplets or small particles, such as in aerosols, when a person breathes. Airborne viruses, including #COVID19, are among the most contagious",
"The DHS agencies responsible for election security have stated the following: The November 3rd election was the most secure in American history.",
"The cure for the C19 virus or the way to eliminated it was achieved. Information comes from Israel. There this virus did not cause any death. The recipe is simple 1. Lemon 2. Bicarbonate Mix and drink as hot tea every afternoon. These two components alkalize the immune and system, since when night Falls the system becomes acidic and defenses lower.",
"As of October 2020, Convalescent plasma falls flat in first randomized trial",
"Virus protection by mask type: N95 Respirator - 95%, 3 ply / surgical mask - 95%, Sponge mask - 0%, Cotton / cloth mask - 0%. There are people out there selling and wearing fake masks that wont prevent us from catching the coronavirus. The only masks that work are the N95's and surgical ones.",
"Research performed by economists has shown no consistent, positive impact on jobs, income or tax revenues arising from stadiums or sports franchises",
"As of October 2020, meta-analysis on mortality on RCTs which shows: 1.  No efficacy of hydroxychloroquine, 2. No efficacy of remdesivir, lopinavir-ritonavir, or convalescent plasma",
"WHO finally gave the green light for clinical trials of African herbal medicines for potential treatment of Covid-19 during the corona virus pandemic.",
"The coronavirus spreads mainly in the air, through tiny respiratory particles that can apparently linger and be inhaled, CDC now says, accepting findings of aerosol scientists",
"Tesla Hones In on Austin, Texas, for Location of Cybertruck Manufacturing Site.",
"Good News: Coronavirus can be destroyed by consuming Chlorine Dioxide",
"Wearing a mask remains one of the most effective ways to prevent the spread of #COVID19. For our loved ones, those around us, and everyone else, let's continue to #WearAMask",
"A scientific study shows that cannabis is more effective at preventing and treating COVID-19 than hydroxychloroquine.",
"Important supporting evidence that steroids significantly reduce mortality in patients with severe COVID. This is true of hydrocortisone as well as dexamethasone. Safe, cheap, well known drugs and available everywhere. ",
#"Wearing the mask literally activates your own virus. You’re getting sick from your own reactivated coronavirus expressions",
"A detainer request from ICE is not a valid warrant, we county shriffs need not heed to it.",
"Kids are less likely to get COVID-19. You know, they’re less likely to spread. So I think there’s some pretty good data out there that will help alleviate some of the concerns about reopening schools.",
"Amazon, owned by the richest man in the world, is soliciting PUBLIC DONATIONS to pay sick leave for contract drivers.",
"Chrysotile or white asbestos in fake snow was used throughout the 20th century on film sets, in schools and at home in Christmas decorations.  ",
"New York consistently ranks as one of the worst voting turnout states in the nation.",
"The Medical Establishment says there's no way to treat coronavirus? I just did some PubMed digging and found studies on natural COVID killers. Licorice root, Turmeric, Chlorine Dioxide, Betulinic acid and more. Why do scientists ignore science?",
"Monoclonal antibodies, which are potent, lab-made antibodies, will be given to about 2,000 people to see if they are effective against #COVID19. It forms part of the UK Recovery Trial, that found that a cheap steroid called dexamethasone could save lives.",
"Kanye West was Actually Selling Confederate Flag T-Shirts.",
"Hand sanitizer is anti-bacterial. The coronavirus is a virus. A bacteria and a virus is not the same. Wash your hands. Sanitizer will do nothing for the coronavirus. Sincerely, A scientist",
"One person is 300 to 900 times more likely to die after getting the COVID-19 vaccine than the flu vaccine. Adverse events are widely underreported.",
"The Tik Tok-Oracle deal is expected to give Oracle a partnership with TikTok Global, with all data going through their US servers.",
"Masking children triggers mouth breathing, which has been shown to cause narrowing of the face, narrowing of the mouth, high vaults in the palate, dental malocclusions — and the list goes on and on. This is according to the Journal of General Dentistry. They say you shouldn’t wear one.",
"In crowded indoor spaces, Covid can spread widely, but this is much less common than with measles. ",
"Why do you think the coronavirus is not happening in Africa like its happening here? Not a 5G region. There may be a few bases there, but not as prevalent as other countries.",
"A reminder: 1. The coronavirus vaccine can’t give you COVID-19. 2. The mRNA in the vaccine doesn’t become a permanent part of your body. 3. Pfizer, Moderna and J&J all have highly effective and SAFE vaccines to prevent severe COVID-19 infection."]

In [None]:
news_categories = ["politics", "sports", "business", "celebrity", "science-tech", "covid-19"]
all_claim_categories = ["politics", "sports", "covid-19", "covid-19", "business", "covid-19", "politics", "sports", "covid-19", "covid-19", "covid-19", "business", "covid-19", "covid-19", "covid-19", "politics", "covid-19", "covid-19", "covid-19", "covid-19", "covid-19", "covid-19", "covid-19", "covid-19", "covid-19", "politics", "covid-19", "covid-19", "covid-19", "sports", "covid-19", "covid-19", "covid-19", "business", "covid-19", "covid-19", "covid-19", "covid-19", "covid-19", "politics", "covid-19", "business", "science-tech", "politics", "covid-19", "covid-19", "celebrity", "covid-19", "covid-19", "business", "covid-19", "covid-19", "covid-19", "covid-19"]
all_claim_label = [0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0]
all_news_claims = list(map(lambda x: re.sub(r'[^\w\s]', '', x), all_claims))
prepost_count = 15
train_count = 24

NameError: ignored

In [None]:
len(all_claims)

53

In [None]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.decomposition import TruncatedSVD
from sklearn.pipeline import Pipeline
import sklearn
documents = all_news_claims
#Raw documents to tf-idf matrix (or normal count could be done too) 
vectorizer = TfidfVectorizer(stop_words='english', 
                             use_idf=True, 
                             smooth_idf=True)
#SVD for dimensionality reduction
svd_model = TruncatedSVD(n_components=50, algorithm='randomized', n_iter=10)
#Pipe tf-idf and SVD, apply on our input documents
svd_transformer = Pipeline([('tfidf', vectorizer), ('svd', svd_model)])
svd_matrix = svd_transformer.fit_transform(documents)
simmatrix_svd = sklearn.metrics.pairwise.cosine_similarity(svd_matrix,svd_matrix)
def getDocSimLSA(a,b):
    return simmatrix_svd[a,b]
simmethods = {'LSA': getDocSimLSA}
#train_agent(userid,seeds,simmethods['LSA'],train_ratio=train_ratio,survey=key=='Survey')

In [None]:
claimstest = all_news_claims[:2]
import nltk
nltk.download('punkt')
from nltk.corpus import stopwords
nltk.download('stopwords')
from nltk.tokenize import word_tokenize
def clean_text(claim):
  text_token = word_tokenize(claim)
  tokens_without_sw = [word for word in text_token if not word in stopwords.words()]
  return(tokens_without_sw)
clean_text(claimstest[1])

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


['Launch',
 'Directional',
 'Robot',
 'Intelligent',
 'Circuitry',
 'LDRIC',
 'golfing',
 'robot',
 'hits',
 'holeinone',
 'TPC',
 'Scottsdale',
 'golf',
 'club']

In [None]:
#Get User Assessments for about 5 users
A320QA9HJFUOZO = {'assessments': [1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1], 'feedback':[1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0], 'trust': 1}
A11S8IAAVDXCUS = {'assessments': [0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1], 'feedback': [0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1], 'trust': 0}
A3KC26Z78FBOJT = {'assessments': [1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1], 'feedback': [1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0], 'trust': 1}

In [None]:
len(A320QA9HJFUOZO['assessments'])

53

In [None]:
a = [1,2,3,4]
a.extend([5,6])
a

[1, 2, 3, 4, 5, 6]

In [None]:
#based on frequency of usage, select the 10 most important words in each category
#in the training phase, first 12 claims would a collection of keywords from the prephase 
def evaluate_beliefs(user_assessment_array):
  """
  Enter the user's assessment array and the function will return a tuple with two dictionaries.
  The first dictionary will contain the overall belief score of the user in a particular news category.
  The second dictionary will contain the key words that a user associated with the most with news in a particular category.
  """
  pretest_assessments = user_assessment_array[:15]
  pretest_labels = all_claim_label[:15]
  pretest_categories = all_claim_categories[:15]
  pretest_claims = all_news_claims[:15] 
  correct_assessment = [int(pretest_assessments[i] != pretest_labels[i]) for i in range(len(pretest_labels))]
  belief_keywords_dict = {}
  for category in news_categories:
    belief_keywords_dict[category] = {'True':[], 'False':[]}
  for i in range(len(pretest_claims)):
    if pretest_assessments[i] == 0:
      belief_keywords_dict[pretest_categories[i]]['False'].extend(clean_text(pretest_claims[i]))
    else:
      belief_keywords_dict[pretest_categories[i]]['True'].extend(clean_text(pretest_claims[i]))
  return belief_keywords_dict

In [None]:
low_reward = 3
high_reward = 9
low_penalty = -3
high_penalty = -9
def get_reward(trust, choice, truth):
  correct_assessment = choice == truth
  if trust == 0:
    if correct_assessment:
      return low_reward
    else:
      return low_penalty
  else:
    if correct_assessment:
      return high_reward
    else:
      return high_penalty

In [None]:
get_reward(1, 0, 1)

-9

In [None]:
agent_name = "A320QA9HJFUOZO"
agent = Agent(default_utility=4.4)
options = [0,1]
import time # to calculate time
runs = 1000 # number of runs (participants)
trials = 23 # number of trials (episodes)
trial_assessments = eval(agent_name + "['assessments'][37:]")
agent_trust = eval(agent_name + "['trust']")
trial_labels = all_claim_label[37:]
average_p = [] # to store average of performance (proportion of maximum reward expectation choice)
average_time = [] # to save time 
for r in range(runs):
  pmax = []
  ttime = [0]
  agent.reset() #clear the memory for a new run
  #prepopulate with pretest beliefs
  #populate with training choices
  #evaluate posttest decisions
  for i in range(trials):     
    start = time.time()
    choice = agent.choose(options) # choose one option from the list of two
    # store the instance
    agent.respond(reward)
    end = time.time()
    ttime.append(ttime[-1]+ end - start)
    pmax.append(choice == 'B') 
  average_p.append(pmax) # save performance of each run 
  average_time.append(ttime) # save time of each run 