# **How to Use Riveter** 💪

This Jupyter notebook will demonstrate how to use the Riveter package to measure social dynamics between personas mentioned in a collection of texts.

The package identifies and extracts the subjects, verbs, and direct objects in texts; it performs coreference resolution on the personas mentioned in the texts (e.g., clustering "Elizabeth Bennet" and "she" together as one persona); and it measures social dynamics between the personas by referencing a given lexicon. The package currently includes Maarten Sap et al's lexicon for power and agency and Rashkin et al's lexicon for perspective, effect, value, and mental state.

## **Set up the notebook**

### **For Google Colab** (skip this if you run the notebook locally on your computer)

First, we need to connect to your Google Drive, so that we have a place to store the Riveter code.

When you run the following cell, a pop-up should appear asking for permission to connect to your Google Drive. Select the Google account that you want to use and accept the conditions.

In [None]:
from google.colab import drive
drive.mount('/content/gdrive')

Now specify a place in your Google Drive where you'd like to save the output of this notebook. If you use a custom folder, you'll need to first create that folder in your Google Drive.

In [None]:
colab_directory_path = '/content/gdrive/MyDrive/riveter-test'

In [None]:
%mkdir /content/gdrive/MyDrive/riveter-test

Let's go inside that folder.

In [None]:
%cd /content/gdrive/MyDrive/riveter-test

And now let's download the Riveter code and data from its Github repository. Because we're inside the Google Drive folder, the code will be saved to that location.

In [None]:
! git clone https://github.com/maartensap/riveter-nlp.git

You can check that the download was successful by printing the contents of your Google Drive folder.

In [None]:
%ls

Now let's move inside the Riveter folder, so that we'll be able to import the functions we want to use.

In [None]:
%cd /content/gdrive/MyDrive/riveter-test/riveter-nlp/riveter

### **For a local Juyter notebook**

Let's download the Riveter code and data from its Github repository.

In [5]:
! git clone https://github.com/maartensap/riveter-nlp.git

Cloning into 'riveter-nlp'...


You can check that the download was successful by printing the contents of the folder your in.

In [6]:
%ls

 Volume in drive C is OS
 Volume Serial Number is 5621-178A

 Directory of C:\Users\bente\Documents\Master\Analysing_Data\A2\riveter-nlp\riveter

08-03-2024  15:00    <DIR>          .
08-03-2024  14:12    <DIR>          ..
08-03-2024  15:00    <DIR>          .ipynb_checkpoints
08-03-2024  11:56               241 __init__.py
08-03-2024  14:10    <DIR>          __pycache__
08-03-2024  14:41         2.793.258 AD_Assignment_2.ipynb
08-03-2024  11:30           650.673 BTS_short_100.csv
08-03-2024  11:56    <DIR>          data
08-03-2024  11:56           649.389 demo.ipynb
08-03-2024  11:56           110.277 demo-Little-Red-Cap.ipynb
08-03-2024  11:56           584.776 demo-NYT-Obits.ipynb
08-03-2024  11:56            19.472 demo-reorganized.ipynb
08-03-2024  11:56            29.118 riveter.py
08-03-2024  14:41            35.117 Riveter_demo.ipynb
08-03-2024  15:00    <DIR>          riveter-nlp
08-03-2024  14:58            22.452 test
08-03-2024  11:56            10.099 test_suite.py
       

Now let's move inside the Riveter folder, so that we'll be able to import the functions we want to use.

In [8]:
%cd riveter-nlp/riveter

C:\Users\bente\Documents\Master\Analysing_Data\A2\riveter-nlp\riveter\riveter-nlp\riveter


##**Install needed packages**

Finally, we'll install some spaCy models and Python packages to support Riveter.

In [9]:
!pip install -U spacy-experimental
# This will downlaod ~500Mb of data
!pip install https://github.com/explosion/spacy-experimental/releases/download/v0.6.0/en_coreference_web_trf-3.4.0a0-py3-none-any.whl#egg=en_coreference_web_trf
!python -m spacy download en_core_web_sm

Collecting en-coreference-web-trf==3.4.0a0
  Downloading https://github.com/explosion/spacy-experimental/releases/download/v0.6.0/en_coreference_web_trf-3.4.0a0-py3-none-any.whl (490.3 MB)
     ---------------------------------------- 0.0/490.3 MB ? eta -:--:--
     ---------------------------------------- 0.0/490.3 MB ? eta -:--:--
     -------------------------------------- 0.0/490.3 MB 660.6 kB/s eta 0:12:23
     ---------------------------------------- 0.1/490.3 MB 1.4 MB/s eta 0:05:40
     ---------------------------------------- 1.1/490.3 MB 8.7 MB/s eta 0:00:57
     --------------------------------------- 2.4/490.3 MB 14.1 MB/s eta 0:00:35
     --------------------------------------- 3.5/490.3 MB 15.9 MB/s eta 0:00:31
     --------------------------------------- 4.7/490.3 MB 17.7 MB/s eta 0:00:28
     --------------------------------------- 5.7/490.3 MB 18.3 MB/s eta 0:00:27
      -------------------------------------- 7.0/490.3 MB 19.5 MB/s eta 0:00:25
      -------------------

  You can safely remove it manually.
  You can safely remove it manually.
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
nl-core-news-sm 3.7.0 requires spacy<3.8.0,>=3.7.0, but you have spacy 3.4.4 which is incompatible.


Collecting en-core-web-sm==3.4.1
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.4.1/en_core_web_sm-3.4.1-py3-none-any.whl (12.8 MB)
     ---------------------------------------- 0.0/12.8 MB ? eta -:--:--
     ---------------------------------------- 0.0/12.8 MB ? eta -:--:--
     --------------------------------------- 0.0/12.8 MB 487.6 kB/s eta 0:00:27
      --------------------------------------- 0.3/12.8 MB 2.3 MB/s eta 0:00:06
     --- ------------------------------------ 1.3/12.8 MB 7.3 MB/s eta 0:00:02
     ------- -------------------------------- 2.4/12.8 MB 10.8 MB/s eta 0:00:01
     ----------- ---------------------------- 3.6/12.8 MB 13.6 MB/s eta 0:00:01
     --------------- ------------------------ 4.8/12.8 MB 15.4 MB/s eta 0:00:01
     ------------------ --------------------- 5.8/12.8 MB 16.7 MB/s eta 0:00:01
     --------------------- ------------------ 6.8/12.8 MB 17.3 MB/s eta 0:00:01
     ------------------------ ------------

In [10]:
# This will downlaod ~500Mb of data
!pip install https://github.com/explosion/spacy-experimental/releases/download/v0.6.0/en_coreference_web_trf-3.4.0a0-py3-none-any.whl#egg=en_coreference_web_trf

Collecting en-coreference-web-trf==3.4.0a0
  Using cached https://github.com/explosion/spacy-experimental/releases/download/v0.6.0/en_coreference_web_trf-3.4.0a0-py3-none-any.whl (490.3 MB)


In [11]:
!python -m spacy download en_core_web_sm

Collecting en-core-web-sm==3.4.1
  Using cached https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.4.1/en_core_web_sm-3.4.1-py3-none-any.whl (12.8 MB)
[+] Download and installation successful
You can now load the package via spacy.load('en_core_web_sm')


In [None]:
!pip install seaborn

<br><br><br><br>

## **Import libraries**

In [4]:
from collections import defaultdict
import os
import pandas as pd
import random
from riveter import Riveter

import seaborn as sns
import matplotlib.pyplot as plt

<br><br><br><br>

## **Small demo with Sap et al's Power and Agency Lexicon**

Here are three example stories that we will use to measure power and agency between personas:
> 1. I was just thinking about walking down the street, when my shoelace snapped. I had to call my doctor to pick me up. I felt so bad I also called my friend Katie, who came in her car. She was a lifesaver. My friend Jack is nice.

> 2. My doctor fixed my shoe. I thanked him. Then Susan arrived. Now she is calling the doctor too.

> 3. She went to the store. She thanked the doctor.

In [5]:
example_stories = ["I was just thinking about walking down the street, when a car hit a tree. I had to call my doctor to pick me up. I felt so bad I also called Katie, who came in her car. She saved me.",
                   "My doctor fixed my shoe. I thanked him. Then Susan arrived. Now she is calling the doctor too.",
                   "Mary went to the store. She thanked the doctor. He called her. He replied that it was no problem.",
                   "Jack broke the vase."]
text_ids = [0, 1, 2, 3]

<br><br>

### Load lexicon ("power") and train model

In [6]:
riveter = Riveter()
riveter.load_sap_lexicon('power')
riveter.train(example_stories,
              text_ids)

100%|██████████| 4/4 [00:01<00:00,  3.75it/s]

2024-03-08 15:08:16 Complete!





### Get total scores for all documents

You can use the `.get_score_totals()` function to get cumulative power scores (or chosen dynamic scores) for each persona mentioned in the texts.

In [7]:
riveter.get_score_totals(frequency_threshold=1)

{'i': -0.3,
 'she': 0.3333333333333333,
 'car': 1.0,
 'tree': -1.0,
 'doctor': 0.5,
 'mary': 0.0,
 'jack': 1.0,
 'vase': -1.0}

We can see all the contributing verbs for each persona and whether they contributed positively or negatively.

In [8]:
riveter.get_persona_polarity_verb_count_dict()

{'she': defaultdict(<function riveter.default_dict_int()>,
             {'positive': defaultdict(int, {'save_nsubj': 1})}),
 'car': defaultdict(<function riveter.default_dict_int()>,
             {'positive': defaultdict(int, {'hit_nsubj': 1})}),
 'i': defaultdict(<function riveter.default_dict_int()>,
             {'positive': defaultdict(int, {'call_dobj': 1}),
              'negative': defaultdict(int,
                          {'pick_dobj': 1,
                           'save_dobj': 1,
                           'thank_nsubj': 1,
                           'fix_dobj': 1})}),
 'tree': defaultdict(<function riveter.default_dict_int()>,
             {'negative': defaultdict(int, {'hit_dobj': 1})}),
 'doctor': defaultdict(<function riveter.default_dict_int()>,
             {'positive': defaultdict(int,
                          {'fix_nsubj': 1, 'thank_dobj': 2, 'call_dobj': 1}),
              'negative': defaultdict(int, {'call_nsubj': 1})}),
 'mary': defaultdict(<function riveter.defa

And we can view these scores visually for each persona.

In the following heatmap, each cell contains the count of matching verbs for the persona that contributed either positively or negatively to its final score.

(See examples later in the notebook for entities with more verbs).

In [6]:
#riveter.plot_verbs_for_persona('i', figsize=(2, 4))

We can save the Riveter object for later, and reload it when we want to use it again.

In [9]:
riveter.save('test')

Riveter successfully saved to "test"


You can also see what words or names have been clustered together for each persona (performed by the coreference resolution model).

In [10]:
riveter.get_persona_cluster('doctor')

{'my doctor fixed': 1,
 'him.': 1,
 'the doctor too': 1,
 'the doctor.': 1,
 'he called': 1,
 'he replied': 1}

In [11]:
riveter.get_persona_cluster('susan')

{'susan arrived': 1, 'she is': 1}

In [12]:
riveter.get_persona_cluster('i')

{'i was': 1,
 'i had': 1,
 'my doctor': 2,
 'me up': 1,
 'i felt': 1,
 'i also': 1,
 'me.': 1,
 'my shoe': 1,
 'i thanked': 1}

### Plot scores for all documents

You can use the `.plot_scores()` function to display a bar plot with the top *n* or bottom *n* personas in the texts. The function will display the top 10 highest-scoring personas by default.

In [None]:
riveter.plot_scores(frequency_threshold=0)

In [None]:
riveter.plot_scores(2)

In [None]:
riveter.plot_scores(-2)

### Get scores, subjects, and direct objects for each document

You can use the `.get_score_for_doc()` function to get power scores (or chosen dynamic scores) for each persona mentioned in a document (based on the document id passed into the training function).

In [13]:
example_stories[0]

'I was just thinking about walking down the street, when a car hit a tree. I had to call my doctor to pick me up. I felt so bad I also called Katie, who came in her car. She saved me.'

In [14]:
riveter.get_scores_for_doc(0)

{'i': -0.2, 'she': 1.0, 'car': 1.0, 'tree': -1.0}

You can use `.count_nsubj_for_doc()` to get all noun subjects and verb pairs (regardless of whether the verb apperas in the chosen lexicon) for each document.

In [15]:
riveter.count_nsubj_for_doc(0)

{('i', 'have'): 1, ('i', 'feel'): 1, ('she', 'save'): 1, ('car', 'hit'): 1}

You can use `.count_dobj_for_doc()` to get all direct objects and verb pairs (regardless of whether the verb apperas in the chosen lexicon) for each document.

In [16]:
riveter.count_dobj_for_doc(0)

{('i', 'call'): 1, ('i', 'pick'): 1, ('i', 'save'): 1, ('tree', 'hit'): 1}

### Plot scores for each document

You can use the `.plot_scores_for_doc()` function to display a bar plot with the top *n* or bottom *n* personas in a specified document. The function will display the top 10 highest-scoring personas by default.

In [None]:
riveter.plot_scores_for_doc(0)

### Use regular expressions in place of entity extraction and coreference resolution

Instead of discovering all the entities automatically, you can define a regular expression to capture patterns for each entity. You will need to determine these entities and patterns yourself.

In [1]:
riveter.train(example_stories,
              text_ids,
              persona_patterns_dict={'first person singular': r'^hi$|^me$'})

NameError: name 'riveter' is not defined

In [18]:
riveter.get_score_totals()

{'first person singular': -0.5714285714285714}

### Use agency lexicon instead of power

In [19]:
riveter = Riveter()
riveter.load_sap_lexicon('agency')
riveter.train(example_stories,
              text_ids)

100%|██████████| 4/4 [00:00<00:00,  4.43it/s]

2024-03-08 15:08:58 Complete!





In [20]:
riveter.get_score_totals()

{'i': 0.1,
 'she': 0.3333333333333333,
 'car': 1.0,
 'tree': 0.0,
 'doctor': 0.16666666666666666,
 'susan': 0.0,
 'mary': 0.3333333333333333,
 'jack': 1.0,
 'vase': 0.0}

<br><br><br><br>

## **Small demo w/ Rashkin frames**

This example is similar to the first example, but here, we'll use Rashkin et al's sentiment frames instead of the power and agency frames we used above.

In [None]:
example_stories = ["I was just thinking about walking down the street, when my shoelace snapped. I had to call my doctor to pick me up. I felt so bad I also called my friend Katie, who came in her car. She was a lifesaver. My friend Jack is nice.",
                   "My doctor fixed my shoe. I thanked him. Then Susan arrived. Now she is calling the doctor too."]
text_ids = [0, 1]

In [None]:
riveter = Riveter()
riveter.load_rashkin_lexicon('effect')
riveter.train(example_stories,
              text_ids)

In [None]:
riveter.get_score_totals()

In [None]:
riveter.get_scores_for_doc(0)

In [None]:
riveter.get_scores_for_doc(1)

In [None]:
riveter = Riveter()
riveter.load_rashkin_lexicon('value')
riveter.train(example_stories,
              text_ids)

In [None]:
riveter.get_score_totals()

<br><br><br><br>

## **Small demo w/ custom frames**

This example is similar to the first example, but here, we'll use a custom lexicon. You can follow this pattern to use your own lexicon.

(In this case, our "custom lexicon" is actually just a copy of the Rashkin lexicon.)

In [None]:
example_stories = ["I was just thinking about walking down the street, when my shoelace snapped. I had to call my doctor to pick me up. I felt so bad I also called my friend Katie, who came in her car. She was a lifesaver. My friend Jack is nice.",
                   "My doctor fixed my shoe. I thanked him. Then Susan arrived. Now she is calling the doctor too."]
text_ids = [0, 1]

In [None]:
riveter = Riveter()
riveter.load_custom_lexicon(lexicon_path='data/example-custom-lexicon/full_frame_info.tsv',
                            verb_column='verb',
                            agent_column='effect(a)',
                            theme_column='effect(t)')
riveter.train(example_stories,
              text_ids)

In [None]:
riveter.get_score_totals()

In [None]:
riveter.get_scores_for_doc(0)

In [None]:
riveter.get_scores_for_doc(1)

In [None]:
riveter = Riveter()
riveter.load_rashkin_lexicon('value')
riveter.train(example_stories,
              text_ids)

In [None]:
riveter.get_score_totals()

<br><br><br><br>

## **Bigger demo w/ Sap frames for _Pride and Prejudice_**

Here we show a larger, more realistic example for the Jane Austen novel.

We'll try both discovering entities automatically and using regular expressions to capture pronoun groups.

### Load data

In [None]:
lines = [_line.strip() for _line in open('data/pride_and_prejudice.txt', 'r')]
texts = ['']
for _line in lines:
    if _line.strip():
        texts[-1] += ' ' + _line
    else:
        texts[-1] = texts[-1].strip()
        texts.append('')
text_ids = [i for i, t in enumerate(texts)]

len(texts), len(text_ids)

### Explore using discovered entities (using entity recognition and coreference resolution)

This cell will take a while to run. (A few minutes on an M2 Macbook, ~30 minutes on Colab.)

In [None]:
riveter = Riveter()
riveter.load_sap_lexicon('power')
riveter.train(texts,
              text_ids,
              num_bootstraps=10)

To avoid re-training every time we run the notebook, we can save the Riveter model for later.

In [None]:
riveter.save(colab_directory_path + '/riveter.pride_and_prejudice.pkl')

In [None]:
persona_score_dict = riveter.get_score_totals(frequency_threshold=20)
len(persona_score_dict)

In [None]:
for _persona, _score in sorted(persona_score_dict.items(), key=lambda x: x[1], reverse=True):
    print(round(_score, 2), '\t', _persona)

In [None]:
persona_polarity_verb_count_dict = riveter.get_persona_polarity_verb_count_dict()
len(persona_polarity_verb_count_dict)

In [None]:
persona_polarity_verb_count_dict['mr. gardiner']

In [None]:
persona_polarity_verb_count_dict['elizabeth']

In [None]:
persona_polarity_verb_count_dict['darcy']

In [None]:
riveter.plot_scores(title='Personas by Score', target_personas=['elizabeth', 'darcy'], figsize=(3, 3))

In [None]:
riveter.plot_scores(title='',
                    frequency_threshold=10,
                    number_of_scores=10,
                    figsize=(6,3.5),
                    output_path=colab_directory_path + '/barplot.pride_and_prejudice.most_power.pdf')

In [None]:
riveter.plot_scores(title='',
                    frequency_threshold=10,
                    number_of_scores=-10,
                    figsize=(6,3),
                    output_path=colab_directory_path + '/barplot.pride_and_prejudice.least_power.pdf')

In [None]:
riveter.plot_verbs_for_persona('elizabeth', figsize=(2,6), output_path=colab_directory_path + '/heatmap.pride_and_prejudice.elizabeth.pdf')

In [None]:
riveter.plot_verbs_for_persona('lizzy', figsize=(2,4), output_path=colab_directory_path + '/heatmap.pride_and_prejudice.lizzy.pdf')

In [None]:
riveter.plot_verbs_for_persona('mr. darcy', figsize=(2,6), output_path=colab_directory_path + '/heatmap.pride_and_prejudice.mrdarcy.pdf')

In [None]:
riveter.plot_verbs_for_persona('sir william', figsize=(2,6), output_path=colab_directory_path + '/heatmap.pride_and_prejudice.sirwilliam.pdf')

In [None]:
riveter.plot_verbs_for_persona('charlotte', figsize=(2,6), output_path=colab_directory_path + '/heatmap.pride_and_prejudice.misslucas.pdf')

In [None]:
matched_ids, matched_texts = riveter.get_documents_for_persona('charlotte')
for t in matched_texts[:10]:
    print(t)
    print()

### Explore using regular expressions to capture pronouns

In [None]:
riveter = Riveter()
riveter.load_sap_lexicon('power')
riveter.train(texts,
             text_ids,
             num_bootstraps=20,
             persona_patterns_dict={'masculine': r'^he$|^him$|^himself$',
                                    'feminine': r'^she$|^her$|^herself$',
                                    'third plural': r'^they$|^them$|^themselves$'})

In [None]:
persona_score_dict = riveter.get_score_totals()
len(persona_score_dict)

In [None]:
for _persona, _score in sorted(persona_score_dict.items(), key=lambda x: x[1], reverse=True):
    print(round(_score, 3), '\t', _persona)

In [None]:
persona_polarity_verb_count_dict = riveter.get_persona_polarity_verb_count_dict()
len(persona_polarity_verb_count_dict)

In [None]:
riveter.plot_scores(number_of_scores=10,
                    title='',
                    target_personas=['masculine', 'feminine', 'third plural'],
                    figsize=(3,3),
                    output_path=colab_directory_path + '/barplot.prideandprejudice.pronouns.pdf')

In [None]:
riveter.plot_verbs_for_persona('masculine', figsize=(3,7), output_path=colab_directory_path + '/heatmap.prideandprejudice.masculine.pdf')

In [None]:
riveter.plot_verbs_for_persona('feminine', figsize=(3,7), output_path=colab_directory_path + '/heatmap.prideandprejudice.feminine.pdf')

In [None]:
riveter.plot_verbs_for_persona('third plural', figsize=(3,7), output_path=colab_directory_path + '/heatmap.prideandprejudice.thirdplural.pdf')

In [None]:
matched_ids, matched_texts = riveter.get_documents_for_persona('feminine')
for t in matched_texts[:10]:
    print(t)
    print()

In [None]:
matched_ids, matched_texts = riveter.get_documents_for_persona('masculine')
for t in matched_texts[:10]:
    print(t)
    print()

In [None]:
matched_ids, matched_texts = riveter.get_documents_for_verb('hear')
for t in matched_texts[:10]:
    print(t)
    print()