# Examining orientation of justice utterances in US Supreme Court oral arguments

This notebook illustrates how the Expected Context Framework can be used to derive a property of utterances and terms, orientation, detailed in [this paper](https://www.cs.cornell.edu/~cristian/Orientation_files/orientation-forwards-backwards.pdf) and [this dissertation](https://tisjune.github.io/research/dissertation). Orientation quantifies the extent to which a term or utterance aims at advancing a conversation forwards or addressing backwards. We originally used it to analyze counselor utterances in crisis counseling conversations.
Here, we demonstrate how orientation can be computed on a public dataset, transcripts of oral arguments from the US Supreme Court; for this notebook, we focus on characterizing utterances from justices. See [this dissertation](https://tisjune.github.io/research/dissertation) for more comments on the below analyses.

We can draw a loose parallel between the oral argument and counseling settings, in that both involve _asymmetric_ conversations, taking part between interlocutors playing different roles (justice vs lawyer, counselor vs individual seeking help). Beyond this parallel, there are some notable differences which perhaps lead to less interpretable output in this setting: justices and counselors have very different goals, the language used in this setting is less structured than that used in the counseling setting, utterances here are much more varied in length. We encourage future work to tinker with the approach presented here, and to consider other ways to examine interactional dynamics.

In [1]:
import warnings
warnings.filterwarnings('ignore')

In [2]:
import pandas as pd
import numpy as np
import math
import os

## 1. Loading and preprocessing the dataset

In [3]:
from convokit import Corpus
from convokit import download

In [4]:
# OPTION 1: DOWNLOAD CORPUS 
# UNCOMMENT THESE LINES TO DOWNLOAD CORPUS
# DATA_DIR = '<YOUR DIRECTORY>'
# SCOTUS_CORPUS_PATH = download('supreme-corpus', data_dir=DATA_DIR)

# OPTION 2: READ PREVIOUSLY-DOWNLOADED CORPUS FROM DISK
# UNCOMMENT THIS LINE AND REPLACE WITH THE DIRECTORY WHERE THE TENNIS-CORPUS IS LOCATED
# SCOTUS_CORPUS_PATH = '<YOUR DIRECTORY>'

In [7]:
scotus_corpus = Corpus(SCOTUS_CORPUS_PATH)

In [8]:
scotus_corpus.print_summary_stats()

Number of Speakers: 8979
Number of Utterances: 1700789
Number of Conversations: 7817


We represent justice and lawyer utterances as dependency-parse arcs, which we load as preprocessed features (which we include with the data release). We also load tokenized versions of these utterances to facilitate taking word-counts.

In [9]:
scotus_corpus.load_info('utterance',['arcs','tokens'])

We will restrict the training data we use to sufficiently long utterances; to facilitate this, we compute the number of words in each utterance:

In [10]:
from convokit.text_processing import TextProcessor
wordcounter = TextProcessor(input_field='tokens', output_field='wordcount',
           proc_fn=lambda x: len(x.split()))
scotus_corpus = wordcounter.transform(scotus_corpus)

In applying the framework, we need to associate utterances with their replies, so we store the ID of the reply to an utterance in its metadata:

In [11]:
for ut in scotus_corpus.iter_utterances(selector=lambda x: x.reply_to is not None):
    scotus_corpus.get_utterance(ut.reply_to).meta['next_id'] = ut.id

### Constructing the training data

As training data, we consider a subset of justice and lawyer utterances (labeled in the utterance metadata as `J` and `A` respectively). We ultimately wish to characterize justice utterances, using the previous and subsequent lawyer utterances as conversational context.

We make the following filtering decisions:
* We only consider lawyer utterances between 10 and 75 words long;
* We only consider justice utterances between 10 and 50 words long, and that occurred between sufficiently long lawyer utterances.

In [12]:
min_wc = 10
max_wc = 50
min_wc_context = 10
max_wc_context = 75

In [13]:
for ut in scotus_corpus.iter_utterances():
    ut.meta['is_valid_context'] = (ut.meta['speaker_type'] == 'A')\
        and (ut.meta['arcs'] != '')\
        and (ut.meta['wordcount'] >= min_wc_context)\
        and (ut.meta['wordcount'] <= max_wc_context)        
for ut in scotus_corpus.iter_utterances():
    if ('next_id' not in ut.meta) or (ut.reply_to is None): 
        ut.meta['is_valid_utt'] = False
    else:
        ut.meta['is_valid_utt'] = (ut.meta['speaker_type'] == 'J')\
            and (ut.meta['arcs'] != '')\
            and (ut.meta['wordcount'] >= min_wc)\
            and (ut.meta['wordcount'] <= max_wc)\
            and scotus_corpus.get_utterance(ut.meta['next_id']).meta['is_valid_context']\
            and scotus_corpus.get_utterance(ut.reply_to).meta['is_valid_context']

This results in ~90,000 utterances and ~370,000 context utterances:

In [14]:
sum(ut.meta['is_valid_utt'] for ut in scotus_corpus.iter_utterances())

91924

In [15]:
sum(ut.meta['is_valid_context'] for ut in scotus_corpus.iter_utterances())

372268

## 2. Applying the Expected Context Framework

In [16]:
from convokit.expected_context_framework import ColNormedTfidfTransformer, DualContextWrapper

To apply the Expected Context Framework, we start by converting the input utterance text to an input vector representation. Here, we represent utterances in a term-document matrix that's normalized by columns (empirically, we found that this ensures that the representations derived by the framework aren't skewed by the relative frequency of utterances). We use the `ColNormedTfidfTransformer` transformer to do this.

We derive different tf-idf representations (with different vocabularies and other parameters) for justice and lawyer utterances, reflecting their different roles (and hence differences in their language use).

In [17]:
j_tfidf_obj = ColNormedTfidfTransformer(input_field='arcs', output_field='j_tfidf', binary=True, 
                                   min_df=250, max_df=1., max_features=2000)
_ = j_tfidf_obj.fit(scotus_corpus, selector=lambda x: x.meta['is_valid_utt'])
_ = j_tfidf_obj.transform(scotus_corpus, selector=lambda x: x.meta['is_valid_utt'])

In [18]:
a_tfidf_obj = ColNormedTfidfTransformer(input_field='arcs', output_field='a_tfidf', binary=True, 
                                   min_df=250, max_df=1., max_features=2000)
_ = a_tfidf_obj.fit(scotus_corpus, selector=lambda x: x.meta['is_valid_context'])
_ = a_tfidf_obj.transform(scotus_corpus, selector=lambda x: x.meta['is_valid_context'])

To compute orientation, we compare characterizations of utterances with respect to their forwards context (replies) with characterizations with respect to their backwards context (predecessors). As such, we initialize  two Expected Context Models, one that relates utterances to replies and one that relates utterances to predecessors. We ensure that the forwards and backwards characterizations are comparable by initializing the second model with the first.

To take care of both interlocked Expected Context Models, we use the `DualContextWrapper` transformer, which will keep track of two `ExpectedContextModelTransformer`s: one that relates utterances to predecessors (`reply_to`), and that outputs utterance-level attributes with the prefix `bk`; the other that relates utterances to replies (`next_id`) and outputs utterance-level attributes with the prefix `fw`.

In [19]:
dual_context_model = DualContextWrapper(context_fields=['reply_to','next_id'], output_prefixes=['bk','fw'],
                                    vect_field='j_tfidf', context_vect_field='a_tfidf', 
                                      n_svd_dims=15,
                                     random_state=1000)

In [20]:
dual_context_model.fit(scotus_corpus, selector=lambda x: x.meta['is_valid_utt'],
         context_selector=lambda x: x.meta['is_valid_context'])

### Term-level orientation

We start by examining the term-level orientation statistics we've computed. For convenience the `DualContextWrapper` outputs all term-level statistics as a dataframe:

In [21]:
term_df = dual_context_model.get_term_df()

We see that most terms have positive orientation. In the counseling setting, most terms have negative orientation. This difference might reflect the differing goals of counselors and justices: counselors are trained to focus heavily on empathetically addressing what an individual has said; justices are tasked with scrutinizing the arguments made by lawyers. 

In [22]:
np.sign(term_df.orn).value_counts(normalize=True)

 1.0    0.700806
-1.0    0.299194
Name: orn, dtype: float64

Among high-orientation terms, we see those reflecting justices pressing the lawyers to address a point in a particular way (e.g., [is there any] difference). The low-orientation terms are somewhat harder to interpret; we note that the idea of being "backwards-oriented" might be somewhat ill-defined in this particular setting.

In [23]:
print('\nhigh orientation')
display(term_df.sort_values('orn')[['orn']].tail(20))
print('low orientation')
display(term_df.sort_values('orn')[['orn']].head(20))


high orientation


Unnamed: 0_level_0,orn
index,Unnamed: 1_level_1
raised_*,0.083416
talking_are,0.085496
do>*,0.086808
said_if,0.087252
agree_do,0.090253
is_the,0.090826
of_appeals,0.093167
brief_*,0.094093
suppose>*,0.094863
is_where,0.099781


low orientation


Unnamed: 0_level_0,orn
index,Unnamed: 1_level_1
in_order,-0.068865
specific_*,-0.059897
and>it,-0.058361
to_it,-0.056576
laughter_*,-0.055687
which>*,-0.053849
is_for,-0.052996
is_which,-0.052908
commission_*,-0.052511
habeas_*,-0.052096


### Sentence-level orientation

We derive the same statistic at the level of _sentences_ comprising justice utterances (we could do this for utterances, but found sentences to be more interpretable). To start, we create a new corpus consisting of these sentences:

In [22]:
from convokit import Utterance

In [23]:
sentence_utts = []
for ut in scotus_corpus.iter_utterances(selector=lambda x: x.meta['is_valid_utt']):
    sents = ut.meta['arcs'].split('\n')
    tok_sents = ut.meta['tokens'].split('\n')
    for i, (sent, tok_sent) in enumerate(zip(sents, tok_sents)):
        utt_id = ut.id + '_' + '%03d' % i
        speaker = ut.speaker
        text = tok_sent
        meta = {'arcs': sent, 'utt_id': ut.id, 'speaker': ut.speaker.id}
        sentence_utts.append(Utterance(
                    id=utt_id, speaker=speaker, text=text,
                    reply_to=ut.reply_to, conversation_id=ut.conversation_id,
                    meta=meta
                ))

In [24]:
sentence_corpus = Corpus(utterances=sentence_utts)

In [25]:
sentence_corpus.print_summary_stats()

Number of Speakers: 35
Number of Utterances: 140274
Number of Conversations: 7272


By applying the transformer, we annotate each sentence with its orientation:

In [26]:
_ = j_tfidf_obj.transform(sentence_corpus)
_ = dual_context_model.transform(sentence_corpus, selector=lambda x: x.meta['j_tfidf__n_feats'] >= 1)


In [27]:
ut_eg_id = '20030__1_029_000'
eg_ut = sentence_corpus.get_utterance(ut_eg_id)
print(eg_ut.speaker.meta['name'], ':',eg_ut.text)

David H. Souter : Well , do you agree with me that if you do n't go to legislative intent , we have , on your reading , what may be a grammatical reading , but a very foolish statute ?


In [28]:
eg_ut.meta['orn']

0.02958196774118771

For convenience, to inspect sentences with low or high orientation, we will load these statistics into a Pandas dataframe:

In [29]:
sent_df = sentence_corpus.get_attribute_table('utterance',['orn','j_tfidf__n_feats'])
text_df = pd.DataFrame([{'id': ut._id, 'text': ut.text, 'speaker': ut.speaker.meta['name']}
    for ut in sentence_corpus.iter_utterances()
]).set_index('id')
sent_df = sent_df.join(text_df)

As with terms, the majority of sentences have positive orientation.

In [30]:
np.sign(sent_df.orn).value_counts(normalize=True)

 1.0    0.87056
-1.0    0.12944
Name: orn, dtype: float64

In [31]:
low_subset = sent_df[(sent_df.j_tfidf__n_feats >= 30)
                    & (sent_df.orn < sent_df.orn.quantile(.1))].sample(10,random_state=9)
high_subset = sent_df[(sent_df.j_tfidf__n_feats >= 30)
                    & (sent_df.orn > sent_df.orn.quantile(.9))].sample(10,random_state=9)

Below, we print out a random sample of sentences with high and low utterances (next two cells). We note that the interpretation problems we noted at the term level remain here, and encourage future work to consider variants or alternatives of the orientation statistic on this dataset.

In [32]:
for id, row in high_subset.sort_values('orn', ascending=False).iterrows():
    print(id,row.speaker, 'orientation:',row.orn)
    print('>', row.text)
    print()

18643__2_007_000 Byron R. White orientation: 0.033653619864415174
> Do you agree that the court below , that the Supreme Judicial Court did not hold that the state courts were required to give review of a prison disciplinary decision ?

15090__1_108_000 Byron R. White orientation: 0.03345959508409968
> Do you still would be open to the Ohio Courts to say notwithstanding the judgment turning him over to the adult Courts has been reversed by the Supreme Court of the United States .

17696__2_125_000 Byron R. White orientation: 0.03340985781504202
> Do you agree with the statement in the Court of Appeals , we hold a comprehensive major Federal action is contemplated in the Northern Great Plains and therefore , what , do you think that is enough that   some of the major Federal action is contemplated ?

20872__0_022_001 Antonin Scalia orientation: 0.032146254486083214
> In other words , did the intermediate court have any fact - finding to do which would not have been reconsidered by the C

In [33]:
for id, row in low_subset.sort_values('orn').iterrows():
    print(id,row.speaker, 'orientation:',row.orn)
    print('>', row.text)
    print()

16516__5_057_000 Warren E. Burger orientation: -0.0062158327698261795
> Are you free when I put that limitation on it , are you free to offer a hypothesis as to why the Government of Cuba as not made any formal claim of act of state , but has simply depended upon a litigation position asserted by you ?

13496__1_095_000 Earl Warren orientation: -0.005610162708263755
> But what line of cases do you rely on to substantiate that point that you can sue only the enforcing agency in order to establish the unconstitutionality of the act when they have done nothing to go beyond a fair interpretation of the act ?

22164__0_020_000 Anthony M. Kennedy orientation: -0.005592280465212318
> Well then , I do n't know what effect you 're giving to the fact , as the earlier questions have indicated , that there is a structural conflict .

15045__0_082_000 Earl Warren orientation: -0.005540441685332631
> Well , why do you have to go from one border of your state way over to the middle of the state in th

## 3. Pipeline usage

We can also apply the framework via a pipeline that handles the following:
* processes text (via a pipeline supplied by the user; see cell below)
* transforms text to input representation (via `ColNormedTfidfTransformer`)
* derives framework output (via `ExpectedContextModelTransformer`)

In [24]:
from convokit.expected_context_framework import DualContextPipeline

In [25]:
# see `demo_text_pipelines.py` in this demo's directory for details
# in short, this pipeline will compute the dependency-parse arcs we use as input features,
# but will skip over utterances for which these attributes already exist
from demo_text_pipelines import scotus_arc_pipeline

We initialize the pipeline with the following arguments:
* `text_field` specifies which utterance metadata field to use as text input
* `text_pipe` specifies the pipeline used to compute the contents of `text_field`
* `tfidf_params` specifies the parameters to be passed into the underlying `ColNormedTfidfTransformer` object

All other arguments are inherited from `DualContextWrapper`.

In [26]:
dual_pipe = DualContextPipeline(context_fields=['reply_to','next_id'], 
                output_prefixes=['bk','fw'], share_tfidf_models=False,
                 text_field='arcs', text_pipe=scotus_arc_pipeline(), 
                tfidf_params={'binary': True, 'min_df': 250, 'max_features': 2000}, 
                n_svd_dims=15, random_state=1000)

In [27]:
dual_pipe.fit(scotus_corpus,
             selector=lambda x: x.meta['is_valid_utt'],
         context_selector=lambda x: x.meta['is_valid_context'])

This should produce the same output as calling the constituent steps separately.

In [30]:
term_df_new = dual_pipe.get_term_df()

In [31]:
print('\nhigh orientation')
display(term_df_new.sort_values('orn')[['orn']].tail(20))
print('low orientation')
display(term_df_new.sort_values('orn')[['orn']].head(20))


high orientation


Unnamed: 0_level_0,orn
index,Unnamed: 1_level_1
raised_*,0.083416
talking_are,0.085496
do>*,0.086808
said_if,0.087252
agree_do,0.090253
is_the,0.090826
of_appeals,0.093167
brief_*,0.094093
suppose>*,0.094863
is_where,0.099781


low orientation


Unnamed: 0_level_0,orn
index,Unnamed: 1_level_1
in_order,-0.068865
specific_*,-0.059897
and>it,-0.058361
to_it,-0.056576
laughter_*,-0.055687
which>*,-0.053849
is_for,-0.052996
is_which,-0.052908
commission_*,-0.052511
habeas_*,-0.052096


Note that the pipeline enables us to transform ad-hoc string input:

In [32]:
eg_ut_new = dual_pipe.transform_utterance('What is the difference between these statutes?')

In [33]:
print('orientation:', eg_ut_new.meta['orn'])

orientation: 0.03682089802385846
