In [1]:
from typing import List, Dict

import regex as re
from convokit import Corpus, Speaker, Utterance
import numpy as np
from pickle import load

In [2]:
supreme = Corpus(filename="../supreme_full_processed_lem")

In [3]:
def get_utterances(token: str) -> List[Dict[str, str]]:
    subcorpus = supreme.filter_utterances_by(
        lambda utt: token in utt.meta["lem-tokens"]
    )
    return [dictify_utt(utt) for utt in subcorpus.iter_utterances()]


def dictify_utt(utt: Utterance) -> Dict[str, str]:
    return {
        "text": utt.text,
        "gender": utt.get_speaker().meta["gender_signal"],
        "speaker-id": utt.get_speaker().id,
    }

In [7]:
import pprint

In [8]:
pprint.pprint(get_utterances("state")[34])

{'gender': 'F',
 'speaker-id': 'lisa_s_blatt',
 'text': 'so you took this case on the assumption -- and we cite it on page '
         '<number> of our reply brief -- all the places where the respondent '
         'concedes, and the montana supreme court expressly said, that this '
         "money has to be used to carry out the remedy. and that's the way "
         'this case comes up. if you want to know the reason for sunburst, '
         "it's because of the reason is personal. so, if you own a property "
         "and love it so much and you don't have any damages, the whole point "
         'of the restoration remedy to avoid the windfall is you have to spend '
         "the money. so i'm quite confident that i'm accurately stating "
         'montana law and that respondents never argued to the contrary. and '
         'in our reply brief, again, we cite all the concessions, including, i '
         'think, the opinion below in three places says the money has to go to '
         "

In [9]:
pprint.pprint(get_utterances("state")[8])

{'gender': 'M',
 'speaker-id': 'derek_l_shaffer',
 'text': "i'm -- i'm not urging that, and i don't think the court needs to "
         "reexamine that because there isn't the same sort of congruence and "
         'proportionality problem. part of what was at issue i think in city '
         'of boerne, and rightfully concerned the court, is congress was '
         'redefining the substantive law. it was intruding upon the '
         'substantive conduct of states and basically changing the substantive '
         "rules of what would constitute a constitutional violation. that's "
         'not what you have here, respectfully, justice alito. this is '
         'congress looking at something that is a cardinal sin. it is states '
         "infringing federal copyrights, protected federal property. and it's "
         "-- it's enacting a remedy that is precisely tailored to that. states "
         'have to pay what any private infringer would pay. states have to pay '
         'what th

In [10]:
pprint.pprint(get_utterances("state")[209])

{'gender': 'M',
 'speaker-id': 'eric_f_citron',
 'text': 'i -- i think the simplest understanding is the following. the code '
         'revision commission is in two critical respects like the legislature '
         'or exercising a legislative or law-making function. first, it '
         'discharges its duties entirely for the behest -- at the behest of '
         'and for the benefit of the legislature, and the georgia supreme '
         'court has told us that this is an exercise of the legislative '
         'authority for purposes of georgia constitutional law. so trying to '
         'draw some line between the code revision commission and the '
         'legislature would be, i think, inauthentic. on top of that, code '
         'revision commissions are exercising a legislative function. they '
         'assemble the text of the statutes. if you were to adopt a rule that '
         'the code revision commission does not speak for the state, in states '
         'like new york,

In [11]:
pprint.pprint(get_utterances("state")[543])

{'gender': 'M',
 'speaker-id': 'jonathan_c_bond',
 'text': "i don't think it's because the state has used the label. the "
         'eleventh circuit pointed out that in the context of that particular '
         'scheme, it was quite implausible that the -- the -- the state '
         'legislature, i think it was alabama, intended that the highest level '
         'of its scheme not to become an acca predicate and not to involve '
         'conduct that would give rise to this kind of inference, even though '
         "lower level offenses did. it's a context-specific thing. and we're "
         'not suggesting that this case turns on exactly where you draw the '
         'line, and where the -- where the eleventh circuit drew it or where '
         'the fourth circuit drew it with respect to that <number> gram '
         'threshold.'}


In [13]:
pprint.pprint(get_utterances("way")[76])

{'gender': 'F',
 'speaker-id': 'erica_l_ross',
 'text': 'so, justice alito, i certainly take the point that arbitrators may '
         'not be bound in the same way that lower courts would be. i think '
         'where you have a situation -- although, actually, in the second '
         "circuit, there is a case overturning an arbitrator's decision for "
         'failing to follow second circuit law and following other courts. i '
         'take the point that this court might not agree with that decision. '
         'but i think when you have a case like this one where you have '
         '<number> arbitrations on one side of the ledger and zero on the '
         "other, you don't actually need to decide these sort of more "
         'difficult edge cases about what would happen if it were closer or if '
         'you really had a question as to what law the arbitrators were '
         "applying. it's quite clear in these, again, reported, well-reasoned, "
         'quite predictable

In [14]:
pprint.pprint(get_utterances("way")[77])

{'gender': 'F',
 'speaker-id': 'erica_l_ross',
 'text': 'so i think, justice alito, the way i would phrase it, if i might, is '
         'that the -- the arbitration decisions are really confirmation of the '
         "industry's understanding because these are expert arbitrators. and, "
         'again, going back to where i started this morning, the industry had '
         'and has had, i believe since the 1950s, two sort of standard form '
         'contracts that govern. and so it is consistent with that dichotomy '
         'between --'}


In [15]:
pprint.pprint(get_utterances("way")[89])

{'gender': 'M',
 'speaker-id': 'elbert_lin',
 'text': 'if i may, i would quarrel with your use of the word "evasion," '
         "because i think what's important to remember is it's a comprehensive "
         "scheme. congress didn't design a -- it didn't just put the point "
         'source program out into the world on a hope and a prayer that there '
         'would be some other regulatory program that would cover the other '
         "scenarios, including the one that you're talking about, justice "
         'breyer. there -- there is a nonpoint source program. there are laws, '
         'including in hawaii, that would explicitly prohibit the scenario '
         "that you're talking about. hawaii code 354d -- three -- "
         "354d-<number>, it says that you can't alter the way your -- your -- "
         'your discharge system is structured without permission from the '
         'director of --'}


In [16]:
pprint.pprint(get_utterances("abuse")[34])

{'gender': 'M',
 'speaker-id': 'jeffrey_l_fisher',
 'text': 'justice kagan, what we think is that if you want to give a '
         'definition for sexual abuse of a minor, at least in the context of '
         'the allegation of abuse being due to age, that <number> would be the '
         'appropriate cutoff. and we do say that in our reply brief in '
         "response to the government's point that, at least its arguments, "
         'that you need to go further than we argued in our blue brief.'}


In [20]:
pprint.pprint(get_utterances("abuse")[98])

{'gender': 'M',
 'speaker-id': 'matthew_s_hellman',
 'text': "i suppose you could, but there's a difference as to why i think the "
         'statute ought to tolerate prosecution in that scenario, which is '
         "where there's been a formal notice of audit and someone has been "
         'given questions by the government and needs to respond in a '
         'reasonable manner to them. you can understand why congress wanted to '
         'make that a crime distinct from, maybe on top of, other crimes that '
         "a person has committed. but if we're talking about the maintenance "
         'of records prior to the initiation of that proceeding, then there '
         'are many other crimes that do cover recordkeeping and, of course, '
         'your obligation to pay taxes. but those are generally, with the '
         'exception of tax evasion, generally not felonies and they generally '
         'have a lower sentence than the one here. so i do take your point '
         'tha

In [21]:
pprint.pprint(get_utterances("abuse")[65])

{'gender': 'F',
 'speaker-id': 'alexandra_a_e_shapiro',
 'text': 'mr. chief justice, and may it please the court: in case after case '
         '-- mcnally, skilling, and mcdonald, to name just a few -- this court '
         'has construed federal criminal statutes narrowly to avoid serious '
         'separation of powers and vagueness problems. this case presents '
         'those same constitutional concerns, but to a far greater degree, '
         'because no statute defines the elements of the crime. the court '
         'should limit this crime to its core, as it did in skilling, and that '
         "core is the insider's abuse of confidential corporate information "
         'for personal profit. unless and until congress enacts a definition, '
         'the crime should be limited to trading by the insider or its '
         'functional equivalent -- equivalent where the insider tips another '
         'person in exchange for a financial benefit.'}


In [23]:
pprint.pprint(get_utterances("abuse")[13])

{'gender': 'M',
 'speaker-id': 'k_winn_allen',
 'text': 'i agree with ms. ratner on this. i -- i -- i think there is some '
         'daylight between plain error and abuse of discretion, probably not '
         'much. i do think many sentences that are deemed substantively '
         "unreasonable will like satisfy plain error review. but i don't think "
         'the --'}
