<a href="https://colab.research.google.com/github/danielsaggau/thesis/blob/main/eyecite.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install eyecite

In [2]:
 import eyecite
text_from_opinion = """Copyright and patents, the Constitution says,
    are to “promote the Progress of Science and useful Arts,
    by securing for limited Times to Authors and Inventors the
    exclusive Right to their respective Writings and Discoveries.”
    Art. I, §8, cl. 8. Copyright statutes and case law have made
    clear that copyright has practical objectives. It grants an
    author an exclusive right to produce his work (sometimes for
    a hundred years or more), not as a special reward, but in order
    to encourage the production of works that others might reproduce
    more cheaply. At the same time, copyright has negative features.
    Protection can raise prices to consumers. It can impose special
    costs, such as the cost of contacting owners to obtain reproduction
    permission. And the exclusive rights it awards can sometimes stand
    in the way of others exercising their own creative powers. See
    generally Twentieth Century Music Corp. v. Aiken, 422 U. S. 151,
    156 (1975); Mazer v. Stein, 347 U. S. 201, 219 (1954)."""

In [3]:
citations = eyecite.get_citations(text_from_opinion)
len(citations)

3

In [13]:
citations

[NonopinionCitation('§8,', metadata=CitationBase.Metadata(parenthetical=None)),
 FullCaseCitation('422 U. S. 151', groups={'volume': '422', 'reporter': 'U. S.', 'page': '151'}, metadata=FullCaseCitation.Metadata(parenthetical=None, pin_cite=None, year=None, court='scotus', plaintiff='Corp.', defendant='Aiken', extra=None)),
 FullCaseCitation('347 U. S. 201', groups={'volume': '347', 'reporter': 'U. S.', 'page': '201'}, metadata=FullCaseCitation.Metadata(parenthetical=None, pin_cite='219', year='1954', court='scotus', plaintiff='Mazer', defendant='Stein', extra=None))]

In [18]:
facts_section = """Google envisioned an Android platform that was free and
    open, such that software developers could use the tools
    found there free of charge. Its idea was that more and more
    developers using its Android platform would develop ever
    more Android-based applications, all of which would make
    Google’s Android-based smartphones more attractive to ultimate consumers.
    Consumers would then buy and use ever
    more of those phones. Oracle America, Inc. v. Google Inc.,
    872 F. Supp. 2d 974, 978 (ND Cal. 2012); App. 111, 464.
    That vision required attracting a sizeable number of skilled
    programmers.
    At that time, many software developers understood and
    wrote programs using the Java programming language, a
    language invented by Sun Microsystems (Oracle’s predecessor). 872 F. Supp. 2d, at 975, 977. About six million programmers had spent considerable time learning, and then
    using, the Java language. App. 228. Many of those programmers used Sun’s own popular Java SE platform to develop new programs primarily for use in desktop and laptop
    computers. Id., at 151–152, 200. That platform allowed
    developers using the Java language to write programs that
    were able to run on any desktop or laptop computer, regardless of the underlying hardware (i.e., the programs were in
    large part “interoperable”). 872 F. Supp. 2d, at 977. Indeed, one of Sun’s slogans was “‘write once, run anywhere.’”
    886 F. 3d, at 1186."""

In [19]:
clean_facts_section = eyecite.clean_text(facts_section, ["all_whitespace"])

In [20]:
len(facts_section) - len(clean_facts_section)

76

In [21]:
facts_section_citations = eyecite.get_citations(clean_facts_section)
len(facts_section_citations)

5

In [22]:
print(facts_section_citations[1])

ShortCaseCitation('872 F. Supp. 2d, at 975', groups={'volume': '872', 'reporter': 'F. Supp. 2d', 'page': '975'}, metadata=ShortCaseCitation.Metadata(parenthetical=None, pin_cite='975, 977', year=None, court=None, antecedent_guess='.'))


In [23]:
facts_section_citations[1].token

CitationToken(data='872 F. Supp. 2d, at 975', start=757, end=780, groups={'volume': '872', 'reporter': 'F. Supp. 2d', 'page': '975'}, exact_editions=(Edition(reporter=Reporter(short_name='F. Supp.', name='Federal Supplement', cite_type='federal', source='reporters', is_scotus=False), short_name='F. Supp. 2d', start=datetime.datetime(1988, 1, 1, 0, 0), end=datetime.datetime(2014, 8, 21, 0, 0)),), variation_editions=(), short=True)

In [25]:
facts_section_citations[2]

IdCitation('Id.,', metadata=IdCitation.Metadata(parenthetical=None, pin_cite=None))

In [26]:
discussion_text = eyecite.clean_text(text_from_opinion, ["all_whitespace"])
discussion_citations = eyecite.get_citations(discussion_text)

In [27]:
import re
from urllib.parse import urlunparse, ParseResult
from eyecite.models import CaseCitation

In [29]:
def url_from_citation(cite: CaseCitation) -> str:
    """Make a URL for linking to an opinion on case.law."""
    reporter_abbreviations = {
        'U.S.': "us",
        "F. Supp.": "f-supp"
    }
    reporter = reporter_abbreviations[cite.canonical_reporter]

    if cite.pin_cite:
        # Assumes that the first number in the pin_cite field is
        # the correct HTML fragment identifier for the URL.
        page_number = re.search(r'\d+', cite.pin_cite).group()
        fragment = f"p{page_number}"
    else:
        fragment = ""

    url_parts = ParseResult(
        scheme='https',
        netloc='cite.case.law',
        path=f'/{reporter}/{cite.volume}/{cite.page}/',
        params='',
        query='',
        fragment=fragment)

    return urlunparse(url_parts)

In [33]:
def make_annotations(
    citations: list[CaseCitation]) -> list[tuple[tuple[int, int], str, str]]:
    result = []
    for cite in citations:
        if isinstance(cite, CaseCitation):
            caselaw_url = url_from_citation(cite)
            result.append(
                (cite.span(),
                f'<a href="{caselaw_url}">',
                "</a>")
            )
    return result

TypeError: ignored

In [31]:
url_from_citation(citations[2])

AttributeError: ignored