<a href="https://colab.research.google.com/github/eastmountaincode/portfolio/blob/main/caselawAPIFYforPortfolio.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Pulling data from the Caselaw Access Project API

From https://case.law/about/:

"*The **Caselaw Access Project** (“CAP”) expands public access to U.S. law. Our goal is to make all published U.S. court decisions freely available to the public online, in a consistent format, digitized from the collection of the Harvard Law School Library.*"

This task was connected to the University of Cincinnati's Digital Scholarship's project investigating the Anthropocene. 
https://www.cambridge.org/core/journals/pmla/article/abs/anthropocene-and-empire-discourse-networks-of-the-human-record/367FA0820A5AB8F59755010E37DC7B48

It was my job to pull all caselaw documents containing the word "bomb", "atom", "nuclear", "pollution", "climate", "environment", "earth", or "enivronmental". This code writes the caselaw document data directly to a CSV file.

## The code

In [None]:
import pandas as pd
import requests
import csv

id_df = pd.DataFrame(columns = ['ID'])

with open('str8_2_hell.csv', mode='w') as csv_file:
    fieldnames = ['ID', 'URL', 'date', 'title', 'text']
    writer = csv.DictWriter(csv_file, fieldnames=fieldnames)
    writer.writeheader()
    
termlist = ["bomb", "atom", "nuclear", "pollution", "climate", "environment", "earth", "environmental"]

#for each term...

for termi in termlist:

  #get the first one, which will serve as a foundation for getting the others

  firsturl = 'https://api.case.law/v1/cases/?full_case=true&search={term}'.format(term = termi)

  res = requests.get(
      firsturl,
      headers={'Authorization': 'Token <API KEY GOES HERE>'}
  )
  res = res.json()

  #get total doc count, which will be useful for determining how many times to iterate

  docCount = res['count']
  counter = 0
  alt_case_counter = 0

  nextURL = res['next']

  #big ol while loop

  while nextURL:
    for i in range(len(res['results'])):
      #get the id
      docID = str(res['results'][i]['id'])

      #make sure there actually is a casebody we can pull
      if res['results'][i]['casebody']['data']['opinions']:

        #check if the docID is already in the dataframe 
        if not (id_df['ID']==docID).any():
          #if not, get the info and add it to the df
          #we already have the id...
          #get the URL
          URL = res['results'][i]['url']
          #get the date
          date = res['results'][i]['decision_date']
          #get the title
          title = res['results'][i]['name_abbreviation']
          #get the full text!! o.0
          text = res['results'][i]['casebody']['data']['opinions'][0]['text']
          text = text.replace(',', '')
          #add the info to a new row in the dataframe
          new_row = {'ID': docID, 'URL':URL, 'date': date, "title":title, "text":text}
          new_dfrow = {'ID': docID}
          id_df = id_df.append(new_dfrow, ignore_index=True)

          with open('str8_2_hell.csv', mode='a') as csv_file:
              fieldnames = ['ID', 'URL', 'date', 'title', 'text']
              writer = csv.DictWriter(csv_file, fieldnames=fieldnames)         
              writer.writerow(new_row)
      else:
        alt_case_counter += 1
         #check if the docID is already in the dataframe 
        if not (id_df['ID']==docID).any():
          #if not, get the info and add it to the df
          #we already have the id...
          #get the URL
          URL = res['results'][i]['url']
          #get the date
          date = res['results'][i]['decision_date']
          #get the title
          title = res['results'][i]['name_abbreviation']
          #get the full text!! o.0
          text = res['results'][i]['casebody']['data']['head_matter']
          text = text.replace(',', '')
          #add the info to a new row in the dataframe
          new_row = {'ID': docID, 'URL':URL, 'date': date, "title":title, "text":text}
          new_dfrow = {'ID': docID}
          id_df = id_df.append(new_dfrow, ignore_index=True)

          with open('str8_2_hell.csv', mode='a') as csv_file:
              fieldnames = ['ID', 'URL', 'date', 'title', 'text']
              writer = csv.DictWriter(csv_file, fieldnames=fieldnames)   
              writer.writerow(new_row)

    counter += len(res['results'])  
    percent_done = round(((counter / docCount) * 100), 2)
    print("{percent_done} percent done".format(percent_done = percent_done))
      
    #get the next url
    nextURL = res['next']
    if bool(nextURL) == True:
      #get the next doc json
      res = requests.get(
        nextURL,
        headers={'Authorization': 'Token <API KEY GOES HERE>'}
      )

      res = res.json()
    else:
      break
  print(termi)

print("Completed!")

In [None]:
print("We have", alt_case_counter, "alt cases.")
print(id_df)

We have 0 alt cases.
           ID
0     5311147
1    10910713
2       82210
3     3819892
4     3958351
..        ...
395  11319933
396    175196
397   1306459
398   4131365
399   2657162

[400 rows x 1 columns]
