# Final Assignment (Prompt 2) LIGN 167
## Contributors:
- Andrew Lona

## Project Prompt (2)
Option 2: Implement an interesting application on top of a Large Language Model, such as GPT-3. These models can write stories, play games, write programs, generate images (Stable Diffusion), and ... do lots of other things, which we're only discovering now. What happens when you hook these things together?

A good application will go beyond the basic things that you can do with Large Language Models. For example, just making an API call to GPT-3 is not sufficient. Here are some ways to get a good grade (you don't have to do all of them at once, and others may be possible):

- You have found a creative way to chain together Large Language Models with each other or other parts of the software stack.
- Your application addresses a real / interesting user need.
- You have found a non-obvious capability of Large Language Models.
- Your application provides a good user experience.

## Goal for this project
The goal is to create a law simplification pipeline/application for CA Legislature bills.
## Literature Review, detailing the need/importance of legalese simplification, along with background research into the topic which should potentially speed up progress.
    - *Poor writing, not specialized concepts, drives processing difficulty in legal language* (Martinez, et al., 2022)
        - Details how legalese comprehension difficulty is not just noticeable, but directly tied to poor writing practices. Provides framework for testing and goals for scaling readability.
    - *Plain English Summarization of Contracts* (Manor and Li, 2019)
        - An earlier attempt at legalese simplification, albeit with a simpler model and more statistically rigid readability scoring criteria. Also says "this is a very challenging task" (p. 2)

### Python Code Integration Process. Both in code and as a brief writeup of process + results.
    - Regarding Pipeline/process stitching, I am hoping to combine web scraping, reading difficulty classification, and comprehension level translation into a seamless process powered by GPT-3.
    - If there is more time available, a GUI for this application will be made (perhaps using GPT-3 to code the UI?), as well as a fine-tuned GPT-3 model to improve efficiency.
        - Although so far I haven't found a big need for fine-tuning GPT-3 yet, apparently others on Piazza are having worse performance with their fine-tuned models, so I decided for my efforts to be on testing and further literature research.
    - As such. the pipeline will follow this process in the code below...

    1. The user is asked to input a quote from an AB or SB bill they would like simplified and what level they would like it simplified to.
    2. The quote is then used to webscrape the entire bill text of the focused bill and location of said quote within the bill. will either be fed through GPT-3, an alternative model, a series of statistical formulas (simple model), or a combination to determine the 4 comprehensio
    3. This then scores for the text
        - word choice, word-frequency, and passive voice (embedded text and non-standard capitalization are unfortunately not finished but will continue to be worked on).
    4. These 3 scores are then compared to reference scores which provide levels of difficulty based off the Martinez et al. paper.
    5. GPT-3 is then fed this info along with the text, and is asked to simplify the text to the level given.
    6. GPT-3 then asked to do one or all of three things during this process and are tweakable as parameters.
        - If the difficulty of the text is higher than the difficulty of the example corpus, then GPT-3 will simplify before applying any theming. This is tweaked with corpus resizing of the sample.
        - If the simpified text is determined as too difficult still and can continue to be simplified, GPT-3 will be asked to continue the above process until the bill is understood. This is tweaked with pass size.
        - If the user provides GPT-3 with a theming prompt, then the simplified text will have a theme applied (either based on Social Media, Films, or Academic writing). This can be omitted but is left in as part of the UI for ease of use.
    7. After the process is finished, the simplified version of the bill is outputted to the user along with any additional information scraped to provide context.

## Brief analysis of results
    - 3 samples will be generated using this process and then presented to a paralegal for verification of accuracy (nothing too crazy, I don't want to take up their time)
    - Extra: if given enough time, these samples will be incorporated into a short quiz/survey (with random variations of the original and simplified texts at different steps) and sent out. Results of the survey will be compared to the results from the Martinez, et al. paper.

## What would success be at the end of this pipeline?
    - On a very simple-level, success would be both a simple interface with realistic outputs that do improve reader comprehension of Legalese
    - What this looks like is a usable translator where the user only has to interact with it on a basic level and
    - The simplified law snippets do indeed prove reader comprehension has increased (aka questions of simplified text tend to have more correct answers)
    - Because the purpose of this project is to deliver an easier to interpret message to users, that includes an easy to use GUI.
___

In [1]:
# Dividing Code Block. This is left here as a buffer to help with exporting
# Andy -10_08_2022

In [2]:
# Import Statements

import torch # Importing pytorch (unknown if needed, yet.)

import os #Importing os and openai to interact with it (API)
import openai
# Load your API key from an environment variable or secret management service
# I have received a few token limit messages from OpenAI recently, so a bit of a warning!
openai.api_key = "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"


from bs4 import BeautifulSoup # Importing BS
import requests # and requests to do web scraping of CA Legislature

# used to shuffle tokens
import random

import PySimpleGUI as sg #GUI library


import threading #needed as GUI will freeze up if everything is running on the main thread

# used for testing user interaction-based functions, unused in final code
import time, sys
# time.sleep(2.5)

# used for replacing multiple characters in a string when webscraping
import re

import nltk # used for text difficulty scoring
# nltk.download('stopwords') # disabled for my own machine as I already have it, but can be enabled for release,
# I have no idea how to integrate this into pseudo-compilation yet unfortunately.

from nltk.tokenize import RegexpTokenizer, word_tokenize # gotta have word tokenization
tokenizer = RegexpTokenizer(r'\w+')
from nltk import FreqDist # used to calculate word frequencies
from nltk.corpus import stopwords # used for calculating word frequencies (without stopwords)
stop_words = set(stopwords.words('english'))

Mac OS Version is 12.6 and patch enabled so applying the patch
Applyting Mac OS 12.3+ Alpha Channel fix.  Your default Alpha Channel is now 0.99


### 1. User asked to input quote and what they would like it simplified to.
- I managed to make get a GUI working (my first time wow)!
- This will be called as a function at the bottom...

### 2. Using the quote to webscrape the bill
- Simple function used to create structured url to search for the exact bill
- If the exact bill is not found, return an error asking user to try correcting their input
- If the exact bill is found, scrape the entire bill and prep for use in Step 3

In [3]:
def CA_Legislature_Webscrape(text, debug):
    # Some explanation below
    # What I've learned of the url and search structure from CA Legislature
    # The base url is this chunk:
    base_search_url =  "https://leginfo.legislature.ca.gov/faces/billSearchClient.xhtml?"

    # The session chunk is:
    #year_start_year_end = str(20212022) #all sessions from 1999 to 2022 [sessions are two year pairs ended with 2021-2022]
    #session_chunk = "session_year="+ year_start_year_end
    years = ["session_year=19992000", "session_year=20012002", "session_year=20032004", "session_year=20052006", "session_year=20072008", "session_year=20092010", "session_year=20112012", "session_year=20132014", "session_year=20152016", "session_year=20172018", "session_year=20192020", "session_year=20212022"]
    # Note: session years searches do not work for all years when using phrase search

    # The keyword/search term chunk is: &keyword=test
    # Clearly states to surround the phrase in quotes
    keyword = ("&keyword="+ '"' + text + '"')

    # The house search term chunk is: &house=Both
    house = ("&house=" + "Both") # I have yet to figure out what this modifies

    # The author search term chunk is: &author=All
    author = ("&author=" + "All") # will pull names of authors and store into a list later on

    # and The law code search term chunk is: &lawCode=All
    law = ("&lawcode=" + "All")
    law_chunk =  law

    # --------------------- SCRAPING LOOP ---------------------

    # Creating url search string to try checking all years
    full_text_found = False
    bill_dict = {}

    print("Searching For Exact Bill Within CA Legislature Database...")

    for year_pair in years: # for each pair of years
        if "https://" in text:
            response = requests.get(url = text) # waiting for response from Legislature website
            soup = BeautifulSoup(response.content, 'html.parser') # creating soup from response
            full_text = soup.find_all(id="bill_all") # looking for a bill within the text
            try:
                full_text[0] # checking for content within the full_text
                print("Exact Bill Found using URL!") # let user know the exact text was found
                full_text_found = True # let's us stop looking for the text
                break
            except:
                print("Error: Did you try using a url?\n Make sure it directly leads to your bill.")
                break
        if not full_text_found: # if we haven't found the full text yet, scrape the current site url
            full_search_url = base_search_url + year_pair + keyword + house + author + law_chunk # creating url
            response = requests.get(url = full_search_url) # waiting for response from Legislature website
            soup = BeautifulSoup(response.content, 'html.parser') # creating soup from response
            full_text = soup.find_all(id="bill_all") # looking for a bill within the text

            try:
                full_text[0] # checking for content within the full_text
                print("Exact Bill Found!") # let user know the exact text was found
                full_text_found = True # let's us stop looking for the text
                break
            except:
                continue

    try: #Let's try getting the title from the text (will work with search and full_text soups)
        bill_title = str(soup.title)
        # newName = re.sub('[\\\\/:*?"<>|]', '', name)
        bill_title = bill_title.replace('</title>','') # Strip out both ends
        bill_title = bill_title.replace('<title>','')
        was_searched_used = False # a check for if the search function was used

        if bill_title == 'Bill Search': # if the soup ends up being from the search page
            new_url = str(soup.body.find(id="content_main").find(id="centercolumn").find(id="j_idt186:commdataTable").tbody.tr.td) # locating the first result in the search function
            new_url = new_url.replace('<td><div class="commdataRow"><a href="',"") # removing both ends
            head, sep, tail = new_url.partition('">\n<!')
            new_url = 'https://leginfo.legislature.ca.gov' + head # then concatenating into the new url

            print("Exact bill not found, locating closest recent bill...") # warn the user of the issue

            response = requests.get(url = new_url) # waiting for response from Legislature website using new url
            soup = BeautifulSoup(response.content, 'html.parser') # creating soup from response of new url

            bill_title = str(soup.title) # repeating the title saving process
            bill_title = bill_title.replace('</title>','')
            bill_title = bill_title.replace('<title>','')
            full_text_found = True # allow the function (and entire program by extension) continue processing
            was_searched_used = True

        bill_dict['title'] = bill_title #Append to the dictionary

    except: # if our last resort fails, then we need to end the entire process as a whole
        print('\nA critical error has occurred. Your bill, or any similar bill, could not be found.')
        return "No Bill", full_text_found, was_searched_used, "No Summary", "No Bill"



    # --------------------- CLEANING PROCESS ---------------------
    # AB_SB = "" # needed to check if bill is AB or SB

    try:
        #Date Published for SB
        date_published = str(soup.body.find(id="content_main").find(id="centercolumn").find(id="bill_nav_bill_text").find(style="font-weight:bold; size:12;"))
        date_published = date_published.replace('<span style="font-weight:bold; size:12;">','')
        date_published = date_published.replace('\n              </span>', '')
        bill_dict['date_published']= date_published
        AB_SB = "SB" # now we know this bill is an SB
    except:
        #Date Published for AB
        date_published = str(soup.body.find(id="content_main").find(id="centercolumn").find(id="billtext").find(style="font-weight:bold; size:12;"))
        date_published = date_published.replace('<span style="font-weight:bold; size:12;">','')
        date_published = date_published.replace('\n              </span>', '')
        bill_dict['date_published']= date_published
        AB_SB = "AB" # now we know this bill is an AB

    # about section of the bill
    # this part will be split between the SB or AB bills
    if AB_SB == "SB":
        soup_bill = soup.body.find(id="content_main").find(id="centercolumn").find(id="bill_all")

        bill_num = str(soup_bill.find(id="about").find(id="bill_num_title_chap"))
        bill_num = bill_num.replace('<div id="bill_num_title_chap"><span style="text-align:center;font-weight:bold;font-size:1.1em;font-family:\'Times New Roman\'; "><b>\n','')
        bill_num = bill_num.replace('                ', '')
        bill_num = bill_num.replace('\n              \n       ', '')
        bill_num = bill_num.replace('</b></span></div>', '')
        bill_dict['bill_info'] = bill_num

    # Subject
        subject = str(soup_bill.find(id="about").find(id="subject"))
        subject = subject.replace('<span id="subject">', '')
        subject = subject.replace('</span>','')
        subject = subject.replace('\n                         ', '')
        bill_dict['subject'] = subject

        digest_text = str(soup_bill.find(id="about").find(id="digesttext"))
        digest_text = digest_text.replace('</div></span>', '')
        digest_text = digest_text.replace('<span id="digesttext"><div style="margin:0 0 1em 0;">','')
        digest_text = digest_text.replace('\t', '')
        digest_text = digest_text.replace('\n', '')
        bill_dict['digest_text'] = digest_text


    if AB_SB == "AB":
        soup_bill = soup.body.find(id="content_main").find(id="centercolumn").find(id="billtext")

        bill_heading = str(soup_bill.find(id="about").find(id="title"))
        bill_heading = bill_heading.replace('<div id="title"><span style="font-weight:normal;font-size:0.9em;"> ', '')
        bill_heading = bill_heading.replace(' </span></div>', '')
        bill_dict['bill_info'] = bill_heading

        # Subject
        subject = str(soup_bill.find(id="about").find(id="subject"))
        subject = subject.replace('<span id="subject">', '')
        subject = subject.replace('</span>','')
        subject = subject.replace('\n                         ', '')
        bill_dict['subject'] = subject

        # digest text
        digest_text = str(soup_bill.find(id="about").find(id="digesttext"))
        digest_text = digest_text.replace('</div></span>', '')
        digest_text = digest_text.replace('<span id="digesttext"><div style="margin:0 0 1em 0;">','')
        digest_text = digest_text.replace('\t', '')
        digest_text = digest_text.replace('\n', '')
        bill_dict['digest_text'] = digest_text

    # now to store the main content of the bill with setion numbers
    main_content = str(soup_bill.find(id="bill"))

    # Removing the html formatting, each subheading needs to be addressed
    main_content = re.sub(r'\</h2>.+:inline;">',' ', main_content)
    main_content = re.sub(r'\<font color=".+><i>','', main_content)
    main_content = re.sub(r'\</div><div><div id=.+</h6>','', main_content)
    main_content = re.sub(r'\</div></div></div><div id=.+display:inline;">',' ', main_content)
    main_content = re.sub(r'\</div></p></div><div><p><div style=.+:inline;">',' ', main_content)
    main_content = re.sub(r'\</div></div><div><p><div style=.+:inline;">',' ', main_content)
    main_content = re.sub(r'\</div><div><div><div id=.+:left;">',' ', main_content)

    # Next, to remove additional string remnants of the html format (this is mostly end/beginning tags and heading numbers)
    main_content = main_content.replace('\t','')
    main_content = main_content.replace('<div id="bill"><div style=" text-transform: uppercase"><h2 style=" text-align: left;">', '')
    main_content = main_content.replace('</div><div style="margin:0 0 1em 0;">', '\n')
    main_content = main_content.replace('</div><div style="margin:0 0 1em 0;margin-left: 1em;">', '\n')
    main_content = main_content.replace('</div><div style="margin:0 0 1em 0;margin-left: 2.5em;">', '')
    main_content = main_content.replace('</div></p></div></div></div></div></div>', '')
    main_content = main_content.replace('</div></div></div></div>','')
    main_content = main_content.replace('</div></p></div>', '')
    main_content = main_content.replace('</i></font>','')
    main_content = main_content.replace('</h1>', '')
    main_content = main_content.replace('</h2>', '')
    main_content = main_content.replace('</h3>', '')
    main_content = main_content.replace('</h4>', '')
    main_content = main_content.replace('</h5>', '')
    main_content = main_content.replace('</h6>', '')

    bill_dict['main_bill'] = main_content # appends the main section of the bill to the dictionary

    use_summarize = False # setting up boolean checks to not go over token size
    use_bill = False

    # from testing on GPT-3 prompts, I've learned I really need to be conservative with my token sizes
    # The limit is ~4000
    # Full bills can exceed ~9000, while their main content can exceed ~6000, summaries can also exceed ~3000
    # When we incorporate the prompt (minimum 1000 tokens) + bill info (2000) tokens, that leaves
    # 1000 tokens for a response. As such the true limit is 3000 tokens.

    if len(word_tokenize(bill_dict['main_bill'])) + len(word_tokenize(bill_dict['digest_text'])) + 2020 < 3000:
        return bill_dict, full_text_found, was_searched_used, use_summarize, use_bill

    elif len(word_tokenize(bill_dict['main_bill'])) + len(word_tokenize(bill_dict['digest_text'])) + 2020 < 3000: # checking entire text size added to prompt + token calc size
        use_bill = True
        return bill_dict, full_text_found, was_searched_used, use_summarize, use_bill

    elif len(word_tokenize(bill_dict['digest_text'])) + 2000 < 3000: # now doing the same for the digest text, hopefully this works out better
        use_summarize = True
        use_bill = False
        return bill_dict, full_text_found, was_searched_used, use_summarize, use_bill
    else:
        print("Warning, your bill is too large to run in GPT-3, I will try my best to give a summary but it may look weird...")
        # if the bill is simply far too large in all sections, we will trim the summary...
        mini_summary = word_tokenize(bill_dict['digest_text'])
        mini_summary = mini_summary[0:2000]
        mini_summary = ' '.join(mini_summary)
        mini_summary = mini_summary.replace(' ,', '')
        mini_summary = mini_summary.replace(' .', '.')
        bill_dict['digest_text'] = mini_summary
        use_summarize = True
        use_bill = False

    if debug: print("word token check success")

    return bill_dict, full_text_found, was_searched_used, use_summarize, use_bill

In [4]:
def scorer(bill_dict, use_summarize, use_bill, repeat, passed_text, debug):
    # scoring process
    # Prepping text to be scored
    score_dict = {} # dictionary to score unique scores

    if use_summarize: # if the bill was deemed too long for GPT-3, just use the summary
        text_to_score = bill_dict["digest_text"]
    elif use_bill: # else use the entire bill
        text_to_score = bill_dict['main_bill']
    elif repeat: # if this is a repeat pass, use the text that was passed through instead
        text_to_score = passed_text
    else: # lastly if we are using the entire bill it will use all content
        text_to_score = bill_dict["digest_text"] + " " + bill_dict['main_bill']
    text_to_score = text_to_score.replace('\n','') # stripping some visual code so word embeddings can be accurate
    text_to_score = text_to_score.replace('\xa0','')
    text_to_score = re.sub(r'\  .+  ',' ', text_to_score) #Stripping empty repeat spaces

    if debug: print("text if check working")
    # We are looking for: Capitalization, Word Frequency, Word Choice, Center-embedding, and Active/passive voice
    # Measure 1, number of unique fully capitalized words/acronyms

    # creating a list of any fully capitalized words
    capitalize_words_list = re.findall(r'\b[A-Z]+\b', text_to_score) # finding all capitalized characters within corpus
    capitalize_words_list = [*set(capitalize_words_list)] # removing duplicates
    capitalize_words_list = [x for x in capitalize_words_list if len(x) != 1] # removing single letters
    capitalize_words_list = [x for x in capitalize_words_list if x != 'SECTION'] # remove section
    score_dict["number of fully capitalized words for full text, lower is better"] = len(capitalize_words_list)


    if debug: print("Measure 1 working")
    if debug: print(score_dict)
    # Measure 2, Word Frequency and Measure 3, Word Choice
    # I had to go with Unique word counts vs. exact synonym matching due to time and skill

    # first removing stopwords
    # stop_words = set(stopwords.words('english')) # save english stopwords for usage
    word_tokens = tokenizer.tokenize(text_to_score) # tokenize the bill text
    # text_to_score_filtered = [w for w in word_tokens if not w.lower() in stop_words]
    text_to_score_filtered = []

    for w in word_tokens: # for each word in word tokens, if it is not equal to the stop words, then add to filtered list
        if w not in stop_words:
            text_to_score_filtered.append(w)

    text_to_score_filtered = str(text_to_score_filtered) # converting list back to string

    text = text_to_score_filtered
    words = text.split()
    frequency_distribution = FreqDist(words)
    #print(frequency_distribution)
    score_dict['text word freq, higher is better'] = (frequency_distribution.N())
    score_dict['text unique words, lower is better'] = (frequency_distribution.B())

    if debug: print("Measure 2 working")
    if debug: print(score_dict)
    # Measures 4 and 5 are simply too complex for me to accomplish yet. But I will return to these
    # after finals. This is a project I'm passionate about and want to continue.

    # Measure 4. Center-embedding
    # now this one is cool, we are essentially using GPT-3 to find the number of center embedded phrases!
    # I spent so long trying to get this prompt to work but it's honestly harder than the actual main GPT-3 Conversion so for now I have to omit it.

    # embedding_prompt = 'Prompt:\n-In linguistics, center embedding is the process of embedding a phrase in the middle of another phrase of the same type. This often leads to difficulty with parsing which would be difficult to explain on grammatical grounds alone. \n-Center-embedded structures have long been observed to pose processing difficulties on a reader (Gibson, 1998; Miller & Chomsky, 1963). The tendency for lawyers to “embed” legal jargon “in convoluted syntax” has been speculated not only to be prevalent in legal texts but as a potential badge of honor for those who wish to “talk like a lawyer” and be accepted by their profession (Tiersma, 2008).\n- You will find the number of center-embedded meanings within pieces of law, we have provided some non-law examples for you.\n\nText 1: "A man that a woman loves"\nCenter-embedded: 0 <STOP>\n\nText 2: "A man that a woman that a child knows loves"\nCenter-embedded: 1 <STOP>\n\nText 3: "A man that a woman that a child that a bird saw knows loves"\nCenter-embedded: 2 <STOP>\n\nText 4: "A man that a woman that a child that a bird that I heard saw knows loves"\nCenter-embedded: 3 <STOP>\n\nText 5: "My brother opened the window. The maid had closed it. "\nCenter-embedded: 1 <STOP>\n\nText 6: '
    # embedding_prompt = embedding_prompt + '"' + text_to_score + '"'
    # # now creating a response prompt for GPT-3 using our embedding prompt
    # response = openai.Completion.create(
    # model="text-davinci-003",
    # prompt =  embedding_prompt,
    # temperature = 0.0,
    # top_p = 1,
    # max_tokens = 8,
    # frequency_penalty = 0,
    # presence_penalty = 0)
    # # Token size stays at 8, an issue is if we are converting whole bills...
    #
    # center_embeddings = (str(response.choices[0].text)) # extracting string response
    # center_embeddings = center_embeddings.lstrip() # removing empty space generated by GPT-3
    # center_embeddings = center_embeddings.lstrip('\nCenter-embedded: ',) # removing initial new lines generated by GPT-3
    # print(center_embeddings)
    # center_embeddings = int(center_embeddings) # converting into int
    # score_dict['center embeddings, lower is better'] = center_embeddings

    # Measure 5. Active/passive voice

    # # We need sentence count to steer GPT-3 in the right direction, a big issue is if we make it work for the sentence count and active/passive counts, it's inacurrate
    # # so, we must provide the sentence count first
    # sentence_count = len(sent_tokenize(text_to_score))
    # # print("Sentence Count = ", sentence_count)
    # sentence_count = "Sentences = " + str(sentence_count)

    #Generating prompt
    # active_passive_prompt = """- Active and passive voice examples
    #
    # - Take a look at these examples of both the active and passive voices in action, mark which type of voice you believe it to fall under:
    #
    # >Is Ajani visiting us today?
    # Sentences = 1, Passive = 0, Active = 1<STOP>
    #
    # >Will we be visited by Ajani today?
    # Sentences = 1, Passive = 1, Active = 0<STOP>
    #
    # - As you see, questions can be written in either voice. Other kinds of sentences, like exclamatory and imperative sentences, are often best written in the active voice:
    #
    # >Please remove your shoes before entering my house.
    # Sentences = 1, Passive = 0, Active = 1<STOP>
    #
    # >Shoes should be removed before entering my house.
    # Sentences = 1, Passive = 1, Active = 0<STOP>
    #
    # >Lock the door!
    # Sentences = 1, Passive = 0, Active = 1<STOP>
    #
    # >Let the door be locked!
    # Sentences = 1, Passive = 1, Active = 0<STOP>
    #
    # - See how with the first pair, the passive voice makes the request feel more like a suggestion? In the second pair, the passive voice makes the message sound stilted and formal rather than an urgent exclamation.
    #
    # -Now take a look at these two examples:
    #
    # >I poured the solution into the beaker and heated it to 100℉.
    # Sentences = 1, Passive = 0, Active = 1<STOP>
    #
    # >The solution was poured into the beaker and heated to 100℉.
    # Sentences = 1, Passive = 1, Active = 0<STOP>
    #
    # >Will we be visited by Ajani today? Ajani will visit us today. There is a sound coming from the door right? Let the door be locked! Ajani says "hello?". The sound from the door must be Ajani! Ajani, Shoes should be removed before entering my house.
    # Passive, Active, Passive, Passive, Active, Active, Passive.
    # Sentences = 7, Passive = 4, Active = 3<STOP>
    #
    # >This bill would require the State Department of Public Health to coordinate specified school district, county office of education, and charter school COVID-19 testing programs that are currently federally funded or organized under the California COVID-19 Testing Task Force, as provided. The bill would authorize the department to provide supportive services, including technical assistance, vendor support, guidance, monitoring, and testing education, related to testing programs for teachers, staff, and pupils to help schools reopen and keep schools operating safely for in-person learning. The bill would also encourage the department to expand its contagious, infectious, or communicable disease testing guidance and other public health mitigation efforts to include prekindergarten and childcare centers, as provided.
    # Sentences = 3, Passive = 1, Active = 2<STOP>
    #
    # >Establishment of a method by which an individual may demonstrate an authorized health-related use of a controlled substance when a positive alert is noted by an electronic drug detection device, a passive alert dog, or other technology.
    # Sentences = 1, Passive = 0, Active = 1<STOP>
    #
    # >Existing law establishes a Citizens Redistricting Commission to adjust the boundary lines of the supervisorial districts of the board of supervisors for the County of Los Angeles. Existing law requires the commission to consist of 14 members who meet specified qualifications. Existing law requires the county elections official, in each year ending in the number 0, to select 60 of the most qualified applicants for membership on the commission and categorize them into subpools for each of the 5 existing supervisorial districts, with random drawings to select one member for each supervisorial district and 3 additional members. Existing law requires those 8 members to select the remaining 6 members from among the list of most qualified applicants, as provided.
    # Sentences = 4, Passive = 2, Active = 3<STOP>
    #
    # >This bill would require each local educational agency, defined to mean a school district, county office of education, or charter school, after consulting with its local health department, as defined, to create a COVID-19 testing plan, or adopt the State Department of Public Health’s framework, as defined, that is consistent with guidance from the department, as provided. The bill would require each local educational agency to publish the testing plan on its internet website. The bill would authorize each local educational agency to designate one staff member to report information on its COVID-19 testing program to the department, as provided. The bill would require that all COVID-19 testing data be in a format that facilitates a simple process by which parents and local educational agencies may report data to the department or a local health department, as provided. By imposing new obligations on local educational agencies, and to the extent new duties are imposed on local health departments, the bill would impose a state-mandated local program. The bill would require the department to determine which COVID-19 tests are appropriate for the testing program.
    # Sentences = 6, Passive = 4, Active = 2<STOP>
    #
    # >Each local educational agency, after consulting with its local health department regarding any local guidance or best practices from the Safe Schools for All Hub, shall create a COVID-19 testing plan, or adopt the framework, that is consistent with guidance from the State Department of Public Health. Each local educational agency shall publish the testing plan on its internet website.
    # Sentences = 1, Passive = 1, Active = 0<STOP>
    #
    # >(A) Been appointed to, elected to, or have been a candidate for office at the local, state, or federal level representing the County of Los Angeles, including as a member of the board.
    # (B) Served as an employee of, or paid consultant for, an elected representative at the local, state, or federal level representing the County of Los Angeles.
    # (C) Served as an employee of, or paid consultant for, a candidate for office at the local, state, or federal level representing the County of Los Angeles.
    # (D) Served as an officer, employee, or paid consultant of a political party or as an appointed member of a political party central committee.
    # (E) Been a registered state or local lobbyist.
    # Sentences = 5, Passive = 1, Active = 4<STOP>
    #
    # >21533. (a) A commission member shall apply this chapter in a manner that is impartial and that reinforces public confidence in the integrity of the redistricting process.
    # (b) The term of office of each member of the commission expires upon the appointment of the first member of the succeeding commission.
    # (c) Nine members of the commission shall constitute a quorum. Nine or more affirmative votes shall be required for any official action.
    # Sentences = 4, Passive = 1, Active = 3<STOP>
    #
    # >(5) “Direct-to-consumer genetic testing company” means an entity that does any of the following:
    # (A) Sells, markets, interprets, or otherwise offers consumer-initiated genetic testing products or services directly to consumers.
    # (B) Analyzes genetic data obtained from a consumer, except to the extent that the analysis is performed by a person licensed in the healing arts for diagnosis or treatment of a medical condition.
    # (C) Collects, uses, maintains, or discloses genetic data collected or derived from a direct-to-consumer genetic testing product or service, or is directly provided by a consumer.
    # Sentences = 4, Passive = 0, Active = 4<STOP>
    #
    # >(C) “Genetic data” does not include data or a biological sample to the extent that data or a biological sample is collected, used, maintained, and disclosed exclusively for scientific research conducted by an investigator with an institution that holds an assurance with the United States Department of Health and Human Services pursuant to Part 46 (commencing with Section 46.101) of Title 45 of the Code of Federal Regulations, in compliance with all applicable federal and state laws and regulations for the protection of human subjects in research, including, but not limited to, the Common Rule pursuant to Part 46 (commencing with Section 46.101) of Title 45 of the Code of Federal Regulations, United States Food and Drug Administration regulations pursuant to Parts 50 and 56 of Title 21 of the Code of Federal Regulations, the federal Family Educational Rights and Privacy Act (20 U.S.C. Sec. 1232g), and the Protection of Human Subjects in Medical Experimentation Act, Chapter 1.3 (commencing with Section 24170) of Division 20 of the Health and Safety Code.
    # Sentences = 1, Passive = 1, Active = 0<STOP>
    #
    # >(8) “Genetic testing” means any laboratory test of a biological sample from a consumer for the purpose of determining information concerning genetic material contained within the biological sample, or any information extrapolated, derived, or inferred therefrom.
    # (9) “Person” means an individual, partnership, corporation, association, business, business trust, or legal representative of an organization.
    # Sentences = 2, Passive = 0, Active = 2<STOP>
    #
    # >(A) A provision prohibiting the service provider from retaining, using, or disclosing the biological sample, extracted genetic material, genetic data, or any information regarding the identity of the consumer, including whether that consumer has solicited or received genetic testing, as applicable, for a commercial purpose other than providing the services specified in the contract with the business.
    # (B) A provision prohibiting the service provider from associating or combining the biological sample, extracted genetic material, genetic data, or any information regarding the identity of the consumer, including whether that consumer has solicited or received genetic testing, as applicable, with information the service provider has received from or on behalf of another person or persons, or has collected from its own interaction with consumers or as required by law.
    # Sentences = 2, Passive = 0, Active = 2
    #
    # >"""
    # bypassing active_passive for now

    # active_passive_prompt = str(active_passive_prompt)
    # active_passive_prompt = active_passive_prompt + text_to_score + sentence_count
    #
    # # now creating a response prompt for GPT-3 using our a/p prompt
    # response = openai.Completion.create(
    # model="text-davinci-003",
    # prompt =  active_passive_prompt,
    # temperature = 0.0,
    # top_p = 1,
    # max_tokens = 20,
    # frequency_penalty = 0,
    # presence_penalty = 0)
    # # Temperature is 0 as we are looking for a deterministic model (as close to no randomness as possible)
    # # Token size stays low, also an issue is if we are converting whole bills...
    #
    # active_passive = (str(response.choices[0].text)) # extracting string response
    # active_passive = active_passive.lstrip(', ',)
    # active_passive = active_passive.replace('<STOP>', '') # removing end token
    # active_passive = active_passive.replace('Passive = ', '')
    # active_passive = active_passive.replace('Active = ', '')
    # active_passive = active_passive.split(', ') # splitting two scores into list
    #
    # for i in range(0,len(active_passive)): # quick loop to append to scoredict
    #     if i == 0:
    #         score_dict['active_sent, higher is better'] = int(active_passive[0])
    #     if i == 1:
    #         score_dict['passive_sent, lower is better'] = int(active_passive[1])

    return score_dict

In [5]:
def simplify_loop(scored_dict, bill_dict, chosen_difficulty, use_summarize, use_bill, debug):
    # we are passing in the scores, the bill dictionary, and the difficulty, along with usage checks
    # Social Media, Film, Academic

    # once again only calculating the summary/simplification for the selected text
    if use_summarize: # if the bill was deemed too long for GPT-3, just use the summary
        simplify_this_text = bill_dict["digest_text"]
        corpus_size = len(word_tokenize(simplify_this_text)) # also grabbing corpus size to reduce size of example corpus
    elif use_bill: # else use the entire bill
        simplify_this_text = bill_dict["main_bill"]
        corpus_size = len(word_tokenize(simplify_this_text))
    else:
        simplify_this_text = bill_dict["digest_text"] + " " + bill_dict['main_bill']
        corpus_size = len(word_tokenize(simplify_this_text))
    if debug: print("text tokenization size success")

    # based on the difficulty chosen, create reference tokens based on the size of the text
    if chosen_difficulty == "Academic":
        # Taken from The New Logics of Homeless Seclusion: Homeless Encampments in America’s West Coast Cities by Chris Herring
        chosen_reference_corpus = """Social scientists have long studied various forms of homeless habitation on the streets (N. Anderson 1923; Duneier 2000; Hopper 2003; Snow and Anderson 1993), in shelters (Cloke et al. 2010; Desjarlais 1997; Dordick 1997; Lyon-Callo 2008; Sutherland and Locke 1936), and squats (Bailey 1973; Katz and Mayer 1985; Pruijt 2003). Yet we know very little about homeless camps (exceptions include Bourgois and Schonberg 2009; Gowan 2010; Wasserman and Clair 2010), and little at all on the recently emerging large-scale formations. The few studies that do exist on large-scale encampments survey a diverse and limited terrain. On the one hand, there are those that detail the development of encampments by homeless people and their allies as forms of protest against housing and homeless policies, such as the tent city protests in Chicago and San Jose (Wright 1997), the radical politics associated with the Tompkins Square encampment (Smith 1996), and the occupations that mobilized groups of homeless people across a number of U.S. cities in the 1980s and 1990s (Cress and Snow 2000; Wagner and Gilman 2012). On the other hand, there are those who have examined the development of large encampments in terms of homeless people making do with the derelict and under-utilized zones of the city left to them. Examples include the homeless shantytown in Tucson, Arizona, at the center of Snow and Mulcahy’s article on the spatial constraints of homeless survival (2001), and the various stories on the “Tunnel People” who inhabited the abandoned Amtrak yards in the bowels of New York City (Toth 1995; Voeten 2010). Describing encampments as politicized sites of protest, on the one hand, and zones of neglected poverty, on the other, the existing studies point to the discontinuity in both the form and functions of these new islands of marginality and the limits of localized case studies. Lacking a broader comparative framework and larger number of cases, these earlier studies are unable to explain the variations in encampments, and why they have re- emerged most intensely at this historical juncture. This study overcomes these limitations through empirical innovation and theoretical extension. First, by examining 12 encamp- ments in eight municipalities on the west coast within a single analytic framework, this study provides the first comparative examination of variegated forms of homeless en- campment in the United States. Second, by deciphering the seclusionary strategies of local state agencies and homeless people in large-scale encampments, the study revises and extends existing theories of urban seclusion, exclusion, and regulation of advanced marginality in the modern metropolis. Homeless camps have long been a part of America’s urban landscape. Their ebb and flow followed the booms and busts of business cycles (Roy 1935) and the seasonal rhythms of farm work (N. Anderson 1923) until the early 1970s. After that, the street homeless and their camps became a permanent fixture in most cities of the United States as the country experienced a period of economic decline, the de-institutionalization of its mental health institutions, and welfare state retrenchment (Jencks 1995). Homeless camps during this period tended to remain smaller and more dispersed than those of the pre-war era, as local law-enforcement agencies would sweep into action when they perceived an area was dominated by the homeless (Snow and Mulcahy 2001). The camps also took the form of short-lived political events in staking “tent-cities” on the steps of city halls, the lawn of the White House, and on contentious parcels of public land to press political demands (Wagner and Gilman 2012). Yet, during the rapid economic expansion of the 1990s and early 2000s, dozens of U.S. cities experienced the rise of durable homeless encampments on a scale unseen since the Great Depression. Several persisted for years, often comprised of 50 or more individuals. In 18 reported cases across the United States, upward of 100 lived in the camps (NCH 2010; NLCHP 2014a). This new trend of homeless encampment, marked by increased size and durability, during a period of economic growth, rather than decline, suggests that a new logic of urban relegation is at work and an alternative sociological explanation. Social scientists have long studied various forms of homeless habitation on the streets (N. Anderson 1923; Duneier 2000; Hopper 2003; Snow and Anderson 1993), in shelters (Cloke et al. 2010; Desjarlais 1997; Dordick 1997; Lyon-Callo 2008; Sutherland and Locke 1936), and squats (Bailey 1973; Katz and Mayer 1985; Pruijt 2003). Yet we know very little about homeless camps (exceptions include Bourgois and Schonberg 2009; Gowan 2010; Wasserman and Clair 2010), and little at all on the recently emerging large-scale formations. The few studies that do exist on large-scale encampments survey a diverse and limited terrain. On the one hand, there are those that detail the development of encampments by homeless people and their allies as forms of protest against housing and homeless policies, such as the tent city protests in Chicago and San Jose (Wright 1997), the radical politics associated with the Tompkins Square encampment (Smith 1996), and the occupations that mobilized groups of homeless people across a number of U.S. cities in the 1980s and 1990s (Cress and Snow 2000; Wagner and Gilman 2012). On the other hand, there are those who have examined the development of large encampments in terms of homeless people making do with the derelict and under-utilized zones of the city left to them. Examples include the homeless shantytown in Tucson, Arizona, at the center of Snow and Mulcahy’s article on the spatial constraints of homeless survival (2001), and the various stories on the “Tunnel People” who inhabited the abandoned Amtrak yards in the bowels of New York City (Toth 1995; Voeten 2010). Describing encampments as politicized sites of protest, on the one hand, and zones of neglected poverty, on the other, the existing studies point to the discontinuity in both the form and functions of these new islands of marginality and the limits of localized case studies. Lacking a broader comparative framework and larger number of cases, these earlier studies are unable to explain the variations in encampments, and why they have re- emerged most intensely at this historical juncture. This study overcomes these limitations through empirical innovation and theoretical extension. First, by examining 12 encamp- ments in eight municipalities on the west coast within a single analytic framework, this study provides the first comparative examination of variegated forms of homeless en- campment in the United States. Second, by deciphering the seclusionary strategies of local state agencies and homeless people in large-scale encampments, the study revises and extends existing theories of urban seclusion, exclusion, and regulation of advanced marginality in the modern metropolis. This article builds on Wacquant’s (2010) conception of social seclusion and Snow and Anderson’s (1993) theory of homeless agency to analyze the various logics of homeless seclusion shaping encampments. Through a dual conception of administrative spatial practices of the local state and adaptive spatial practices of homeless people, I delineate the principles that define four types of homeless seclusion, which encompass, differenti- ate, and explain the various forms of encampment. I conclude by considering the the- oretical implications of these peculiar institutions, which I argue function both as new socio-spatial contraptions of homeless containment for the state as well as preferable safe ground to the dominant institution of homeless seclusion in the United States, namely, the shelter. To understand why and how certain cities come to develop encampments of this scale and to identify what functions they serve, I carried out interviews with city officials, non- profit actors, and residents of the camps between 2009 and 2011 along with observations from repeated site visits. This time-lapse allowed me to trace the ongoing development of homeless containment and adaptation within each of the encampments. As I was inter- ested only in camps that had maintained a degree of permanence and scale, in distinction to the more common smaller and temporary camps, I completed a thorough review of local media reports through the LexisNexis database to identify currently existing camps in the United States comprised of 50 or more campers that had existed for more than a year. After identifying and reviewing 32 cases that fit this criterion as of August 2009, the west coast region was selected because it contained both the highest concentration of encampments and greatest variety of settlement types. The particular encampments within the region were selected to insure that every type of legal status and manage- ment model within the broader census was interrogated in more than a single case. In 2010, my initial empirical findings of the camps were published as a policy report for the National Coalition for the Homeless, which presents the basic attributes of the sample (see Table 1). Of those interviewed, 14 were city officials, 23 were affiliated with nonprofit service providers or advocates connected to the encampments, and 32 were camp residents. The study also draws on 3 months of embedded ethnography in which I lived in the archipelago of homeless encampments in Fresno, California. Although I only touch on the ethnographic data within this broadly comparative article, living in the nonprofit sponsored Village of Hope, surrounding illegal encampments, and local shelter offered an important perspective for understanding the key differences of homeless seclusion. The experience of living in the encampments under similar material conditions as those of the homeless—in a tent or hut, eating donated food, showering at the service center, and spending only money earned from recycling—gave me a proximate and visceral un- derstanding of the encampments and their moral life-worlds that remained invisible in the interviews. The exclusionary spatial policies and practices of local governments, which undergird the formation of large-scale homeless encampments, have been thoroughly studied by soci- ologists and geographers of the city, who have examined the “hardening of public space” (Dear 2001; Davis 1990; Soja 2000), new modes of surveillance (Coleman 2004; Flusty 2001), “antisocial behavior laws” (Duneier 2000; Johnsen and Fitzpatrick 2010; Mitchell 1997; Vitale 2008), and novel techniques of banishment (Beckett and Herbert 2011). Al- though the recent intensification of criminalizing homelessness is widespread (NLCHP 2014b), a growing number of commentators argue that the prevailing framework risks obscuring the increasingly varied and complex geographies of urban poverty and its cor- responding social control in ignoring the regulation of the homeless beyond the bound- aries of redeveloping downtowns (see DeVerteuil et al. 2009; Walby and Lippert 2012; Yarwood 2007). As Stuart (2013) notes in his recent article on policing Los Angeles’ skid row, recent studies tend to focus on the process by which the homeless are excluded from prime spaces (Snow and Anderson 1993)—spaces that are primarily used and valued by mainstream society—and fail to account for the related seclusionary policies and prac- tices, which sustain, sanction, and control the daily lives of individuals within the marginal spaces into which homeless are being expelled. Rather than document more examples of the same, this paper examines the practices and outcomes of homeless seclusion in the marginal spaces of encampments and attempts to explain their variegated and contra- dictory functions for the local state and those experiencing homelessness in the U.S. metropolis. To do this, I draw on Wacquant’s conception of social seclusion, which he defines as the process through which “particular social categories and activities are corralled, hemmed in, and isolated in a reserved and restricted quadrant of physical and social space” (2010: 166). In making the argument against scholars who confusingly collapse the conceptions of the “ghetto” and “ethnic cluster” into a single category of social space, Wacquant draws out a two-dimensional analytic grid depicting degrees of high and low social hierarchy and selective and forced isolation, to distinguish numerous modal- ities of seclusion. I follow a similar method of analysis built on the premises of Wac- quant’s framework to disentangle differences within the one-dimensional conception of the “homeless camp.” First, Wacquant focuses on the ways populations, institutions, and activities are secluded, isolated, or confined, that complements the more prevalent stud- ies, which examine the pervasive tactics of exclusion (Beckett and Herbert 2011; Merry 2001; O’Malley 1992). Second, Wacquant’s dual conception of seclusion as both a prod- uct of imposed constraints and elective choice, eschews the all too frequent trend in the literature of recognizing only the repressive components of confinement, while ig- noring its productive aspects (Wacquant 2008, 2011) critical to understanding the co- constitutive roles of homeless people’s preference to camp amidst varied administrative constraints. Figure 1 presents an analysis of divergent forms of homeless seclusion, which serves as the guiding map of the paper. There are two settings, legal and illegal, and within each, forms of seclusion are distributed along two basic dimensions. The vertical axis of institutionalization and informality gauges the degree to which camps are managed and supported by institutions of the state and/or nonprofit service agencies. Encampments that are formally recognized through zoning ordinances and serviced by contracted non- profits would be located near the top of the axis, whereas those under threat of eviction and without basic services such as water and sanitation would be at the bottom. The hor- izontal axis describes the extent to which campers are able to independently exercise power over their encampment outside of state impositions of direct management or re- pression. These conceptual axes in turn form four quadrants, each of which depicts what I will go on to elaborate as distinct forms of homeless seclusion: contestation, toleration, accommodation, and co-optation. Although these forms of homeless seclusion can be minimally parsed out along these two dimensions, the purpose of this typology is not simply descriptive, but also analytic. It offers a lens through which one can explain the distinctive logics and practices of each type. To do this, I follow Snow and Anderson (1993), who examine the survival strategies of homeless people within four distinctive though overlapping and interact- ing constraints: organizational, political, moral, and spatial constraints. This article con- siders the adaptive strategies of homeless people and their allies within each of these constraints. Snow and Anderson’s concept of adaptive strategies adds a critical com- ponent of agency or resistance in distinguishing encampments absent in Wacquant’s heavy focus on the administrative strategies of the state. Thus, this analysis combines the local state’s administrative strategies that constrain the adaptive strategies of homeless people. In the summer of 2008, Seattle’s Mayor Greg Nickels issued police orders to crack down on rough sleepers. Targeting primarily camping groups, police moved with little warning, often confiscating and destroying residents’ belongings. With inadequate shelters and two legal tent cities already filled to capacity, homeless people joined together and formed a protest camp in South Seattle named Nickelsville. The encamp- ment formed after a month of planning, weekly organizing meetings, two rallies, a die- in, and a car wash with a local homeless advocacy group. Like Nickelsville, all of the camps in the Northwest first organized through activist repertoires to protect against dis- placement and dispersion by local law enforcement. After forming an initial encamp- ment, the authorities evicted the campers en masse, but rather than dispersing, they relocated collectively on new territory. It is this resilience against attempts of disper- sal, the explicit political program of the camps, and their emergence through militant struggle with city authorities that distinguishes the process of contestation to other forms of seclusion. Unlike the other three forms of homeless seclusion, wherein local governments toler- ate and often actively support secluded zones for homeless people, seclusion through contestation is a reaction to an administrative strategy of dispersion. In these cases, lo- cal governments utilize police “sweeps” to deconcentrate and make invisible homeless populations, through a number of city ordinances against street drinking, panhandling, camping, rough sleeping, park use, and broad antisocial behavior (Beckett and Her- bert 2011; NLCHP 2014b). Yet the encampments re-emerge. They are merely geographi- cally and/or temporally displaced, reconsolidating to defend against future attacks. How- ever, it would be wrong to interpret the police sweeps as simply the neutral enforce- ment of legislation. Interviews showed instead that the reasons for dispersing camps were foremost political, depending on material and symbolic rationales given varying urban conditions. The most prevalent reasons for clearing camps that city officials gave were proxi- mate material concerns: the fears of heightened crime in the area of the camps, re- ductions of adjacent property values, retailers’ anxieties that homelessness was driving customers away, and resident complaints of scavengers sorting through trash. These same arguments were also the prime cause of concern expressed in the city-council hearings on anti-homeless ordinances and legalization of encampments. However, in Fresno, Seattle, and Sacramento, the camps were so thoroughly marginalized on fal- low and abandoned land that evidence of proximate effects was difficult to pinpoint, despite the official claims. For instance, Nickelsville’s most frequent encampment site, located on the ironically named street Marginal Way, was hidden from sight by a forested border off of an industrial service road. Sacramento’s Safe Ground encampment was tucked deep in the woods along the American River, invisible even from the traveled trails. In Fresno, a buffer of rail yards and abandoned warehouses guarded its tent city district, and Portland and Ontario’s camps were both situated between airports and landfills. In short, the availability of space to occupy with ample invisibility is a necessary, though not sufficient, condition for durable encampments. When I pressed city offi- cials on the evictions from the sites in Fresno, Sacramento, and Seattle, where material threats to property values and profitability were not apparent, they then justified the dismantling of camps on symbolic grounds, citing public perceptions of insecurity and preservation of their city’s or administration’s reputation. Even though most residents had never set eyes on these areas firsthand, the visual spectacle captured through me- dia had the effect of mobilizing city administrators to fight perceptions of a crisis of homelessness. The homeless policy manager of Fresno concisely explains this politics of visibility: You have to understand Fresno’s homeless problem is much bigger than the camps South of Ventura, but when people see these large shantytowns growing on TV, even if our numbers (of homeless) are declining, they assume the city is tolerating illegalities and we get pressure to clean up, even though that area is completely abandoned. The media’s gaze simultaneously stokes the insecurity of local residents and re- veals the social problems unaddressed by city administration, leading officials to take action. However, all they do is disperse the campers to less visible circumstances. De- spite most journalists’ intentions of ameliorating the plight of campers by raising aware- ness of their plight, officials in both Sacramento and Reno similarly cited the media uproar that drew international attention to their cities as the triggering factor to evict the camps. The use of the term “illegalities,” as opposed to poverty, is also telling. It casts criminality rather than economic circumstances as the primary social problem of homelessness. These instances suggest that from the view of urban managers, it is not the mere exis- tence of homelessness, but rather its public visibility, which turns the unhoused into sym- bols of incivility and objects of policy action. This supports Snow and Mulcahy’s (2001) finding that the dichotomous conception of space as maintaining both a “use” and “ex- change” value (Logan and Molotch 1987) neglects the symbolic dimension, which at- tributes a political value. However, the cases of Sacramento and Seattle demonstrate that even marginal spaces have political value, something Snow and Mulcahy relate only to prime and transitional spaces. Therefore, the dismantling of camps is not merely aimed at protecting proximate property values and local business, as highlighted by scholars study- ing the regulation of homelessness in prime spaces (Beckett and Herbert 2011; Duneier 2000; Mitchell 1997; Vitale 2008). They are also part-and-parcel of a broader penal- welfare strategy designed to project governmental competency in poverty management by reinforcing an image of law and order while concealing the failures of the welfare state (Wacquant 2009). Unlike the recent experiments in legalized encampments, the tactic of setting up tent cities as protest and civil disobedience by homeless people and their allies in the United States has existed for decades. The erection of tent cities to protest homelessness first spread across the United States in the 1980s (Wagner and Gilman 2012: 56). The community group ACORN staged tent cities in 15 cities, “Reaganvilles” were set up out- side of Boston’s City Hall and the White House, and protest camps persisted into the 1990s and 2000s as political spectacles in symbolic prime spaces to draw attention to homelessness (Snow and Mulcahy 2001; Wagner and Cohen 1991; Wright 1997). Al- though there were some camps that had been tolerated and became politicized only when threatened with eviction, as was the case in the radicalization of Tompkins Square Park (Smith 1996), most protest encampments were political events by design demand- ing affordable housing, the decriminalization of homelessness, and humane shelters. With the exception of “Justiceville” in Los Angeles, which lasted from 1985 to 1993 before it was transformed into transitional housing, the vast majority of these earlier cases lasted only a matter of days and weeks, and only in a small a handful of cases, months. The contested camps in this study follow in this tradition of political protest, but have persisted far longer, and all began from the start with the goal of permanently safeguarding a space for their existence. Both Seattle’s Nickelsville and Sacramento’s Safe Ground continue to politicize their encampment in the face of inadequate shel- ters and housing, whereas Camp Quixote, Dignity Village, and Tent Cities 2 and 3 all initially formed through protests before settling into relatively de-politicized forms of seclusion."""
        chosen_difficulty = "Academic, write in the writing level of an Academic article" # refining into prompt
    elif chosen_difficulty == "Social Media":
        # Social media taken from Rick Astley Riding a Bicycle Reddit Post https://www.reddit.com/r/pics/comments/haucpf/ive_found_a_few_funny_memories_during_lockdown/
        chosen_reference_corpus = """Yes, folks, this is really Rick Astley. No, he will not be giving you up, letting you down, or... look, you get it. If you ask Rick Astley to give you a copy of the movie “UP” you create a paradox where he either has to give you up or let you down Edit: I think we need u/ReallyRickAstley to weigh in on this. What should we name this paradox Edit: A Rickle in Time. I would call it a rickle in time. Rickle Pick. Funniest seen i ever shit. The Roll Up Rick, perhaps?. The Give in or Give up paradox. it’s not even 4:20 yet. Bro it’s 4:28. You’re not making sense. Time Zones.Double cheeked up, on a Thursday afternoon. Come on, we need you to weigh in u/ReallyRickAstley. Yeah, but let's be real, we all know he's never gonna give you "UP". This timeline is already so screwed up; what could go wrong? Bruh ... bravo take my poor gold award Wasn’t this a popular post on r/showerthoughts? Link? This is some next level Rick-Rolling. I mean, he's on a bike. If he’s riding away, he better not be deserting you. It better be Rick Astley, I've been Rick Rolled a few too many times and this really feels like it. They see me rolling, they hating. That’s awesome. Last time they played a prank so I want proof! I’m not falling for any bait this time. I am skeptical given this mod's shady past ties to OP. Everyone recognizes that URL, friend. You aren't wrong, though: I do have ties to the original poster. Oops, HERE'S the video I was thinking of. I can't believe i fell for this, been a long time since I have been rolled this badly. I should have seen that coming. So did I! Totally. No way I’m clicking any links on this post. Click Mine. I like to rickroll people with rick astleys other songs, especially his recent ones from the last couple of years. Kinda wish that would take off. "Together Forever" is legit. This is a picture of Rick Rollin. They got all of us. YOU KNOW THE RULES and SO. DO. I. Ask him for a copy of “up”. I'm sure he'd be happy to give you Up! Didn't someone who started that with him get told to STFU? I'm not sure if I want to tempt that fate. I believe that guy was instructed to ‘go fuck himself’. Hes literally Rick rolling on that bicycle. Holy shit, it's the real u/RamsesThePigeon! Yes, folks, this is ReallyRickAstley. I have to look at his profile to verify lol. But is he going to run around and hurt me? Was it just to kill time or was that your mode of transport to somewhere?! Saw you in Santa Clara California USA at Great America Amusement Park as a teenager(m51). Amazing show you’re definitely a true Showmen!! Hope to see again one day! Still listen to your music daily. DR. Goddamn. Never thought I'd live to see the day. Love from Pakistan, my dude. You don't know how many people I've ricked and rolled without them even knowing what a rickroll is. Thanks to you. Keep smiling! I came to say this. deserting you? You could have just said “You know the rules and so do I”. Wow. I live in Las Vegas. Where was this taken? Thank you for posting this and for everyone else making it the top post, cause I had NO idea who this was. Then again, I wasn't a very big Astley fan. Not that there's anything wrong with it :) Edit: oops, not top post, pinned by OP Edit: double oops, not pinned by OP, but it is there, and it is at the top. I got that much right. Just a suggestion, pin this post at the top of this subreddit. For history has been made. Why Do I Always Miss The Coolest Stuff?? Aw Maaan! Thanks for stopping by! You really never give up on us, huh? I love you, you handsome denim angel! Also your gig with the Foo Fighters was legendary. Thank you! Thanks all for the love, comments, DMs etc! And finally, u/theMalleableDuck I salute you! Rick x. Missed by 11 minutes! Damn. Well, I don't need a reply. Just wanna say, you inadvertently helped me with spelling, as stupid as that sounds. I had to learn that damned URL because of how often I fell for it in the early days . If you can learn to spell dQw4w9WgXcQ, you can learn to spell anything. Here's a fun fact: A few years ago when rickrolling was at its peak, many people were remembering the url. It was so popular that in a spelling contest, the contestants were actually asked to spell the rickroll link. It wasn't counted as a point but it still it was pretty funny. thanks for not giving it away with your reply. 250k! Wtf Edit: 300k!! Hey Rick! My dad used to work with your brother in law at an aerospace place in Wigan. He's only ever heard good things about you, which means I've only ever heard good things about you. I just wanted to say hello. So, hello! Yay! Wait until I tell my dad about this. He may have worked with Rick Astley's brother in law, but Rick Astley replied to my comment on reddit. I found the most wholesome comment on the thread. It's my dad's claim to fame. If I had a quid for every time I've heard 'I work/ed with Rick Astley's brother in law' over the years, I'd be as wealthy as Rick Astley. For a second I thought this was r/OldSchoolCool and was thinking ‘Wow, he looks just like Rick Astley’. In fairness, a lot of people think that Rick Astley looks a bit like Rick Astley. But before Rick Astley, a lot of people didn’t think that Rick Astley could sound like Rick Astley. I was initially like, what kind of idiot do you think I am that I won't notice it's rick astley! What kind of loser would post this... then I saw the user. Wait, you're actually Rick Astley tho...? Yes! I think I might cry!!! It’s actually you. I met you at a backstage event when I was 12. Seriously a big fan. I’ve seen you in concert five times. Whoa whoa, hold up I just have to confirm that I just saw history in the making. Did you just get Rick Rolled Mr. Astley? All evidence points to yes, you have indeed witnessed Rick Astley get successfully Rickrolled by a random redditor. It is a truly momentous moment! Edit after 24 hours. Who are you calling a random Redditor? That's u/theMalleableDuck the legendary Redditor that successfully Rick-rolled Rick-Astley! I mean, do you even Reddit bro? smh. Hallowed be his name. Out of curiosity, I went ahead and calculated how much money was spent in Reddit coins for the comment by hand. By June 18th: The 1,707 gifts spent on the comment cost 634,830 coins. That's a maximum of $2,526.62 spent (not including sales tax). This gives the recipient 641 weeks or 12.33 years of Reddit Premium. By June 20th: The 2,478 gifts spent on the comment cost 875,065 coins. That's a maximum of $3,482.76 spent (not including sales tax). This also gives the recipient 142,750 coins to spend and 849 weeks or 16.32 years of Reddit Premium. That amount of gifts bought pays for about 16.01 weeks of Reddit server uptime. Once his Reddit Premium ends, he can spend the coins to get more Reddit Premium: With the coins, he could buy 79 Platinum gifts for 142,200 coins, which gives him 79 more months of Reddit Premium and 700 coins each. He'll end up with a total of 55,850 coins. He can then use that amount to buy 31 more months of Reddit Premium and have 21,750 coins left over. He can then use that amount to buy 12 more months of Reddit Premium and have 8,550 coins left over. He can then use that amount to buy 4 more months of Reddit Premium and have 4,150 coins left over. He can then use that amount to buy 2 more months of Reddit Premium and have 1,950 coins left over. He can then use that amount to buy 1 more month of Reddit Premium and have 850 coins left over. He can then use that amount to buy 1 more week of Reddit Premium and have 450 coins left over. This gives him 10 years 9 months and 1 week more of Reddit Premium. Combined with the 16.32 years from before, that leaves him with a minimum of 27 years of Reddit Premium. The math (name of gift: cost of gift * amount bought). A coin is worth: $0.00398 each if we assume that everyone paid $1.99 for 500 coins. There's discounts (up to 59%) for buying more, but let's not complicate things. A month of gold back in the day was 231.26 minutes of reddit server uptime. Gold was $4.99 back then. Since $3,482.76 was spent for all the awards, that equals to: 161407.430381 minutes of server time or 2690.12383968 hours of server time or 112.08849332 days of server time or about 16.01 weeks of server time. Does anyone know what's the world record? That's a lot for just one comment. So say we all. Viciously deleting all porn posts and likes...... I was here. Hello future buzzfeed readers. Implying BuzzFeed exists in the future. Commenting to show my kids in 20 years that I was a part of history. Commenting so I can show this guy's kids in ten years. This will long be spoken of in Reddit lore. Can’t believe this just happened. Rick Astley got rock rolled! Thanks for being a good sport man. I guess that works too. This was a risky link to click too. relevant xkcd. I'm so glad Rick Astley is such a great sport about Rickrolling. It has to be the longest lived, most beloved meme of all time. Does that even make it a meme anymore? Anyway, I think it helps that it's a legitimately great song. I listen to it all the way through relatively often when I get Rickrolled. Very few of us will have the immortality that Rick Astley has due to the Rickroll phenomenon. Like 2,000 years from now, end of civilization comes, aliens invade the planet and the first thing they're going to do is fucking Rickroll us. Or we'll Rickroll them in our glorious response to their invasion ultimatum. Oh my god. Thank you, this made my day Edit: I am concerned that it is all downhill from here. This is definitely the highlight of my life. Edit 2: putting this here cause I don’t wanna ruin the comment, Glad I was able to make some people laugh. We could all use a laugh in 2020! Edit 3: just because I’m getting hundreds of DM‘s asking. As of now, I have premium until 2037 and just under 85k coins. You crazy son of a bitch. You did it! Rick rolled Rick. Now I can die. Rick rolled the Rick roller. Oh, how the Ricks have rolled. Oh how the roles have ricked. The Circle of Memes is complete. r/madlads. So this is how reddit dies? With thunderous awards. This is a momentous day. I've witnessed the ultimate meme history. My internet life is now complete. This is like the time that Obama put out the “Thanks, Obama” video. Wait, that's illegal. Rick-Rolled Rick; this is like a once in a lifetime opportunity and we witnessed it in real time. We are the lucky ones, seeing history unfold before our very eyes . . . . I cannot believe this actually just happened. Legitimately at a loss for words. And you pulled it off flawlessly with the alternate URL and everything. I check these things, especially in a thread with a lot of rickley things going on and you got me too. Well played indeed! (you bastard). This year alone we've seen a massive pandemic, protests, and now Rick getting rick-rolled. 2020 is cray, yo. Is 2020 about to get better finally?! After seeing this, we can finally turn off the internet, there.is.nothing.else.out.there.Do you think he actually left the video running after he realized what it was, lightly bopping his head like, "damn this still just fucking slaps". I know I would. LOfuckingL..this is one of the funniest things I've seen around here in quite a while Bravo!!! I will tell my grandchildren and their friends of the day I witnessed Rick Astley getting Rickrolled. We will sit in front of the fire, reminiscing on the old days of the internet, and they’ll always ask, “Old Dydarian, tell us again about kind Sir Astley getting Rickrolled! We so love that story!” I will sit high up and my chair, puff out my chest, and begin to tell the story. “Well kids, during the pandemic of ‘20, I had to work from home. Which meant that I could masturbate to internet porn right up until the minute I had to work. But this one particular day, I decided to not jerk off, as I had my fill the night before. Instead, I went on Reddit and was looking at a picture of kind Sir Astley, when the immortal /u/theMalleableDuck came and did the unthinkable.” My story will be the most sought out in our camp. With people coming from miles around to hear about dear, old, kind Sir Astley, being Rickrolled for the first time. i’m just happy to be a part of it. Legend. I'm so proud of this moment, here on the internet. Brings a tear to my eye. You are so kind and likable, thank you for that. I’m posting this from the emergency room and suffice it to say this is not my best day. I really appreciate the smile you gave me. Edit - update as promised further down in the replies to me on here - I got discharged from the emergency room and am home for the night. They did a CT scan, EKG, ultrasound, bloodwork, and urinalysis all within a few hours. I was given strict orders to stay home from work tomorrow, call my doctor, and have them find me the general surgeon who can get me in the fastest to get it biopsied, whatever state or area that may be in (given the covid situation, this could be difficult). So. Yeah. I have a thread on r/AskDocs/ with the technical stuff in it, I'm not gonna do that here, but I know some people just need that sense of closure, so. It's gonna be over there shortly. Thank you all for the well wishes. Please keep them coming in whatever way works for you (thoughts, prayers, hopes, dreams, positive vibes - whatever) that I can get this sorted quickly and easily and with a good outcome. Crazy. Didnt' see this coming when I woke up this morning, but by the end if it all, even Rick Astley himself had gilded me and then DM'd me to wish me well. Nicest guy in show business, humans and gentlepeople. TIL. Update- surgeon saw me this morning at 830am. He's doing the surgery Monday. Thank you all for the well wishes! Everything okay? I hope you Feel better. Oh wow thank you so much for asking. I was in getting a cat scan... I found a lump on my abdomen earlier this week and it’s been growing and caused bruising.. my doctor sent me straight to the ER when I showed her. :/ I don’t know what’s wrong yet but the ER admitted me and gave me a cat scan and took blood and urine samples and said I also need an ultrasound and I don’t know what else. They’re not sure what it is either. I’m alone here and just hoping I can go home today. I really appreciate you asking!amd can you believe Rick Astley gave me platinum?? THANK YOU /u/reallyrickastley/ ! I have a 45 record single of Never Gonna Give You Up at home— never in a million years did preteen me think we would “meet”! Honestly I’m so amazed, this really does make me feel a bit better. Thank you both again. Edit- oh someone gave me gold too? You’re all so kind. Thank you so much. That is so cool about Rick astley giving a freakin platinum!! Also I imagine trying to explain it to IRL people is not all that simple. Oh my god so sorry about the lump. That sounds scary a hell. Fingers crossed its something super benign. Funny story about the time my doctor sent me straight over to the ER. It happened in 2009. I went in for a med check up and becuase my stomach had been bothering me. Dr sent me to the ER because she thought I might have appendicitis. Super urgent and all that. Scared and planning on surgery. A cat scan and lab work later. It turned out I needed to poop. The dr said something like I had an incredible amount of stool backed up. Of course my mom then told everyone she could that I was full of shit lol. Classic mom response. I really hope it won't be anything serious. Please get better. Get well soon! Pulling for you stranger. Wow yep that’s me! Thank you so much! Good luck!!! Thanks! Had a cat scan, ekg, ultrasound, bloodwork, and pee test so far. They haven’t let me eat though. Maybe soon. Haven’t had anything since 9am and it’s 530pm here now... ugh i hope you get a bite soon! Hope you are doing better and get to eat soon! Thoughts, wishes that you feel better soon, and know you're not solo, but very much thought of. Thank you! Hopefully you're alright and whatever it is is something easily taken care of. Thank you, I hope so too. Dr said I need a biopsy ASAP at any dr in any state, whoever can do it first. Hope everything works out for you. Thanks for the well wishes, they mean a lot! I was honestly expecting a story of you getting beat by your dad with jumper cables. I’m sorry for your situation though and hope it’s nothing serious. Best of luck friend. Thanks so much. I missed the jumper cable thread that you’re referring to but I’m guessing it was a story someone told here on Reddit. Wait are you me? I'm having the exact issues except no lump so far. What's happening to you sounds eerily similar to what happened to me a month ago, just sped up. I hope your results are different than what mine were. Thank you for the well wishes. I'm sorry that you had a bad outcome in your situation, or at least a less than ideal one. I hope that things turn around quickly and that ultimately you're completely fine. Lymphoma? That's a concern of mine, at least. Neuroendocrine cancer for me. I went to the ER with gut pain that turned out to be kidney stones and they found the mass incidentally during a CT scan as well. No external growth or bruising though. Ended up getting surgery on it and awaiting the results of a PET scan to make sure they got it all, so fingers crossed. Hopefully yours turns out to be far less troublesome. Good luck! Have you ever seen the movie Alien? I laughed out loud at this in the ER. Thanks for taking the chance at getting a good reaction from me! I know some people wouldn’t take the joke well but I found it hilarious :D. """
        chosen_difficulty = "Social Media, write using hashtags and more in the style of a reddit.com post" # refining into prompt
    elif chosen_difficulty == "Film":
        # Goodfellas script taken from http://www.script-o-rama.com/movie_scripts/g/goodfellas-script-transcript.html
        chosen_reference_corpus = """What the fuck is that? Jimmy? - What's up? - Did I hit something? What the fuck is that? Maybe you got a flat. What the fuck? Pull over. He's still alive. You piece of shit! Die, you motherfucker! Look at me! As far back as I can remember, I always wanted to be a gangster. To me... ...being a gangster was better than being president of the United States. Even before I went to the cabstand for an after-school job... ...I knew I wanted to be a part of them. It was there that I knew I belonged. To me, it meant being somebody... ...in a neighborhood full of nobodies. They weren't like anybody else. They did whatever they wanted. They parked in front of hydrants and never got a ticket. When they played cards all night... ...nobody ever called the cops. Tony Stacks. How are you? Tuddy Cicero. Could this be the Canarsie kid? Tuddy. Tuddy ran the cabstand and the Bella Vista Pizzeria... ...and other places for his brother Paul, who was the boss of the neighborhood. Paulie might have moved slow... ...but it was only because Paulie didn't have to move for anybody. It's your fault. Yeah, it's your fault. At first my parents loved that I found a job across the street from the house. My father, who was Irish, was sent to work at the age of   . He liked that I got myself a job. He always used to say that American kids were spoiled lazy. Henry! Watch how you cross! Bring back milk! My mother was happy after she found out the Ciceros... ...came from the same part of Sicily as she did. To my mother... ...that was the answer to her prayers. I was the luckiest kid in the world. I could go anywhere, do anything. I knew everybody, and everybody knew me. Wiseguys would pull up and Tuddy would let me park their Cadillacs. Here I am, this little kid, I can't even see over the steering wheel... ...and I'm parking Cadillacs. But, it wasn't too long... ...before my parents changed their minds about my job at the cabstand. For them, it was supposed to be a part-time job. But for me... ...it was definitely full time. That's all I wanted to do. People like my father could never understand, but I was part of something. I belonged. I was treated like a grown-up. Tell him . Every day I was learning to score. A dollar here, a dollar there. I was living in a fantasy. Have a good day at school? My father was always pissed off. Pissed that he made such lousy money, that my brother... ...was in a wheelchair. He was pissed that seven of us lived in such a tiny house. Tell me about this. A letter from school. It says you haven't been there in months. Months! You're a bum! Want to grow up to be a bum?! After a while, he was mostly pissed because I hung around the cabstand. He knew what went on there. Every once in a while I'd have to take a beating. But by then, I didn't care. The way I saw it... ...everybody takes a beating sometime. I can't make any more deliveries. You'll fuck everything up. My dad says he'll kill me. Look. Come with me. Is that him there? How about him? - That's him. - Get him. Excuse me. - Scumbag. - Come here, you piece of shit. Know this kid? Know where he lives? You deliver mail to his house? From now on, any letter from school to his house comes directly here. Understand? Another letter from school goes to that kid's house... ...in the oven you'll go, head first. That was it. No more letters from truant officers. No letters from school. In fact, no more letters from anybody. After a few weeks, my mother went to the post office to complain. How could I go to school after that... ...and pledge allegiance and sit through good government bullshit? Paulie hated phones. He wouldn't have one in his house. Mickey called. Call him back. He got all his calls second hand. Then you'd have to call the people back. Got a nickel? Get him on the phone. There were guys, that's all they did all day, was take care of Paulie's calls. For a guy who moved all day long... ...Paulie didn't talk to   people. With union problems... ...or a beef in the numbers... ...only the top guys spoke with Paulie about the problem. Everything was one-on-one. Paulie hated conferences. He didn't want anyone hearing what he said... ...or anyone listening to what he was being told. Hundreds of guys depended on him, and he got a piece of everything they made. It was a tribute, like in the old country, except they were doing it in America. All they got from Paulie was protection from the guys trying to rip them off. That's what it's all about. That's what the FBI could never understand. What Paulie and the organization does... ...is protect people who can't go to the cops. That's it. They're like the police department for wiseguys. People looked at me differently, and they knew I was with somebody. I didn't have to wait in line at the bakery on Sunday morning for fresh bread. The owner knew who I was with and no matter how many people were waiting... ...I was taken care of first. Our neighbors stopped parking in our driveway, even though we had no car. At   ... ...I was making more money than most of the grown-ups in the neighborhood. I had it all. One day... One day, some neighborhood kids carried my mother's groceries all the way home. Know why? It was out of respect. What do you think? Aren't my shoes great? You look like a gangster. They shot me. Help! Henry, shut the door. That was the first time I had ever seen anyone shot. Can't have that in here. Jesus Christ! I can't have that in this joint. I remember feeling bad about the guy, but also... ...feeling maybe Tuddy was right. I knew Paulie didn't want anyone dying in the building. You're a real jerk. You wasted   fucking aprons on this guy. What's wrong with you? I got to toughen this kid up. It was a glorious time. And wiseguys were all over the place. It was before Apalachin and before Crazy Joe... ...decided to take on a boss and start a war. It was when I met the world. And it was when I first met Jimmy Conway. He couldn't have been more than or at the time, but he was already a legend. He'd just walk in and everybody who worked the room went wild. He'd give a doorman $ for opening the door. He'd give hundreds to the dealers and guys who ran the games. The bartender got $ for keeping ice cubes cold. The Irishman is here to take you Guineas' money. - Want a drink? - Give me a   and  . Meet the kid Henry. Keep them coming. Jimmy was one of the most feared guys in the city. He was first locked up at and doing hits for mob bosses at   . Hits never bothered Jimmy. It was business. But what Jimmy really loved to do, what he really loved to do was steal. He actually enjoyed it. Jimmy was the kind of guy who rooted for the bad guys in movies. Give me your wallet. You might know who we are, but we know who you are. He was one of the biggest highjackers... ...of booze, cigarettes, razor blades, shrimp and lobsters. Shrimp and lobsters were best. They went fast. Almost all of them were gimmies. They just gave it up, no problem. They called him Jimmy the Gent. Help the lady. Drivers loved him. They used to tip him off... ...about the really good loads, of course, everybody got a piece. Thanks, I'll be back for the rest later. Henry, come here. Meet Tommy. Youse gonna be working together, okay? Good. Jimmy, you get anything good? And when the cops assigned a whole army to stop Jimmy, what did he do? He made them partners. I'd complain, but who'd listen? - What do you need? - Two Luckys. What are you doing? - It's all right. - Who says? Your mother? How many you need? - Where'd you get the cigarettes? - Get him out of here. - It's okay. - It's not okay! - You don't understand. - You don't understand. Store's closed. - Henry got pinched. - Where? By the factory. Henry Hill. The People of the State of New York vs. Henry Hill. Docket #   . Yes, sir. That's me. Just stand there. Now stay there. Proceed. Congratulations. A graduation present. - Why? I got pinched. - Everyone does. You did it right. - You told them nothing. - I thought you'd be mad. I'm not mad, I'm proud of you. You took your first pinch like a man, and learned the two greatest things in life. Look at me. Never rat on your friends... ...and always keep your mouth shut. Here he is! You broke your cherry! Congratulations! By the time I grew up, there was billion a year in cargo... ...moving through Idlewild Airport, and we tried to steal every bit of it. See, we grew up near the airport. It belonged to Paulie. We had friends and relatives who worked all over it. They would tip us off about what was coming in and moving out. If any truckers or airlines gave us trouble... ...Paulie's union people scared them with a strike. It was beautiful. It was a bigger moneymaker than numbers and Jimmy was in charge. Whenever we needed money, we'd rob the airport. To us, it was better than Citibank. You got a phone?! Come on! Two niggers just stole my truck. Can you fucking believe that shit?! There was Jimmy and Tommy... ...and me. And there was Anthony Stabile. Frankie Carbone. And then there was Mo Black's brother, Fat Andy. And his guys, Frankie the Wop... ...and Freddy No Nose. And then there was Pete the Killer, who was Sally Balls' brother. Then you had Nickey Eyes... ...and Mikey Franzese. Jimmy Two-Times, nicknamed because he said everything twice. I'll get the papers, get the papers. What is this, coats? I need suits, Henry, not coats. Thursday. This is the summer. What'll I do with fur coats? So I'll take them away. No, I want them. We'll hang them in the freezer with the meat. For us to live any other way was nuts. To us those goody-good people who worked shitty jobs for bum paychecks... ...and took the subway to work and worried about bills, were dead. They were suckers. They had no balls. If we wanted something, we just took it. If anyone complained twice, they got hit so bad they never complained again. It was all just routine. You didn't even think about it. Frankie, what the fuck does ... ...have to do with ?  ain't even close to . What's that got to do with anything? Piece of cake. Don't worry about the alarms. I just got to get a key. - No problems? - I'll take care of it. - Tell him what you were telling me. - Too good to be true. Big score coming from Air France. Bags of money coming on. Americans change their money over into French money, send it back here. - Calm down. - It's beautiful. It's totally, totally untraceable. Our only problem is getting a key, but I got a plan. - Me and Frenchy and this citizen. - Yeah, he's a piece of work. If I'm right, there could be half a mil coming in, all cash. The best time is probably over a weekend. So maybe Saturday. There's a Jewish holiday Monday. They won't find out until Tuesday. Beautiful. What about the security? Security? You're looking at it. It's a joke. I'm the midnight to eight man. Just come in like you're picking up lost baggage. - It's beautiful. - There won't be a problem. - Good. - We're on. What's really funny was that fucking bank job in Secaucus. I'm in the weeds lying down. He said, "What are you doing?" I said, "Resting. Here?" "In the weeds? I'm resting!" They pull me in, start asking questions. You know, this and that. "What are you going to tell us?" I said, "My usual. Nothing." "Why tell you?" The fuck. He says, "No, you'll tell me something today." I said, "Okay, go fuck your mother." You saw the paper. My head was out like this. I'm coming around and who do I see in front of me? This prick again. He says, "What do you want to tell me now?" I said, "What are you doing here? I said to go fuck your mother." I thought he'd shit. The fuckers. I wish I was big just once. Funny. You're really funny. What do you mean? It's funny, you know. It's a good story. You're a funny guy. You mean the way I talk? What? It's just, you know. You're just funny. You know, the way you tell the story. Funny how? What's funny about it? You got it all wrong. He's a big boy. He knows what he said. Funny, how? Just, you know. You're funny. Let me understand this. Maybe I'm a little fucked up. But I'm funny how? Funny like a clown? I amuse you? I'm here to fucking amuse you? What do you mean, funny? How am I funny? You know, how you tell a story. I don't know. You said it. You said I'm funny. How am I funny? What the fuck is so funny about me? Tell me what's funny. Get the fuck out of here, Tommy. Motherfucker! I almost had him. You stuttering prick, you. Frankie, was he shaking? I wonder about you sometimes, Henry. You may fold under questioning. What's with you? I thought I was getting pinched already. He's on my neck like a vulture. What do you want? This guy didn't want to give you the check. Could you take care of this? No problem. Tell him to put it on my tab. I want to talk to you about that. It ain't just this one. It's seven big ones you owe me. That ain't peanuts. I don't mean to be out of order... It's good you don't like to be out of order, Sonny. Embarrassing me in front of my friends, like I'm a deadbeat. You're a mutt. You know the money we spend in this fucking... - Don't be like that. - Like what? Do you believe him? You think this is funny? What the fuck are you looking at? You fucking moron! You don't want to bring the check? Do you believe this prick? You're supposed to be doing this stuff, too. You're a funny guy. That's it, Henry! You want to laugh? This prick asked me to christen his kid. I charged him seven thousand. You really are a funny guy. I'm worried. I'm hearing bad things. He treats me like I'm a fucking fag. I got to go on the lam to get away from this guy. This ain't right. I can't go here or there. I talk to them a million times. They don't listen. If you tell him, he'll stop. I'll wind up being declared M.I.A. They'll find me in a car in the weeds. You know this Tommy all your life. This cocksucker's an arch criminal. When I leave my house in the morning I look over both shoulders. This is no way to live. I'm no fence jumper. - Tell me what to do, I'll do it. - What could I do? If there was something I could do, I would. I'd like to help you out. Tell him what we talked about. Maybe you could come in with me, take a piece of this joint. It'd be good. What do you mean? The restaurant? It's a classy place. You've been in here a million times. Tommy taking over this joint is like putting a silk hat on a pig. I don't mean disrespect, but that's the way it is. I'm begging you. What can I say? What am I going to do? What does he want from me? I don't know anything about the restaurant business. All I know is how to sit down and order a meal. Not for you. Just a place to hang. The chef is great. The shows are good. There's a lot of whores coming in. What do you want from me? Tommy's a bad seed. What am I supposed to do? Shoot him? That wouldn't' be a bad idea. Sorry I said that. I didn't mean it. It's just that he's scaring me. I just need help. Help me, please. Know anything about the restaurant business? He knows everything. He's there day and night. Another fucking few minutes, he could be a stool. You want me to be your partner? Is that what you're telling me? What do you think I'm telling you? Paulie, please. It's not even fair. You don't understand. You run the joint. Maybe I'll try to help you. God bless you, Paulie. You've always been fair with me. Now he's got Paulie as a partner. Any problems, he goes to Paulie. Trouble with a bill, to Paulie. Trouble with cops, deliveries, Tommy... ...he calls Paulie. But now he has to pay Paulie... ...every week no matter what. "Business bad? Fuck you, pay me. Had a fire? Fuck you, pay me." "The place got hit by lightning? Fuck you, pay me." Also, Paulie could do anything. Like run up bills on the joint's credit. And why not? Nobody will pay for it anyway. Take deliveries at the front door and sell it out the back at a discount. Take a $ case of booze and sell it for $   . It doesn't matter. It's all profit. Then finally, when there's nothing left... ...when you can't borrow another buck from the bank... ...you bust the joint out. You light a match. Do you need help reaching anything? You look like you're decorating a Christmas tree. She's from the Five Towns. Who?"""
        chosen_difficulty = "Film, write using a blockbuster style dialog for a trailer" # refining into prompt
    if debug: print("reference corpus success")

    # reduce the size of the reference to match the target
    reference_tokens = word_tokenize(chosen_reference_corpus)
    #if debug: print(reference_tokens)
    random.shuffle(reference_tokens) # shuffling tokens
    if debug: print("reference corpus shuffle success")
    reference_tokens = reference_tokens[0:int(corpus_size * .75)] # I've done some testing and found that halving the corpus size will
    # of course deliver a lower score, but this also ensures GPT-3 will always have a goal to achieve
    # however, the lower the reference token size, the shorter the summary becomes. Use caution when setting
    # both the reference size and the number of passes. It takes a balance to find the ideal setup for each
    # theme/difficulty.

    if debug: print("reference_token resizing success")

    reference_tokens = ' '.join(reference_tokens)
    reference_tokens = reference_tokens.replace(' ,', '')
    reference_tokens = reference_tokens.replace(' .', '.')
    scored_reference_dict = scorer(bill_dict, False, False, True, reference_tokens, debug) # rerunning the scorer but for our reference (only once)

    # decided to nix these entirely. I'm going for a new approach after the presentations.
    # del scored_dict['active_sent, higher is better']
    # del scored_dict['passive_sent, lower is better']
    # del scored_reference_dict['active_sent, higher is better']
    # del scored_reference_dict['passive_sent, lower is better']

    # calculating dict_wide scores. (Still messing with the "weights" but this would of been cool to build into a network vs trying to do it all in GPT-3 but time is precious)
    # text_score is initial difficulty score
    text_score = (scored_dict['number of fully capitalized words for full text, lower is better'] * 10) + (1-(scored_dict['text word freq, higher is better'])) + (scored_dict['text unique words, lower is better'])
    reference_score = (scored_reference_dict['number of fully capitalized words for full text, lower is better'] * 10) + (1-(scored_reference_dict['text word freq, higher is better'])) + (scored_reference_dict['text unique words, lower is better'])


    max_passes = 1 # we're only running this once, a big issue is you can overuse your server requests for OpenAI resulting in an error!
    # It does work though, the more passes you allow, the more simplified the text will become (but this can be a detriment)
    loss = text_score - reference_score # loss? Loss, okay I ended up trying to calculate loss but...
    # Instead, we'll do just one pass. It still seems to simplify fine, but I'll have to come up with another solution

    # print(chosen_difficulty)

    for i in range(0, max_passes):
        # creating prompt to send to GPT-3
        prepped_score_dict = str(scored_dict).replace("{", "")
        prepped_score_dict = str(prepped_score_dict).replace("}", "")

        prepped_scored_reference_dict = str(scored_reference_dict).replace("{", "")
        prepped_scored_reference_dict = str(prepped_scored_reference_dict).replace("}", "")

        simplification_send_prompt = ("""You are a text meaning extractor machine. You have only two goals.
        Your first goal is to simplify the text by getting the loss as close to 0 as possible using the text's values to match the given reference values:
        1. number of fully capitalized words for full text, lower is better
        Explanation: These are the number of unique fully capitalized words in the text. For example, you find an Acronym of FAFSA, therefore if you change all FAFSA acronyms to Free Application for Federal Student Aid will reduce this parameter by 1.

        2. text word freq, higher is better
        Explanation: these are the number of non-unique words in the text. If you increase the amount of common words while reducing the number of unique words,  you will get a better score. For example "To exercise one's purchasing power requires the usage of a plastic object with a magnetic strip." could be corrected with simpler vocabulary to "To use your credit you must have a credit card".

        3. text unique words, lower is better
        Explanation: these are the number of unique words in the text. If you decrease the amount of unique words while increasing the amount of common words, you will get a better score. Refer to the example above.

        4. loss
        Explanation: The loss is found by subtracting the reference text total score of your current text. The formula is as follows:
        loss = (number of fully capitalized words for full text, lower is better * 10) + (1-(text word freq, higher is better)) + (text unique words, lower is better)

        Your second goal is to then take your newly simplified text and summarize it in context of the theme you are given

        5. Theme
        Explanation: You will receive one of three Themes in which your summary must be moulded to. Feel free to use influence from any knowledge you know about the theme.

        6. Length and Conciseness
        Explanation: You are only providing one written short summary of the original text, in context of your given theme. Your response size can be derived from the loss.

        7. Do not repeat the old text, we only want your summarized text

        _________________________________________________
        Simplify the text to match the reference parameters.

        The bill, until July 1, 2025, would prohibit a court from awarding attorneys’ fees that exceed specified amounts, which vary based on whether the matter is contested or uncontested, in any action to recover COVID-19 rental debt, as defined, brought as a limited or unlimited civil case under normal circumstances, determined as provided.

        Text values = 'number of fully capitalized words for full text, lower is better': 1, 'text word freq, higher is better': 36, 'text unique words, lower is better': 36

        Reference values = 'number of fully capitalized words for full text, lower is better': 1, 'text word freq, higher is better': 23, 'text unique words, lower is better': 20

        The Loss is = 3

        Your Summarized Text: The bill would prohibit a court from awarding attorneys' fees that exceed specified amounts in any action to recover COVID-19 rental debt.

        _________________________________________________
        Simplify the text to match the reference parameters.

        {}

        Text parameters = {}

        Reference parameters = {}

        The Loss is = {}

        Theme = {}

        Your Summarized Text: """.format(simplify_this_text, prepped_score_dict, prepped_scored_reference_dict, loss, chosen_difficulty))

        # Sending prompt
        response = openai.Completion.create(
        model="text-davinci-003",
        prompt =  simplification_send_prompt,
        temperature = 0.7,
        top_p = 1,
        max_tokens = 256,
        frequency_penalty = 0,
        presence_penalty = 0)

        simplify_this_text = (str(response.choices[0].text)) # extracting string response
        simplify_this_text = simplify_this_text.lstrip('\n\n',) # cleaning it up

        scored_dict = scorer(bill_dict, False, False, True, simplify_this_text, debug) # scoring
        # calculating full score
        text_score = (scored_dict['number of fully capitalized words for full text, lower is better'] * 10) + (1-(scored_dict['text word freq, higher is better'])) + (scored_dict['text unique words, lower is better'])

        # eh, why not include the Loss? Idk if GPT-3 really needed it but I think there's a difference?
        # Will need a lot more testing
        loss = text_score - reference_score

    return simplify_this_text

In [6]:
#Example usage
#user types: Implementation of the provisions of this section are contingent upon an appropriation in the annual Budget Act or another statute for this purpose.
#user clicks on Spoken, beginning conversion process in the background

#Returned output is:
#> From SB-1479 2021-2022 Chapter 850, Section 1 32096 (g),

In [7]:
# Global variables!
total = 100     # Max size of progress bar
message = ''    # Blank message string, used to send throughout functions and threads. Not needed yet
progress = 0    # current progress up to a maximum of "total"
difficulty = '' # the current difficulty of text selected by user


def long_operation_thread(text):
    global message, progress, difficulty

    debug = False # enable this if you'd like to see some sanity checks + other printed internals

    # Webscraping func
    legis_scrape, continue_process, searched, summary_only, bill_only = CA_Legislature_Webscrape(text, debug)

    # legis_scrape comes as dict if bill found
    # else it will come as an empty string
    # continue process is a boolean check from the process (left as false if there is no bill found)

    if not continue_process: # if the process failed
        progress = 100
        return "\nWebscraping Failed. \nPlease ensure your quote is exact (no typos). \n Or try a new quote or search"

    progress += 33.3 # Step completed

    if debug: print(legis_scrape)

    if bill_only: # if the exact bill wasn't found, score the bill
        print('\nScoring: "', "The main bill", '"') # Displaying original text
        outputted_scored_dict = scorer(legis_scrape, summary_only, bill_only, False, "", debug)
    elif summary_only: # if the exact bill wasn't found, and it is too long, score the summary
        print('\nScoring: "', "The summary of the bill", '"') # Displaying original text
        outputted_scored_dict = scorer(legis_scrape, summary_only, bill_only, False, "", debug)
    elif not bill_only and not summary_only: # if the exact bill was found, score the bill
        print('\nScoring: "', "The entire bill", '"')
        outputted_scored_dict = scorer(legis_scrape, summary_only, bill_only, False, "", debug)

    progress += 33.3 # Step 2 completed
    if debug: print(outputted_scored_dict)

    # Now we try to have GPT-3 simplify the text
    print("\nConverting/Simplifying the text...")
    new_simplified_text = simplify_loop(outputted_scored_dict, legis_scrape, difficulty, summary_only, bill_only, debug)
    progress += 33.3
    print("\nFinished converting text to", difficulty, ".")


    print("\n\nYou scraped and converted: ", legis_scrape["subject"], "\nLast Updated: ", legis_scrape['date_published'], "\nAdditional Info: ", legis_scrape['bill_info'], "\n\nWhich means:\n", new_simplified_text, '\n')
    #date published, bill info, subject

In [8]:
def release_the_gui():

    global message, progress, difficulty #we're doing multithreading now! Basically just ensuring the

    sg.theme("DarkAmber") #I do like dark theme

    #Making/setting the layout of the window. Just an input field and some buttons to run all functions in proper order at the user's whim (without allowing multiple clicks)
    layout = [
        [sg.Text("Input a quote of a CA Legislature bill,\nI'll try finding it and simplify the whole thing!", font=("Helvetica", 20), text_color="white")],
        [sg.Text("Note: You can use quotes, search terms, or url!", font=("Helvetica", 12), text_color="white", background_color=None)],
        [sg.Text("Input text here: "), sg.InputText(key = '-STRIN-')],
        [sg.Button("Academic"), sg.Button("Social Media"), sg.Button("Film"), sg.Button("Exit", button_color=("white", "red"))],
        [sg.Text('Work progress'), sg.ProgressBar(total, size=(20, 20), orientation='h', key='-PROG-')],
        [sg.Output(size=(80, 20))]
    ]

    window = sg.Window("Law Conversion App", layout) # creating new window to read (display) in while loop
    thread = None # nothing is running in the seperate thread right now

    # --------------------- EVENT LOOP ---------------------

    while True: # while the sg event/window is in use
        event, values = window.read(timeout=100)
        if event in (None, "Exit"): # Exit this loop if the exit button is clicked
            break

        elif event == 'Academic' and not thread:
            difficulty = "Academic"
            window["Academic"].update(button_color=("grey",))
            print('Beginning conversion process to Academic...\n')
            thread = threading.Thread(target=long_operation_thread, args=(values['-STRIN-'],), daemon=True)
            thread.start()
            continue

        elif event == 'Social Media' and not thread:
            difficulty = "Social Media"
            window["Social Media"].update(button_color=("grey",))
            print('Beginning conversion process to Social Media...\n')
            thread = threading.Thread(target=long_operation_thread, args=(values['-STRIN-'],), daemon=True)
            thread.start()
            continue

        elif event == 'Film' and not thread:
            difficulty = "Film"
            window["Film"].update(button_color=("grey",))
            print('Beginning conversion process to Spoken...')
            thread = threading.Thread(target=long_operation_thread, args=(values['-STRIN-'],), daemon=True)
            thread.start()
            continue

        if thread:                                          # If thread is running
            window['-PROG-'].update_bar(progress, total)    # update the progress bar with the current progress amount
            thread.join(timeout=0)                          # not necessarily needed as we are not running more than 1 additional thread and using the results within global,
                                                            # but a good practice to join all in case I want to maybe
                                                            # have many threads running different functions at once (maybe allow more than one snippet?)

            if not thread.is_alive():                       # if the thread is finished/dead
                window[difficulty].update(button_color=("#fdcb52",)) #reset the button colors back to default
                thread, message, progress, difficulty = None, '', 0, ''     # reset variables for next run
                window['-PROG-'].update_bar(0,0) # clear the progress bar
                print('\n\nReady to Convert Another!\n\n') # Send complete message
    window.close() # One the loop is done, close the window ending this function

In [9]:
if __name__ == '__main__':
    release_the_gui()
    exit()

