# Public Comment Period Segmentation via GPT-3.5-Turbo-Instruct

## Dataset Setup

In [1]:
from cdp_data import CDPInstances, datasets
from cdp_data.utils import connect_to_infrastructure
import pandas as pd

# Connect to infra
connect_to_infrastructure(CDPInstances.Seattle)

# Get dataset
seattle_df = datasets.get_session_dataset(
    CDPInstances.Seattle,
    start_datetime="2023-01-01",
    end_datetime="2023-02-15",
    store_transcript=True,
)

  from .autonotebook import tqdm as notebook_tqdm
Fetching each model attached to event_ref: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:00<00:00, 90.49it/s]
Fetching transcripts: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:00<00:00, 48.40it/s]


## Prompt Setup

In [2]:
from enum import Enum

from cdp_backend.pipeline.transcript_model import Transcript
from dotenv import load_dotenv
from langchain.llms import OpenAI
from langchain.callbacks import get_openai_callback 
from langchain.output_parsers import PydanticOutputParser
from langchain import PromptTemplate
from pydantic import BaseModel, Field

###############################################################################

load_dotenv()
llm = OpenAI(model_name="gpt-3.5-turbo-instruct", temperature=0, max_tokens=-1)

###############################################################################

class PublicCommentPeriod(BaseModel):
    first_sentence_text: str = Field(
        description="the text of the sentence which introduces the public comment period",
    )
    last_sentence_text: str = Field(
        description="the text of the sentence which concludes the public comment period",
    )

PUBLIC_COMMENT_PERIOD_SEG_PARSER = PydanticOutputParser(pydantic_object=PublicCommentPeriod)

###############################################################################

PUBLIC_COMMENT_PERIOD_SEG_PROMPT = PromptTemplate.from_file(
    "prompts/v0-public-comment-period-seg.jinja",
    input_variables=["transcript"],
    partial_variables={
        # "format_instructions": PUBLIC_COMMENT_PERIOD_SEG_PARSER.get_format_instructions(),
    },
    template_format="jinja2",
)

def _process_transcript(df: pd.DataFrame, index: int) -> PublicCommentPeriod:
    # Get the meeting transcript
    session_details = df.loc[index]
    
    # Load transcript
    with open(session_details.transcript_path) as open_f:
        transcript = Transcript.from_json(open_f.read())

    # Convert to string
    transcript_str = "\n\n".join([s.text for s in transcript.sentences[20:170]])
    
    # Fill the prompt
    input_ = PUBLIC_COMMENT_PERIOD_SEG_PROMPT.format_prompt(transcript=transcript_str)

    # Generate and log token usage
    with get_openai_callback() as api_usage:
        output = llm(input_.to_string())
    
        # Parse and print parsed
        try:
            pc_period = PUBLIC_COMMENT_PERIOD_SEG_PARSER.parse(output)

            # Ensure single sentence
            first_sentence_text = pc_period.first_sentence_text.split(".")[0].strip()
            first_sentence_text = f"{first_sentence_text}.".strip()
            last_sentence_text = pc_period.last_sentence_text.split(".")[-2].strip()
            last_sentence_text = f"{last_sentence_text}.".strip()
            print(first_sentence_text)
            print(last_sentence_text)

            # Find full content
            start_sentence_index = -1
            end_sentence_index = -1
            for i, sentence in enumerate(transcript.sentences):
                if (
                    start_sentence_index == -1
                    and sentence.text == first_sentence_text
                ):
                    start_sentence_index = i
                if (
                    start_sentence_index != -1
                    and sentence.text == last_sentence_text
                ):
                    end_sentence_index = i + 1
                    break

            # Only print if both start and end were found
            if start_sentence_index != -1 and end_sentence_index != -1:
                section_text = "\n".join([
                    s.text for s in transcript.sentences[
                        start_sentence_index:end_sentence_index
                    ]
                ])
                print(f"SECTION FULL CONTENT: {section_text}")
            else:
                print("SECTION FULL CONTENT: Could not find matching content")
                print(output)
                    
            print()
            print()
            print("-" * 80)
            print()

            return pc_period
    
        except Exception as e:
            # Print output
            print("!!!! ERROR OCCURRED !!!!")
            print()
            print(output)
            print()
            print("-" * 80)
            print()
            raise e
    
        finally:
            # Print api usage
            print(api_usage)

## Outputs

In [3]:
seg = _process_transcript(seattle_df, 0)

Let's go on to public comment.
We have just a few slides left, and we're going to transition now to some policy considerations and recommendations.
SECTION FULL CONTENT: Let's go on to public comment.
Madam Clerk could you let me know if we have anybody in the room or remotely signed up for public comment?
We do not.
Okay great and just confirming the electronic there's nobody signed up electronically either?
Correct.
Okay thank you so much.
There are no registered remote public commenters.
Thank you both so much.
Okay wonderful well if you do have any public comment you're welcome to go ahead and send those to us at council at seattle.gov and the full council will receive those in real time.
With that the public comment period has been opened and seeing no one signed up for public comment the public comment period is now closed.
Madam Clerk let's go ahead and move on to items on our agenda.
We have first with us the reordered agenda would include the sugary sweetened beverage tax upda

In [4]:
seg = _process_transcript(seattle_df, 5)

So the public comment period is now open, and I will call the first person.
This is absolutely a no.
SECTION FULL CONTENT: So the public comment period is now open, and I will call the first person.
As I said, we don't have anybody signed up in advance.
So I will call the first person.
The first person I see is Sri Ravi.
Yes.
Can you hear me?
Yes, Sri.
Please go ahead.
You have one minute.
Thank you.
I just wanted to say that I'm requesting all the people in the room to please raise your hand if you don't know on the caste ordinance because it disproportionately impacts Indian children in the U.S. and leads to harassment, assault, and discrimination on a community that's a victim of centuries of horrendous colonial atrocities.
It is recorded that the British actually introduced caste as a part of the divide and rule policy, and when they asked for the caste, they said it was the British who introduced it.
And this is all documented by the British themselves.
Excuse me.
And it's also no

In [5]:
seg = _process_transcript(seattle_df, 7)

At this time, we're going to move into public comment.
The fires, the gun violence.
SECTION FULL CONTENT: At this time, we're going to move into public comment.
I'll moderate the public comment in the following manner.
Each speaker has two minutes to speak.
I'll alternate between virtual and in-person public commenters.
I'll call on each speaker by name and in order in which they registered on the Council's website in a sign-in form.
If you have not yet registered to speak but would like to, you can sign up before the end of the public comment session.
Once I call a speaker's name, if you are using the virtual option, you'll hear a prompt.
And once you've heard that prompt, please press star 6 to unmute yourself.
Please begin speaking by stating your name and the item on the agenda which you are addressing.
Speakers will hear a chime when 10 seconds are left of the allotted time.
And once the speaker hears the chime, we ask that you begin to wrap up your public comments.
If speakers do