# Machine Learning Project 2024

## Background

The aim of this project is to explore the capabilities of pre-existing speech recognition/diarization models. While you may not be aware of them, these models are widespread in real life and play a crucial role in various applications. For example:

- **Content Creation and Streaming Platforms**:
  - Used extensively in live streaming, YouTube[[1]](https://support.google.com/youtube/answer/6373554?hl=en&utm_source=chatgpt.com), Netflix, and other platforms to provide subtitles/captions, enhancing accessibility and viewer experience

- **Virtual Assistants**:
  - The likes of Siri/Alexa all use Automatic Speech Recognition (ASR)[[2]](https://medium.com/@zhouxiaogan0/how-do-voice-assistants-like-alexa-siri-actually-work-1885dce1f683)

- **Customer Service**:
  - While it can be frustrating, most customer service phone-lines use Automatic Speech Recognition, while you navigate their menu or provide your account identifiers [[3]](https://www.geeksforgeeks.org/what-is-speech-recognition/).

- **Legal and Business Environments**:
  - Facilitates the transcription of meetings and using Large Language Models can produce the minutes.[[4]](https://otter.ai/)

These models demonstrate how speech diarization has become an integral part of modern technology.

To begin with, we use a segment of the 2024 American Presidential debates.

In [33]:
# Packages used

import whisper
import os 
import csv
import pandas as pd
import matplotlib.pyplot as plt
from pydub import AudioSegment
import openai
import requests
import assemblyai as aai
from pyannote.audio import Pipeline
import Authtoken


## Part 1 - Speaker Diarisation

The first part of this project is to use pre-existing models to process an audio file and identify who was speaking and when.

There are many different models that perform speaker diarisation. Below are a few to look into;

- NVIDIA NeMo 2[5](https://github.com/NVIDIA/NeMo):
This is open-sourced software that is used to develop conversational AI models. It performs ASR, Text to Speech (TTS) and NAtural Language Processing (NLP) . According to the documentation it "requires a minimum of 1 NVIDIA GPU with at least 48GB memory for fine tuning, and 2 NVIDIA GPUs with at least 48GB memory each for pretraining", so it probably not suitable for this project.
- PyAnnote[[6]](https://docs.pyannote.ai/introduction):
This is also an open-sourced Spearker Diarisation model. It offers pre trained models. It should perform well for standard tasks but can struggle with larger datasets.
- Deepgram[[7]](https://developers.deepgram.com/docs/introduction):
Deepgram is a commercial product and as such has high accuracy. It's trained on massive datasets ensuring it works across many scenarios and languages. The subscription fees in this case mean it is not suitable for our purposes here.  
- WhisperX[[8]](https://github.com/m-bain/whisperX/blob/main/README.md):
WhisperX is an open source, speech recognition model created by OpenAI.


For this project we will initially look at WhisperX as its free of charge and can handle overlapping speakers/backround noise well. 



Whisper requires the command line tool ffmpeg to be installed on the system. This needs to be done via a package manager like chocolatey [[9]](https://chocolatey.org/) for windows. 

```
choco install ffmpeg
```

This installs software packages at a system level. Options for other OS are listed in the README file. 




In [None]:
# loading audio file to Whisper

file_path = "data/raw/TrumpHarrisDebate.mp3"

#select whisper model

model = whisper.load_model("base")
result = model.transcribe(file_path, verbose = False)

#show outputted text
print(result["text"])

Detected language: English


100%|██████████| 22381/22381 [00:22<00:00, 1009.71frames/s]

 Thank you. Welcome to you both. It's wonderful to have you. It's an honor to have you both here tonight. We have inflation like very few people have ever seen before. Probably the worst in our nation's history. This has been a disaster for people, for the middle class, but for every class. Donald Trump left us the worst unemployment since the Great Depression. And what we have done is clean up Donald Trump's mess. She's a Marxist. Everybody knows she's a Marxist. Her father is a Marxist professor in economics. And he taught her well. But her vice presidential pick says abortion in the ninth month is absolutely fine. He also says execution after birth. It's execution no longer abortion because the baby is born is okay. And that's not okay with me. One does not have to abandon their face or deeply held back. Or deeply held beliefs to agree. The government and Donald Trump certainly should not be telling a woman what to do with her body. Pregnant women who want to carry a pregnancy to te




In [4]:
# show full contents of result
result

{'text': " Thank you. Welcome to you both. It's wonderful to have you. It's an honor to have you both here tonight. We have inflation like very few people have ever seen before. Probably the worst in our nation's history. This has been a disaster for people, for the middle class, but for every class. Donald Trump left us the worst unemployment since the Great Depression. And what we have done is clean up Donald Trump's mess. She's a Marxist. Everybody knows she's a Marxist. Her father is a Marxist professor in economics. And he taught her well. But her vice presidential pick says abortion in the ninth month is absolutely fine. He also says execution after birth. It's execution no longer abortion because the baby is born is okay. And that's not okay with me. One does not have to abandon their face or deeply held back. Or deeply held beliefs to agree. The government and Donald Trump certainly should not be telling a woman what to do with her body. Pregnant women who want to carry a pregn

Whisper has broken down the audiofile into segments, that begin and end as each person speaks. It gives an ID number for each segment, start and end times and also the text from that segment. It also identifies what language is being used. It does not however differentiate between who is speaking when.

In [6]:
#Show language
result['language']

'en'

In [7]:
# put the data for each segment into a dataframe for visualisation

#create dataframe
all_segments_df=[]

for seg in result['segments']:
  #not interested in tokens, temperature, seek, 
  for key in ['tokens', 'seek', 'temperature']:
        seg.pop(key, None)
  seg_df = pd.DataFrame.from_dict({0: seg}, orient='index')

  #append each segment to the dataframe
  all_segments_df.append(seg_df)

pd.concat(all_segments_df, axis=0)

Unnamed: 0,id,start,end,text,avg_logprob,compression_ratio,no_speech_prob
0,0,0.0,2.0,Thank you.,-0.198477,1.661355,0.277675
0,1,2.0,4.0,Welcome to you both.,-0.198477,1.661355,0.277675
0,2,4.0,5.0,It's wonderful to have you.,-0.198477,1.661355,0.277675
0,3,5.0,7.0,It's an honor to have you both here tonight.,-0.198477,1.661355,0.277675
0,4,7.0,11.0,We have inflation like very few people have e...,-0.198477,1.661355,0.277675
...,...,...,...,...,...,...,...
0,61,189.0,191.0,But we're not going back.,-0.071835,1.712500,0.055387
0,62,191.0,194.0,They've had three and a half years to create ...,-0.071835,1.712500,0.055387
0,63,194.0,196.0,And all the things we talked about.,-0.071835,1.712500,0.055387
0,64,196.0,198.0,Why hasn't she done it?,-0.071835,1.712500,0.055387


In the above output Whisper gives ``avg_logprob``,	``compression_ratio`` and	``no_speech_prob``.

- avg_logprob is the confidence level in the transcription. Higher values (closer to 0) mean more confident.
- compression_ratio is the ratio of lenght of the original audio to the transcribed text. High compression ratios can indicate, there are pauses, backround noice etc.
- no_speech_prob is the probability that the segment contains no speech.


In [8]:
# Save the transcribed text file

transcription = result["text"]

output_folder = "data/processed"
output_file = os.path.join(output_folder, "whisper_transcription.txt")
with open(output_file, "w") as file:
  file.write(transcription)

The above code has transcribed the full interview to a txt file. However it does not identify the different speakers or separate their words. To achieve this we need to use another model called pyannote.

In [9]:
from pyannote.audio import Pipeline
#https://github.com/pyannote/pyannote-audio/blob/develop/README.md

In [47]:
#Access token to download pyannote models
from Authtoken import token

The below pyannote models take some time to download (12+minutes on my machine). To avoid this happening every time, I have directed it to be stored in the cache. Next time it runs it should pull it from there. 

In [None]:
pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization", use_auth_token=token, cache_dir="./model_cache")
diarisation = pipeline(file_path)

In [None]:
# show each time someone spoke and output it to a csv

output_file = os.path.join(output_folder, "speaking_times.csv")
with open(output_file, "w", newline="") as csvfile: 
  writer = csv.writer(csvfile)
  #headings
  writer.writerow(['Speaker', 'Start Time (s)', 'End Time (s)'])
  for turn, _, speaker in diarisation.itertracks(yield_label=True):
      writer.writerow([speaker, f"{turn.start:.1f}", f"{turn.end:.1f}"])
      print(f"{speaker}: {turn.start:.1f}s to {turn.end:.1f}s")

### Data presentation

In [None]:
#import the csv file as a Dataframe to visualise

speaker_csv = "data/processed/speaking_times.csv"

speaker_df = pd.read_csv(speaker_csv)
speaker_df

In [None]:
speaker_df["Duration"] = (speaker_df['End Time (s)']- speaker_df['Start Time (s)'])
speaker_df

In [None]:
#create separate datafrmae for each speaker
#speaker_00
speaker_0_df = speaker_df[speaker_df["Speaker"] == "SPEAKER_00"]

#speaker_01
speaker_1_df = speaker_df[speaker_df["Speaker"] == "SPEAKER_01"]

#speaker_02
speaker_2_df = speaker_df[speaker_df["Speaker"] == "SPEAKER_02"]

speaker_0_df

In [None]:
#Plot them on bar chart to visualise the difference.

total_durations = {
  "SPEAKER_00": speaker_0_df["Duration"].sum(),  
  "SPEAKER_01": speaker_1_df["Duration"].sum(),
  "SPEAKER_02": speaker_2_df["Duration"].sum(),
} 

labels = list(total_durations.keys())
values = list(total_durations.values())

#create plot
plt.bar(labels, values)

plt.xlabel("Speakers")
plt.ylabel("Total Duration (seconds)")
plt.title("Total Speaking Duration per Speaker")

#Show plot
plt.show()

### Transcription

As stated above Pyannote is useful for speaker diarisation but does not allow for transcribing the audio file, while whisper, transcribes but does not do speaker diarization. So we need to use a combination of the two. 

In order to transcribe the audio while retaining the speaker segmentation we need to split the audio file into individual segements. We can do this with the library pydub[[9]](https://audiosegment.readthedocs.io/en/latest/audiosegment.html). 

In [9]:
#load audio file
audio = AudioSegment.from_mp3(file_path)

#output folder
segment_folder = "data\processed\Segments"

#Loop through segments
speaker_segments = []
for turn, _, speaker in diarisation.itertracks(yield_label=True):
    #change start and end times to milliseconds
    segment = audio[turn.start *1000:turn.end *1000]

    #pass the output folder
    segment_file = os.path.join(segment_folder, f"{speaker}_{int(turn.start)}_{int(turn.end)}.mp3")    
    segment.export(segment_file, format="mp3")

    speaker_segments.append({"speaker": speaker, "start": turn.start, "end": turn.end, "file": segment_file})



NameError: name 'diarisation' is not defined

The above code has split the audio into segments based on the speaker changing. The output files are in order of speaker so when we transcribe it below, they will need to be resorted in order of start time.

In [15]:
# Transcribe Segments

segment_transcriptions = []

#for loop to transcribe each segment
for segment in os.listdir(segment_folder):
  if segment.endswith(".mp3"):
        # Full path to the audio file
        segment_path = os.path.join(segment_folder, segment)
        
        # Transcribe the audio segment
        print(f"Transcribing: {segment_path}")
        segment_result = model.transcribe(segment_path)

        # Extract speaker, start, and end info from the filename
        file_parts = os.path.splitext(segment)[0].split("_")
        speaker = f"SPEAKER{file_parts[1]}"
        start = float(file_parts[2])
        end = float(file_parts[3])

        # Append the transcription result
        segment_transcriptions.append({
            "speaker": speaker,
            "start": start,
            "end": end,
            "text": segment_result["text"]
        })

#sort by start time to retain order of conversation
segment_transcriptions = sorted(segment_transcriptions, key=lambda x: x['start'])

# Print the results
for transcription in segment_transcriptions:
  print(f"{transcription['speaker']} ({transcription['start']}-{transcription['end']}")



Transcribing: data\processed\Segments\SPEAKER_00_106_111.mp3




Transcribing: data\processed\Segments\SPEAKER_00_115_128.mp3
Transcribing: data\processed\Segments\SPEAKER_00_1_2.mp3
Transcribing: data\processed\Segments\SPEAKER_00_6_10.mp3
Transcribing: data\processed\Segments\SPEAKER_00_97_102.mp3
Transcribing: data\processed\Segments\SPEAKER_01_100_100.mp3
Transcribing: data\processed\Segments\SPEAKER_01_108_115.mp3
Transcribing: data\processed\Segments\SPEAKER_01_10_20.mp3
Transcribing: data\processed\Segments\SPEAKER_01_159_174.mp3
Transcribing: data\processed\Segments\SPEAKER_01_190_203.mp3
Transcribing: data\processed\Segments\SPEAKER_01_30_52.mp3
Transcribing: data\processed\Segments\SPEAKER_01_82_97.mp3
Transcribing: data\processed\Segments\SPEAKER_02_100_101.mp3
Transcribing: data\processed\Segments\SPEAKER_02_103_105.mp3
Transcribing: data\processed\Segments\SPEAKER_02_128_159.mp3
Transcribing: data\processed\Segments\SPEAKER_02_174_187.mp3
Transcribing: data\processed\Segments\SPEAKER_02_189_190.mp3
Transcribing: data\processed\Segments\

In [13]:
#Create transcribed file
full_transcription_file = "data/processed/full_transcription.txt"

In [16]:
#Combine transcribed files
full_transcription_file = "data/processed/full_transcription.txt"
with open(full_transcription_file, "w") as file:
  for t in segment_transcriptions:
    file.write(f"Speaker {t['speaker']} ({t['start']:.1f}s - {t['end']:.1f}s): {t['text']}\n")

print("Created full transcription text file!")

Created full transcription text file!


The above cells have taken a 3 mins 43 second long audio file and broken it down to 20 individual audio segments retaining the time they start and finish speaking. They have then been transcribed individually and pasted back together to from one text file.

### Large Language Models

There are a number of LLMs out there (the best known being ChatGPT),that are trained on vast amounts of data to understand and generate human-like language. It can generate new text based on context allowing it to answer questions and create its own content. 

If we pass the transcription that we have been able to produce to an LLM we can ask it questions about it. 

Hugging Face have a transformer libary of pre-trained sentinment analysis models. Some are designed for social media, other for multilingual. In our case the default model of ``distilbert-base-uncased-finetuned-sst-2-english`` is sufficient.

In [17]:
#hugging face have a transformer with a sentiment analysis

from transformers import pipeline
sentiment_analyzer = pipeline("sentiment-analysis")

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


In [18]:
#open transcript as read only form
with open(full_transcription_file, "r") as file:
  transcript= file.read()

The max tokens to be uploaded to teh sentiment analyzer is 512 so we need to split the transcript into chunks.

In [19]:
# Split the transcript by lines (each line corresponds to a speaker's contribution)
lines = transcript.split("\n")

# Analyze each speaker's sentiment
for line in lines:
    # Skip empty lines
    if line.strip():
        print(f"Line: {line}")
        result = sentiment_analyzer(line)
        print(f"Sentiment: {result}")

Line: Speaker SPEAKER00 (1.0s - 2.0s):  Kamala Harris.
Sentiment: [{'label': 'NEGATIVE', 'score': 0.8671300411224365}]
Line: Speaker SPEAKER02 (1.0s - 5.0s):  Kamala Harris. It's up good to be here. She's a fun. Thank you.
Sentiment: [{'label': 'POSITIVE', 'score': 0.9998607635498047}]
Line: Speaker SPEAKER00 (6.0s - 10.0s):  Welcome to you both. It's wonderful to have you. It's an honor to have you both here tonight.
Sentiment: [{'label': 'POSITIVE', 'score': 0.9998480081558228}]
Line: Speaker SPEAKER01 (10.0s - 20.0s):  We have inflation like very few people have ever seen before, probably the worst in our nation's history. This has been a disaster for people, for the middle class, but for every class.
Sentiment: [{'label': 'NEGATIVE', 'score': 0.9996551275253296}]
Line: Speaker SPEAKER02 (20.0s - 29.0s):  Donald Trump left us the worst unemployment since the Great Depression. And what we have done is clean up Donald Trump's mess.
Sentiment: [{'label': 'NEGATIVE', 'score': 0.99888879

The Hugging Face transformer library[[10]](https://huggingface.co/blog/sentiment-analysis-python)uses a pre-trained model to tag data according to its sentiment, which can be either Positive, Negative or Neutral. 

It splits into smaller components called tokens. The tokens are then processed through a stack of transformer layers, which capture relationships between words and their contextual meaning. The output from the transformer then is fed into a classification head that maps it to a number of output classes (Positive, Negative or Neutral). The model also provides a confidence score for the predicted class. 

However based on quick eye ball test, we might be able to find a better model. For example it classifies the following line as Positive;

"In Springfield, they're eating the dogs, the people that came in, they're eating the cats, they're eating the pets of the people that live there. And this is what's happening in our country."

In [23]:
# Load a QA pipeline with a Hugging Face model
qa_pipeline = pipeline("question-answering", model="deepset/roberta-base-squad2")


Downloading config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


Downloading model.safetensors:   0%|          | 0.00/496M [00:00<?, ?B/s]

Downloading tokenizer_config.json:   0%|          | 0.00/79.0 [00:00<?, ?B/s]

Downloading vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/772 [00:00<?, ?B/s]

In [25]:
question = "How many people are speaking in this transcript?"

response = qa_pipeline(context=transcript, question=question)
print(response['answer'])

Kamala Harris


We can see from the above answer that while it is taking some information from the transcript it is not answering our question correctly. Perhaps it requires further training on our dataset before being implemented. 

There are models such as ChatGPT and Assembly AI which do all of this. They have automated speech recognition, speaker diarisation and LLM's that can analyse the text. They are both

In [26]:
import openai
from Authtoken import chatgpt_token

# Set your API key
openai.api_key = chatgpt_token
url = "https://api.openai.com/v1/chat/completions"

headers = {
    "Authorization": f"Bearer {chatgpt_token}",
    "Content-Type": "application/json"
}

data = {
    "model": "gpt-4o-mini",
    "messages": [
        {"role": "system", "content": "You are a helpful assistant analyzing transcripts."},
        {"role": "user", "content": "Here is the transcription:\n\nSpeaker 1: Hello. Speaker 2: Hi there."},
        {"role": "user", "content": "Can you summarize this transcript?"}
    ]
}

response = requests.post(url, headers=headers, json=data)

if response.status_code == 200:
    print("Response:", response.json()['choices'][0]['message']['content'])
else:
    print("Error:", response.status_code, response.text)

Error: 429 {
    "error": {
        "message": "You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.",
        "type": "insufficient_quota",
        "param": null,
        "code": "insufficient_quota"
    }
}



In [27]:
def query_chatgpt(context, query):
  response = openai.ChatCompletion.create(
    model="gpt-4",
    messages= [
      {"role": "system", "content": "You will analyze transcripts of audiofiles and answer questions about them"},
      {"role": "user", "content": f"Here is the transcription:\n\n{context}"},
      {"role": "user", "content": query},
      ]
  )
  return responses['choices'][0]['message']['content']

user_query = "can you read this document?"




In [28]:
query_chatgpt(transcript, user_query)

APIRemovedInV1: 

You tried to access openai.ChatCompletion, but this is no longer supported in openai>=1.0.0 - see the README at https://github.com/openai/openai-python for the API.

You can run `openai migrate` to automatically upgrade your codebase to use the 1.0.0 interface. 

Alternatively, you can pin your installation to the old version, e.g. `pip install openai==0.28`

A detailed migration guide is available here: https://github.com/openai/openai-python/discussions/742


In [None]:
import importlib
import Authtoken

importlib.reload(Authtoken)

from Authtoken import assembly_ai_token


667ca01903b0410f8f5feaf9b36ec5bf
123456


In [49]:
# https://www.assemblyai.com/docs/getting-started

from Authtoken import assembly_ai_token

# # AI assembly transcription

aai.settings.api_key = assembly_ai_token

transcriber = aai.Transcriber()
aai_transcript = transcriber.transcribe(file_path)

if aai_transcript.status == aai.TranscriptStatus.error:
    print(aai_transcript.error)
else:
    print(aai_transcript.text)


Kamala Harris. Let's have a good debate. Have fun. Thank you. Welcome to you both. It's wonderful to have you. It's an honor to have you both here. Tonight we have inflation like very few people have ever seen before. Probably the worst in our nation's history. This has been a disaster for people, for the middle class, but for every class. Donald Trump left us the worst unemployment since the Great Depression. And what we have done is clean up Donald Trump's mess. She's a Marxist. Everybody knows she's a Marxist. Her father is a Marxist professor in economics and he taught her well. But her vice presidential pick says abortion in the ninth month is absolutely fine. He also says execution after birth, it's execution. No longer abortion because the baby is born is okay. And that's not okay with me. One does not have to abandon their faith or deeply held beliefs to agree. The government and Donald Trump certainly should not be telling a woman what to do with her body. Pregnant women who w

In [51]:
# Now Transcribe with different speakers identified

config = aai.TranscriptionConfig(speaker_labels=True)

transcriber = aai.Transcriber()
speaker_transcript = transcriber.transcribe(
  file_path,
  config=config
)

for utterance in speaker_transcript.utterances:
  print(f"Speaker {utterance.speaker}: {utterance.text}")

    # Save speaker labels and transcriptions to a text file
  with open("data/processed/aai_transcription.txt", "w", encoding="utf-8") as file:
      for utterance in speaker_transcript.utterances:
          start_time = utterance.start/1000 # convert from milliseconds to seconds
          end_time = utterance.end/1000
          line = (
             f"[{start_time:.2f} - {end_time:.2f}]"
             f"Speaker {utterance.speaker}: {utterance.text}\n"
          )
          print(line)  # Print to console
          file.write(line)  # Write to file

Speaker A: Kamala Harris. Let's have a good debate.
[1.84 - 4.02]Speaker A: Kamala Harris. Let's have a good debate.

[4.15 - 4.70]Speaker B: Have fun.

[4.77 - 5.54]Speaker A: Thank you.

[6.56 - 9.67]Speaker C: Welcome to you both. It's wonderful to have you. It's an honor to have you both here.

[9.70 - 20.66]Speaker B: Tonight we have inflation like very few people have ever seen before. Probably the worst in our nation's history. This has been a disaster for people, for the middle class, but for every class.

[20.83 - 29.75]Speaker A: Donald Trump left us the worst unemployment since the Great Depression. And what we have done is clean up Donald Trump's mess.

[29.94 - 52.72]Speaker B: She's a Marxist. Everybody knows she's a Marxist. Her father is a Marxist professor in economics and he taught her well. But her vice presidential pick says abortion in the ninth month is absolutely fine. He also says execution after birth, it's execution. No longer abortion because the baby is born

In [52]:
#Interrogate using AI ASsmebly

prompt1 = "Can you tell me how many speakers there are and who they are?"

result = speaker_transcript.lemur.task(
  prompt1, final_model=aai.LemurModel.claude3_5_sonnet
)

print(result.response)

Based on the transcript, there appear to be at least 3 main speakers:

1. Kamala Harris (referred to as "Vice President Harris")
2. Donald Trump (referred to as "Donald Trump" and "Mr. President")
3. An unnamed moderator/interviewer (asking questions and directing the conversation)

The transcript seems to be from a debate or interview setting where Harris and Trump are the primary participants, with an moderator facilitating the discussion. However, the text is somewhat disjointed and jumps between topics, so it's possible there are other speakers not clearly identified. The overall structure makes it difficult to definitively state if there are additional speakers beyond these three.


In [53]:
#Interrogate using AI ASsmebly

prompt2 = "Only based on the text and no pre-existing knowledge, can you tell if any speaker appears either conservative or liberal?"

result = speaker_transcript.lemur.task(
  prompt2, final_model=aai.LemurModel.claude3_5_sonnet
)

print(result.response)

Based solely on the text provided, without any pre-existing knowledge, there appear to be at least two speakers with contrasting views that could be characterized as conservative and liberal:

1. One speaker seems more conservative, making statements like:
- Calling the other speaker a "Marxist"
- Criticizing abortion policies
- Mentioning concerns about crime and immigration (the Springfield reference)
- Defending Donald Trump and criticizing the current administration

2. Another speaker seems more liberal, making statements like:
- Defending abortion rights and women's bodily autonomy
- Criticizing Donald Trump's presidency
- Emphasizing their election victory with "81 million people"
- Focusing on moving forward rather than looking to the past

The text presents these contrasting viewpoints in a debate-like format, suggesting opposing political ideologies are being represented, though the specific terms "conservative" and "liberal" are not used explicitly.
