# Scrape Commercial Appeal

The purpose of this notebook was to identify articles with quotes from candidates 
for the 2023 mayoral race, pull out those statements, and categorize them.

It uses Selenium to scrape and the ChatGPT API to parse articles and identify statements.

In [1]:
import os
from dataclasses import dataclass
from time import sleep

import pandas as pd
from selenium import webdriver
from selenium.common.exceptions import (StaleElementReferenceException,
                                        TimeoutException)
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from tqdm import tqdm

In [2]:
# My chromium driver isn't in PATH, for some reason

os.environ['PATH'] += ':/opt/homebrew/bin'

In [18]:
# Start a Selenium driver and navigate to the CE homepage
options = Options()
options.headless = False
options.add_argument("--window-size=1200,800")

driver = webdriver.Chrome(options=options)
driver.get('https://www.commercialappeal.com/')
wait = WebDriverWait(driver, 4)

  options.headless = False


Here I logged into a free account, to get around the soft paywall and navigated back to the main page.

In [19]:
articles = []


@dataclass
class Article:
    title: str
    link: str
    description: str

In [4]:
candidates = [
    'Floyd Bonner', 'Joe Brown', 'Karen Camper', 'Frank Colvett',
    'J.W. Gibson', 'Willie Herenton', 'Michelle McKissack', 'Van Turner',
    'Paul Young', 'James Harvey', 'Reggie Hall'
]

In [21]:
for candidate in tqdm(candidates):
    # Open up the search box
    driver.find_element(By.XPATH,
                        '/html/body/header/nav/div[2]/div[1]/a').click()
    # Search for articles containing the candidate's name and "statement"
    webdriver.ActionChains(driver).send_keys(f'"{candidate}" statement',
                                             Keys.ENTER).perform()
    # Go through up to 10 pages
    for i in range(10):

        # Get all the articles
        for article in driver.find_elements(By.XPATH,
                                            '/html/body/main/div[1]/a'):
            # skip subscriber-only articles
            if 'gnt_lbl_pm' in article.get_attribute('class'):
                continue
            articles.append(
                Article(title=article.text,
                        link=article.get_attribute('href'),
                        description=article.get_attribute('data-c-desc')))

        # Go to the next page
        for next_button in driver.find_elements(
                By.XPATH, '/html/body/main/div[1]/div[4]/a[3]'):
            next_button.click()
            break
        else:
            # If there are no more pages, exit the loop
            break

100%|██████████| 11/11 [00:31<00:00,  2.88s/it]


In [22]:
len(articles)

109

In [23]:
df = pd.DataFrame(articles).drop_duplicates()

df[df.link.str.contains('opinion')]

Unnamed: 0,title,link,description
90,"The brutal death of Tyre Nichols: horror, grie...",https://www.commercialappeal.com/story/opinion...,"It seems that after each horrific episode, the..."


In [169]:
# Throw out to disk
links_df = pd.DataFrame(articles).drop_duplicates()
links_df['date'] = pd.to_datetime(
    links_df.link.str.extract('(\d{4}/\d{2}/\d{2})', expand=False))
links_df.to_csv('links.csv', index=False)

In [130]:
# Restart point
links_df = pd.read_csv('links.csv')
print(len(links_df))
links_df.head()

96


Unnamed: 0,title,link,description
0,City of Memphis made party to residency lawsui...,https://www.commercialappeal.com/story/news/lo...,An attorney for the city of Memphis said they ...
1,City of Memphis won't be made party to residen...,https://www.commercialappeal.com/story/news/lo...,Chancellor JoeDae Jenkins held a hearing on th...
2,Questions remain following release of footage ...,https://www.commercialappeal.com/story/news/20...,"Although footage of the incident was released,..."
3,Surveillance footage from jail shows officers ...,https://www.commercialappeal.com/story/news/20...,The video shows correctional officers punching...
4,Memphis police officer Geoffrey Redd dies two ...,https://www.commercialappeal.com/story/news/20...,"Memphis police officer Geoffrey Redd, who was ..."


## Load article bodies

In [27]:
article_text = []

In [28]:
articles_df = links_df.copy()
for idx, link in enumerate(tqdm(articles_df.link.tolist())):
    if len(article_text) > idx:
        continue
    # if '/opinion/' in link or '10214856002/' in link:
    #     article_text.append('Subscriber-only')
    #     continue
    driver.get(link)
    try:
        wait.until(
            EC.visibility_of_element_located(
                (By.XPATH, '/html/body/div[2]/main/article/div[5]')))
    except TimeoutException:  # hmm didn't load
        article_text.append('Failed to load')
    article_text.append(
        driver.find_element(By.XPATH,
                            '/html/body/div[2]/main/article/div[5]').text)
articles_df['text'] = article_text

100%|██████████| 96/96 [01:18<00:00,  1.23it/s]


In [33]:
articles_df.to_csv('articles.csv', index=False)

## Use GPT to parse out statements

In [26]:
intro = f"""A table listing the statements in a newspaper article verbatim and in full by any of the following candidates: {', '.join(candidates)}. The table only has lines for candidates that made statements, but may have multiple lines for candidates that made multiple statements:

"""

ending = """

Candidate\tStatement"""

In [27]:
print(intro)

A table listing the statements in a newspaper article verbatim and in full by any of the following candidates: Floyd Bonner, Joe Brown, Karen Camper, Frank Colvett, J.W. Gibson, Willie Herenton, Michelle McKissack, Van Turner, Paul Young, James Harvey, Reggie Hall. The table only has lines for candidates that made statements, but may have multiple lines for candidates that made multiple statements:




In [53]:
import openai

openai.api_key = os.getenv("OPENAI_API_KEY")
# openai.Model.list()

In [49]:
article_input = '\n'.join(article_text[0].split('\n')[:-1])
response = openai.Completion.create(model="gpt-3.5-turbo",
                                    prompt=intro + article_input + ending,
                                    max_tokens=400,
                                    temperature=0)

In [53]:
from io import StringIO


Unnamed: 0,candidate,statement
0,Floyd Bonner,"Robert Spence, an attorney for Bonner, said th..."
1,Van Turner,The city of Memphis has been made a party to a...
2,J.W. Gibson,"Another twist in the lawsuit, originally two s..."
3,Michelle McKissack,"Another mayoral candidate, Michelle McKissack,..."
4,JoeDae Jenkins,"Chancellor JoeDae Jenkins on Monday, in additi..."


In [88]:
system_msg = (
    f"""You will be provided with a newspaper article. Your task is to
read the article and make a table of the full statements made by any of the following
candidates: {', '.join(candidates)}.
Do not include lines for candidates whose statements are not included in the article.
If the document does not contain any statements, then simply respond with "No Statements".
Otherwise, answer with one line per statement starting
with the candidate's name, a pipe character, and the statement. For example:
""".replace('\n', ' ') + '\nMichelle McKiassack | You should all vote for me')
print(system_msg)

You will be provided with a newspaper article. Your task is to read the article and make a table of the full statements made by any of the following candidates: Floyd Bonner, Joe Brown, Karen Camper, Frank Colvett, J.W. Gibson, Willie Herenton, Michelle McKissack, Van Turner, Paul Young, James Harvey, Reggie Hall. Do not include lines for candidates whose statements are not included in the article. If the document does not contain any statements, then simply respond with "No Statements". Otherwise, answer with one line per statement starting with the candidate's name, a pipe character, and the statement. For example: 
Michelle McKiassack | You should all vote for me


In [57]:
from tenacity import (retry, stop_after_attempt, wait_fixed)


@retry(wait=wait_fixed(10), stop=stop_after_attempt(6), reraise=True)
def chat_completion_with_backoff(**kwargs):
    return openai.ChatCompletion.create(**kwargs)

In [None]:
statements = []
processed = set()

In [126]:
for _, row in tqdm(list(articles_df.iterrows())):
    if row.title in processed:
        continue

    article_input = '\n'.join(
        line for line in row.text.splitlines()
        if not line.startswith('Related') and not 'commercialappeal' in line)
    if len(article_input
           ) > 20000:  # seem to be getting an error on the longest ones
        continue
    messages = [
        dict(role="system", content=system_msg),
        dict(role="user", content=article_input),
    ]
    completion = chat_completion_with_backoff(model="gpt-3.5-turbo",
                                              messages=messages)

    for line in completion.choices[0].message.content.splitlines():
        if 'No Statements' in line:
            continue
        try:
            candidate, statement = line.split(' | ')
            statements.append((row.title, candidate, statement.strip('"')))
        except:
            continue
    processed.add(row.title)


100%|██████████| 96/96 [19:52<00:00, 12.42s/it]


In [163]:
statements_df = pd.DataFrame(statements,
                             columns=['title', 'candidate', 'statement'])
statements_df = statements_df[statements_df.statement != 'No statements']
statements_df = statements_df[statements_df.statement != 'N/A']
statements_df = statements_df[statements_df.statement != '(No statement)']
statements_df.to_csv('statements.csv', index=False)


### Combine table

In [5]:
statements_df = pd.read_csv('statements.csv')
links_df = pd.read_csv('links.csv')
combined_df = statements_df.join(links_df.set_index('title'), on='title')
combined_df = combined_df[combined_df.statement.isna() == False]

# Filter for actual candidates
keep = []
last_names = [candidate.split()[-1].casefold() for candidate in candidates]
for _, row in combined_df.iterrows():
    keep.append(row.candidate.split()[-1].casefold() in last_names)
combined_df = combined_df[keep]

print(len(combined_df))
combined_df.head()

68


Unnamed: 0,title,candidate,statement,link,description,date
0,City of Memphis made party to residency lawsui...,Floyd Bonner,The five-year residency requirement for mayora...,https://www.commercialappeal.com/story/news/lo...,An attorney for the city of Memphis said they ...,2023-05-01
1,City of Memphis made party to residency lawsui...,Van Turner,The five-year residency requirement for mayora...,https://www.commercialappeal.com/story/news/lo...,An attorney for the city of Memphis said they ...,2023-05-01
2,City of Memphis made party to residency lawsui...,J.W. Gibson,The Meyers opinion should be enforced;,https://www.commercialappeal.com/story/news/lo...,An attorney for the city of Memphis said they ...,2023-05-01
3,City of Memphis made party to residency lawsui...,Michelle McKissack,Campaign is considering whether to file a moti...,https://www.commercialappeal.com/story/news/lo...,An attorney for the city of Memphis said they ...,2023-05-01
6,City of Memphis won't be made party to residen...,Van Turner,"Not to be flippant, who cares?"" said attorney ...",https://www.commercialappeal.com/story/news/lo...,Chancellor JoeDae Jenkins held a hearing on th...,2023-03-31


In [175]:
with pd.ExcelWriter('comm_appeal_statements.xlsx') as writer:
    combined_df[['date', 'candidate', 'statement', 'title',
                 'link']].to_excel(writer,
                                   sheet_name='statements',
                                   index=False)
    links_df.to_excel(writer, sheet_name='links', index=False)


In [67]:
preface = """ Between triple quotes is an article from newspaper Commercial Appeal
'''
%s
'''
Please write out the speakers and statements in this article in an array, along with a category. 
The available categories are "crime and safety", "public education", "housing", and "none of the above". 
Only includes quotes from the following speakers: Floyd Bonner, Joe Brown, Karen Camper, Frank Colvett, J.W. Gibson, Willie Herenton, Michelle McKissack, Van Turner, Paul Young, James Harvey, Reggie Hall. 
Here is example output:
[("Elmer Fudd", "I'm going to get that rabbit a home.", "housing"),
("Buggs Bunny", "What's up, doc?", "none of the above")]
"""

In [None]:
import time

# Another attempt with a different prompt

completions = {}
articles_df = pd.read_csv('articles.csv')

In [111]:
prev_start = time.time() - 30

for _, row in tqdm(list(articles_df.iterrows())):
    # Make sure we stay under 3 requests a minute
    time.sleep(max(21 - (time.time() - prev_start), 0))
    if row.title in completions:
        continue
    elif 'Geoffrey Redd' in row.title:
        continue  # something weird with this article?
    # Exclude paragraphs about author or other articles
    if not isinstance(row.text, str):
        continue
    article_input = '\n'.join(
        line for line in row.text.splitlines()
        if not line.startswith('Related') and not 'commercialappeal' in line)
    wordcount = article_input.count(
        ' ')  # use spaces as a rough estimate of words
    # Use a model with a larger limit for longer articles, otherwise the most recent model
    model = 'gpt-3.5-turbo-16k' if wordcount > 3000 else 'gpt-3.5-turbo'
    prev_start = time.time()
    messages = [
        # dict(role="system", content=system_msg),
        dict(role="user", content=preface % article_input),
    ]
    completions[row.title] = openai.ChatCompletion.create(model=model,
                                                          messages=messages)


100%|██████████| 96/96 [05:55<00:00,  3.70s/it]


In [118]:
# Parse the ones that work well enough (note: almost all issues caused by quotes - next time have it use single quotes or another format)
parsed = {}
for title, completion in completions.items():
    try:
        parsed[title] = eval(completion.choices[0].message.content)
    except SyntaxError:
        print(('"%s":' % title) + completion.choices[0].message.content + ',')

"Tear gas use during protests could help spread coronavirus, medical professional says
Desiree Stennett ":[("Dr. Steve Threlkeld", "When you're in the midst of a pandemic respiratory viral situation, it's not just the fact that it could be damaging to people with respiratory problems," Threlkeld said. "Tear gas will cause people to have increased secretions and more saliva and coughing and generally be a much more effective source of aerosolized droplets that could be infectious to other people.", "public education"),
("Capt. Anthony Buckner", "The concern was traffic stopping and traffic from the west not being able to access Memphis or our medical district should there be a medical emergency of some kind," Buckner said. "We understand that we are in the middle of a pandemic. It is reasonable, I believe, to assume that it's possible we could have healthcare workers that live in Arkansas that commute over to Memphis to service the hospitals, which ultimately services our community.", "

In [143]:
# Add the ones I fixed manually
parsed.update({
    "Tear gas use during protests could help spread coronavirus, medical professional says\nDesiree Stennett ":
    [("Dr. Steve Threlkeld",
      "When you're in the midst of a pandemic respiratory viral situation, it's not just the fact that it could be damaging to people with respiratory problems,\" Threlkeld said. \"Tear gas will cause people to have increased secretions and more saliva and coughing and generally be a much more effective source of aerosolized droplets that could be infectious to other people.",
      "public education"),
     ("Capt. Anthony Buckner",
      "The concern was traffic stopping and traffic from the west not being able to access Memphis or our medical district should there be a medical emergency of some kind,\" Buckner said. \"We understand that we are in the middle of a pandemic. It is reasonable, I believe, to assume that it's possible we could have healthcare workers that live in Arkansas that commute over to Memphis to service the hospitals, which ultimately services our community.",
      "crime and safety"),
     ("Sheriff Floyd Bonner Jr.",
      'It\'s a difficult job," said Sheriff Floyd Bonner Jr., relaying again the events that happened before the tear gas was fired. "That\'s unfortunate what happened. But what I hope moving forward what we see is that when we have peaceful protests, we\'re not going to stop people from protesting. That\'s not what we\'re here for. We\'re only here to maintain the peace.',
      "crime and safety"),
     ("Joan Carr",
      "On average, people begin to develop symptoms 5-7 days after infection,\" Carr said in an email. \"But there is often a delay between the onset of symptoms and when people (seek) treatment or testing. For that reason, the Health Department expects to see the impact of an event in the COVID-19 testing results approximately 9 days after the event.",
      "public education")],
    "Here's what local officials have said about calls to 'defund police'\nMicaela A Watts ":
    [("Tami Sawyer",
      "My goal is for Memphis to disinvest from brutal policing tactics and the school to prison pipeline...",
      "none of the above"),
     ("Floyd Bonner",
      "To take an almost $18 million cut, there’s no way the Sheriff’s Office can continue to function the way we’re functioning...",
      "crime and safety"),
     ("Steve Cohen",
      "I don't think most people think we should have a society without rules...",
      "crime and safety"),
     ("Jim Strickland", "I'm opposed defunding our police department...",
      "crime and safety")],
    "The 901: How last night's election upsets could change the Memphis City Council\nRyan Poe ":
    [("Sam Hardiman",
      "For the first time in the city's history, it has five black women on the council, breaking the previous record set earlier this year of four.",
      "none of the above"),
     ("Sam Hardiman",
      "With the runoffs settled, we now know what the council's lineup will be at the start of next year:",
      "none of the above"),
     ("Judge Jon McCalla",
      "“The Decree was meant only to prohibit the City’s surveillance, capture, cataloging, maintenance and dissemination of political intelligence unrelated to any legitimate law enforcement activities ...”",
      "crime and safety"),
     ("Judge Jon McCalla",
      "“However, modification of (the Decree) would erode the barrier put in place by the Decree; it acts as a bulwark, ensuring that the City’s surveillance practices do not cross the line from being a powerful weapon in the fight against crime to becoming an intrusive tool that improperly interferes with its residents’ First Amendment-protected activities.",
      "crime and safety"),
     ("Michelle McKissack",
      "“Stuntarious Vol. IV,”, which pools the label's many talents and styles in what is probably the most \"Memphis\" album you've heard.",
      "none of the above")],
    "Memphis attorney, political adviser James S. Gilliland dies at 86\nDesiree Stennett ":
    [("Jim Strickland",
      "Jim Gilliland was first my employer, but quickly became a mentor and an inspiration for 30 years,\" the mayor said. \"He was a great example of a lawyer who took leave from his private practice to serve this city and our country. During my first term as mayor, he provided me with valuable advice numerous times when we were faced with tough situations. I will miss his friendship and his counsel.",
      "none of the above"),
     ("Willie Herenton",
      "\"Jim was a visionary,\" said Herenton, who said he spoke to Gilliland about one month ago and said the two had planned to meet for lunch but never got the chance. \"He saw the need for us to work together.",
      "none of the above"),
     ("Al Gore",
      "Jim Gilliland was an extremely close lifelong friend and I’m deeply saddened by his death,\" former Vice President Al Gore said in a statement to The Commercial Appeal. \"In addition to being a highly valued confidant, Jim had an astute legal mind and was an ideal fit as chief legal counsel in the Department of Agriculture. Along with his wife Lucia, Jim remained a close adviser to me in the White House and during both of my campaigns for President. I will never forget Jim’s kindness, thoughtfulness, and intelligence. Above all, he was a great man and a true friend. I will miss him greatly.”",
      "none of the above"),
     ("Gale Jones Carson",
      "He believed in equality,\" Carson said. \"He believed in Memphis moving forward. He believed in us being a melting pot. ... For Memphis, Shelby County and the state as well, he went beyond looking for people who looked like him.",
      "none of the above"),
     ("Lucia Gilliland",
      "He was devoted to his family, his friends, the outdoors and the well-being of Memphis,\" she said.",
      "none of the above")],
    "The 901: ESPN's 'College GameDay' gives Memphis a 'turn to really shine'\nRyan Poe ":
    [("Anwar Ghazali",
      "It was a hard impact on us and it is still hard on us. You know, we battling this everyday. But we thankful that the victory came out and the judge sought justice for Dorian,”",
      "crime and safety"),
     ("Effie Peete",
      "It was a hard impact on us and it is still hard on us. You know, we battling this everyday. But we thankful that the victory came out and the judge sought justice for Dorian,”",
      "crime and safety"),
     ("Sylvia Harris",
      "It hurts. I will never be able to see my grandson again. None of us will. This is a pill I just cannot swallow but I am glad justice was served and my grandson can rest in peace,”",
      "crime and safety")],
    "In final stretch of superintendent search, MSCS board says it will see the process through":
    [("Michelle McKissack",
      "Seeing the process through 'gives the public the opportunity to see that we are looking for the best candidate for the job,' MSCS board member Michelle McKissack told The CA. 'I haven't wavered from that since we began our search.'",
      "public education"),
     ("Michelle McKissack",
      "McKissack was among seven of the district's nine board members who responded to Commercial Appeal inquiries about commitments to completing the search. Each of the seven, including board chair Althea Greene, support finishing it.",
      "public education"),
     ("Max McGee",
      "We are going to devote full time to making sure you have...a slate of exceptional talent, and one who will be the perfect fit for the long haul. I hear your frustration about past issues and churn and leadership,' Max McGee, who is leading the Memphis search for HYA, recently told a group tasked with advising board members on the search.",
      "public education"),
     ("Max McGee",
      "HYA will whittle the applicants to a slate of four to eight 'semi-finalists' to present to the board, he said. The board will move from there to interview a selection of finalists, though the body has not yet said how many finalists there will be.",
      "public education"),
     ("Gisela Guerrero",
      "'We don't necessarily...do processes like this very often. Where that search firm can offer expertise, maybe even a sample of what they've done before that we could work off of,' Gisela Guerrero, an organizer with the Memphis Interfaith Coalition for Action and Hope (MICAH), said during the January meeting. 'It's hard to, at least for me, to grab out of the air what I want this...process to look like.'",
      "public education"),
     ("Sarah Carpenter",
      'Sarah Carpenter of Memphis Lift helmed another committee to inform her input as a committee member. Often over dinners at the Memphis Lift offices, the parent-advocacy committee convened to discuss what they wanted to see, delivering earlier in February a position statement that also included process requests: Public interviews with finalists and a commitment to understanding through interviews and even travel how finalists worked with the communities they are from.',
      "public education"),
     ("Venita Doggett",
      "Venita Doggett with the Memphis Education Fund has encouraged the board to hold finalist interviews publicly, an option the board 'may' allow, according to policy. Most board members who responded to The Commercial Appeal readily supported making the interviews with finalists public.",
      "public education"),
     ("Doggett",
      'I recognize that this search is one of the most important decisions that this board will undertake as it will set the trajectory for the district for the next several years. That is a weighty responsibility. It is imperative that we get this right and we ensure that the community, parents, students and teachers have access and voice in this selection," Doggett said during public comment at a recent board meeting, calling for adherence to the policy.\n\n\"...We can\'t afford to lose any more time, any more years — we\'re 10 years past the merger and demerger — to unfocused, chaotic and unsure leadership," Doggett continued.',
      "public education")],
    "Mulroy: Ex-MPD officer Preston Hemphill won't face charges in Tyre Nichols case":
    [("Steve Mulroy",
      "No charges will be filed against former Memphis police officer Preston Hemphill for his role in pulling Tyre Nichols over on the evening of Jan. 7, Shelby County District Attorney General Steve Mulroy said Tuesday.",
      "crime and safety"),
     ("Steve Mulroy",
      "It was after reviewing \"hours and hours of body-worn camera footage,\" along with a number of witness interviews, that the decision was made to not pursue prosecution against Hemphill, Mulroy said.",
      "crime and safety"),
     ("Steve Mulroy",
      "Mulroy further confirmed that Hemphill was not at that second scene, and said he did not believe criminal charges were appropriate for Hemphill's actions that night.",
      "crime and safety"),
     ("Ben Crump",
      "\"In light of this, we are supportive of no charges for this individual. It is our deepest hope and expectation that justice will be served fully, and that all who had a role to play in this senseless tragedy will be held accountable.\"",
      "none of the above"),
     ("Steve Mulroy",
      "Mulroy said prosecutors expect Hemphill to testify in the case, and emphasized that Hemphill's cooperation was not part of a plea deal.",
      "crime and safety"),
     ("Paul Hagerman",
      "\"He wasn't necessarily fleeing to the open car,\" Hagerman said.\"But, if you take the officers' perspective, if you watch the body cam very intensely, very slowly, you'll see Hemphill's perspective at the time the taser is actually deployed. We think Tyre was just trying to get away from the violence and threats you heard on the radio. But...we have to put ourselves in a position of what Hemphill would have known, and what Hemphill saw at the time the taser was deployed.\"",
      "crime and safety"),
     ("Van Turner",
      "\"Why Hemphill, the only white officer, was not charged was asked by everyone, including attorney Crump, because we just didn't know,\" Turner said.\"That's why you have to state the obvious and wait until the investigation plays out. We all have a duty to follow the evidence, and respond to the evidence, in a fair and neutral way. I think that's why...upon first appearance it did not look good. But, as we've seen things play out, I think we know that's the right call.\"",
      "crime and safety")],
    "With 116 youths locked up, Shelby County Commissioners address DOJ report, adult transfers\nKatherine Burgess and Sarah Macaraeg ":
    [("Sandra Simkins",
      "A proposal by Shelby County Mayor Lee Harris “could be a step in the right direction” towards solving one issue in Juvenile Court [...] And as for concerns that remain surrounding the transfer of youth to adult court, Simkins stressed that commissioners make a goal of bringing Shelby County “in line with other cities.”",
      "crime and safety"),
     ("Mark Billingsley",
      "“Right down the road they’re doing a better job than us”[...] “I don’t understand the disparity” [...] Commissioners [...] emphasized that the person would need to be independent from Juvenile Court and that checks and balances should be in place between the mayor's office and the commission.",
      "crime and safety"),
     ("Jessica Indingaro",
      "“I would have preferred clarity as opposed to a rap sheet\" [...] Currently, there are 93 youths detained at juvenile detention and 23 who have been transferred to adult court and are housed at Jail East. [...] but that they are “on the front end” of that conversation.",
      "crime and safety"),
     ("Tami Sawyer",
      "\"I would have preferred clarity as opposed to a rap sheet,\" [...] described by Simkins as an \"aberration\" nationally.",
      "crime and safety"),
     ("Reginald Milton",
      "\"We need to make sure we make solid actions that cannot be removed when we’re gone to ensure we never have to revisit these mistakes ever again,\" [...] stressed that the commission and mayor’s administration must put in place measures that will last after they leave office.",
      "none of the above"),
     ("Van Turner",
      "\"We want to make sure we’re not having these young detainees in a facility that’s crumbling,” [...] Commissioners also voiced support for Harris’ plan to begin work on a new $25 million juvenile detention and education facility.",
      "public education"),
     ("Lee Harris",
      "\"A proposal by Shelby County Mayor Lee Harris “could be a step in the right direction” towards solving one issue in Juvenile Court [...] Harris proposed that defense attorneys for delinquent youth whose cases pose a conflict of interest with the Public Defender's Office could be chosen by a person hired by the Shelby County mayor\".[...] Commissioners will have to vote to approve the change, which Harris said would cost approximately $50,000 to implement.",
      "crime and safety")],
    "The 901: The Kellogg's strike is over, Hy-Vee grocery chain is coming to Memphis\nDann Miller ":
    [('Anthony Shelton',
      'Our striking members at Kellogg’s ready-to-eat cereal production facilities courageously stood their ground and sacrificed so much in order to achieve a fair contract. This agreement makes gains and does not include any concessions.none of the above'
      ),
     ('Fred Ashwill',
      'I know we need to find a solution to all of our issues, but this one needs to be at the front of the table. These are the people who built our roads, fought our wars… they deserve better than this.',
      'none of the above'),
     ('Frank Colvett',
      'He said ownership of the building would allow the Orpheum to secure financing for further renovations.',
      'housing'),
     ('Mark Giannotto',
      'The stages of Ja Morant fandom that accompanied his return to the court ran the gamut, with a range of emotions that, by the time Morant finished speaking to reporters for the first time since his knee injury, were as exhausting as a long holiday road trip.',
      'none of the above'),
     ('Damichael Cole',
      'In my new job, you can expect me to chronicle how the Grizzlies intersect with the culture of this city. I pride myself on being detailed and working with that grind synonymous with Memphis. Everything in a Grizzlies game will happen for a reason, and I want to dissect those small turning points with large consequences. But basketball is often bigger than 48 minutes. Sneakers, fashion, music, community work and food are all big parts of NBA culture. Each of those will be incorporated into our coverage.',
      'none of the above')]
})


In [148]:
parsed_df = pd.concat([
    pd.DataFrame(quotes, columns=['candidate', 'statement', 'category'
                                  ]).assign(title=title)
    for title, quotes in parsed.items()
],
                      ignore_index=True).join(links_df.set_index('title'),
                                              on='title')

# Filter for actual candidates
keep = []
last_names = [candidate.split()[-1].casefold() for candidate in candidates]
for _, row in parsed_df.iterrows():
    keep.append(row.candidate.split()[-1].casefold() in last_names)
parsed_df = parsed_df[keep]

parsed_df = parsed_df[parsed_df.statement.str.len() > 20]

parsed_df = parsed_df[
    parsed_df.candidate.str.split().str[-1].str.casefold().isin(last_names)
    & ~parsed_df.candidate.
    isin(  # exclude names that share partially with actual candidates
        ['Kiki Hall', 'Kenneth A. Turner', 'Zamyra Hall'])]

print(len(parsed_df))
parsed_df

138


Unnamed: 0,candidate,statement,category,title,link,description,date
2,JW Gibson,Friday filed a motion to intervene in the suit...,none of the above,City of Memphis made party to residency lawsui...,https://www.commercialappeal.com/story/news/lo...,An attorney for the city of Memphis said they ...,2023-05-01
3,Michelle McKissack,tweeted Monday that her campaign is considerin...,none of the above,City of Memphis made party to residency lawsui...,https://www.commercialappeal.com/story/news/lo...,An attorney for the city of Memphis said they ...,2023-05-01
11,Shelby County Sheriff Floyd Bonner,"According to the Medical Examiner, Mr. Gershun...",crime and safety,Questions remain following release of footage ...,https://www.commercialappeal.com/story/news/20...,"Although footage of the incident was released,...",2023-03-03
12,Shelby County Sheriff Floyd Bonner,"According to the Medical Examiner, Mr. Gershun...",crime and safety,Surveillance footage from jail shows officers ...,https://www.commercialappeal.com/story/news/20...,The video shows correctional officers punching...,2023-03-02
13,Shelby County Sheriff Floyd Bonner,It is unfortunate that parts of the video are ...,crime and safety,Surveillance footage from jail shows officers ...,https://www.commercialappeal.com/story/news/20...,The video shows correctional officers punching...,2023-03-02
...,...,...,...,...,...,...,...
480,Willie Herenton,"""Jim was a visionary,"" said Herenton, who said...",none of the above,"Memphis attorney, political adviser James S. G...",https://www.commercialappeal.com/story/money/b...,"Memphis attorney James ""Jim"" Gilliland died Mo...",2020-02-24
487,Michelle McKissack,Seeing the process through 'gives the public t...,public education,"In final stretch of superintendent search, MSC...",https://www.commercialappeal.com/story/news/ed...,"Input from surveys of students, staff, parents...",2023-03-05
488,Michelle McKissack,McKissack was among seven of the district's ni...,public education,"In final stretch of superintendent search, MSC...",https://www.commercialappeal.com/story/news/ed...,"Input from surveys of students, staff, parents...",2023-03-05
501,Van Turner,"""Why Hemphill, the only white officer, was not...",crime and safety,Mulroy: Ex-MPD officer Preston Hemphill won't ...,https://www.commercialappeal.com/story/news/lo...,No charges will be filed against former Memphi...,2023-05-02


In [149]:
with pd.ExcelWriter('comm_appeal_statements.xlsx') as writer:
    parsed_df.to_excel(writer, index=False, sheet_name='final')

After this point the files were manually curated and delivered to the client.