# Scraping Complaints from the Better Business Bureau

The goal of this notebook is to collect complaints about apartment complexes, expecially the top evictors, 
in Memphis from the Better Business Bureau.

The BBB has an API but it's very limited:

1. You need to contact them via email to get a key
2. The API lets you search organizations, but does not let you pull complaints.

As the latter is most of the work here, I'll be skipping the API.

Scraping will be done with Selenium.

In [1]:
import os
from dataclasses import dataclass
from time import sleep

import pandas as pd
from selenium import webdriver
from selenium.common.exceptions import (StaleElementReferenceException,
                                        TimeoutException)
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from tqdm import tqdm, trange
import re

In [2]:
# My chromium driver isn't in PATH, for some reason

os.environ['PATH'] += ':/opt/homebrew/bin'

In [3]:
# Start a Selenium driver
options = Options()
options.headless = False
options.add_argument("--window-size=1200,800")

driver = webdriver.Chrome(options=options)
wait = WebDriverWait(driver, 4)

  options.headless = False


## Get a list of apartment complexes in Memphis

In [4]:
@dataclass
class Result:
    name: str
    categories: str
    address: str
    phone: str
    rating: str
    url: str

In [5]:
from uszipcode import SearchEngine

sr = SearchEngine()

In [6]:
results = []

for sz in sr.by_city_and_state('Memphis', 'TN'):
    print(sz.zipcode)
    driver.get(
        "https://www.bbb.org/search?filter_state=TN&find_country=USA"
        f"&find_entity=60042-000&find_id=60042-000&find_loc={sz.zipcode}"
        "&find_text=Apartments&find_type=Category&page=1")
    # BBB shows up to 15 pages of apartments
    for page in trange(15):
        # for some reason, it likes to throw 502 errors
        for backoff in (0.1, 0.5, 1, 2, 5, 10):  # so retry with backoff
            if driver.title == '502 Bad Gateway':
                sleep(2)
                driver.refresh()
            else:
                break
        else:
            print('failed')
            break

        wait.until(
            EC.visibility_of_element_located(
                (By.CLASS_NAME, 'result-item-ab')))

        # If there's a dialog that pops up, select "All businesses" and continue
        for button in driver.find_elements(
                By.XPATH, '/html/body/div[1]/dialog/form/div/button'):
            try:
                driver.find_element(
                    By.XPATH,
                    '/html/body/div[1]/dialog/form/fieldset/div[3]/div/input'
                ).click()
                button.click()
            except:
                continue

        # Iterate through the items and pull out the relevant information
        for result in driver.find_elements(By.CLASS_NAME, 'result-item-ab'):
            title = result.find_element(By.CLASS_NAME, 'text-blue-medium')
            sections = [
                el.text
                for el in result.find_elements(By.CLASS_NAME, 'bds-body')
            ]
            try:
                categories, rating, phone, address = sections
            except ValueError:  # not enough sections, probably no phone
                categories, rating, address = sections
                phone = None
            results.append(
                Result(
                    name=title.text,
                    categories=categories,
                    address=address,
                    phone=phone,
                    rating=rating,
                    url=title.get_property('href'),
                ))
        try:
            next_button = driver.find_element(By.LINK_TEXT, 'Next')
            driver.get(next_button.get_property('href'))
        except:
            print('last page')
            break

38103


 93%|█████████▎| 14/15 [00:59<00:04,  4.26s/it]


last page
38104


 93%|█████████▎| 14/15 [01:12<00:05,  5.15s/it]


last page
38105


 93%|█████████▎| 14/15 [01:01<00:04,  4.41s/it]


last page
38106


 93%|█████████▎| 14/15 [01:06<00:04,  4.77s/it]

last page
38107



 93%|█████████▎| 14/15 [00:53<00:03,  3.82s/it]

last page





In [13]:
# pd.DataFrame(results).drop_duplicates().to_csv('bbb-results.csv', index=False)

In [5]:
listings = pd.read_csv('bbb-results.csv')
listings

Unnamed: 0,name,categories,address,phone,rating,url
0,Indigo Riverview,Apartments,"99 N Main St,\nMemphis, TN 38103",,BBB Rating: D+,https://www.bbb.org/us/tn/memphis/profile/apar...
1,99 Front Street,"Property Management, Apartments, Real Estate C...","99 S Front St,\nMemphis, TN 38103",(901) 767-6500,BBB Rating: A+,https://www.bbb.org/us/tn/memphis/profile/prop...
2,The Renaissance Apartments,Apartments,"99 N Main St,\nMemphis, TN 38103",(901) 527-8057,BBB Rating: B-,https://www.bbb.org/us/tn/memphis/profile/apar...
3,2nd Street Flats,Apartments,"275 South Second Street,\nMemphis, TN 38103",(901) 774-8000,BBB Rating: A+,https://www.bbb.org/us/tn/memphis/profile/apar...
4,"Fogelman Properties, Inc.","Property Management, Apartments, Real Estate C...","495 Tennessee Street,\nMemphis, TN 38103",(833) 706-9514,BBB Rating: A+,https://www.bbb.org/us/tn/memphis/profile/prop...
...,...,...,...,...,...,...
271,Tree Haven Glenn Apartments,Apartments,"6075 Poplar Ave STE 630,\nMemphis, TN 38119-4702",(256) 881-6201,BBB Rating: A+,https://www.bbb.org/us/al/huntsville/profile/a...
272,Courts at Waterford Place,Apartments,"5545 Murray Ave 3rd FL,\nMemphis, TN 38119",(901) 435-9300,BBB Rating: A-,https://www.bbb.org/us/tn/chattanooga/profile/...
273,Richard & Milton Grant LLC,Apartments,"7542 Legacy Drive,\nMemphis, TN 38119",(901) 755-8480,BBB Rating: A+,https://www.bbb.org/us/tn/memphis/profile/apar...
274,Ridgeway Holdings LLC,Apartments,"6033 Bangalore Ct,\nMemphis, TN 38119-7200",(901) 767-1830,BBB Rating: F,https://www.bbb.org/us/tn/memphis/profile/apar...


## Read reviews of said apartment complexes

In [74]:
@dataclass
class Complaint:
    complaint_type: str
    complaint_id: str
    status: str
    date: str
    description: str
    text: str
    url: str

In [78]:
complaints = []

for url in tqdm(listings['url'].unique()):
    driver.get(url + '/complaints')
    while (1):
        # Find each complaint group
        for item in driver.find_elements(
                By.XPATH,
                '/html/body/div[1]/div/div/main/div[2]/div/div/div[1]/ul/li'):
            # Each complaint starts with some info and the initial complaint
            complaint_type = re.search('Complaint Type:\n([^\n]+)\n',
                                       item.text).groups()[0]
            status = re.search('Status:\n([^\n]+)\n', item.text).groups()[0]
            complaint_id = item.find_element(
                By.LINK_TEXT,
                'Initial Complaint').get_attribute('href').split('#')[-1]

            complaints.append(
                Complaint(complaint_type=complaint_type,
                          complaint_id=complaint_id,
                          status=status,
                          date=re.search('Initial Complaint\n([^\n]+)\n',
                                         item.text).groups()[0],
                          description='Initial Complaint',
                          text=re.search('More info\n([^\n]+)',
                                         item.text).groups()[0],
                          url=url))

            # And then there are responses from the business and customer
            groups = re.split(
                '(Business|Customer) response\n(\d{2}\/\d{2}\/\d{4})\n',
                item.text)[1:]
            assert len(groups) % 3 == 0
            for responder, date, text in zip(groups[::3], groups[1::3],
                                             groups[2::3]):
                complaints.append(
                    Complaint(complaint_type=complaint_type,
                              complaint_id=complaint_id,
                              status=status,
                              date=date,
                              description=f'{responder} response',
                              text=text,
                              url=url))

        # Next page
        try:
            next_button = driver.find_element(By.LINK_TEXT, 'Next')
            driver.get(next_button.get_property('href'))
        except:
            break  # last page

100%|██████████| 276/276 [07:08<00:00,  1.55s/it]


In [83]:
df = pd.DataFrame(complaints).drop_duplicates()
df.complaint_id.nunique()

529

In [86]:
df.to_csv('bbb-complaints.csv', index=False)

In [2]:
import pandas as pd

pd.read_csv('bbb-complaints.csv')

Unnamed: 0,complaint_type,complaint_id,status,date,description,text,url
0,Problems with Product/Service,1504025924,Unanswered,03/21/2023,Initial Complaint,I moved out of this building (now Indigo River...,https://www.bbb.org/us/tn/memphis/profile/apar...
1,Problems with Product/Service,1590525145,Answered,04/04/2023,Initial Complaint,This business removes autopay from resident po...,https://www.bbb.org/us/tn/memphis/profile/prop...
2,Problems with Product/Service,1590525145,Answered,04/05/2023,Business response,The resident payment portal is set up and cont...,https://www.bbb.org/us/tn/memphis/profile/prop...
3,Problems with Product/Service,1590525145,Answered,04/05/2023,Customer response,\nComplaint: 19895253\n\nI am rejecting this ...,https://www.bbb.org/us/tn/memphis/profile/prop...
4,Problems with Product/Service,1590525145,Answered,04/17/2023,Business response,The resident keeps referencing the amount of h...,https://www.bbb.org/us/tn/memphis/profile/prop...
...,...,...,...,...,...,...,...
1184,Advertising/Sales Issues,1424905505,Resolved,11/16/2020,Initial Complaint,"After vacating the apartment, the final balanc...",https://www.bbb.org/us/tn/chattanooga/profile/...
1185,Advertising/Sales Issues,1424905505,Resolved,12/02/2020,Business response,"Business Response /* (1000, 5, 2020/11/16) */ ...",https://www.bbb.org/us/tn/chattanooga/profile/...
1186,Problems with Product/Service,1304387016,Unanswered,08/03/2022,Initial Complaint,I recently applied for the town home apartment...,https://www.bbb.org/us/tn/memphis/profile/apar...
1187,Advertising/Sales Issues,1304387015,Unanswered,04/09/2021,Initial Complaint,"These apartments refuse to fix anything, I hav...",https://www.bbb.org/us/tn/memphis/profile/apar...
