# Investigating Gentrification "Code Words" in Rental Listings on Craigslist.com and Apartments.com
Audrey Jang and Sara Tohamy | UP229 Urban Data Science | MURP '22

# Overall Findings and Narrative
Here you can find all of our findings and narratives aggregated into one notebook. 

## Findings for Craigslist

**Notebook 1: Web scraping and word counts**

In this notebook, we conducted an analysis of the language used in rental listings on craigslist for 4 neighborhoods: East Hollywood, Koreatown, Highland Park, and Boyle Heights. Across these neighborhoods we found that 'park' was among one of the most common words to come up in our raw word count (not filtering for code words). 'Central' also came up as a common word with a high frequency. These two words suggest that the idea of proximity is an attractive feature in these neighborhoods. 'Park' suggests proximity to outdoor amenities, while 'central' suggests general proximity to amenities, services, and other areas of interest. 

In our analysis of gentrification code words appearing in the descriptions of listings, we found that "central" appeared in the top 3 words for all neighborhoods, and was the most frequent word for East Hollywood, Koreatown, and Highland Park. For each of these neighborhoods, words like "private" and "gated" closely followed, suggesting that the idea of safety and security were of top importance to current and future residents. Boyle Heights, however, was slightly different with the word with highest frequency being "historic." Given that we've identified Boyle Heights as a neighborhood that is in the beginning stages of gentrification with comparatively little gentrifying activity, the word "historic" seems to less so emphasize amenities and central location, but rather the existing charm offered by the neighborhood's history. However, similar to the other neighborhoods, the words that followed include "private" and "central." Koreatown was the only neighborhood that “gated” and “amenities” in its top 3, showing an emphasis on security but also services offered by the property. 

**Notebook 2: Geospatial Investigation of Listings**

Mapping out the rental listings in each neighborhood revealed that the geographic location of many listings was actually far from the actual neighborhood under which it was listed. We found that East Hollywood had the highest percent, 87 percent, of its listings not actually located within the neighborhood boundaries. Koreatown had the lowest with 47 percent. Reasons for listings being located outside of the labeled neighborhood range from human error, to wanting to list properties under "trendy" neighborhoods to increase the chances of a successful listing, to not lack of knowledge on the extent of neighborhood boundaries. The second of these reasons could potentially serve as an indicator of gentrification itself. If the second rason holds true, then theoretically neighborhoods with a higher percentage of listings existing outside of the neighborhood boundary could be more gentrified than those with lower a percentage. Again, this works off the assumption that properties are being falsely listed under neighborhoods due to their "trendiness," which could be due to ongoing or completed gentrification. However, proving this is out of the scope of this project.
We also found that each neighborhood has a number of listings that mention none of our identified gentrification code words. When we looked at this variable within neighborhood boundaries, we found that Koreatown had the lowest number (2/8 or 25%) of postings without gentrification code words within its neighborhood boundary, while Highland Park had the highest (7/7 or 100%). A high number of non-code word postings could also support the theory that properties falsely listed under a neighborhood are done so to benefit from the trendiness of a neighborhood, especially if the property itself is not "trendy."
In the next notebook, we'll see how our findings from the last two notebooks interact with rental prices and whether rental prices vary with code word frequencies and property location.

**Notebook 3: Rent Price Assessment** 

Overall, we found that rent prices vary in many ways. First, median rent prices for listings went up when we accounted for listings that were labeled under a certain neighborhood but were physically located outside of the neighborhood boundaries. We hypothesize that this might mean that landlords and managers are taking advantage of the higher prices of gentrified areas and using their false location to justify higher rents. Second, we found, unsurprisingly, that listings that mention at least one of our gentrification code words have higher median rents than those that do not. Finally, we found that 1 and 2 bedroom listings were overwhelmingly more likely to include gentrification code words in their descriptions, than 3 bedroom listings. This is probably due to the younger/unmarried demographic that are more likely to be attracted to gentrified/gentrifying areas. This finding also matches up to our finding that 1 bedroom units rents are more sensitive to the use of gentrification code words and the location of listings.


## Apartments.com

**Notebook 1: Web scraping and word counts**

In this notebook, I'm scraping 500 listings from www.apartments.com for four neighborhoods: East Hollywood, Boyle Heights, Koreatown, and Highland Park. Each listing includes a description as well as three types of amenities (Unique Amenities, Apartment Features, and Community Amenities). I calculated and created bar graphs of each category’s word frequencies.

In the next notebook, I will introduce the hand-selected words that we determined were code for neighborhood change/gentrification. We ran the word counts on these words' frequencies in the Descriptions of each neighborhood's listings. I then calculated the proportion of certain words and mapped the listings color-coded by whether they included certain groups of words.

**Notebook 2: Geospatial Investigation of Listings**

I also scraped the lat/long location and "Pricing & Floor Plans" sections that provides available unit type and rents. In the next notebook, I will dive deeper into the relationship between neighborhood rents and their gentrification word frequencies. I will be exploding each listing into multiple rows. Hopefully I can get to a scatterplot of median rents / rents per SF / rents per BR X frequency of gentrification keywords.

**Notebook 3: Rent Price Assessment**

The unit_type column was less standardized than I expected. If I had more time, I would have gone back to examine the html of apartments.com to see if there is a different way of scraping a more standardized unit type.

## Cross Site Findings and Discussion

#### Top words from raw word count

<table>
  <thead>
    <tr>
      <th>Neighborhood</th>
      <th>Apartments.com</th>
      <th>Craigslist.com</th>
    </tr>
  </thead>
  <tbody>
   <tr>
      <td>East Hollywood</td>
      <td>room,
          unit,
          community,
          rent,
          living</td>
      <td>district,	
          office,
          apartments,
          hollywood,
          space</td>
    <tr>
    <tr>
      <td>Boyle Heights</td>
      <td>unit, 
          district,
          kitchen,
          arts,
          downtown</td>
        <td>District,
          Space,
          park,
          Building,
          squarefeet</td>
    </tr>
    <tr>
      <td>Koreatown</td>
      <td>Koreatown,
          unique,
          features,
          custom,
          room</td>
      <td>Downtown,
          Park,
          District,
          Koreatown,
          central</td>
    </tr>
    <tr>
      <td>Highland Park</td>
      <td>Park,
          new,
          Highland,
          unit,
          parking</td>
      <td>Park,
          District,
          Space,
          City,
          unit</td>
    </tr>
  </tbody>
</table>

#### Most interesting words from raw word count

<table>
  <thead>
    <tr>
      <th>Neighborhood</th>
      <th>Apartments.com</th>
      <th>Craigslist.com</th>
    </tr>
  </thead>
  <tbody>
   <tr>
      <td>East Hollywood</td>
      <td>community, 
       free, 
       private</td>
      <td>Office,
          Space,
          central</td>
    <tr>
    <tr>
      <td>Boyle Heights</td>
      <td>arts district, 
          downtown,
          lofts</td>
        <td>fashion,
          parking,
          historic</td>
    </tr>
    <tr>
      <td>Koreatown</td>
      <td>unique,
          custom,
          flooring</td>
      <td>park,
          new,
          parking</td>
    </tr>
    <tr>
      <td>Highland Park</td>
      <td>new,
          parking,
          Pasadena</td>
      <td>space,
          car,
          beach</td>
    </tr>
  </tbody>
</table>

#### Top gentrification code words

<table>
  <thead>
    <tr>
      <th>Neighborhood</th>
      <th>Apartments.com</th>
      <th>Craigslist.com</th>
    </tr>
  </thead>
  <tbody>
   <tr>
      <td>East Hollywood</td>
      <td>private, 
       unique, 
       lounge</td>
      <td>central,
          private,
          historic</td>
    <tr>
    <tr>
      <td>Boyle Heights</td>
      <td>downtown, 
          private,
          unique</td>
        <td>historic,
            private, 
            central</td>
    </tr>
    <tr>
      <td>Koreatown</td>
      <td>unique,
          lounge,
          private</td>
      <td>central,
          gated,
          amenities</td>
    </tr>
    <tr>
      <td>Highland Park</td>
      <td>unique,
          central,
          downtown</td>
      <td>central,
          private,
          historic
</td>
    </tr>
  </tbody>
</table>

“Unique” was in the top three gentrification code words of all four neighborhoods for Apartments.com listings, while Craigslist listings showed “Central” in the top three for all neighborhoods. Some common gentrification code words across the two sites include "private" and "central."

We can confirm that similar language was used in the descriptions of both sites. 

Conducting a comparative rent price analysis for the two was quite difficult given that apartments.com provides a range of rent prices for one property, whereas craigslist listings provide one flat price. 

Overall though, we found that language used in these sites is an important indicator of value, and if we had more time, we would love to figure out the code that stumped us this time around to tie together our word count and rent price analyses more meaningfully. 