# Data scrapping
---

This script collects data from 'www.apartments.com' and store them into a DataFrame. The following data are collected from the website. Later, data cleaning is performed to remove any unwanted features from the dataframe.
- Apartment complex name
- Address
- Rent
- Number of bedroom, bathroom, sqft
- Tenant rating
- Amenity such as laundary, fitness, business center, etc
- Allowed pet
- Nearby school
- and additional information such as built year

## Import useful libraries

In [1]:
from bs4 import BeautifulSoup
import requests
import pandas as pd
import numpy as np
import time

## Parsing the main page with BeautifulSoup

The website provides a main page for each of 5 boroughs of NY or any region in general. These main pages can be accessed by urls in the form of 'www.apartment.com/name-of-region'. In this project, 5 boroughs of NY is considered.

As a first step, the main page for each borough is parsed to find the total number of 'listing pages' to scrap. The number of listing pages are stored in a dictionary called 'boroughs_max_pages'

In order to avoid any problem with the server, 1 second of delay is introduced between each request.

In [2]:
# boroughs of NY
boroughs = ['manhattan-ny','queens-ny','brooklyn-ny','staten-island-ny','bronx-ny']
boroughs_max_page = {'manhattan-ny':0,'queens-ny':0,'brooklyn-ny':0,'staten-island-ny':0,'bronx-ny':0}
boroughs_listing_pages = {'manhattan-ny':{},'queens-ny':{},'brooklyn-ny':{},'staten-island-ny':{},'bronx-ny':{}}

# send user agent to avoid bot check
headers = {'User-Agent': 'User'}

# loop over borooughs of NY
for br in boroughs:
    
    # request page
    page_main = requests.get('https://www.apartments.com/%s/'%br, headers=headers)
    
    # pause to regulate request frequency
    time.sleep(1)

    # create soup of the main page
    soup_main = BeautifulSoup(page_main.content,'html.parser')
    
    # find the total number of apartment "listing pages" to scrap
    max_list_pages = 0

    # find tag that links to "listing pages"
    for tag in soup_main.find_all('a'):
        # selecting tags containing page numbers
        if 'data-page' in tag.attrs:
            if not 'class' in tag.attrs:
                # found the page numbers to scrap for this br
                boroughs_max_page[br] = max(int(tag['data-page']),max_list_pages)

# for testing            
print(boroughs_max_page)

{'manhattan-ny': 28, 'queens-ny': 28, 'brooklyn-ny': 28, 'staten-island-ny': 28, 'bronx-ny': 28}


## Collect URLs of each rental listing

Knowing the total number of listing pages to parse for each borough, the listing pages are parsed to collect URLs of apartment listings. The URLs and titles of apartment complex are stored in a dictionary "boroughs_listing_pages".

Similar to the previous step, 1 second of delay is introduced to avoid any conflict with the server. Because of the delay, it could take a few minuts to run this code. Hence, debug message is printed before looping over each page.

In [3]:
# loop over boroughs
for br in boroughs:
    
    print("brough:",br)

    # loop over page number
    for i in range(boroughs_max_page[br]):
        
        # set page number to avoid confusion
        page_number = i+1
        
        #print("Loop over page %s of %s" %(page_number,br))
        
        # request list page
        page_listing = requests.get('https://www.apartments.com/%s/%s' %(br,page_number), headers=headers)
        
        #print("page_listing:",page_listing)
        
        # pause to regulate request frequency
        time.sleep(0.5)
       
        # parse listing page with BeautifulSoup
        soup_listing = BeautifulSoup(page_listing.content,'html.parser')
       
        # for testing
        #print(soup_listing.prettify())

        for tag in soup_listing.findAll("a",{"class":"placardTitle"}):
            
            # add title and the link to each property
            if not tag['title'] in boroughs_listing_pages[br]:
                boroughs_listing_pages[br][tag['title']] = tag['href']
            
# Collected links
print("collected links to each apartment pages")

brough: manhattan-ny
brough: queens-ny
brough: brooklyn-ny
brough: staten-island-ny
brough: bronx-ny
collected links to each apartment pages


For debugging purpose, total number of apartment rental listing collected are printed here. In the current state, each borough has around 500 to 700 apartment listing.

In [8]:
# for debugging
for br in boroughs:
    print(br, len(boroughs_listing_pages[br]))

manhattan-ny 671
queens-ny 688
brooklyn-ny 683
staten-island-ny 690
bronx-ny 663


In [6]:
import json

json = json.dumps(boroughs_listing_pages)
f = open("data/boroughs_listing_pages.json","w")
f.write(json)
f.close()

##  Scrap apartment rental listing

In the previous step, we collected title and URL of all apartment rental listing posted on the website. Now, we loop over each URL, and collect useful features such as rent price, number of bedrooms, etc.

Parsing is done with BeautifulSoup, and features collected from each apartment rental listing are stored as python list. Later, the lists are converted into a dataframe.


Additional care was given so that if some features are missing from an apartment listing, it would not cause the script to crash.

In [174]:
# if soup_rent.findAll("span",{"itemprop":"addressLocality"}):
#     city = soup_rent.findAll("span",{"itemprop":"addressLocality"})[0].text
# else:
#     city = ""

# if soup_rent.findAll("script",{"type":"application/ld+json"}): 
#     street_info = soup_rent.findAll("script",{"type":"application/ld+json"})[0].text
# else:
#     street_info = ""

In [202]:
# streetAddress
# addressLocality
# addressRegion
# postalCode
# 

# for s in si:
#     if 'addressLocality' in s:
#         print(s)
#         break

"addressLocality":"New York"


In [7]:
# br = 'manhattan-ny' #: 28, 'queens-ny': 28, 'brooklyn-ny': 28, 'staten-island-ny': 28, 'bronx-ny': 28}

# list_rating = []
# list_amenity = []
# list_leaseLength = []

# i = 0

# for title, url in boroughs_listing_pages[br].items():
    
#     if i < 5:

#         print("boroughs: %s, processing apartment: %s" %(br,title))

#         #=======================================
#         # request and parse each complex
#         #=======================================

#         # get url from the list
#         rental_url = url

#         # request list page
#         page_rent = requests.get('%s' %rental_url, headers=headers)

#         # pause to regulate request frequency
#         time.sleep(0.2)

#         # parse listing page with BeautifulSoup
#         soup_rent = BeautifulSoup(page_rent.content,'html.parser')

#         print("rating:")
#         #print(soup_rent.findAll("div",{"class":"rating"}))

#         if 'title' in soup_rent.findAll("div",{"class":"rating"})[0].attrs:
#             rating = soup_rent.findAll("div",{"class":"rating"})[0]['title']
#         else:
#             rating = ""

#         # list of amenities
#         amenity_temp = []
#         if soup_rent.findAll("section",{"class":"printPropertySection"}):
#             for tag_amenity in soup_rent.findAll("section",{"class":"printPropertySection"})[0].findAll("li"):
#                 amenity_temp.append(tag_amenity.text)
            
#         # add amenity to row
#         amenity = amenity_temp
        
#         list_rating.append(rating)
#         list_amenity.append(amenity)
        
#         for tag_unit in soup_rent.findAll("tr",{"class":"rentalGridRow"}):
            
#             if tag_unit.findAll("td",{"class":"leaseLength"}):
#                 list_leaseLength.append(tag_unit.findAll("td",{"class":"leaseLength"})[0].text.strip())
#             else:
#                 list_leaseLength.append("")
#     i +=1   

In [10]:
# rental_url = "https://www.apartments.com/studio-1-bath-house-260-w-27th-st-new-york-ny/flt29nr/"

# page_rent = requests.get('%s' %rental_url, headers=headers)

# # pause to regulate request frequency
# #time.sleep(0.3)

# # parse listing page with BeautifulSoup
# soup_rent = BeautifulSoup(page_rent.content,'html.parser')

# # for testing
# # print(soup_rent.prettify())

# #=======================================
# # find complex's property
# #=======================================

# # get rental title
# #rental_title = title



# if soup_rent.findAll("script", {"type":"text/javascript"}): 
#     property_address1 = soup_rent.findAll("script", {"type":"text/javascript"}) # ' '.join(soup_rent.findAll("div", {"class":"propertyAddress"})[0].text.strip().split())
# else:
#     property_address1 = ""

# if soup_rent.findAll("script", {"type":"application/ld+json"}): 
#     property_address = soup_rent.findAll("script", {"type":"application/ld+json"}) # ' '.join(soup_rent.findAll("div", {"class":"propertyAddress"})[0].text.strip().split())
# else:
#     property_address = ""

In [49]:
# if soup_rent.findAll("script", {"type":"text/javascript"}): 
#     infos = soup_rent.findAll("script", {"type":"text/javascript"}) # ' '.join(soup_rent.findAll("div", {"class":"propertyAddress"})[0].text.strip().split())
# else:
#     infos = ""

# infos = [x.strip() for x in infos[2].text.strip().split(',')]

# for i in range(len(infos)):
#     info = infos[i].split(":")
#     if info[0] == "listingCity":
#         print(info[1])

 'New York'


In [52]:
# infos = [x.strip() for x in property_address1[2].text.strip().split(',')]

# #infos = [x.strip() for x in infos[2].text.strip().split(',')]

# for i in range(len(infos)):
#     info = infos[i].split(":")

#     if info[0] == 'listingAddress':
#         street_address = info[1]

#     if info[0] == "listingCity":
#         city = info[1]

#     if info[0] == "listingState":
#         state = info[1]

#     if info[0] == "listingZip":
#         postal_code = info[1]

# print(street_address,city,state,postal_code)

 '260 W 27th St'  'New York'  'NY'  '10001'


In [58]:
rental_url = "https://www.apartments.com/halletts-point-astoria-ny/1j2c5h6/"

page_rent = requests.get('%s' %rental_url, headers=headers)

# pause to regulate request frequency
#time.sleep(0.3)

# parse listing page with BeautifulSoup
soup_rent = BeautifulSoup(page_rent.content,'html.parser')

# for testing
# print(soup_rent.prettify())

#=======================================
# find complex's property
#=======================================

# get rental title
#rental_title = title



# if soup_rent.findAll("div", {"class":"specList propertyFeatures js-spec"}): 
#     property_address1 = soup_rent.findAll("script", {"type":"text/javascript"}) # ' '.join(soup_rent.findAll("div", {"class":"propertyAddress"})[0].text.strip().split())
# else:
#     property_address1 = ""

if soup_rent.findAll("div", {"class":"specList propertyFeatures js-spec"}):
    property_address = soup_rent.findAll("div",{"class":"specList propertyFeatures js-spec"})[0].text # ' '.join(soup_rent.findAll("div", {"class":"propertyAddress"})[0].text.strip().split())
else:
    property_address = ""

In [59]:
property_address

'\nProperty Information\n\n•Built in 2019\n•405 Units/22 Stories\n\n'

In [None]:
    </div>
            <div class="specList propertyFeatures js-spec">
            <span class="amenityCircle"><i class="propertyIcon"></i></span><h3>Property Information</h3>
            <ul>
                    <li><span class="bullet">&bull;</span>Built in 2019</li>
                    <li><span class="bullet">&bull;</span>405 Units/22 Stories</li>
            </ul>
        </div>

In [None]:
# loop over boroughs again to scrap rental listing
for br in boroughs:

    # loop over rental listing
    for title, url in boroughs_listing_pages[br].items():
        
        # get url from the list
        rental_url = url

        # request list page
        page_rent = requests.get('%s' %rental_url, headers=headers)

        # pause to regulate request frequency
        time.sleep(0.2)

        # parse listing page with BeautifulSoup
        soup_rent = BeautifulSoup(page_rent.content,'html.parser')
        
        if soup_rent.findAll("div", {"class":"specList propertyFeatures js-spec"}):
            property_info = soup_rent.findAll("div",{"class":"specList propertyFeatures js-spec"})[0].text
        else:
            property_info = ""


In [53]:
# define lists to hold contents before creating DataFrame
list_rental_title = []
list_street_address = []
list_city = []
list_state = []
list_postal_code = []
#list_neighbor = []
#list_location = []
list_rating = []
list_amenity = []
list_pet_policy = []
list_property_info = []
list_school = []
list_bedrooms = []
list_bathrooms = []
list_rent = []
#list_deposit = []
#list_unit = []
list_sqft = []
#list_name = []
#list_leaseLength = []
list_borough = []

# loop over boroughs again to scrap rental listing
for br in boroughs:

    # loop over rental listing
    for title, url in boroughs_listing_pages[br].items():

        #print("boroughs: %s, processing apartment: %s" %(br,title))

        #=======================================
        # request and parse each complex
        #=======================================

        # get url from the list
        rental_url = url

        # request list page
        page_rent = requests.get('%s' %rental_url, headers=headers)

        # pause to regulate request frequency
        time.sleep(0.3)

        # parse listing page with BeautifulSoup
        soup_rent = BeautifulSoup(page_rent.content,'html.parser')

        # for testing
        # print(soup_rent.prettify())

        #=======================================
        # find complex's property
        #=======================================

        # get rental title
        rental_title = title

#         if soup_rent.findAll("div", {"class":"propertyAddress"}): 
#             property_address = ' '.join(soup_rent.findAll("div", {"class":"propertyAddress"})[0].text.strip().split())
#         else:
#             property_address = ""

#         #print("property_address:",property_address) 
#         #'43-22 Queens St, Long Island City, NY 11101 – Hunters Point'

#         address_list = property_address.split(',')
#         if len(address_list) <3:
#             street_address = ""
#             city = property_address.split(',')[0]
#             state_postal_neighbor = property_address.split(',')[1]
#         else:
#             street_address = property_address.split(',')[0]
#             city = property_address.split(',')[1]
#             state_postal_neighbor = property_address.split(',')[2]
#         state = state_postal_neighbor.split()[0]
#         postal_code = state_postal_neighbor.split()[1]
#         neighbor = state_postal_neighbor.split('–')[1]
        
        if soup_rent.findAll("script", {"type":"text/javascript"}): 
            infos = soup_rent.findAll("script", {"type":"text/javascript"}) # ' '.join(soup_rent.findAll("div", {"class":"propertyAddress"})[0].text.strip().split())
        else:
            infos = ""
        
        infos = [x.strip() for x in infos[2].text.strip().split(',')]

        for i in range(len(infos)):
            info = infos[i].split(":")
            
            if info[0] == 'listingAddress':
                street_address = info[1]
            
            if info[0] == "listingCity":
                city = info[1]
            
            if info[0] == "listingState":
                state = info[1]
            
            if info[0] == "listingZip":
                postal_code = info[1]
              
            if info[0] == "location":
                location = info[1]
                
        # street address
#         if soup_rent.findAll("span",{"itemprop":"streetAddress"}):
#             street_address = soup_rent.findAll("span",{"itemprop":"streetAddress"})[0].text
#         else:
#             street_address = ""

        # city
#         if soup_rent.findAll("span",{"itemprop":"addressLocality"}):
#             city = soup_rent.findAll("span",{"itemprop":"addressLocality"})[0].text
#         else:
#             city = ""

        # state
#         if soup_rent.findAll("span",{"itemprop":"addressRegion"}):
#             state = soup_rent.findAll("span",{"itemprop":"addressRegion"})[0].text
#         else:
#             state = ""

        # postal code (zip code)
#         if soup_rent.findAll("span",{"itemprop":"postalCode"}):
#             postal_code = soup_rent.findAll("span",{"itemprop":"postalCode"})[0].text
#         else:
#             postal_code = ""

        # apartment rating         <div class="rating" title="5 star property">
        try:
            #if 'title' in soup_rent.findAll("div",{"class":"rating"})[0].attrs:
            rating = soup_rent.findAll("div",{"class":"rating"})[0]['title']
        except:
            rating = ""

        # list of amenities
        amenity_temp = []
        if soup_rent.findAll("section",{"class":"printPropertySection"}):
            for tag_amenity in soup_rent.findAll("section",{"class":"printPropertySection"})[0].findAll("li"):
                amenity_temp.append(tag_amenity.text)

        # add amenity to row
        amenity = amenity_temp

        # pet policy <div class="petPolicyDetails">
        pet_policy_temp = []
        for tags_pet in soup_rent.findAll("div",{"class":"petPolicyDetails"}):
            pet_policy_temp.append(tags_pet.text)

        # add pet policy to row
        pet_policy = pet_policy_temp

        # additional information such as built date, complex size     <div class="specList propertyFeatures js-spec">
        if soup_rent.findAll("div", {"class":"specList propertyFeatures js-spec"}):
            property_info = soup_rent.findAll("div",{"class":"specList propertyFeatures js-spec"})[0].text
        else:
            property_info = ""

        # nearby school information  <div class="schoolCard">
        school_temp = []
        for tag_school in soup_rent.findAll("div",{"class":"schoolCard"}):
            # get name, number of students, rating
            school_name = tag_school.findAll("p",{"class":"schoolType"})[0].text
            school_number_student = tag_school.findAll("p",{"class":"numberOfStudents"})[0].text
            school_rating = tag_school.findAll("i")[0]['class'][0]

            # add it to list of schools as a dictionary
            school_temp.append({"school_name":school_name, 
                                "school_number_student":school_number_student,
                                "school_rating":school_rating})

        # add school to row
        school = school_temp

        # loop over available units in this apartment
        for tag_unit in soup_rent.findAll("tr",{"class":"rentalGridRow"}):

            #=======================================
            # find unit property
            #=======================================

            # append complex's properties
            list_rental_title.append(rental_title)
            list_street_address.append(street_address)
            list_city.append(city)
            list_state.append(state)
            list_postal_code.append(postal_code)
            #list_location.append(location)
            #list_neighbor.append(neighbor)
            list_rating.append(rating)
            list_amenity.append(amenity)
            list_pet_policy.append(pet_policy)
            list_property_info.append(property_info)
            list_school.append(school)
            list_borough.append(br)

            # number of bedrooms in this apartment
            if tag_unit.findAll("td",{"class":"beds"}):
                list_bedrooms.append(tag_unit.findAll("td",{"class":"beds"})[0].findAll("span",{"class":"shortText"})[0].text.strip())
            else:
                list_bedrooms.append("")

            # number of bathroom in this apartment
            if tag_unit.findAll("td",{"class":"baths"}):
                list_bathrooms.append(tag_unit.findAll("td",{"class":"baths"})[0].findAll("span",{"class":"shortText"})[0].text.strip())
            else:
                list_bathrooms.append("")

            # rent per month
            if tag_unit.findAll("td",{"class":"rent"}):
                list_rent.append(tag_unit.findAll("td",{"class":"rent"})[0].text.strip())
            else:
                list_rent.append("")

            # deposit
#             if tag_unit.findAll("td",{"class":"deposit"}):
#                 list_deposit.append(tag_unit.findAll("td",{"class":"deposit"})[0].text.strip())
#             else:
#                 list_deposit.append("")

            # unit
#             if tag_unit.findAll("td",{"class":"unit"}):
#                 list_unit.append(tag_unit.findAll("td",{"class":"unit"})[0].text.strip())
#             else:
#                 list_unit.append("")

            # sqft
            if tag_unit.findAll("td",{"class":"sqft"}):
                list_sqft.append(tag_unit.findAll("td",{"class":"sqft"})[0].text.strip())
            else:
                list_sqft.append("")

            # name
#             if tag_unit.findAll("td",{"class":"name  "}):
#                 list_name.append(tag_unit.findAll("td",{"class":"name  "})[0].text.strip())
#             else:
#                 list_name.append("")

            # lease length
#             if tag_unit.findAll("td",{"class":"leaseLength"}):
#                 list_leaseLength.append(tag_unit.findAll("td",{"class":"leaseLength"})[0].text.strip())
#             else:
#                 list_leaseLength.append("")

## Converting list to DataFrame

The lists collected from the previous step are converted into a DataFrame. The DataFrame is then saved as csv for post-processing.

In [54]:
df = pd.DataFrame({'rental_title':list_rental_title,
                   'borough':list_borough,
                   'street_address':list_street_address,
                   'city':list_city,
                   'state':list_state,
                   'postal_code':list_postal_code,
                   #'neighbor':list_neighbor,
                   #'location'：list_location,
                   'rating':list_rating,
                   'amenity':list_amenity,
                   'pet_policy':list_pet_policy,
                   'property_info':list_property_info,
                   'school':list_school,
                   'bedrooms':list_bedrooms,
                   'bathrooms':list_bathrooms,
                   'rent':list_rent,
                   #'deposit':list_deposit,
                   #'unit':list_unit,
                   'sqft':list_sqft,
                   #'name':list_name,
                   #'leaseLength':list_leaseLength
                  })

Before store the DataFrame to csv, check the output and make sure it looks okay.

In [55]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7797 entries, 0 to 7796
Data columns (total 15 columns):
rental_title      7797 non-null object
borough           7797 non-null object
street_address    7797 non-null object
city              7797 non-null object
state             7797 non-null object
postal_code       7797 non-null object
rating            7797 non-null object
amenity           7797 non-null object
pet_policy        7797 non-null object
property_info     7797 non-null object
school            7797 non-null object
bedrooms          7797 non-null object
bathrooms         7797 non-null object
rent              7797 non-null object
sqft              7797 non-null object
dtypes: object(15)
memory usage: 913.8+ KB


In [56]:
df.shape

(7797, 15)

In [57]:
df.to_csv('data/ny_rental_data_v2.csv')

In [None]:
# total number of amenities
# dishwasher, 'Washer/Dryer', 'Wifi' or 'high-spread internet'
# 

In [71]:
df.sqft.value_counts()

                       1738
1,000 Sq Ft             256
700 Sq Ft               255
800 Sq Ft               217
900 Sq Ft               199
650 Sq Ft               120
—                       113
750 Sq Ft               111
850 Sq Ft               111
600 Sq Ft               108
1,200 Sq Ft             106
500 Sq Ft               105
1,100 Sq Ft             100
1 Sq Ft                  99
950 Sq Ft                86
550 Sq Ft                67
702 Sq Ft                32
414 Sq Ft                30
400 Sq Ft                29
450 Sq Ft                28
716 Sq Ft                26
528 Sq Ft                26
636 Sq Ft                25
708 SF                   25
675 Sq Ft                24
684 Sq Ft                24
691 Sq Ft                22
718 Sq Ft                21
680 Sq Ft                20
1,300 Sq Ft              20
                       ... 
315 SF                    1
883 SF                    1
430 Sq Ft                 1
1,118 Sq Ft               1
1,125 Sq Ft         

In [76]:
df.rating.value_counts(normalize = True)

5 star property    0.475192
                   0.294501
4 star property    0.141944
3 star property    0.054092
2 star property    0.019565
1 star property    0.013171
No Rating Yet      0.001535
Name: rating, dtype: float64

In [None]:
# possibly use the full address to get the region??
# note: we have the brough (county), which is rough 

In [77]:
df.neighbor.value_counts()

 Hunters Point            602
 Downtown Brooklyn        301
 Williamsburg             289
 Central Queens           236
 South Shore Queens       212
 Stapleton                209
 Hell's Kitchen           199
 St George                195
 Bowery                   175
 Roosevelt Island         166
 Bushwick                 165
 Downtown Astoria         157
 Hudson Yards             156
 Greenpoint               154
 Constable Hook           144
 West Chelsea             144
 Southeast Queens         137
 Fort Greene              128
 Washington Heights       120
 East Harlem              101
 Upper West Side           99
 Morningside Heights       96
 Queens                    93
 Bay Ridge                 92
 Central Harlem            89
 Bedford-Stuyvesant        85
 East Village              83
 Carteret                  83
 Ditmars Steinway          75
 Times Square              67
                         ... 
 Fairmont                   1
 Brookville                 1
 Rosedale 

In [78]:
df.city.value_counts()

 New York               1510
 Brooklyn               1086
 Staten Island           942
 Long Island City        696
New York                 680
Brooklyn                 659
Queens                   512
 The Bronx               279
 Far Rockaway            225
 Queens                  203
 Bronx                   200
 Bayonne                 138
 Astoria                  88
 Carteret                 83
The Bronx                 68
 Perth Amboy              59
 Douglaston               57
 Elizabeth                43
Bronx                     40
 Flushing                 39
 Corona                   30
Staten Island             30
 Arverne                  20
 Inwood                   19
 Jamaica                  18
 Forest Hills             14
 Kew Gardens Hills        10
Yonkers                    9
 Ozone Park                8
 Rego Park                 8
 Elmhurst                  6
Bayonne                    6
 Bayside                   6
 Hollis                    6
 Yonkers      

In [72]:
df.sample(5)

Unnamed: 0,rental_title,borough,street_address,city,postal_code,neighbor,rating,amenity,pet_policy,property_info,school,bedrooms,bathrooms,rent,sqft
4259,"Avalon Fort Greene, Brooklyn, NY",brooklyn-ny,343 Gold St,Brooklyn,11201,Fort Greene,5 star property,"[Package Service, Maintenance on site, Doorman...",[\n\nDogs and Cats Allowed:\r\nPet Policy for ...,\nProperty Information\n\n•Built in 2011\n•631...,"[{'school_name': 'Public Elementary School', '...",1 BR,1 BA,"$3,737 - 3,786",763 Sq Ft
3604,"160-45 95th St Unit 2, Queens, NY",queens-ny,,Queens,11414,Old Howard Beach,,"[Waterfront, Smoke Free, Dishwasher, Microwave...",[\n\n\r\n No Pets A...,,[],3 BRs,1 BA,"$2,300","1,400 Sq Ft"
1656,"Avalon West Chelsea, New York, NY",manhattan-ny,282 11th Ave,New York,10001,West Chelsea,5 star property,"[Package Service, Wi-Fi at Pool and Clubhouse,...",[\n\nDogs and Cats Allowed:\r\nWe are a pet fr...,\nProperty Information\n\n•Built in 2014\n•305...,"[{'school_name': 'Public Elementary School', '...",1 BR,1 BA,Call for Rent,684 Sq Ft
935,"Henry Hall, New York, NY",manhattan-ny,509 W 38th St,New York,10018,Hudson Yards,5 star property,"[Package Service, Laundry Facilities, On Site ...",[\n\nDogs and Cats Allowed:\r\nWe welcome up t...,\nProperty Information\n\n•Built in 2017\n•225...,"[{'school_name': 'Public Elementary School', '...",1 BR,1 BA,"$4,242",615 Sq Ft
6383,"Todt Hill Houses, Staten Island, NY",staten-island-ny,815 Manor Rd,Staten Island,10314,Emerson Hill,1 star property,[],[],\nProperty Information\n\n•Built in 1950\n•502...,"[{'school_name': 'Public Elementary School', '...",4 Beds,,,


In [83]:
df.to_csv('data/ny_rental_data.csv')