Hi!

I am an NTIA student and this is my Python and Data Analytics project.

In this project I will demonstrate how to combine data from Real Estate Websites
with data available on Public Domain such as GIS data from City of Newton MA

This project can be cloned form:
https://github.com/Tasha-Tech/NestFinder.git

This project has several modules:

**adress**   # Property information from Newton City GIS data 

**schools**  # School information from Newton City GIS data

**on_sale**  # Houses on Sale information from www.movoto.com

**finder**   # Finder module "combines" information from modules above and populates SQL database

Let us start with schools module

It has **load** function witch loads school information from geojson file

Note: jeojson file can be downloaded by calling **web_load** function

**load** function returns geopandas.geodataframe.GeoDataFrame

In [31]:
import schools
s = schools.load('downloads')
print(s.head(4))

                   NAME         ADDRESS        TYPE  \
0  Lincoln-Eliot School    191 Pearl St  Elementary   
1     Day Middle School     21 Minot Pl      Middle   
2      Fessenden School  246 Waltham St     Private   
3       Franklin School    125 Derby St  Elementary   

                     geometry  
0  POINT (-71.19402 42.35969)  
1  POINT (-71.21315 42.35813)  
2  POINT (-71.21974 42.35790)  
3  POINT (-71.22865 42.35834)  


**on_sale** module has very similar API, 

but it has **from_disk** function and uses the data from www.movoto.com website

**from_disk** function returns geopandas.geodataframe.GeoDataFrame

In [32]:
import on_sale
houses = on_sale.from_disk('downloads')
print(houses[['address', 'listPrice', 'yearBuilt']].head(4))

                address  listPrice  yearBuilt
0       31 Roosevelt Rd    1189000     1935.0
1       284 Melrose Ave    1899900     1910.0
2      401 Dedham St #B    1579000     1980.0
3  21 Beaconwood Rd #21    1450000     2022.0


Please note that geopandas.geodataframe.GeoDataFrame has location information embedded in it

In [33]:
houses.geometry.head(4)
houses[['address', 'geometry']].head(4)

Unnamed: 0,address,geometry
0,31 Roosevelt Rd,POINT (-71.19421 42.31371)
1,284 Melrose Ave,POINT (-71.24809 42.34695)
2,401 Dedham St #B,POINT (-71.19830 42.30792)
3,21 Beaconwood Rd #21,POINT (-71.20973 42.33034)


The **finder** module uses this information to calculate **minimal** distance between property
and schools of particular **TYPE**

In [34]:
s['TYPE'].unique()

array(['Elementary', 'Middle', 'Private', 'High', 'Special Ed'],
      dtype=object)

Once distance information is calculated **finder** module creates SQL in-memory database
populated with property information and distances to all school types.
This will allow to run reasonably complex quiries on this dataset.

Column names can be obtained by:

In [35]:
import finder
columns = finder.columns()
print(columns)

['address', 'listPrice', 'yearBuilt', 'lotSize', 'sqftTotal', 'bath', 'bed', 'propertyType', 'zipCode', 'Elementary_school', 'Middle_school', 'Private_school', 'High_school', 'Special Ed_school']


Below is a few examples of SQL queries:

In [36]:
finder.submit("SELECT address, listPrice, Elementary_school FROM on_sale LIMIT 5")

[('31 Roosevelt Rd', 1189000, 0.5751231650958885),
 ('284 Melrose Ave', 1899900, 0.4375193192225121),
 ('401 Dedham St #B', 1579000, 0.5497037486841614),
 ('21 Beaconwood Rd #21', 1450000, 0.5700811402079833),
 ('61 Walker St #1', 1299000, 0.7971169504253137)]

Select all houses with distance from Elementary sclool is less then 0.3 miles. (Yes, distance converted to miles :) )

In [37]:
finder.submit("SELECT address, listPrice, Elementary_school FROM on_sale WHERE Elementary_school < 0.3")

[('2 Rowe St', 1050000, 0.2349404681438777),
 ('208 Cherry', 859000, 0.14849737276881836)]

Find 3 most expensive houses on sale:

In [38]:
finder.submit("SELECT address, bed, listPrice FROM on_sale ORDER BY listPrice DESC LIMIT 3")

[('301 Waverley Ave', 9.0, 9500000),
 ('45 Claremont St', 6.0, 6000000),
 ('163 Country Club Rd #163', 7.0, 4498000)]

9 Bedrooms for 9.5M dollars it is kind of impressive!