NYC Property Sales

by: Aviv Farag, Joseph Logan, Abdulaziz Alquzi

Abstract:

We are data science consultants who are contracted by property management investors in New York City. Their company, supported by investors, wants to buy residential real estate in NYC at the cheapest price possible, renovate, then resell within a year. The renovation analysis is outside the scope of this project, but they want a baseline model that can predict the price of residential real-estate in order to :

Identify potential undervalued listed properties to buy Predict market price when it’s time to sell in order to sell quickly while maximizing return on investment Because the want to renovate and sell the properties quickly, they want less than 10 residential units, and properties less than 5 million each but are at least ten thousand.

Python Packages:

pandas
import pandas as pd
numpy
import numpy as np
matplotlib.pyplot
import matplotlib.pyplot as plt
joblib
import joblib
seaborn
import seaborn as sns
scipy.stats.randint
from scipy.stats import randint

sklearn:

sklearn.metrics:
1. mean_squared_error
2. mean_absolute_error
3. r2_score
4. confusion_matrix
sklearn.ensemble:
1. RandomForestRegressor
2. BaggingRegressor
sklearn.model_selection:
1. train_test_split
2. GridSearchCV
3. RandomizedSearchCV
4. cross_validate
5. KFold
sklearn.preprocessing:
1. StandardScaler
2. OneHotEncoder
3. RobustScaler
sklearn.linear_model: LinearRegression
sklearn.model_selection: train_test_split
sklearn.pipline: Pipline
sklearn.compose: ColumnTransformer
sklearn.decimposition: PCA
sklearn.dummy: DummyRegressor

from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import RobustScaler, StandardScaler
from sklearn.compose import ColumnTransformer
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor, BaggingRegressor
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from scipy.stats import randint
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV
from sklearn.dummy import DummyRegressor
from sklearn.tree import ExtraTreeRegressor
from sklearn.model_selection import cross_validate, KFold

Functions

random_SCV(pipe = [], grid_param = [], n_iter = 10, cv = 5, scoring = 'neg_mean_squared_error', rnd_state = 42, file_name = "", training = [])
Running RandomizedSearchCV for an estimator "pipe" according to grid_param and the other parameters including a list of x_training and y_training (training). The results are saved in param_tuning folder in the file named: file_name.
grid_SCV(pipe = [], grid_param = [], cv = 5, scoring = 'neg_mean_squared_error', file_name = "", training = [])
Similar to the first function, but this time it is GridSearchCV that runs on an estimator "pipe".
wr_pkl_file(file_name = "",content = "", read = False)
Dealing with either reading or writing a pkl file that contains different machine learning pipelines with their corresponding results.
print_results(labels = [], est = [], plt_num = 50, log = False, testing = [])
Predicting sales prices and printing results (R-Squared, MAE, and RMSE) for different estimators (est).
validation(models = [], estimators = [], training = [], cv = 5, train_score = False):
Performs cross validation for different models using their estimators and training set.

Setup and running the code:

Clone the repo using the following command in terminal:
git clone https://github.com/avivfaraj/DSCI631-project.git

After cloning the repo, open Final_project.ipynb and run each cell one at a time in the order that they are presented. You can run the whole notebook in a single step by clicking on the menu Cell -> Run All.

The first two sections are packages and functions which are required for the code to run. Make sure to run those two sections before running the program.

Acknowledgements

Dataset was found at Kaggle.
The origin of the data in this dataset is NYC Department of Finance Rolling Sales

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
Final Report		Final Report
Proposal		Proposal
data		data
param_tuning		param_tuning
.gitignore		.gitignore
Final Project.ipynb		Final Project.ipynb
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NYC Property Sales

by: Aviv Farag, Joseph Logan, Abdulaziz Alquzi

Table of Contents

Abstract:

Python Packages:

Functions

Setup and running the code:

Acknowledgements

About

Languages

License

avivfaraj/DSCI631-project

Folders and files

Latest commit

History

Repository files navigation

NYC Property Sales

by: Aviv Farag, Joseph Logan, Abdulaziz Alquzi

Table of Contents

Abstract:

Python Packages:

Functions

Setup and running the code:

Acknowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Languages