# Predicting Redevelopment Potential for Boston Parcels

**Authors:** Milo Margolis, Dhruv Rokkam

**Problem Statement:** This project develops a binary classification model to predict the redevelopment potential for Boston properties using parcel data. Properties are labels as high potential based on indicators such as low building to land ratios and underutilized FAR and then classified using logistic regression, KNN, and decision trees with the proper training, validation, and hyperparameter tuning to demonstrate overfitting prevention. 


In [None]:
# import the libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split, cross_val_score, StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import (accuracy_score, precision_score, recall_score, 
                            f1_score, confusion_matrix, roc_curve, auc)
from pathlib import Path
import warnings
warnings.filterwarnings('ignore')

# set the random seed 
np.random.seed(42)

# set the plotting style
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")


## Section 2: Data Cleaning


In [None]:
# load the csv 
data_path = Path("../data/raw/boston_properties.csv")
df = pd.read_csv(data_path)

# show the basic info here about the dataset
print("Dataset Shape:")
print(df.shape)
print("\n" + "="*50)
print("Dataset Info:")
print("="*50)
df.info()
