## Imports

In [56]:
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.impute import SimpleImputer
from sklearn.metrics import mean_squared_error, mean_absolute_error
from sklearn.linear_model import LassoCV
from sklearn.preprocessing import StandardScaler

In [44]:
listings_scaled= pd.read_csv('../data/scaled_encoded_data.csv')

## Baseline

In [45]:
# Compute the mean or median
target_mean = np.mean(y_train) 

# Baseline prediction
baseline_predictions = np.full_like(y_test, target_mean)  

# Evaluate the baseline
baseline_mse = mean_squared_error(y_test, baseline_predictions)
baseline_mae = mean_absolute_error(y_test, baseline_predictions)

print("Baseline MSE:", baseline_mse)
print("Baseline MAE:", baseline_mae)


Baseline MSE: 0.37874735662089654
Baseline MAE: 0.24130426677897318


**The code above represents the computation of a baseline for predicting Airbnb daily prices in Broward County. The baseline prediction is based on a simple strategy, where the predicted value for each data point is set to the mean or median of the target variable in the training set.
By using the mean or median as the baseline prediction, it is assumed that all instances in the test set will have the same predicted value, equal to the average value observed in the training set. The computed baseline MSE and MAE provide a benchmark for evaluating the performance of more advanced models.If the model being developed cannot outperform the baseline, it suggests that the model is not learning any meaningful patterns from the data and should be revised or improved**.

## Gradient Boosting

In [30]:
# Select the relevant features and target variable
features = ['accommodates'] + listings_scaled.columns.tolist()
target = 'price'


X = listings_scaled[features]
y = listings_scaled[target]


In [31]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [35]:
# Handle missing values
imputer = SimpleImputer()
X_train_imputed = imputer.fit_transform(X_train)
X_test_imputed = imputer.transform(X_test)

In [36]:
# Train the model
model = GradientBoostingRegressor()
model.fit(X_train_imputed, y_train)

In [38]:
# Predict on the test set
y_pred = model.predict(X_test_imputed)
y_pred

array([-0.26621971,  0.35730649, -0.06532729, ...,  0.14662266,
        0.10948189, -0.20997648])

In [39]:
# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)

print("Mean Squared Error:", mse)
print("Mean Absolute Error:", mae)

Mean Squared Error: 0.008801991350739722
Mean Absolute Error: 0.0066841123553667055


**The model's Mean Squared Error (MSE) is 0.0088, which means that, on average, its predictions are very close to the actual daily prices. The Mean Absolute Error (MAE) is 0.0067, indicating that the model's predictions have a small average difference from the actual prices. These results show that the model is performing well and accurately predicting the daily prices of Airbnb listings in Broward County. Overall, the model is reliable in estimating the prices, as its predictions closely match the true prices**.

## LassoCV

In [58]:
features = ['accommodates'] + listings_scaled.columns.tolist()
target = 'price'


X = listings_scaled[features]
y = listings_scaled[target]


In [59]:
# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


In [60]:
# Scale the features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

In [61]:
# Impute missing values
imputer = SimpleImputer()
X_train_imputed = imputer.fit_transform(X_train_scaled)
X_test_imputed = imputer.transform(X_test_scaled)


In [62]:
# Train the LassoCV model
model = LassoCV(cv=5)
model.fit(X_train_imputed, y_train)

In [64]:
# Make predictions on the test set
y_pred = model.predict(X_test_imputed)

In [65]:
# Evaluate metrics
mse = mean_squared_error(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)

print("Mean Squared Error:", mse)
print("Mean Absolute Error:", mae)

Mean Squared Error: 3.787473566206398e-07
Mean Absolute Error: 0.00024130426677888926


**The LassoCV model is doing a good job at predicting the daily prices of Airbnb listings in Broward County. Its predictions are very close to the actual prices, with only a small average difference. This means that the model is accurate and reliable in estimating the prices of the listings**



