**NAMES: Mizeromahire Celse**

**ID: 25599**


# Python Practical Quiz
---

### Question:

- Make a ML model to predict Housing prices
- Model to use is **Linear Regression**

### Answer

The first step as always is to import the libraries we will use in training this model

In [4]:
# Step 1: Import necessary libraries
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error

**Task 1**: with the help of AI, I created a dataset to use in training this model

In [5]:
# Step 2: Create a simple, reasonable dataset with house sizes and prices
data = {
    'Size(sq ft)': [1200, 1500, 1800, 2100, 2400, 1350, 1650, 1950, 2250, 2700, 1100, 1700, 2000, 2300, 2600],
    'Price($)': [220000, 265000, 312000, 345000, 410000, 245000, 285000, 330000, 375000, 450000, 210000, 295000, 335000, 390000, 425000]
}

# Convert data into DataFrame
df = pd.DataFrame(data)

# Display dataset
print("House Price Dataset:")
print(df)

num_rows = len(df)
print("\n")
print(f"Number of rows: {num_rows}")

House Price Dataset:
    Size(sq ft)  Price($)
0          1200    220000
1          1500    265000
2          1800    312000
3          2100    345000
4          2400    410000
5          1350    245000
6          1650    285000
7          1950    330000
8          2250    375000
9          2700    450000
10         1100    210000
11         1700    295000
12         2000    335000
13         2300    390000
14         2600    425000


Number of rows: 15


**Task 2**: the next task is to define the feature and the target variables:
- the feature variable in this case is Size
- the target variable is the Price 

In [10]:
# Step 3: Define feature and target variables
X = df[['Size(sq ft)']]
y = df['Price($)']


**Task 3:** The next task was to split our dataset into training ans testing set using the 80/20 ratio

In [11]:
# Step 4: Split dataset into training and testing sets (80/20)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

**Task 4**: After splitting the dataset, we now train our model using the linear regression.

In [12]:
# Step 5: Train the linear regression model
model = LinearRegression()
model.fit(X_train, y_train)

Here we display the coeffients and the intercept.
- **the coeffient** represents how much the house price increases for each additional square foot
- **the intercept** represents the "base price" of a house with 0 square feet (a theoretical value)

In [13]:
# Display model parameters
print(f"Model Coefficient (slope): ${model.coef_[0]:.2f} per square foot")
print(f"Model Intercept: ${model.intercept_:.2f}")
print(f"Formula: Price = ${model.intercept_:.2f} + ${model.coef_[0]:.2f} * Size")
print("\n")

Model Coefficient (slope): $147.82 per square foot
Model Intercept: $43930.27
Formula: Price = $43930.27 + $147.82 * Size




**Task 6**: Here we use a test data to predict data on the test set. We also use compare the result to see how far of the model was from the actual price.

In [14]:
# Step 6: Make predictions on the test set
y_pred = model.predict(X_test)

# Display predictions vs actual values with rounded values
comparison = pd.DataFrame({
    'Size(sq ft)': X_test['Size(sq ft)'],
    'Actual Price($)': y_test,
    'Predicted Price($)': np.round(y_pred, 2),  # Round to 2 decimal places
    'Difference($)': y_test - np.round(y_pred, 2)  
})

print("Predictions vs Actual Values:")
print(comparison)
print("\n")

Predictions vs Actual Values:
    Size(sq ft)  Actual Price($)  Predicted Price($)  Difference($)
9          2700           450000           443041.54        6958.46
11         1700           295000           295222.55        -222.55
0          1200           220000           221313.06       -1313.06




**Task 6**: Mean Absolute Error and Mean Squared Error are common error metrics used for evaluating how well the model performs.

In [15]:
# Step 7: Evaluate model performance
mae = mean_absolute_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)


print("Model Evaluation Metrics:")
print(f"Mean Absolute Error: ${mae:.2f}")
print(f"Mean Squared Error: ${mse:.2f}")

print("\n")


Model Evaluation Metrics:
Mean Absolute Error: $2831.36
Mean Squared Error: $16731256.62




## Task 7: Possible Improvement
---
To improve this model, we could collect and incorporate additional features variables such as *location, number of bedrooms/bathrooms, age of the house, and neighborhood characteristics*. These additional variables would likely provide a more comprehensive prediction model with higher accuracy, as house prices are determined by many factors beyond just size.


### Deploying

In [17]:
# Save the trained model
import joblib

# Save the trained model
joblib.dump(model, 'house_price_model.joblib')



['house_price_model.joblib']