<a href="https://colab.research.google.com/github/KennethLengo/KL-ML-Basics-Assignments/blob/main/ML_BASICS_Part_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Coding Exercise - ML Basics**

# **Part 1 - Predict house prices based on square footage and location:**

In [None]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
# Generate sample data
# ChatGPT prompts with 100+ rows
data = ('https://raw.githubusercontent.com/KennethLengo/MISTEST2026/refs/heads/main/housing_with_location.csv')
df = pd.read_csv(data)

# Features and target
X = df[['square_footage', 'location']]
y = df['price']

# Preprocessing: One-hot encode the location column
preprocessor = ColumnTransformer(
transformers=[
('location', OneHotEncoder(sparse_output=False), ['location'])
], remainder='passthrough')

# Create pipeline with preprocessing and model
model = Pipeline(steps=[
('preprocessor', preprocessor),
('regressor', LinearRegression())
])

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Train model
model.fit(X_train, y_train)

# Make prediction for a new house: 2000 sq ft in each location (Downtown, Suburb, or Rural)
new_houses = pd.DataFrame({'square_footage': [2000, 2000, 2000], 'location': ['Downtown', 'Suburb', 'Rural']})
predicted_prices = model.predict(new_houses)
for location, price in zip(new_houses['location'], predicted_prices):
  print(f"Predicted price for a 2000 sq ft house in {location}: ${price:,.2f}")

# Display model coefficients
feature_names = (model.named_steps['preprocessor'].named_transformers_['location'].get_feature_names_out(['location'])).tolist() +['square_footage']
coefficients = model.named_steps['regressor'].coef_
print("\nModel Coefficients:")
for feature, coef in zip(feature_names, coefficients):
  print(f"{feature}: {coef:.2f}")

# Explanations And Interpretations of Data
print('''\nThis model predicts that a downtown location has the strongest positive effect on the price, with a model coefficient of 1709.61. This means that a house in downtown will cost
$1709.61 more than homes in other locations, suggesting higher demand and prices for a house downtown.''')
print('''This model predicts that the rural location has the strongest negative effect on the price, with a model coefficient of -1670.88. This means that houses in rural areas are generally cheaper and in
lower demand when compared to a downtown location.''')
print('This model predicts that the suburb location has the most minimal effect on price, with a model coefficient of -38.72. This means that a suburb house would be close in price to the average house prices.')
print('\nThe square foot coefficient implies that each additional square foot is going to cost $230.28, regardless of the location.')
print('\nThe improvement made includes displaying the predicted prices for each type of home, allowing for effective comparison across locations.')




Predicted price for a 2000 sq ft house in Downtown: $465,920.63
Predicted price for a 2000 sq ft house in Suburb: $464,172.31
Predicted price for a 2000 sq ft house in Rural: $462,540.15

Model Coefficients:
location_Downtown: 1709.61
location_Rural: -1670.88
location_Suburb: -38.72
square_footage: 230.28

This model predicts that a downtown location has the strongest positive effect on the price, with a model coefficient of 1709.61. This means that a house in downtown will cost
$1709.61 more than homes in other locations, suggesting higher demand and prices for a house downtown.
This model predicts that the rural location has the strongest negative effect on the price, with a model coefficient of -1670.88. This means that houses in rural areas are generally cheaper and in
lower demand when compared to a downtown location.
This model predicts that the suburb location has the most minimal effect on price, with a model coefficient of -38.72. This means that a suburb house would be close 