# Housing Price Prediction: King County Dataset 🏡

This project was completed as part of a Data Analysis course on Coursera. The goal is to predict housing prices using various features such as square footage, number of bedrooms, number of floors, etc.

The dataset contains housing sale prices for King County, including Seattle. It includes homes sold between May 2014 and May 2015.

## Tools Used
- Python
- Pandas, NumPy
- Seaborn, Matplotlib
- Scikit-learn

---


### Question 1
Display the data types of each column.

In [None]:
df.dtypes

### Question 2
Drop 'id' and 'Unnamed: 0', then display statistics.

In [None]:
df.drop(["id", "Unnamed: 0"], axis=1, inplace=True)
df.describe()

### Question 3
Count the number of houses with unique floor values.

In [None]:
df['floors'].value_counts().to_frame()

### Question 4
Boxplot for price vs. waterfront.

In [None]:
import seaborn as sns
sns.boxplot(x='waterfront', y='price', data=df)

### Question 5
Regplot: sqft_above vs. price.

In [None]:
sns.regplot(x='sqft_above', y='price', data=df)

### Question 6
Linear Regression: sqft_living vs. price.

In [None]:
from sklearn.linear_model import LinearRegression
X = df[['sqft_living']]
y = df['price']
model = LinearRegression()
model.fit(X, y)
model.score(X, y)

### Question 7
Linear Regression with multiple features.

In [None]:
features = ["floors", "waterfront", "lat", "bedrooms", "sqft_basement", "view", "bathrooms", "sqft_living15", "sqft_above", "grade", "sqft_living"]
from sklearn.impute import SimpleImputer
X = SimpleImputer(strategy='mean').fit_transform(df[features])
y = df['price']
model = LinearRegression()
model.fit(X, y)
model.score(X, y)

### Question 8
Pipeline with polynomial features and regression.

In [None]:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
pipeline = Pipeline([
    ('imputer', SimpleImputer(strategy='mean')),
    ('scale', StandardScaler()),
    ('polynomial', PolynomialFeatures(include_bias=False)),
    ('model', LinearRegression())
])
X = df[features]
y = df['price']
pipeline.fit(X, y)
pipeline.score(X, y)

### Question 9
Ridge Regression (alpha=0.1) with test set.

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Ridge
X = df[features]
y = df['price']
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.15, random_state=1)
x_train = SimpleImputer(strategy='mean').fit_transform(x_train)
x_test = SimpleImputer(strategy='mean').transform(x_test)
ridge = Ridge(alpha=0.1)
ridge.fit(x_train, y_train)
ridge.score(x_test, y_test)

### Question 10
Polynomial Ridge Regression (degree=2, alpha=0.1).

In [None]:
from sklearn.preprocessing import PolynomialFeatures
poly = PolynomialFeatures(degree=2)
x_train_poly = poly.fit_transform(x_train)
x_test_poly = poly.transform(x_test)
ridge_poly = Ridge(alpha=0.1)
ridge_poly.fit(x_train_poly, y_train)
ridge_poly.score(x_test_poly, y_test)