# Housing Agency Data Science Project

This project analyzes housing data to identify key factors affecting house prices and build a predictive model.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score


## Data Collection
For this project, we will use a sample housing dataset.

In [None]:
# Tạo dataset giả lập (vì không có file sẵn)
data = {
    'Area': [750, 800, 850, 900, 950, 1000, 1100, 1200, 1500, 2000],
    'Bedrooms': [1, 2, 2, 3, 3, 3, 4, 4, 5, 6],
    'Age': [10, 15, 20, 5, 8, 12, 7, 3, 2, 1],
    'Price': [150000, 160000, 170000, 200000, 220000, 240000, 280000, 320000, 400000, 600000]
}
df = pd.DataFrame(data)
df.head()


## Data Wrangling
Check for missing values and data types.

In [None]:
df.info()
df.isnull().sum()


## Exploratory Data Analysis (EDA)
We will visualize relationships between features and price.

In [None]:
sns.pairplot(df)
plt.show()


In [None]:
plt.figure(figsize=(6,4))
sns.heatmap(df.corr(), annot=True, cmap='coolwarm')
plt.show()


## Model Development
We will build a Linear Regression model to predict house prices.

In [None]:
X = df[['Area', 'Bedrooms', 'Age']]
y = df['Price']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = LinearRegression()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)


## Evaluation

In [None]:
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("Mean Squared Error:", mse)
print("R² Score:", r2)


## Conclusion
The model provides insights into how area, number of bedrooms, and age affect house prices. You can improve this project by using a larger dataset and trying advanced models.