### **House Price Prediction with Linear Regression**

---

### **Objective**  
The objective of this project is to build a Linear Regression model to predict house prices based on various features. This regression problem involves data preprocessing, model training, evaluation, and prediction.

---

### **Data Source**  
The dataset for this project is sourced from [GitHub](https://raw.githubusercontent.com/ywchiu/riii/master/data/house-prices.csv).

---

### **Import Library**

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.preprocessing import StandardScaler

---

### **Import Data**

In [None]:
# Load the dataset
url = 'https://raw.githubusercontent.com/ywchiu/riii/master/data/house-prices.csv'
data = pd.read_csv(url)

---

### **Describe Data**

In [None]:
# Clean column names to remove extra spaces and convert to lowercase
data.columns = data.columns.str.strip().str.lower()

# Display first few rows
print("First few rows of the dataset:")
print(data.head())

# Dataset information
print("\nDataset Info:")
data.info()

# Summary statistics
print("\nSummary statistics for numerical features:")
print(data.describe())

---

### **Data Visualization**

In [None]:
# Visualize the distribution of the target variable (assumed to be 'price')
sns.histplot(data['price'], kde=True, bins=30)
plt.title('Distribution of House Prices')
plt.show()

# Scatter plot for a feature and target variable
sns.scatterplot(x='grlivarea', y='price', data=data)
plt.title('Living Area vs House Price')
plt.show()

---

### **Data Preprocessing**

In [None]:
# Handle missing values
data.fillna(data.mean(), inplace=True)

# Encode categorical variables if necessary
data = pd.get_dummies(data)

# Feature scaling
scaler = StandardScaler()
data_scaled = scaler.fit_transform(data)

---

### **Define Target Variable (y) and Feature Variables (X)**

In [None]:
y = data['price']
X = data.drop(['price'], axis=1)

---

### **Train Test Split**

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

---

### **Modeling**

In [None]:
model = LinearRegression()
model.fit(X_train, y_train)

---

### **Model Evaluation**

In [None]:
# Predict on the test set
y_pred = model.predict(X_test)

# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"Mean Squared Error: {mse}")
print(f"R-squared: {r2}")

---

### **Prediction**

In [None]:
# Example prediction
new_data = np.array([[1500, 3, 2]])  # Adjust based on features
predicted_price = model.predict(new_data)
print(f"Predicted House Price: ${predicted_price[0]:,.2f}")

---

 ### **Explanation**

1.Objective: Predict house prices using Linear Regression.

2.Data Source: The dataset is publicly available on GitHub.

3.Preprocessing: Missing values are handled, categorical variables are encoded, and features are scaled.

4.Visualization: Key relationships and distributions are explored for better understanding.

5.Model Training: Linear Regression is used to model the relationship between features and house prices.

6.Evaluation: The model's performance is evaluated using metrics such as Mean Squared Error, RMSE, and R-squared.

7.Prediction: The model predicts prices for new data, showcasing its utility for real-world scenarios.