<a href="https://colab.research.google.com/github/Issac-Kondreddy/ML-Classification-Analysis/blob/main/Comprehensive_Machine_Learning_Analysis_Classification_and_Insights.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **House Price Prediction Project**

**Introduction**

Welcome to my House Price Prediction Project! In this project, I aim to develop a predictive model to accurately estimate the selling prices of houses based on their various features. The real estate market is influenced by numerous factors like location, size, amenities, and economic conditions. My goal is to leverage machine learning algorithms to analyze these factors and make informed predictions about house prices.

**The project will involve several key steps:**

**Data Collection:** I will use a comprehensive dataset that includes different features of houses and their sale prices.
Exploratory Data Analysis (EDA): This step will involve exploring the dataset to understand its characteristics, identify patterns, and detect anomalies or outliers.

**Feature Engineering:** I will process and engineer the features to better suit the machine learning models.

**Model Selection and Training:** I will experiment with various machine learning models, such as linear regression, decision trees, and random forests, to find the most effective approach for the prediction task.

**Evaluation:** The models will be evaluated based on their accuracy and ability to generalize, using metrics like Mean Squared Error (MSE) and R-squared.

By the end of this project, my objective is to compare various machine learning models and identify the most effective one for predicting house prices. This exploration will allow me to understand the strengths and weaknesses of each model in the context of real estate data. I'm excited to embark on this journey to uncover the most insightful and accurate approach to house price prediction!

# Connect to Google Drive
In this section, we'll connect the Colab notebook to Google Drive to access the Ames Housing dataset.

In [3]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


# Load the Ames Housing Dataset
Now that we've connected to Google Drive, let's load the Ames Housing dataset into our notebook for analysis.
We have four separate files: 'AmesHousing.csv', 'target.csv', 'train.csv', and 'test.csv'. Let's load each of these files to understand their contents and structure.


In [4]:
import pandas as pd

ames_housing_path = '/content/drive/MyDrive/Datasets/Ames Iowa Housing Dataset/AmesHousing.csv'
target_path = '/content/drive/MyDrive/Datasets/Ames Iowa Housing Dataset/target.csv'
train_path = '/content/drive/MyDrive/Datasets/Ames Iowa Housing Dataset/train.csv'
test_path = '/content/drive/MyDrive/Datasets/Ames Iowa Housing Dataset/test.csv'

# Loading the datasets
ames_housing_data = pd.read_csv(ames_housing_path)
target_data = pd.read_csv(target_path)
train_data = pd.read_csv(train_path)
test_data = pd.read_csv(test_path)

# Displaying the first few rows of each dataset
display(ames_housing_data.head(), target_data.head(), train_data.head(), test_data.head())

Unnamed: 0,Order,PID,MS SubClass,MS Zoning,Lot Frontage,Lot Area,Street,Alley,Lot Shape,Land Contour,...,Pool Area,Pool QC,Fence,Misc Feature,Misc Val,Mo Sold,Yr Sold,Sale Type,Sale Condition,SalePrice
0,1,526301100,20,RL,141.0,31770,Pave,,IR1,Lvl,...,0,,,,0,5,2010,WD,Normal,215000
1,2,526350040,20,RH,80.0,11622,Pave,,Reg,Lvl,...,0,,MnPrv,,0,6,2010,WD,Normal,105000
2,3,526351010,20,RL,81.0,14267,Pave,,IR1,Lvl,...,0,,,Gar2,12500,6,2010,WD,Normal,172000
3,4,526353030,20,RL,93.0,11160,Pave,,Reg,Lvl,...,0,,,,0,4,2010,WD,Normal,244000
4,5,527105010,60,RL,74.0,13830,Pave,,IR1,Lvl,...,0,,MnPrv,,0,3,2010,WD,Normal,189900


Unnamed: 0,Order,SalePrice
0,2127,123600
1,193,209500
2,2407,202665
3,46,224000
4,2478,187000


Unnamed: 0,Order,PID,MS SubClass,MS Zoning,Lot Frontage,Lot Area,Street,Alley,Lot Shape,Land Contour,...,Pool Area,Pool QC,Fence,Misc Feature,Misc Val,Mo Sold,Yr Sold,Sale Type,Sale Condition,SalePrice
0,534,531363010,20,RL,80.0,9605,Pave,,Reg,Lvl,...,0,,,,0,4,2009,WD,Normal,159000
1,803,906203120,20,RL,90.0,14684,Pave,,IR1,Lvl,...,0,,,,0,6,2009,WD,Normal,271900
2,956,916176030,20,RL,,14375,Pave,,IR1,Lvl,...,0,,,,0,1,2009,COD,Abnorml,137500
3,460,528180130,120,RL,48.0,6472,Pave,,Reg,Lvl,...,0,,,,0,4,2009,WD,Normal,248500
4,487,528290030,80,RL,61.0,9734,Pave,,IR1,Lvl,...,0,,,,0,5,2009,WD,Normal,167000


Unnamed: 0,Order,PID,MS SubClass,MS Zoning,Lot Frontage,Lot Area,Street,Alley,Lot Shape,Land Contour,...,Screen Porch,Pool Area,Pool QC,Fence,Misc Feature,Misc Val,Mo Sold,Yr Sold,Sale Type,Sale Condition
0,2127,907135180,20,RL,60.0,8070,Pave,,Reg,Lvl,...,0,0,,,,0,8,2007,WD,Normal
1,193,903206120,75,RL,,7793,Pave,,IR1,Bnk,...,0,0,,,,0,5,2010,WD,Normal
2,2407,528181040,120,RL,40.0,6792,Pave,,IR1,Lvl,...,0,0,,,,0,3,2006,New,Partial
3,46,528175010,120,RL,44.0,6371,Pave,,IR1,Lvl,...,0,0,,,,0,6,2010,New,Partial
4,2478,531379030,60,RL,70.0,8304,Pave,,IR1,Lvl,...,0,0,,GdPrv,,0,7,2006,WD,Normal




# Examining Shapes and Sizes of the Datasets
Let's check the shapes and sizes of 'AmesHousing.csv', 'target.csv', 'train.csv', and 'test.csv' to understand the dimensions and memory usage of each dataset.


In [5]:
# Displaying the shape and size of each dataset
print("Ames Housing Dataset Shape: ", ames_housing_data.shape)
print("Ames Housing Dataset Memory Usage: ", ames_housing_data.memory_usage().sum(), "bytes")
print("---")

print("Target Dataset Shape: ", target_data.shape)
print("Target Dataset Memory Usage: ", target_data.memory_usage().sum(), "bytes")
print("---")

print("Train Dataset Shape: ", train_data.shape)
print("Train Dataset Memory Usage: ", train_data.memory_usage().sum(), "bytes")
print("---")

print("Test Dataset Shape: ", test_data.shape)
print("Test Dataset Memory Usage: ", test_data.memory_usage().sum(), "bytes")

Ames Housing Dataset Shape:  (2930, 82)
Ames Housing Dataset Memory Usage:  1922208 bytes
---
Target Dataset Shape:  (733, 2)
Target Dataset Memory Usage:  11856 bytes
---
Train Dataset Shape:  (2197, 82)
Train Dataset Memory Usage:  1441360 bytes
---
Test Dataset Shape:  (733, 81)
Test Dataset Memory Usage:  475112 bytes


# Exploratory Data Analysis (EDA)
In this section, we'll explore the Ames Housing dataset to understand its characteristics, discover patterns, and identify any anomalies or outliers. This will help us gain insights into the factors affecting house prices.