# QCTO - Workplace Module

### Chennai Housing Sales Price Prediction
#### Done By: Nelisiwe Bezana

© ExploreAI 2024

---

## Table of Contents

<a href=#BC> Background Context</a>

<a href=#one>1. Importing Packages</a>

<a href=#two>2. Data Collection and Description</a>

<a href=#three>3. Loading Data </a>

<a href=#four>4. Data Cleaning and Filtering</a>

<a href=#five>5. Exploratory Data Analysis (EDA)</a>

<a href=#six>6. Modeling </a>

<a href=#seven>7. Evaluation and Validation</a>

<a href=#eight>8. Final Model</a>

<a href=#nine>9. Conclusion and Future Work</a>

<a href=#ten>10. References</a>

---
 <a id="BC"></a>
## **Background Context**
<a href=#cont>Back to Table of Contents</a>

### Project Overview:
This project focuses on analyzing real estate data from the city of Chennai. The primary goal is to explore and understand the various factors influencing property prices and to develop predictive models that can accurately estimate property values based on these factors.

### Problem Domain:
In rapidly urbanizing cities like Chennai, understanding the dynamics of the real estate market is crucial for both buyers and sellers. Property prices are influenced by a multitude of factors, including location, size, number of rooms, proximity to amenities, and the overall quality of the property. Accurately predicting property prices can assist potential buyers in making informed decisions, while also aiding sellers in setting competitive prices.

### Goals and Objectives:

- To clean and preprocess the real estate dataset for accurate analysis.
- To explore and analyze the key factors affecting property prices in Chennai.
- To build and validate predictive models that can estimate property values based on these factors.
- To provide insights that could help stakeholders in the real estate market make data-driven decisions.

### Significance of the Project:
The outcomes of this project have practical implications for the real estate market in Chennai. Accurate property price predictions can lead to better market efficiency, reduced uncertainty in transactions, and more informed decision-making for all parties involved. Additionally, this project can serve as a case study for similar analyses in other urban centers, contributing to the broader field of real estate analytics.

---
<a href=#one></a>
## **Importing Packages**
<a href=#cont>Back to Table of Contents</a>

* **Makdown**

In [1]:
import pandas as pd

---
<a href=#two></a>
## **Data Collection and Description**
<a href=#cont>Back to Table of Contents</a>

**Details:** The dataset used for this project was obtained from Kaggle, a popular platform for data science and machine learning resources. The data consists of real estate transactions in Chennai, India, and is intended for use in predicting housing sale prices.

### Overview of the Dataset:

- **Source:** The dataset is sourced from Kaggle, ensuring a reliable and well-documented collection.
- **Size:** The dataset contains approximately 7,109 records and 22 fields, providing a comprehensive view of various attributes associated with housing sales.
- **Scope:** The dataset covers properties located in different areas of Chennai, allowing for a diverse analysis of housing prices across the city.

### Types of Data:
**Numerical:** Attributes such as internal square footage (`INT_SQFT`), number of bedrooms (`N_BEDROOM`), number of bathrooms (`N_BATHROOM`), and sales price (`SALES_PRICE`).
**Categorical:** Attributes like area (AREA), building type (`BUILDTYPE`), and sale condition (`SALE_COND`).

---
<a href=#three></a>
## **Loading Data**
<a href=#cont>Back to Table of Contents</a>

**Markdown** 

In [3]:
data = pd.read_csv('housing_price_data.csv')
print(data.head())

   PRT_ID        AREA  INT_SQFT   DATE_SALE  DIST_MAINROAD  N_BEDROOM  \
0  P03210  Karapakkam      1004  04-05-2011            131        1.0   
1  P09411  Anna Nagar      1986  19-12-2006             26        2.0   
2  P01812       Adyar       909  04-02-2012             70        1.0   
3  P05346   Velachery      1855  13-03-2010             14        3.0   
4  P06210  Karapakkam      1226  05-10-2009             84        1.0   

   N_BATHROOM  N_ROOM SALE_COND PARK_FACIL  ... UTILITY_AVAIL  STREET MZZONE  \
0         1.0       3  AbNormal        Yes  ...        AllPub   Paved      A   
1         1.0       5  AbNormal         No  ...        AllPub  Gravel     RH   
2         1.0       3  AbNormal        Yes  ...           ELO  Gravel     RL   
3         2.0       5    Family         No  ...       NoSewr    Paved      I   
4         1.0       3  AbNormal        Yes  ...        AllPub  Gravel      C   

  QS_ROOMS QS_BATHROOM  QS_BEDROOM  QS_OVERALL  REG_FEE  COMMIS  SALES_PRICE  
0

---
<a href=#four></a>
## **Data Cleaning and Filtering**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Prepare the data for analysis by cleaning and filtering.
* **Details:** Include steps for handling missing values, removing outliers, correcting errors, and possibly reducing the data (filtering based on certain criteria or features).
---

In [4]:
# Checking for missing values in the dataset
print("Missing values in each column:")
print(data.isnull().sum())

Missing values in each column:
PRT_ID            0
AREA              0
INT_SQFT          0
DATE_SALE         0
DIST_MAINROAD     0
N_BEDROOM         1
N_BATHROOM        5
N_ROOM            0
SALE_COND         0
PARK_FACIL        0
DATE_BUILD        0
BUILDTYPE         0
UTILITY_AVAIL     0
STREET            0
MZZONE            0
QS_ROOMS          0
QS_BATHROOM       0
QS_BEDROOM        0
QS_OVERALL       48
REG_FEE           0
COMMIS            0
SALES_PRICE       0
dtype: int64


In [5]:
# Handling missing values
#Filling missing values with the median for numerical columns
data['N_BEDROOM'].fillna(data['N_BEDROOM'].median(), inplace=True)
data['N_BATHROOM'].fillna(data['N_BATHROOM'].median(), inplace=True)

---
<a href=#five></a>
## **Exploratory Data Analysis (EDA)**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Explore and visualize the data to uncover patterns, trends, and relationships.
* **Details:** Use statistics and visualizations to explore the data. This may include histograms, box plots, scatter plots, and correlation matrices. Discuss any significant findings.
---


In [None]:
#Please use code cells to code in and do not forget to comment your code.

---
<a href=#six></a>
## **Modeling**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Develop and train predictive or statistical models.
* **Details:** Describe the choice of models, feature selection and engineering processes, and show how the models are trained. Include code for setting up the models and explanations of the model parameters.
---


In [None]:
#Please use code cells to code in and do not forget to comment your code.

---
<a href=#seven></a>
## **Evaluation and Validation**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Evaluate and validate the effectiveness and accuracy of the models.
* **Details:** Present metrics used to evaluate the models, such as accuracy, precision, recall, F1-score, etc. Discuss validation techniques employed, such as cross-validation or train/test split.
---

In [None]:
#Please use code cells to code in and do not forget to comment your code.

---
<a href=#eight></a>
## **Final Model**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Present the final model and its performance.
* **Details:** Highlight the best-performing model and discuss its configuration, performance, and why it was chosen over others.
---


In [None]:
#Please use code cells to code in and do not forget to comment your code.

---
<a href=#nine></a>
## **Conclusion and Future Work**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Summarize the findings and discuss future directions.
* **Details:** Conclude with a summary of the results, insights gained, limitations of the study, and suggestions for future projects or improvements in methodology or data collection.
---


In [None]:
#Please use code cells to code in and do not forget to comment your code.

---
<a href=#ten></a>
## **References**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Provide citations and sources of external content.
* **Details:** List all the references and sources consulted during the project, including data sources, research papers, and documentation for tools and libraries used.
---

In [None]:
#Please use code cells to code in and do not forget to comment your code.

## Additional Sections to Consider

* ### Appendix: 
For any additional code, detailed tables, or extended data visualizations that are supplementary to the main content.

* ### Contributors: 
If this is a group project, list the contributors and their roles or contributions to the project.
