
### Documentation for the Code (Model Preparation)
## Description:

This code is a machine learning project for predicting house prices in Bangalore, India. It performs data cleaning, preprocessing, feature engineering, and then uses a Linear Regression model for predicting the house prices based on various features like location, total square feet area, number of bathrooms, and number of bedrooms (BHK).

## Dependencies:

#### pandas: A popular library for data manipulation and analysis.

#### matplotlib: A library for data visualization.

#### numpy: A library for numerical computations.

#### scikit-learn: A library for machine learning tasks.

### Steps Performed:

1. Load the dataset using pandas and display a sample of the data.

2. Drop unnecessary columns from the dataset, including "availability," "area_type," "society," and "balcony."

3. Handle missing values by dropping rows with NaN values in any column.Since there quantity is quite low compared to whole dataset.

4. Extract the number of bedrooms (BHK) from the "size" column and create a new column "bhk" for it.

5. Fix the "total_sqft" column, which contains a range of values or invalid entries, and convert it to a numerical representation.

6. we will require to convert the labeled data of location column but number of locations is quite high So we Group similar location names together and combine those with few occurrences into a single category called "other_location."

7. Remove outliers from the dataset based on the "total_sqft" and "bhk" columns to eliminate any unrealistic data points.

8. Define a function remove_per_sqft to remove outliers based on the price per square foot for each location.

9. Apply the remove_per_sqft function to the dataset to further clean the data. In this function we grouped the dataset by location and then kept only those points which are from one standard deviaton away from mean.

10. we plot the scatterplot between price and total sq. feet at an location fount out that there are some house which have less price despite having more bedrooms.

11. Define a function remove_per_bhk to remove data points with a price per square foot lower than the mean price per square foot for lower BHK values in the same location.

12. Apply the remove_per_bhk function to the dataset to clean the data further.

13. Prepare the data for machine learning by one-hot encoding the "location" column and dropping the "other_location" column to avoid overfitting.

14. Split the dataset into training and testing sets.

15. Use GridSearchCV to find the best model among Linear Regression, Lasso Regression, and Decision Tree Regression, based on cross-validated performance metrics.

16. Train the selected Linear Regression model on the training data. Since it performed the best.

17. Define a function predict_price that uses the trained model to predict house prices based on location, total square feet, number of bathrooms, and number of bedrooms (BHK).

18. Save the trained model and the one-hot encoded column information for future use.

## Function Definitions:

`fix_total_sqft(x)`: A helper function to fix the "total_sqft" column, handling cases where it contains a range of values or invalid entries.

`remove_per_sqft(df)`: A function to remove outliers based on the price per square foot for each location.

`plot_scatterplot(df, location)`: A function to plot scatter plots of house prices versus total square feet for a given location and BHK combination.

`remove_per_bhk(df)`: A function to remove data points with a price per square foot lower than the mean price per square foot for lower BHK values in the same location.

`find_best_model_using_gridsearchcv(X, y)`: A function to find the best model among Linear Regression, Lasso Regression, and Decision Tree Regression, using GridSearchCV for hyperparameter tuning.

`predict_price(location, total_sqft, bath, bhk)`: A function to predict house prices based on location, total square feet, number of bathrooms, and number of bedrooms (BHK).

## Saved Files:
`"price_prediction_model.pickle"`: A serialized version of the trained Linear Regression model.

`"columns.json"`: A JSON file containing the names of the columns after one-hot encoding.




## Documentation for util.py and server.py

### util.py:

#### Description:
This file contains utility functions that are used to load the essential data and the trained machine learning model for predicting house prices. The functions in this file serve as helper functions for the Flask server (server.py) to perform the actual prediction and provide necessary information for the frontend.

#### Functions:

1. `get_estimated_price(location, sqft, bhk, bath)`: This function takes four parameters - location (string), sqft (float), bhk (integer), and bath (integer). It uses the trained model (__model) to predict the house price based on the given location, total square feet, number of bedrooms (bhk), and number of bathrooms (bath).

2. `load_all_data()`: This function loads the necessary data for prediction and stores it in global variables. It loads the one-hot encoded column names (__columns) and the list of unique location names (__locations) from "columns.json" file. Additionally, it loads the trained Linear Regression model (__model) from "price_prediction_model.pickle" file.

3. `get_location_names()`: This function returns the list of unique location names (__locations) that can be used to display the available locations on the frontend.

4. `get_columns()`: This function returns the list of column names (__columns) after one-hot encoding, which can be used to ensure consistent data handling between the frontend and backend.

### server.py:

#### Description:
This file sets up a Flask web server to serve the house price prediction application. It provides two routes - `/` for the home page and `/getlocations` and `/predict` for handling AJAX requests from the frontend.

#### Routes and Functions:

1. `/` (GET method):
   - Function: `index()`
   - Description: This function serves the home page by rendering the "index.html" template.

2. `/getlocations` (GET method):
   - Function: `getlocations()`
   - Description: This function returns the list of unique location names using the `get_location_names()` function from util.py. It responds with a JSON object containing the list of locations.

3. `/predict` (POST method):
   - Function: `predict()`
   - Description: This function handles the prediction request from the frontend. It receives the JSON data containing total square feet, location, number of bedrooms (bhk), and number of bathrooms (bath). It then uses the `get_estimated_price()` function from util.py to predict the house price and responds with a JSON object containing the estimated price.

### Main Execution:
The `if __name__ == '__main__':` block in server.py ensures that the Flask server starts running when the script is directly executed. It first loads the essential data using `utils.load_all_data()` to set up the global variables __columns, __locations, and __model. Then, it starts the Flask server using `app.run()`, which listens for incoming requests.
