Bi-directional Machine Learning framework for Real-Estate Investment Advisory
- Rahul Katinni 8875-551-0040
- Shubhranshu Pattnaik 3217-0118-70
- Create a new Conda environment with the latest Python version:
conda create -n project_env -y
- Activate the environment:
conda activate project_env
-
Create a virtual environment with the latest Python version:
python -m venv project_env
Ensure that your system's Python is updated to the latest version before running this command.
-
Activate the environment:
- On Linux/macOS:
source project_env/bin/activate - On Windows:
project_env\Scripts\activate
- On Linux/macOS:
-
Verify the Python version in the environment:
python --version
This should display Python 3.11 or the latest version installed.
-
Ensure you have activated the environment:
- For Conda:
conda activate project_env
- For Virtual Environment:
source project_env/bin/activate # On Linux/macOS project_env\Scripts\activate # On Windows
- For Conda:
-
Install dependencies from the
requirements.txtfile:pip install -r requirements.txt
-
Verify the installed libraries:
pip list
This will display a list of all installed packages and their versions.
-
Troubleshooting:
- If any dependencies fail to install, ensure your Python version matches the project's compatibility requirements.
- For Conda users, you can try resolving package conflicts using:
conda install --file requirements.txt
The data required for this project is scraped from the internet using the provided Jupyter Notebook web_scrapper.ipynb. Due to server restrictions on apartments.com, special configurations are required to ensure successful data scraping.
-
Add Your Browser's User-Agent
- Locate the
headers_listvariable inweb_scrapper.ipynb. - Replace the existing
headers_listvalues with user-agents from your browser's request headers.- To find your browser's user-agent:
- Open your browser after navigating to
https://www.apartments.com/and pressF12or right click and use the Inspect option to open the Developer Tools. - Navigate to the "Network" tab.
- Reload the webpage and select any request.
- Copy the "User-Agent" from the "Request Headers" section.
- Open your browser after navigating to
- Example user-agent format:
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36
- To find your browser's user-agent:
- Add your user-agent as a dictionary in the
headers_list:headers_list = [ {"User-Agent": "your_user_agent_here"} ]
- Important: Comment out the existing user-agent entries to avoid conflicts.
- Locate the
-
Use Multiple User-Agents
- For optimal results, add user-agents from different browsers on your system (e.g., Chrome, Firefox, Edge).
- Avoid using randomly generated user-agents, as they are likely to be flagged by the server, preventing successful scraping.
-
Run the Notebook
- Open the
web_scrapper.ipynbfile in Jupyter Notebook or Jupyter Lab. - Execute the cells sequentially to start the scraping process.
- The scraped data will be saved in the
data/rawdirectory.
- Open the
The scraper is implemented in a Jupyter Notebook (.ipynb) instead of a Python script (.py) due to the way Jupyter Notebooks execute code. When run in an .ipynb file, the scraper can successfully bypass server restrictions on apartments.com. However, if the same code is executed as a .py file, the server blocks the scraper, making it unable to fetch data.
- If the scraper is blocked:
- Verify that the
headers_listcontains valid user-agents from your browser(s). - Avoid using proxies or VPNs, as they may also be flagged.
- Verify that the
- Ensure that your network connection is stable.
To clean the data and convert it into a format suitable for analysis:
-
Run the
json_to_csv_converter.pyscript:- Navigate to the
srcdirectory. - Execute the script to convert the property data from JSON format (output from the scraper) into a CSV file:
python src/json_to_csv_converter.py
- Navigate to the
-
Output:
- The cleaned data will be saved as a CSV file in the
data/processeddirectory. - This CSV format is optimized for easier manipulation and analysis using libraries like
pandas.
- The cleaned data will be saved as a CSV file in the
The analysis is performed using the visualize.ipynb Jupyter Notebook. This notebook generates various plots and includes observations in markdown cells for a detailed understanding of the data.
-
Open the Notebook:
- Navigate to the project directory.
- Open the
visualize.ipynbfile using Jupyter Notebook or Jupyter Lab:jupyter notebook visualize.ipynb
- OR: Open the notebook in any editor that supports Jupyter notebooks, such as VS Code:
- Install the Python and Jupyter extensions in VS Code if not already installed.
- Open the
visualize.ipynbfile directly in VS Code.
-
Execute the Notebook:
- Run the cells in sequence to generate plots and visualizations.
- Each plot corresponds to a specific aspect of the data and is accompanied by observations in markdown cells.
- Some cells may have long run times because of the volume of the data.
-
Output:
- The generated plots will be displayed within the notebook.
-
Review Observations:
- The markdown cells in the notebook contain detailed insights and observations based on the visualized data.
The machine learning model is built and tested using the model_building.ipynb Jupyter Notebook which imports all the core model code present in model_runner.py. This notebook trains the model and allows you to input sample data to check its performance.
-
Open the Notebook:
- Navigate to the project directory.
- Open the
model_building.ipynbfile using Jupyter Notebook, Jupyter Lab, or any compatible editor like VS Code.jupyter notebook model_building.ipynb
- OR: Open the notebook directly in VS Code or your preferred editor that supports Jupyter notebooks.
-
Execute the Notebook:
- Run the cells sequentially to:
- Load and preprocess the data.
- Define and train the machine learning model.
- Evaluate the model's performance.
- Run the cells sequentially to:
-
Testing the Model with Sample Input:
-
At the end of the notebook, there are cells that allow you to input sample data to test the model.
-
To Get Rent Predictions:
- Locate the cell containing the
user_inputdictionary. - Modify the
user_inputdictionary with your desired property features. Example:user_input = {}
- Run the cell to see the predicted rent based on the input features.
- Locate the cell containing the
-
To Get Property Data Based on Desired Rent:
- Locate the cell that prompts for user input using the
input()function. - Run the cell, and an input dialogue box will pop up in your code editor or notebook interface.
- Enter the desired rent when prompted.
- The model will output property data that matches or closely aligns with the input rent value.
- Locate the cell that prompts for user input using the
-
-
Notes:
- Using the
input()Function:- Since the notebook uses the
input()function, the input dialogue box will appear inline in the notebook or terminal where you're running the notebook server. - Make sure to click on the input prompt and enter your data when it appears.
- Since the notebook uses the
- Dependencies:
- Ensure all prior cells have been executed successfully before running the input cells.
- If you encounter any errors, verify that the data has been loaded and the model has been trained.
- Using the
-
Output:
- The model's predictions and any retrieved property data will be displayed directly in the notebook.
- You can modify the inputs and rerun the cells to test different scenarios.