DSCI 510 Final Project

Name of the Project

Bi-directional Machine Learning framework for Real-Estate Investment Advisory

Team Members (Name and Student IDs)

Rahul Katinni 8875-551-0040
Shubhranshu Pattnaik 3217-0118-70

Instructions to create a conda enviornment

For Conda:

Create a new Conda environment with the latest Python version:
```
conda create -n project_env -y
```
Activate the environment:
```
conda activate project_env
```

For Virtual Environment:

Create a virtual environment with the latest Python version:
```
python -m venv project_env
```
Ensure that your system's Python is updated to the latest version before running this command.

Activate the environment:

On Linux/macOS:
```
source project_env/bin/activate
```
On Windows:
```
project_env\Scripts\activate
```

Verify the Python version in the environment:
```
python --version
```
This should display Python 3.11 or the latest version installed.

Instructions on how to install the required libraries

Ensure you have activated the environment:

For Conda:
```
conda activate project_env
```

For Virtual Environment:

source project_env/bin/activate  # On Linux/macOS
project_env\Scripts\activate     # On Windows

Install dependencies from the requirements.txt file:
```
pip install -r requirements.txt
```
Verify the installed libraries:
```
pip list
```
This will display a list of all installed packages and their versions.
Troubleshooting:
- If any dependencies fail to install, ensure your Python version matches the project's compatibility requirements.
- For Conda users, you can try resolving package conflicts using:
```
conda install --file requirements.txt
```

Instructions on how to download the data

The data required for this project is scraped from the internet using the provided Jupyter Notebook web_scrapper.ipynb. Due to server restrictions on apartments.com, special configurations are required to ensure successful data scraping.

Instructions for Running the Web Scraper

Add Your Browser's User-Agent
- Locate the headers_list variable in web_scrapper.ipynb.
- Replace the existing headers_list values with user-agents from your browser's request headers.
  - To find your browser's user-agent:
    1. Open your browser after navigating to https://www.apartments.com/ and press F12 or right click and use the Inspect option to open the Developer Tools.
    2. Navigate to the "Network" tab.
    3. Reload the webpage and select any request.
    4. Copy the "User-Agent" from the "Request Headers" section.
  - Example user-agent format:
```
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36
```
- Add your user-agent as a dictionary in the headers_list:
```
headers_list = [
    {"User-Agent": "your_user_agent_here"}
]
```
- Important: Comment out the existing user-agent entries to avoid conflicts.
Use Multiple User-Agents
- For optimal results, add user-agents from different browsers on your system (e.g., Chrome, Firefox, Edge).
- Avoid using randomly generated user-agents, as they are likely to be flagged by the server, preventing successful scraping.
Run the Notebook
- Open the web_scrapper.ipynb file in Jupyter Notebook or Jupyter Lab.
- Execute the cells sequentially to start the scraping process.
- The scraped data will be saved in the data/raw directory.

Why the Scraper is Run in an `.ipynb` File

The scraper is implemented in a Jupyter Notebook (.ipynb) instead of a Python script (.py) due to the way Jupyter Notebooks execute code. When run in an .ipynb file, the scraper can successfully bypass server restrictions on apartments.com. However, if the same code is executed as a .py file, the server blocks the scraper, making it unable to fetch data.

Troubleshooting Tips

If the scraper is blocked:
- Verify that the headers_list contains valid user-agents from your browser(s).
- Avoid using proxies or VPNs, as they may also be flagged.
Ensure that your network connection is stable.

Instructions on how to clean the data

To clean the data and convert it into a format suitable for analysis:

Run the json_to_csv_converter.py script:
- Navigate to the src directory.
- Execute the script to convert the property data from JSON format (output from the scraper) into a CSV file:
```
python src/json_to_csv_converter.py
```
Output:
- The cleaned data will be saved as a CSV file in the data/processed directory.
- This CSV format is optimized for easier manipulation and analysis using libraries like pandas.

Instructions on how to create visualizations and run analysis

The analysis is performed using the visualize.ipynb Jupyter Notebook. This notebook generates various plots and includes observations in markdown cells for a detailed understanding of the data.

Steps to Run the Analysis Code

Open the Notebook:
- Navigate to the project directory.
- Open the visualize.ipynb file using Jupyter Notebook or Jupyter Lab:
```
jupyter notebook visualize.ipynb
```
- OR: Open the notebook in any editor that supports Jupyter notebooks, such as VS Code:
  - Install the Python and Jupyter extensions in VS Code if not already installed.
  - Open the visualize.ipynb file directly in VS Code.
Execute the Notebook:
- Run the cells in sequence to generate plots and visualizations.
- Each plot corresponds to a specific aspect of the data and is accompanied by observations in markdown cells.
- Some cells may have long run times because of the volume of the data.
Output:
- The generated plots will be displayed within the notebook.
Review Observations:
- The markdown cells in the notebook contain detailed insights and observations based on the visualized data.

Instructions on how to run the model builder and query the model

The machine learning model is built and tested using the model_building.ipynb Jupyter Notebook which imports all the core model code present in model_runner.py. This notebook trains the model and allows you to input sample data to check its performance.

Steps to Run the Model Builder

Open the Notebook:
- Navigate to the project directory.
- Open the model_building.ipynb file using Jupyter Notebook, Jupyter Lab, or any compatible editor like VS Code.
```
jupyter notebook model_building.ipynb
```
  - OR: Open the notebook directly in VS Code or your preferred editor that supports Jupyter notebooks.
Execute the Notebook:
- Run the cells sequentially to:
  - Load and preprocess the data.
  - Define and train the machine learning model.
  - Evaluate the model's performance.
Testing the Model with Sample Input:
- At the end of the notebook, there are cells that allow you to input sample data to test the model.
- To Get Rent Predictions:
  - Locate the cell containing the user_input dictionary.
  - Modify the user_input dictionary with your desired property features. Example:
```
user_input = {}
```
  - Run the cell to see the predicted rent based on the input features.
- To Get Property Data Based on Desired Rent:
  - Locate the cell that prompts for user input using the input() function.
  - Run the cell, and an input dialogue box will pop up in your code editor or notebook interface.
  - Enter the desired rent when prompted.
  - The model will output property data that matches or closely aligns with the input rent value.
Notes:
- Using the input() Function:
  - Since the notebook uses the input() function, the input dialogue box will appear inline in the notebook or terminal where you're running the notebook server.
  - Make sure to click on the input prompt and enter your data when it appears.
- Dependencies:
  - Ensure all prior cells have been executed successfully before running the input cells.
  - If you encounter any errors, verify that the data has been loaded and the model has been trained.
Output:
- The model's predictions and any retrieved property data will be displayed directly in the notebook.
- You can modify the inputs and rerun the cells to test different scenarios.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DSCI 510 Final Project

Name of the Project

Team Members (Name and Student IDs)

Instructions to create a conda enviornment

For Conda:

For Virtual Environment:

Instructions on how to install the required libraries

Instructions on how to download the data

Instructions for Running the Web Scraper

Why the Scraper is Run in an `.ipynb` File

Troubleshooting Tips

Instructions on how to clean the data

Instructions on how to create visualizations and run analysis

Steps to Run the Analysis Code

Instructions on how to run the model builder and query the model

Steps to Run the Model Builder

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
results		results
src		src
.gitignore		.gitignore
README.md		README.md
project_proposal.pdf		project_proposal.pdf
requirements.txt		requirements.txt

ShubhranshuPattnaik/Bidirectional-ML-Framework

Folders and files

Latest commit

History

Repository files navigation

DSCI 510 Final Project

Name of the Project

Team Members (Name and Student IDs)

Instructions to create a conda enviornment

For Conda:

For Virtual Environment:

Instructions on how to install the required libraries

Instructions on how to download the data

Instructions for Running the Web Scraper

Why the Scraper is Run in an .ipynb File

Troubleshooting Tips

Instructions on how to clean the data

Instructions on how to create visualizations and run analysis

Steps to Run the Analysis Code

Instructions on how to run the model builder and query the model

Steps to Run the Model Builder

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Why the Scraper is Run in an `.ipynb` File

Packages