Skip to content

ShubhranshuPattnaik/Bidirectional-ML-Framework

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DSCI 510 Final Project

Name of the Project

Bi-directional Machine Learning framework for Real-Estate Investment Advisory

Team Members (Name and Student IDs)

  1. Rahul Katinni 8875-551-0040
  2. Shubhranshu Pattnaik 3217-0118-70

Instructions to create a conda enviornment

For Conda:

  1. Create a new Conda environment with the latest Python version:
    conda create -n project_env -y
  2. Activate the environment:
    conda activate project_env

For Virtual Environment:

  1. Create a virtual environment with the latest Python version:

    python -m venv project_env

    Ensure that your system's Python is updated to the latest version before running this command.

  2. Activate the environment:

    • On Linux/macOS:
      source project_env/bin/activate
    • On Windows:
      project_env\Scripts\activate
  3. Verify the Python version in the environment:

    python --version

    This should display Python 3.11 or the latest version installed.


Instructions on how to install the required libraries

  1. Ensure you have activated the environment:

    • For Conda:
      conda activate project_env
    • For Virtual Environment:
      source project_env/bin/activate  # On Linux/macOS
      project_env\Scripts\activate     # On Windows
  2. Install dependencies from the requirements.txt file:

    pip install -r requirements.txt
  3. Verify the installed libraries:

    pip list

    This will display a list of all installed packages and their versions.

  4. Troubleshooting:

    • If any dependencies fail to install, ensure your Python version matches the project's compatibility requirements.
    • For Conda users, you can try resolving package conflicts using:
      conda install --file requirements.txt

Instructions on how to download the data

The data required for this project is scraped from the internet using the provided Jupyter Notebook web_scrapper.ipynb. Due to server restrictions on apartments.com, special configurations are required to ensure successful data scraping.

Instructions for Running the Web Scraper

  1. Add Your Browser's User-Agent

    • Locate the headers_list variable in web_scrapper.ipynb.
    • Replace the existing headers_list values with user-agents from your browser's request headers.
      • To find your browser's user-agent:
        1. Open your browser after navigating to https://www.apartments.com/ and press F12 or right click and use the Inspect option to open the Developer Tools.
        2. Navigate to the "Network" tab.
        3. Reload the webpage and select any request.
        4. Copy the "User-Agent" from the "Request Headers" section.
      • Example user-agent format:
        Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36
        
    • Add your user-agent as a dictionary in the headers_list:
      headers_list = [
          {"User-Agent": "your_user_agent_here"}
      ]
    • Important: Comment out the existing user-agent entries to avoid conflicts.
  2. Use Multiple User-Agents

    • For optimal results, add user-agents from different browsers on your system (e.g., Chrome, Firefox, Edge).
    • Avoid using randomly generated user-agents, as they are likely to be flagged by the server, preventing successful scraping.
  3. Run the Notebook

    • Open the web_scrapper.ipynb file in Jupyter Notebook or Jupyter Lab.
    • Execute the cells sequentially to start the scraping process.
    • The scraped data will be saved in the data/raw directory.

Why the Scraper is Run in an .ipynb File

The scraper is implemented in a Jupyter Notebook (.ipynb) instead of a Python script (.py) due to the way Jupyter Notebooks execute code. When run in an .ipynb file, the scraper can successfully bypass server restrictions on apartments.com. However, if the same code is executed as a .py file, the server blocks the scraper, making it unable to fetch data.


Troubleshooting Tips

  • If the scraper is blocked:
    • Verify that the headers_list contains valid user-agents from your browser(s).
    • Avoid using proxies or VPNs, as they may also be flagged.
  • Ensure that your network connection is stable.

Instructions on how to clean the data

To clean the data and convert it into a format suitable for analysis:

  1. Run the json_to_csv_converter.py script:

    • Navigate to the src directory.
    • Execute the script to convert the property data from JSON format (output from the scraper) into a CSV file:
      python src/json_to_csv_converter.py
  2. Output:

    • The cleaned data will be saved as a CSV file in the data/processed directory.
    • This CSV format is optimized for easier manipulation and analysis using libraries like pandas.

Instructions on how to create visualizations and run analysis

The analysis is performed using the visualize.ipynb Jupyter Notebook. This notebook generates various plots and includes observations in markdown cells for a detailed understanding of the data.

Steps to Run the Analysis Code

  1. Open the Notebook:

    • Navigate to the project directory.
    • Open the visualize.ipynb file using Jupyter Notebook or Jupyter Lab:
      jupyter notebook visualize.ipynb
    • OR: Open the notebook in any editor that supports Jupyter notebooks, such as VS Code:
      • Install the Python and Jupyter extensions in VS Code if not already installed.
      • Open the visualize.ipynb file directly in VS Code.
  2. Execute the Notebook:

    • Run the cells in sequence to generate plots and visualizations.
    • Each plot corresponds to a specific aspect of the data and is accompanied by observations in markdown cells.
    • Some cells may have long run times because of the volume of the data.
  3. Output:

    • The generated plots will be displayed within the notebook.
  4. Review Observations:

    • The markdown cells in the notebook contain detailed insights and observations based on the visualized data.

Instructions on how to run the model builder and query the model

The machine learning model is built and tested using the model_building.ipynb Jupyter Notebook which imports all the core model code present in model_runner.py. This notebook trains the model and allows you to input sample data to check its performance.

Steps to Run the Model Builder

  1. Open the Notebook:

    • Navigate to the project directory.
    • Open the model_building.ipynb file using Jupyter Notebook, Jupyter Lab, or any compatible editor like VS Code.
      jupyter notebook model_building.ipynb
      • OR: Open the notebook directly in VS Code or your preferred editor that supports Jupyter notebooks.
  2. Execute the Notebook:

    • Run the cells sequentially to:
      • Load and preprocess the data.
      • Define and train the machine learning model.
      • Evaluate the model's performance.
  3. Testing the Model with Sample Input:

    • At the end of the notebook, there are cells that allow you to input sample data to test the model.

    • To Get Rent Predictions:

      • Locate the cell containing the user_input dictionary.
      • Modify the user_input dictionary with your desired property features. Example:
        user_input = {}
      • Run the cell to see the predicted rent based on the input features.
    • To Get Property Data Based on Desired Rent:

      • Locate the cell that prompts for user input using the input() function.
      • Run the cell, and an input dialogue box will pop up in your code editor or notebook interface.
      • Enter the desired rent when prompted.
      • The model will output property data that matches or closely aligns with the input rent value.
  4. Notes:

    • Using the input() Function:
      • Since the notebook uses the input() function, the input dialogue box will appear inline in the notebook or terminal where you're running the notebook server.
      • Make sure to click on the input prompt and enter your data when it appears.
    • Dependencies:
      • Ensure all prior cells have been executed successfully before running the input cells.
      • If you encounter any errors, verify that the data has been loaded and the model has been trained.
  5. Output:

    • The model's predictions and any retrieved property data will be displayed directly in the notebook.
    • You can modify the inputs and rerun the cells to test different scenarios.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published