This project aims to build a machine learning pipeline to predict short-term rental prices in New York City. The pipeline is built using MLflow and the model is trained on a dataset of rental listings.
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
Please ensure you have the following software installed on your system:
- Python 3.7+
- Conda
The specific Python packages required are listed in the environment.yml
file.
-
Clone the repository:
git clone https://github.com/AndrewAungKo/build-ml-pipeline-for-short-term-rental-prices.git
-
Navigate to the project directory:
cd build-ml-pipeline-for-short-term-rental-prices
-
Create a new Conda environment:
conda env create -f environment.yml
-
Activate the new environment:
conda activate nyc_airbnb_dev
-
Follow the instructions in
installation.txt
to complete the setup.
To run the ML pipeline, execute the following command:
python main.py
To run the ML pipeline with MLflow, first ensure that the MLflow server is running. You can start the server with the following command:
mlflow server
Then, you can run the pipeline with the following command:
mlflow run . -P steps=test_regression_model
Replace test_regression_model
with the name of the step you want to run.
(OR)
You can run the release using mlflow
without any other pre-requisite. We will
train the model on a new sample of data that is (sample2.csv
):
mlflow run https://github.com/AndrewAungKo/build-ml-pipeline-for-short-term-rental-prices.git \
-v 1.0.1 \
-P hydra_options="etl.sample='sample2.csv'"
You can view the project on Wandb here.
- see CODEOWNERS
- see the LICENSE.txt file for details
The code owners for this repository are @udacity/active-public-content. They will be automatically requested for review when you open a pull request.