Skip to content

TimMolleman/funda-airflow

Repository files navigation

Funda Housing Price Prediction Project (API)

Practice project for setting up a data infrastructure for gathering data, doing some operations on it, training a model, and then making it available via an API. Primary tooling used for this are Apache Airflow for orchestration of several AWS services (mainly Lambdas), AWS Lambda in combination with Python and FastAPI for creating the API endpoints. This FastAPI service is being hosted via AWS API Gateway and Lambda.

For the project, basic housing information of the Dutch real-estate website Funda is scraped and saved to an Amazon S3 bucket. After this some transformations are done, and a model is trained, all using AWS Lambda functions (see repository). The trained model is also saved to S3 and is then exposed for predictions via the API (see repository). To schedule all lambdas and to do a number of other transformations Apache Airflow is used (see repository).

For managing AWS infrastructure reliably and assure re-usability, Terraform is used (see repository).

Description

This repository contains the code for starting and running the Apache Airflow (locally).

Getting Started

Dependencies

The Python version recommended for running this project is 3.8.

Executing program

To run the Airflow instance. Simply run the start.sh shell script. To stop the instance run the stop.sh shell script, and to restart run the restart.sh shell script.

Authors

Tim Molleman

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published