Flights Metrics

Near Real-time metrics for flights over a region.

Visit the Dashboard »

Flights Metrics is a data pipeline that aims to provide hourly analytics on flights over a city or a region (London is the default configuration). The pipeline is built in Python and deployed on several AWS services: Lambda, S3, Glue and Athena.

Preview

Architecture

The project is deployed fully only on AWS, except for the Kafka broker which is hosted on the free tier of Upstash. Metabase is deployed on Amazon Lightsail as a Docker container. Flights are fetched every 15 minutes with a Lambda function and consumed every hour by another Lambda function, Apache Kafka serves both as a streaming layer and a buffer in this case.

The consumer creates partitions in Athena (see note) and the Metabase dashboard performs different SQL queries to retrieve the metrics data from Athena. Data is kept for 3 days in S3 and then it's automatically deleted through an S3 lifecycle policy to reduce costs.

Note: The old architecture was using Glue to automate the process of adding new partitions from S3 to Athena, however, this approach was expensive and was replaced by a new method in stream/S3Consumer.py to add partitions to S3 directly using an Athena query, this reduces costs by over 60%!

Setting up

To configure the projects with your preferences, you need to edit those three configuration files in the config folder:

aws.json: Rename the aws.example.json file into aws.json and put the right parameters: AWS Access key, secret key, and the session key (not required).
env.json: Rename the env.example.json file into env.json and put the bounding box coordinates for the region of your choice (London is the default configuration).
kafka.json: Similarly, rename kafka.example.json to kafka.json and include all the required parameters for Kafka to run: bootstrap server, username, password, etc. Feel free to experiment with other options.

Deploying to AWS Lambda

Currently, we have two different bash scripts to generate the zip files to be uploaded on AWS lambda: vendor.zip for dependencies and package.zip for both the producer and consumer functions.

# To package the libraries into one zip run :
bash package-deps.sh

# To package the app into one zip run :
bash package-deps.sh

Improvements for the future

Implement a CI/CD pipeline with GitHub Actions and AWS SAM to automate the deployment of new versions.
Add an ETL function to convert JSON files to Parquet for more efficient analytics.
Cache some parts of the dashboard hourly to reduce the number of requests and costs.

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
.vscode		.vscode
api		api
config		config
docs		docs
logs		logs
stream		stream
.gitignore		.gitignore
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
__init__.py		__init__.py
config.py		config.py
consumer.py		consumer.py
package-app.sh		package-app.sh
package-deps.sh		package-deps.sh
producer.py		producer.py
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Flights Metrics

Preview

Architecture

Setting up

Deploying to AWS Lambda

Improvements for the future

About

Releases

Packages

Languages

annis-souames/flights-metrics

Folders and files

Latest commit

History

Repository files navigation

Flights Metrics

Preview

Architecture

Setting up

Deploying to AWS Lambda

Improvements for the future

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages