A repository for a group work
Read Me
Kanban Board
·
Log an Issue
·
Quick Guide
Table of Contents
This is a Data Engeneering group project made using Apache Airflow. It aims as creating a system (data pipeline) for a business’s smooth and efficient data delivery. The business model is a music streaming service.
The datasets we are using come from http://millionsongdataset.com/ We will be coordinating data from several sets that will contain information about songs from listener data to lyric content.
A Quick Guide is available in Wiki section for general installation and command lines etc.
List tools and technologies used to deliver this project.
Badges made with:
Getting started section
- Docker Desktop
- Code Editor
- Clone the repo
git clone https://github.com/JamisonUK/GroupAe.git
- Install Docker
choco install docker-desktop --pre
- Install Airflow
pip install "apache-airflow[celery]==2.1.4" --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.1.4/constraints-3.6.txt"
- Mount Airflow Image
docker-compose up airflow-init
- Launch Docker
docker-compose up
- Navigate to localhost://8080
Please refer to Apache Airflow Documentation
- Identifying the requirements (functional and non-functional)
- Prioritising the requirements (if applicable)
- Task allocation
- Identifying the scope of your project
- Identifying the stakeholders
- Risk management
-
Sprint 1
- GitHub project for coursework setup.
- Product backlog created.
- Initial tasks are defined as user stories.
- Kanban/project board being used.
- Sprint boards are being used.
- Necessary starting docker files for the project set up and working.
- Correct branches for GitFlow workflow created – includes master, develop, and release branches.
- The first release was created on GitHub.
- Code of Conduct defined
-
Sprint 2
- Kanban Board being used
- Issues updated
- Dataset Implemented as csv
- Apache airflow set up on all individuals computers
- Zube.IO Updated
- Fix issue with docker
-
Sprint 3
- Dags is completed/running:
- Dataset is now live: GitLFS - Develop branch
- Api selected: Spotify
- Docker Compose bug fixed
- Develop branch being used
- Kanban Board Updated
-
Sprint 4
- DAGs set up
- Spotify API implemented
- Postgres database functional
- Airflow working
- Implementation done
- Front end elastisearch
This project is licensed under the terms of the MIT license. Check LICENSE.cmdfor more information.
Project Link: https://github.com/JamisonUK/GroupA
State resources or references