Skip to content

JamisonUK/GroupA

Repository files navigation

Contributors Forks Stargazers Issues MIT License Roehampton


Logo

Data Engeneering Project

A repository for a group work
Read Me

Kanban Board · Log an Issue · Quick Guide

Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage
  4. Roadmap
  5. Contributing
  6. License
  7. Contact
  8. Acknowledgments

About The Project

This is a Data Engeneering group project made using Apache Airflow. It aims as creating a system (data pipeline) for a business’s smooth and efficient data delivery. The business model is a music streaming service.

The datasets we are using come from http://millionsongdataset.com/ We will be coordinating data from several sets that will contain information about songs from listener data to lyric content.

A Quick Guide is available in Wiki section for general installation and command lines etc.

(back to top)

Built With

List tools and technologies used to deliver this project.

  • Airflow
  • Python
  • Mysql

Badges made with:

  • ShieldsIO

(back to top)

Getting Started

Getting started section

Prerequisites

  • Docker Desktop
  • Code Editor

Installation

  1. Clone the repo
    git clone https://github.com/JamisonUK/GroupAe.git
  2. Install Docker
    choco install docker-desktop --pre
  3. Install Airflow
    pip install "apache-airflow[celery]==2.1.4" --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.1.4/constraints-3.6.txt"
  4. Mount Airflow Image
    docker-compose up airflow-init
  5. Launch Docker
    docker-compose up
  6. Navigate to localhost://8080

(back to top)

Usage

Please refer to Apache Airflow Documentation

(back to top)

Roadmap

  • Identifying the requirements (functional and non-functional)
  • Prioritising the requirements (if applicable)
  • Task allocation
  • Identifying the scope of your project
  • Identifying the stakeholders
  • Risk management

(back to top)

Sprints

  • Sprint 1

    • GitHub project for coursework setup.
    • Product backlog created.
    • Initial tasks are defined as user stories.
    • Kanban/project board being used.
    • Sprint boards are being used.
    • Necessary starting docker files for the project set up and working.
    • Correct branches for GitFlow workflow created – includes master, develop, and release branches.
    • The first release was created on GitHub.
    • Code of Conduct defined
  • Sprint 2

    • Kanban Board being used
    • Issues updated
    • Dataset Implemented as csv
    • Apache airflow set up on all individuals computers
    • Zube.IO Updated
    • Fix issue with docker
  • Sprint 3

    • Dags is completed/running:
    • Dataset is now live: GitLFS - Develop branch
    • Api selected: Spotify
    • Docker Compose bug fixed
    • Develop branch being used
    • Kanban Board Updated
  • Sprint 4

    • DAGs set up
    • Spotify API implemented
    • Postgres database functional
    • Airflow working
    • Implementation done
    • Front end elastisearch

(back to top)

License

This project is licensed under the terms of the MIT license. Check LICENSE.cmdfor more information.

(back to top)

Contact

Project Link: https://github.com/JamisonUK/GroupA

(back to top)

Acknowledgments

State resources or references

(back to top)

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •