Google Summer of Code '21
Making FOSSology architecture microservice friendly
Table of Contents
- Microservice Architecture
- Improvements over old cluster install
- Separate agents
- Docker Images
- Too many agents ... Too many services
- Migrating configuration from local files to shared key-value pair storage
- Available agents in microservice
- List of Kubernetes manifests
- List of Dockerfiles
- Pull Request
- Known issues and drawbacks
- Contact Information
FOSSology is an open-source license compliance software system and toolkit. As a toolkit, we can run license, copyright, and export control scans from the command line, and as a system, it provides a database and web UI to give a compliance workflow.
FOSSology is designed in a modular fashion but it certainly does not follow microservices architecture. If there is a change in an agent's logic, the whole source code has to be built again and installed. Whereas in microservices architecture, only this agent needs to be built and installed/deployed. And that is what the project has accomplished. with the use of the latest cloud technologies like Docker and Kubernetes, FOSSology could be installed on cloud in a microservice way.
Improvements over old cluster install
- Modules are installed in containers instead of VMs which save space, time and allows to use all modern cloud technologies like Docker and Kubernetes and CI/CD tools like Jenkins, Travis CI and GitHub actions.
- The ability to install agents separately and install only required agents. the system identifies available agents and modify the scheduler configuration and UI accordingly
- Easy installation using Kubernetes, just with simple
kubectlcommand. the cluster will be up and running in a couple of minutes vs the old install, which required creating VM for each module and establishing SSH communication between all machines, which would take a couple of hours.
- Easy scaling in or out using Kubernetes services, any agent could be scaled up when need dynamically using some simple commands.
- Key-value pair storage
etcdinstead of conf files in every container. which will give one shared place for configurations data used by agents and scheduler and the change needs to be applied once vs in the old cluster install all conf files needs to get the same change inside each VM which would be a hectic process that requires some time.
All FOSSology installation methods require installing all agents, you can't update, delete or deploy a single agent you need to use all agents as a single module. and with more than 25 agents this would be a hectic process to update a single agent. With the new architecture, every agent is treated as a single module. only required agents could be installed which would save time and resources. With the use of Kubernetes, agents could be scaled in or out. With every agent added or deleted from the cluster no need to restart the scheduler, all configurations will be reloaded automatically. FOSSology scheduler was designed to test the agent's host machine before spawning the agents and running the job, with the new architecture the scheduler is identifying the agent installed in the host and spawn each agent from its host.
To separate agents, fossology debian packages are used. using fossology
fo_debuild command-line tool, FOSSology is packed into debian packages. after
fo_debuild finishes building the packages and with the use of docker multistage build, only the packages are copied into the image binging down the image size from 1.5Gb to only 15Mb. this is the first dockerimage
fossology/packages this is used as a base image and it contains all debian packages agents, scheduler, web, common etc..
FROM debian:buster-slim as builder LABEL maintainer="Fossology <email@example.com>" ... ... ... RUN ./utils/fo-debuild --no-sign --no-tar FROM scratch WORKDIR /fossology_packages/fossology COPY --from=builder /fossology_packages/fossology/packages /fossology_packages/fossology/packages
to create the other images scheduler, agent, web, etc..
fossology/packages is used as base image for a multistage build and the appropriate debian package is copied to the container along with
fossology-common package which contains all the libraries.
FROM fossology/packages:latest as builder LABEL maintainer="Fossology <firstname.lastname@example.org>" ... ... ... COPY --from=builder /fossology_packages/fossology/packages/fossology-common_*_amd64.deb . COPY --from=builder /fossology_packages/fossology/packages/fossology-ununpack_*_amd64.deb .
Too many agents ... Too many services
To avoid using a service for each agent, which would be more than 25 services if all agents are installed, headless services are used, giving a single service that will handle all agents and give each agent a unique DNS.
Migrating configuration from local files to shared key-value pair storage
For microservice architecture, each agent has its conf file inside its container. FOSSology hosts need to be hardcoded in the scheduler container before deployment. To solve this, all conf files are added to a key-value pair database. The selected DB system is etcd for each new agent added, it interfaces with etcd using RESTful API. The agent will open its conf file and start making a proper PUT request to add configuration in etcd and the agent will add the host details.
Available agents in microservice
List of Kubernetes manifests
|Agents Service||1. agents-headless|
List of Dockerfiles
|Packages base image||Dockerfile.pkg|
- feat(core): Microservices Architecture #2086
- docs(microservices): Intro & reports weeks1 - 4 #3
- docs(microservice): added weekly reports 5 - 9 #23
- docs(microservices): added week 10 and setup #28
Known issues and drawbacks
- Although containers same a decent amount of space compared to VMs, but with separate agents more space is used as each agent has all libraries and dependencies installed in its container. The old build system ununpack and adj2nest agent are in the same package, which leads to using the same image for both containers, but this issue is solved in the new build system developed in GSoC'21.
- Scheduler doesn't tolerate possible errors in configuration retrieved from etcd which would lead to some errors when there's missing data.
- If a new agent is added when etcd is not running the configuration won't be added which in return would make the scheduler not aware of this agent.
- fossy user has read write permissions on
/root/.kubefolder inside the scheduler container to be able to use kubectl commands from inside the container to communicate with the agents.
|Dockerfile template||✔️||Dockerfiles for all modules and 6 agents|
|Separating agents||✔️||Separate container for each agent and Scheduler core code is modified to work with separate agents|
|Kubernetes Manfists||✔️||Kubernetes deployments, services, and pvcs are provided|
|Kubernetes Config Maps and Secrets||✔️||Kubernetes config maps for env variables and secretes for database username and password|
|ETCD setup||✔️||ETCD Kubernetes deployments, service and pvc. Scheduler core code is modified to get data from etcd instead of conf files|
|Docker and Kubernetes test||❌||Will be provided upon confirmation from the community on the initial version of the project|
Google summer of code is the best experience I had in my college years till now. I have worked on large scale and industry grade projects with talented people who are devoting their time for the Open Source community.
Special thanks for my mentors Gaurav Mishra, Michael C. Jaeger, Anupam Ghosh, Klaus Gmeinwieser and Vasudev Maduri. Thank you for the support not only in GSoC but even before, you helped me find my way into the Open Source community and achieve my goal of making a contribution that will make an impact.
Also, I want to give a special thanks to my fellow student developers. You're so talented and hardworking, I learned from all of you and I'm glad that I had the opportunity to be part of this community.