Skip to content

The repository containing all source code for the annotation web service.

Notifications You must be signed in to change notification settings

Hack3rz-Official/annotation-web-service

Repository files navigation

Annotation Web Service

Introduction

Annotation Web Service (AWS) is a syntax highlighting web service based on a deep learning (DL) model. The goal was to build an API that uses the DL model to provide syntax highlighting for Java, Kotlin and Python3. Furthermore, the incoming requests should be used to train the DL model and to further improve its accuracy.

This README.md focuses on the technical aspects and the running and configuration of the services. For a more in-depth description of the functionalities, technologies and our development process please consult our Wiki. The original motivation and requirements for this project can be found in the project instructions provided by the lecturers of the course.

Microservices

The Annotation Web Service consists of the following microservices:

Microservice Description Technology Status
Annotation Service Handles the annotation of code, i.e. lexing and highlighting. Java with Spring Boot SonarCloud Coverage Code Smells SonarCloud Bugs SonarCloud Vulnerabilities
Prediction Service Handles the prediction of syntax highlighting. Python with Flask SonarCloud Coverage Code Smells SonarCloud Bugs SonarCloud Vulnerabilities
Training Service Handles the regularly conducted training and exchange of the underlying prediction models. Python with Flask SonarCloud Coverage Code Smells SonarCloud Bugs SonarCloud Vulnerabilities
Web API Acts as the primary entry point for all services. JS/TS with Nest.js SonarCloud Coverage Code Smells SonarCloud Bugs SonarCloud Vulnerabilities

Every microservice is running in a Docker container. An extensive documentation of each microservice is provided in the Wiki.

Utils and Proof-of-Concepts

In addition to the microservices listed above, we have implemented a number of utils/helpers and a proof-of-concept of a demo-frontend that uses the API provided by the microservices. These tools are intended for internal use only. Thus, they do not adhere to the same code quality standards as the microservices. Nevertheless, they demonstrate how the API can be used in various environments.

Tool Description Technology
Demo Frontend A single page webapp that demonstrates how the API could be used by a potential customer. JS/TS with Vue
Code Fetcher A command line tool to download source code from GitHub and send it to the API. Python
Load Tester A simple script to send a lot of concurrent requests to the API and analyze the performance under heavy load. Javascript with K6

Configuration

The microservices rely on a number of environment variables for their configuration. The environment variables are defined in a .env file in the project root. This file is used within the docker-compose.yml to pass the configuration to the services. The following table displays an overview of the environment variables and their default values:

Variable Name Description Example Value
MONGO_USERNAME The username used for the MongoDB. hack3rz
MONGO_PASSWORD The password used for the MongoDB. palm_tree_poppin_out_the_powder_blue_sky
MONGO_DATABASE_NAME The database name for the MongoDB. aws
MONGO_DATABASE_TEST_NAME The test database name for the MongoDB. aws_test
MONGO_PORT The port on which the MongoDB runs. 27017
MONGO_HOST The host for the MongoDB. mongodb
MONGO_AUTH_DATABASE The database used in MongoDB for the authentication (holds default users). admin
DB_CONNECTION_STRING The connection string for the MongoDB. mongodb://hack3rz:palm_tree_poppin_out_the_powder_blue_sky@mongodb:27017/aws?authSource=admin
MODEL_NAME The prefix used when storing a model locally on the disk. best
MIN_TRAINING_BATCH_SIZE The minimum amount of annotations required before a training is started. 100
DEMO_FRONTEND_PORT The port on which the demo frontend runs. 80
WEB_API_PORT The port on which the web api runs. 8081
SWAGGER_UI_PORT The port on which the Swagger UI runs. 8082
ANNOTATION_SERVICE_PORT The port on which the annotation service runs. 8083
PREDICTION_SERVICE_PORT The port on which the prediction service runs. 8084
TRAINING_SERVICE_PORT The port on which the training service runs. 8085
NGINX_PORT The port on which the nginx load-balancer runs. 4000

Docker Containers

The following table contains a list of all docker containers that are part of the docker-compose setup and their respective images used. All the images prefixed with richner were developed as part of this project and are publicly available on DockerHub.

Name Description Image
Demo Frontend The single demo frontend. richner/demo-frontend:latest
Web API The web API that wraps all the other microservices. richner/web-api:latest
Swagger UI The swagger UI that holds the documentation for all services. swaggerapi/swagger-ui
Annotation The annotation service. richner/annotation-service:latest
Prediction The prediction service. richner/prediction-service:latest
Training The training service. richner/training-service:latest
Nginx The NGINX load-balancer and reverse-proxy. nginx:latest
MongoDB The MongoDB that is used by the annotation, prediction and training service. mongo:5.0.6

Run It Locally

Make sure that you use Docker Compose V2 and activate it in your docker setup. Within Docker Desktop, go to "Settings" and toggle the “Use Docker Compose V2” option under the “General” tab. More information can be found here. You can verify the setting by running:

$ docker-compose -v # should output v2.X.X

Use the following command to run all services using docker-compose:

$ docker-compose up --build --scale prediction=2 --scale annotation=2

Sometimes builds fail on machines with different processor architectures (e.g. on M1 MacBooks). In other cases the build might fail, because there are old versions of the docker containers stored. Use the following command for a clean new build:

$ docker-compose up -d --force-recreate --renew-anon-volumes --build --scale prediction=2 --scale annotation=2

MongoDB

The MongoDB is launched as a separate Docker container. The credentials are stored within the environment of the other containers, so they can access it. A folder data in the project root is mounted as a volume for the database. This folder will persist the data in the database even when the containers are reset. If you want to reset the database you can just delete the contents of this folder. The file mongo-init.sh is used to initialize the database with a new user and the credentials provided by the environment file.

Testing the connection

Make sure the mongodb container is running. Connect to the CLI of the container and use the following command to access the DB:

$ mongo --username "$MONGO_USERNAME" --password "$MONGO_PASSWORD"

Alternatively, you can use a GUI like mongoDBCompass to access the database.

NGINX

To demonstrate the scaling and redundancy possibilities within the API NGINX is used to act as a load-balancer and reverse-proxy for the annotation and prediction microservices. Consequently, the Web API interacts with NGINX which in turn forwards the requests to the respective microservices. This allows us to scale both the annotation and prediction services. The load is distributed using a round-robin method. The configuration for NGINX can be found in the nginx.conf.template file.

Swagger REST API Documentation

All the endpoints of each microservice are documented using Swagger. Each microservice contains a openapi.json file that documents the endpoints using the OpenApi specification. If the docker-compose is run there will be an additional swagger container running at localhost:8082.

Azure Deployment

Currently, it's not possible to automate the deployment with GitHub Actions because our student subscription via UZH does not have the privilege to create a service account which would be required for automated deployments. However, there is a possibility to manually deploy the Docker containers to Azure. Please make sure your Azure account is owner of Azure's resource group called hack3rz and you have installed Azure CLI on your machine. Then use the following commands to deploy the containers:

  1. Login to Azure with your credentials and setup context for Azure Container Instances (ACI):
$ az login
$ az account set --subscription 02b30768-05c8-4ad0-acc8-dda03818d4d6
$ az acr login --name hack3rzacr
$ docker login azure
  1. Run the following shell script to deploy and redeploy the containers:
$ sh deploy-azure.sh
  1. After a successful deployment you can check the status of the deployed containers at Azure Portal. The public domain name is hack3rz-aws.switzerlandnorth.azurecontainer.io. The demo is accessible via http://hack3rz-aws.switzerlandnorth.azurecontainer.io. A test request can be made with the following command:
$ curl -X 'POST' \
  'http://hack3rz-aws.switzerlandnorth.azurecontainer.io:8081/api/v1/highlight' \
  -H 'accept: */*' \
  -H 'Content-Type: application/json' \
  -d '{
  "code": "public static void main(String args[]){ System.out.println(\"testing\") }",
  "language": "java"
}'

The container configuration for the deployment on azure can be found in the file docker-compose-azure.yml.

Deployment Caveats

The current deployment configuration found in docker-compose-azure.yml is a preliminary version. The following restrictions apply:

  • CosmosDB instead of MongoDB is used
  • Swagger UI is not deployed
  • Nginx is not deployed
  • Training Service cronjob does not work
  • There is no model update
  • Only a single instance of the prediction service is deployed
  • Only a single instance of the annotation service is deployed

Consequently, the deployment only acts as a proof-of-concept and does not yet fully reflect the local docker setup.

Demo

A demo is accessible via http://hack3rz-aws.switzerlandnorth.azurecontainer.io.

Attention: The restrictions / caveats mentioned above apply. Use docker-compose to test our service with its full functionality.

Authors

This project has been built by team Hack3rz as part of the Advanced Software Engineering course at the University of Zurich in the spring semester 2022:

It is based on the following libraries: