Milestone 3

Problem Statement:

Implementing different service mesh technologies like Istio and Linkerd and their functionalities to our Weather Application project.

Changes in the previously discussed problem statement:

Initially, we planned to compare different service meshes, but we faced lots of problems while installing and implementing service meshes on the JetStream cloud. We realized there was some version compatibility mismatch with the Kubernetes version we were using and the service mesh minimum requirements. We have to resolve that issue first before starting towards the discussed goal. Hence, due to limited time, we have to shorten our goal limits and set it to the task of resolving problems in implementing different service mesh functionalities in our Weather Application problem.

Problem statement development:

Based on the analysis of Weather Application performance after milestone 2, we noticed that our architecture was not able to handle the load. Our investigation of the root cause of this issue and overall research lead us to the conclusion that we have to deploy all containers in separate pods. Our research shows that integrating service mesh with our application can help to improve performance. Based on this research, our main goal of milestone 3 is to analyze and compare multiple service mesh technologies by integrating it with our application. Meanwhile, we realized that the Kubernetes cluster we were using was not compatible with popular service mesh technologies like Istio and Linkerd.

In this throughout the process, we came across some initial issues which we have logged as follows:

Methodology:

The architecture of our Weather Application project before refinement was looking stable on paper, but during load testing, surprisingly we noticed, it was not able to handle more than 10 user requests at the same time. Hence, the main challenge for us to scale the application and refine the architecture in such a way that it should sustain with more load than the previous.

Our investigation of the performance issue leads us to the root cause that we were creating all containers within the same pod. That was causing a lot of problems in load balancing and making system fault tolerance. Another root cause we found that we have implemented a very complex logic of communication between the different microservices. Mostly we were using zookeeper for service discovery and then communicating via TCP connections. Synchronous communication via TCP connection was causing unnecessary delay in waiting for all microservices to respond serially.

To solve both of these issues, our combined research helped us to narrow down two separate solutions. We decided to separate all containers in different pods and implement service mesh to handle the communication role via sidecars. This will clear the bottleneck causing in communication and load-balancing at the pod level.

Implementation:

Root cause analysis and tentative solution

In the initial architectural implementation of Weather Application, we have used zookeeper for service registry and discovery and API gateway for rerouting the service request to different microservices. In this approach, the developer has to manually handle service-to-service communication connectivity in addition to business goals. In smaller applications, it might seem simple, but as soon as the application grows, it will get a more tedious and cumbersome task. It also means communication failures are harder to diagnose because the logic that governs interservice communication is hidden within each service. This causes a lot of overhead at the API gateway layer as well. The main root cause of this approach is that the same components have to handle business logic as well as communication logic. To get rid of this bottleneck, the separation of these both tasks is important.

A service mesh is capable of performing the logic separation tasks. Service mesh takes the logic governing service-to-service communication out of the individual services and abstract it to a layer of infrastructure. Service mesh achieves this by implementing an array of network proxies alongside container and each proxy server acts as a gateway to the interaction that occurs between containers and servers.

Additionally, service mesh also supports some useful functionalities like as automatic load balancing, traffic management, and monitoring, pluggable policy layers, service discovery, telemetry, and secure connectivity within services without making any changes in source code.

Kubernetes cluster setup

After investigating the root cause and identifying solutions, we explored different available service mesh technologies. Out of multiple available options, we decided to implement Istio and Linkerd which are the most popular service meshes; based on their provided functionalities and documentation.

During the setup of Istio and Linkerd, we faced major compatibility issues with Kubernetes, as we used Kubernetes v1.11 with Fedora OS image provided by JetStream cloud infrastructure. After many unsuccessful tries to upgrade the Kubernetes version on an existing cluster, finally we concluded that we will need to set up a new cluster with manual installation of Kubernetes using ansible cookbook. For the same, we have used the following ansible scripts to configure cluster on multiple individual cloud instances and installation of Kubernetes server and client on it.

hosts (define master and workers node IP addresses which are used by master.yml and worker.yml to create cluster)
Kube-dependencies.yml (install Kubeadm, Kubelet, and Kubectl on the cluster)
master.yml (setup master node)
worker.yml (setup worker nodes)

Note: All the above-mentioned files are present in the Ansible directory.

Separate containers in different pods

Services should not all be in one pod. Services are segregated into different pods, with few exceptions for tightly coupled services. Tightly coupled services include:

User Management and Postgres
Data Retrieval, Data Post Processing, and Model Executor.

Below steps are followed to segregate the services.

Create one deployment.yml file to create a pod, and one service.yml to expose the service.
There are deployment and service yml files for all the microservices.
Expose API Gateway and UI with node ports.
The rest of the microservices are local to the cluster and thus use a cluster IP address.
Change the IP addresses of the REST Calls, RPC Calls and Service Discovery with Zookeeper from 'localhost' to the service name in the Kubernetes cluster.

Note: All the deployment files are in the deployment directory.

Service mesh setup

After resolving the Kubernetes version compatibility issue, we installed Istio and Linkerd service mesh on the cluster using the below steps.

Istio installation and configuration steps

Run istio-init.yml (to install all Custom Resource Definition (CRD) for istio installation).
Run istio-setup.yml (to configure istio and starting all services, deployments, and pods related to istio on the cluster).
Run kiali-secret.yml (to setup Kiali dashboard credentials).

Note: All above-mentioned files are present in Istio-setup directory.

By following the above-mentioned steps, your istio configuration will be ready to use.

You can access the Kiali dashboard on the below link:

http://<master-node IP>:31000

Istio proxies can be injected in all pods by running below command:

kubectl label namespace <namespace name in which application will be deployed> istio-injection=enabled --overwrite

Once you deploy the application, you can see that istio-proxy containers are injected into all pods. All deployed services will be visible in the Kiali dashboard. Kiali dashboard provides multiple features like monitoring traffic between the microservices, applying different traffic management rules, etc.

Linkerd installation and configuration steps:

Run curl -sL https://run.linkerd.io/install | sh, to download the Linkerd command line interface which is used to install the control plane on the Kubernetes cluster.
Add Linkerd to PATH environment variable using below command: export PATH=$PATH:$HOME/.linked2/bin
Run linkerd check --pre, to check that your cluster is configured correctly and ready to install the control pane
Run kubectl apply -f linkerd.yml, to configure Linkerd and starting all services, deployments, and pods related to linkerd on the cluster.

Note: The linkerd.yml file is present in the linkerd-setup directory.

By following the above-mentioned steps, your linkerd configuration will be ready to use.

You can access linkerd dashboard on the below link: http://<master-node IP>:31237

Linkerd proxies can be injected in all pods by running below commands:

linkerd inject <application deployment yml file> | kubectl apply -f -

Note: Application service must be in a ready state before running above command.

Once you execute the above command, it will deploy the application and inject linkerd-proxy containers in all pods. All deployed services will be visible in the Linkerd dashboard. You can monitor the traffic flowing through all micro-services on the Linkerd dashboard. You can also access the Grafana dashboard to see traffic-related information.

Implementing different service mesh functionalities

Telemetry:

After adding istio proxies, we can use the Kiali dashboard for telemetry. It generates detailed telemetry for all the communications in the service mesh. It provides useful functionality like observability of service behavior and troubleshooting. It can help the developer to maintain and optimize applications without extra burden. Istio generated mainly 3 types of telemetry: Metrics, Distributed Traces, and Access Logs.

Metrics: Istio generates a set of service metrics based on latency, traffic, errors, and saturation. It provides detailed data for the mesh control plane. These metrics provide detailed information about the overall volume of traffic, error rates, and response time of services. Istio collects metrics at three levels: proxy-level, service-level, and control-plane.
Distributed Traces: It provides a way to monitor and understand behaviors by monitoring request flow through service mesh. It helps to identify service dependencies and sources of latency in the mesh. Distributed tracing is done through envoy proxies. Those proxies automatically generate trace span.
Access Logs: It provides a way to monitor and understand behaviors from the perspective of workload instances. Istio generates access logs for service traffic and provides a complete insight into the mesh. Istio exposes source and destination metadata to access log systems which helps in auditing the network transactions.

Load balancer:

Istio provides an automatic load balancing feature by using the Kubernetes' VirtualService and DestinationRule resources. The load balancer is mainly used for redirecting the incoming requests to different replicas of the services to manage the load on every node by setting some rules.

We have added a load balancer behind the API-gateway layer at the model-executor microservice. To test this functionality, we have created two different deployments of the model-executor service (with 2 replicas and 1 replica). A new VirtualService resource is created for it which maps the model-executor host service with the subset of the destination service. In our implementation, both host and destination service is the same. A new destination rule is created for the model-executor service, which sets consistent hashing based load balancing traffic policy which uses stickiness policy based on incoming requests IP address. The VirtualService uses the subset created by this DestinationRule resource. This setting transfers the incoming request to all available replicas of model-executor service based on the IP address and caches the request at that particular replica. So all the requests coming from the same user will be transferred to the same replica.

Note: For above-mentioned setting for the load balancer, you can refer the model_executor_lb.yml file in deployment directory.

Canary deployment:

Istio provides a canary deployment feature by using the Kubernetes' VirtualService and DestinationRule resources. Canary deployment is one of the most useful features which can be used at the time of releasing a new version of any service for the beta release.

To test the canary deployment feature, we have added two different versions of the user-management service. Second (v2) version of user-management service has the support of the user details feature, while the first (v1) version doesn't have this functionality implemented. VirtualService resource is created which navigates the 70% traffic to the first version subset while the remaining 30% traffic is navigated to the second version subset. DestinationRule resource creates both of these subsets to map these with the different deployment of user-management service. This setting now transfers only 30% of the total traffic to the new version.

Note: For above-mentioned setting for the canary deployment, you can refer the user_management_canary.yml file in deployment directory.

Conclusion:

In milestone 3, we separated each service into different pods. This change helped us to make the system more fault-tolerant. By deploying each service in different pods, we are now able to add a layer of the load balancer to balance the loads on multiple replicas of each service.

Previously, each service has to handle multiple responsibilities like business logic as well as service-to-service communication logic. We have integrated service mesh with our Weather Service application. Service mesh has taken the logic governing service-to-service communication out of the individual services and abstract it to a layer of infrastructure. Base infrastructure now has to focused on implementing business logic only. Indirectly, it improved the performance of the entire system.

Due to the integration of service mesh, we were able to add a few more functionalities to our application like traffic monitoring and management, load-balancing, and canary deployment. We analyzed other functionalities provided by service mesh like gateways, dark releases, fault injection, and circuit breaking. But due to limited time, we couldn't implement those features.

As a future scope, we planning to focus on remaining features provided by service mesh. The next step will be to compare the multiple service mesh options based on performance and different functionalities.

Application is accessible at Weather App

All source-code related to milestone 3 is available in the dockerize-project-test branch

Team Members Contributions:

We created the following issues and tasks while working on milestone 3. All members of the team have contributed equally throughout the project. In milestone 3, each member of the team have worked on the following issues:

Bivas Maiti:

Darshan Shinde:

Virendra Wali:

Weather App
- Milestone 1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly