Skip to content

MobilityDB/MobilityDB-Azure

Repository files navigation

Self-scalable Moving Object Databases on the Cloud: MobilityDB and Azure

Author: Dimitrios Tsesmelis
Email: tsesmelis.jim007@gmail.com
Phone: (+30) 6949550129

This work has been performed during the 4rth semester of my MSc in Big Data Management and Analytics and it consists the product of my master's thesis. This repository is a copy of the original one that can be found here.
A detailed report of the thesis is available as well as the presentation slides of the master's thesis defense.

Abstract

PostgreSQL is one of the most promising and quickly evolving relational database management systems. Being a fully extensible system, there are plenty of projects that build functionalities on top of PostgreSQL. MobilityDB is such a tool that enables users to efficiently store, manage and query moving object data, such as data produced by fleets of vehicles.

Living in the era of big data technologies and cloud computing, it is vital for cutting-edge database management systems to provide cloud native solutions that allow data processing at scale. Citus is such a tool that transforms any PostgreSQL server into a distributed database, without preventing any PostgreSQL native functionality.

Deploying a MobilityDB cluster using Citus on the cloud is rather a simple task. It requires deep knowledge of handling the provided infrastructure, configuring the network between the machines as well as time and effort to learn and manage the peculiarities of each cloud provider. In addition, maintaining a scalable system requires continuous monitoring of several factors and metrics that depict the performance of the system. Such a task implies repeating human effort that is sometime prone to errors.

In this work, we aim to target the aforementioned challenges by introducing automation to the rolling out process of a MobilityDB cluster on Microsoft Azure as well as to partially automate the management and maintenance of the deployed solution. Moreover, we provide a tool, called autoscaler, that is capable of automatically monitoring the database cluster, analyzing the collected performance metrics and making decisions that adapt the size of the cluster, according to the measured workload.

To assess the performance of our solution, we perform several experiments that combine two different benchmarks, namely BerlinMOD and Scalar benchmarks. The first provides a synthetic dataset that simulates the behavior of moving vehicles across Berlin, while the second one is used to simulate a number of concurrent user requests that query the system under test.

Automated Cluster Initialization

One major part of this work is to enable users to automatically deploy MobilityDB on Azure. Someone can use our software to roll out a MobilityDB cluster along with Citus extension with minimum human interaction. The image below depicts the cluster generation process. MobilityDB cluster initialization diagram

Kubernetes Architecture

After deploying the cluster, a Kubernetes deployment should be launched to initialize the corresponding Pods. The following image illustrates the final Kubernetes deployment. Kubernetes architecture

Execution Guidelines

The purpose of this section is to enable the user reuse the existing work.

Required Components

This work combines different tools and technologies to create a self-managed database on the cloud. The following list includes the required components along with some links that assist the users to install and configure them.

  • A local computer running Linux OS (tested with Ubuntu 20.04).
  • A Microsoft Azure account with an active subscription attached to it. The user must have full access to the Azure resources (Owner).
  • A Service Principal, created and configured for your Azure account. More details on how to create a Service Principal can be found here.

Cluster Initialization

To deploy a MobilityDB cluster on Azure, follow the below steps:

  1. Clone the Github repository
  2. Configure the bash script under the path MobilityDB-Azure/automaticClusterDeployment/KubernetesCluster/deployK8SCluster.sh, by modifying the values of the parameters placed on the top of the file in the following way:
    • AzureUsername parameter is used to login to your Azure account.
    • The default ResourceGroupName, Location and VirtualNetwork values can be used.
    • Subscription defines the name of the active Azure subscription.
    • VMsNumber determines the number of Worker nodes and VMsSize the size of each machine.
    • SSHPublicKeyPath and SSHPrivateKeyPath values specify the location of the ssh private and public keys to access the created VMs. By default, the files will be stored in ~/.ssh/ directory.
    • Gitrepo specifies the Github repository from which the installation scripts and the rest source files will be found. The default value should be used.
    • Service_app_url determines the url of the Service Principal and Service_tenant the tenant’s id. When executing the script, the Client secret should be given by the user to authenticate the application in Azure.
  3. Execute the script by running bash MobilityDB-Azure/automaticClusterDeployment/KubernetesCluster/deployK8SCluster.sh. After around 15 minutes, the cluster will be deployed on Azure.

When the cluster is ready, you can access any machine using the ~/.ssh/id_rsa key. The next step is to establish an ssh connection with the Coordinator VM. To connect to the machine, run ssh -i ~/.ssh/id_rsa azureuser@{vm_ip_address}, where vm_ip_address is the public ip address of the Coordinator VM that can be found inAzure portal. When connected to the VM, you can confirm that the K8S cluster has been successfully initialized by executing sudo kubectl get nodes.

Deploying A PostgreSQL Cluster

Until now we have created a Kubernetes cluster on Azure. The purpose of this section is to deploy a PostgreSQL cluster with Citus and MobilityDB extensions installed. First we need to modify the provided configuration files:

  1. Edit the content of MobilityDB-Azure/KubernetesDeployment/postgres-secrets-params.yaml file by changing the values of the username and password. These credentials will be the default keys to access the PostgreSQL server. The values should be provided as base64-encoded strings. To get such an encoding, you can use the following shell command: echo -n "postgres" | base64.
  2. Replace the content of the folder MobilityDB-Azure/KubernetesDeployment/secrets by creating your own SSL certificate that Citus will use to encrypt the database data. The existing files can be used for guidance.
  3. Edit the content of MobilityDB-Azure/KubernetesDeployment/postgres-deployment-workers.yaml by setting the replicas to be equal to the number of available worker machines that you want to create.
  4. Run bash MobilityDB-Azure/KubernetesDeployment/scripts/startK8s.sh to create the Kubernetes deployment. After some few minutes, the Pods will be created and ready to serve the database.

Now you are ready to connect to your distributed PostgreSQL cluster. After connecting to the Coordinator VM, execute the following shell command to ensure that the Pods are running, as show in the following screenshot : sudo kubectl get pods -o wide. Normally, you should see one Pod hosting the citus-master and a number of citus-worker Pods, equal to the replica number that you defined before.

alt text

You can connect to the Citus Coordinator by using the public ip of the master VM as host name/address, 30001 as port, postgres as database and the username and password that you defined before. The default values are postgresadmin and admin1234, respectively. Try to execute some Citus or MobilityDB queries. For instance, run select master_get_active_worker_nodes() to view the available Citus worker nodes.

Self-managed PostgreSQL Cluster

Assuming we have successfully done all the previous step, we are now ready to turn the PostgreSQL database into a self-managed cluster. The process of monitoring and applying an auto-scaling mechanism is managed by a script, implemented as a Daemon process and written in Python 3. To execute the script, follow the step below:

  1. Replace the parameters on the top of the MobilityDB-Azure/autoscaling/scripts/addNewVms.sh file with the same parameters that you provided in MobilityDB-Azure/automaticClusterDeployment/KubernetesCluster/deployK8SCluster.sh.
  2. Execute sudo -s command on the Coordinator VM to get root access rights.
  3. Create a virtual environment by running python3 -m venv venv and activate it source venv/bin/activate
  4. Install the required packages by first running pip install setuptools-rust, export CRYPTOGRAPHY_DONT_BUILD_RUST=1 and pip install -r MobilityDB-Azure/autoscaling/requirements.txt.
  5. Export the following environment variables, by adjusting their values as follows:
    • export AZURE_SUBSCRIPTION_ID=..., export AZURE_TENANT_ID=..., export AZURE_CLIENT_ID=... and export AZURE_CLIENT_SECRET=... by specifying the corresponding values from the Azure Service Principal.
    • export RESOURCE_GROUP=... with the Azure resource group name.
    • export POSTGREDB=postgres, export POSTGREUSER=... and export POSTGREPASSWORD=... with the corresponding server credentials.
    • export POSTGREPORT=30001.
    • export SCRIPTPATH=/home/azureuser/MobilityDB-Azure/autoscaling/scripts, assuming you have cloned the source code into /home/azureuser path.
  6. Copy the content of ~/.ssh/id_rsa.pub file from your local machine to the same path in the Coordinator VM.
  7. Finally, execute the following command to launch the auto-scaler: python3 MobilityDB-Azure/autoscaling/AutoscalingDaemon.py --action start --minvm 2 --maxvm 8 --storage /home/azureuser/autolog --lower_th 70 --upper_th 30 --metric sessions. You can get more information regarding the available parameters by running python3 MobilityDB-Azure/autoscaling/AutoscalingDaemon.py --help. Note: the auto-scaler is a Daemon process hence, the script can be executed in the background. Information about the state of the auto-scaler can be found in /var/log/autoscaling.log and /var/log/autoscaling_performance.log log files.

Releases

No releases published

Packages

No packages published