Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: meilisearch deployment #146

Closed
wants to merge 1 commit into from

Conversation

wperron
Copy link
Contributor

@wperron wperron commented Sep 23, 2020

Add a complete deployment of Meilisearch on EC2 to offer a better search experience for deno.land/x.

Overview

diagram

Compute

After an experiment with Fargate I figured EC2 would be a much simpler option to manage while also being cheaper for the compute/hour. I opted for a single t3a.small instance to start with; the t3a class is cheaper than t3, and the small sizing should be well enough to begin with.

I used a custom AMI that builds off of vanilla Ubuntu 20.04 and simply adds Meilisearch as a systemctl service. I used Packer to build the image. Seeing as we already use Terraform it made sense to stick with the Hashicorp stack. The recipe is included in the the PR. To build the image, simply run :

packer build -var aws_access_key=AKIAEXAMPLE -var aws_secret_key=example123456abc -var region=us-east-1 ./terraform/packer/meilisearch.pkr.hcl

Storage

Again, just went with the simple solution here: The EBS root device on the EC2 instance is configured to 20Gb and set to not destroy when the instance terminates, so in case we need to do an upgrade more complex than simply resizing the instance we can use the same EBS volume as root device on a new instance, thus keeping all the same data.

Networking

The EC2 instance is configured without a public IP and sits in a public subnet to allow accessing it over ssh. I've included an internal Application Load Balancer in front of the instance, and used a VPC Link in the API Gateway to use API Gateway as a proxy to the load balancer. Only the API Gateway is publicly accessible over HTTP.

Note: Private integrations with VPC Links in API Gateway only allow for integrations of type "HTTP_PROXY" meaning that we can either allowlist the endpoints we want public by creating a route that exactly matches the meilisearch api route (which is what I've done here) or we could create a separate API Gateway with a different subdomain, with a single route for /{proxy+} that would effectively expose the entire Meilisearch api.

Also, as-is the communication between the API Gateway to the ALB is in HTTP and not HTTPS. Switching to HTTPS would require associating a valid DNS record that matches the certificate associated with the ALB, and I have no idea if it's even possible to do via CloudFlare.

Security

couple of things here:

  • the VPC Link on the API Gateway has a security group that allows HTTP and HTTPS traffic from 0.0.0.0/0
  • the ALB is marked as internal so it doesn't resolve outside of the VPC
  • the EC2 instance has a secutiry that allows:
    1. SSH traffic from 0.0.0.0/0
    2. HTTP traffic on port 7700 only, from the SG associated with the VPC Link

Upgrading the Instance

Simply changing the instance type is super easy; just change the type in Terraform. This is a change that doesn't require dropping and recreating, so is pretty safe to do. I tested it by upgrading from t3a.small to t3a.medium and the service became unavailable for around 1 minute. It's not as fancy as having a rolling update or canary release but should be good enough considering current load.

Prerequisites

To be able to connect via ssh, the plan will now require an ssh_key variable which is the name of the EC2 key pair to associate with the instance. This should ideally be created beforehand and included in a GH secret.

If the AMI is not already built before running Terraform, the plan step will simply fail because the data source won't be able to find the image.

Any indexes or resources managed by Meili will have to be created by hand after the deployment; this PR simply creates the underlying infrastructure. While testing locally I've found that connecting to the instance via SSH and using curl on localhost to be the easiest way to do that.

Fixes #69

@wperron
Copy link
Contributor Author

wperron commented Sep 23, 2020

Quick overview of the pricing:

  • EC2: $13.98 with a single t3a.small instance (2cpu - 2Gb)
  • EBS: $1.00/mo for 20Gb provisioned (assuming the 30Gb of free tier is already taken elsewhere, otherwise possibly $0)
  • ALB: $16.74/mo flat rate + LCU-hours depending on new http connections per hour

Add a complete deployment of Meilisearch on EC2 to offer a better search
experience for deno.land/x.

resources include:
* EC2 instance (w/ persistent root ebs device)
* Application Load Balancer
* Private API Gateway integration (http proxy)
* Appropriate security groups
@wperron
Copy link
Contributor Author

wperron commented Oct 15, 2020

I should also mention, given that the team operating this deployment is pretty small, the pragmatic alternative would probably to use AWS' managed ElasticSearch service. With a small cluster of t3 instances the pricing would be similar and would require less maintenance and be easier to upgrade. Not to mention; it also comes with a Kibana instance alongside so we can directly play around in the index and build dashboards, saved filters etc.

That's not to say I'm not confident about the proposed infra, quite the opposite; I just want to highlight some options.

Copy link
Member

@lucacasonato lucacasonato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Let's land this so we can start integrating. Few questions:

  • How do we access this from the lambdas to add data?
  • Is updating meilisearch as simple as described below, or are there extra steps involved?
    1. Run packer to build new AMI
    2. Redeploy terraform, which will recreate ec2 instance with same block device?

@wperron
Copy link
Contributor Author

wperron commented Oct 24, 2020

hey @lucacasonato , thanks for taking the time to review this.

How do we access this from the lambdas to add data?

With the current state of this PR, you could either run a Lambda inside the same VPC and target the ALB on port 80 (I'd have to do some tests, there may be small tweaks to the Security Groups there but nothing major) or you could go through API Gateway which is what is publicly available anyway, though that might incur higher latencies because the call would have to go out to the internet to resolve the domain name.

It also raises the question of how are we going to set up the indexing mechanism. A Lambda consuming an SQS would be pretty simple, but we could also use Spot Instances to benefit from the lower compute cost, or capitalize on the fact that we already have an EC2 instance running and spawn a second process that could consume the queue right on the same host. I think we should have another discussion on this topic, maybe open another issue about this?

I should also reiterate that no matter what we choose to do, we'll likely have to setup auth keys for Meili to restrict write access and only allow read access from the internet. I haven't included it in this PR, I think the AWS Config Manager could help there, otherwise maybe go for something like Ansible.

Is updating meilisearch as simple as described below, or are there extra steps involved?

I'll double check later today just to be sure. I remember clearly testing an instance size upgrade, but not a Meili version upgrade. It's probably possible to simply connect to the instance via ssh and run the upgrade manually (or using Ansible) but I think having immutable AMIs and letting Terraform do the heavy lifting is probably a better way to do it.

This point leads into another point; just how permanent does the block device data need to be? In my opinion if we setup up good caching at the API level, maybe even stick a Redis cluster in front of the instance to cache reads and couple that with a comprehensive indexing solution we might get away with treating the EC2 instance data as ephemeral. That would be an ideal case I think, but would likely require a lot of instrumentation around the instance, which would also drive up the infra costs. Again, we can probably address these in a subsequent discussion/issue.

@wperron
Copy link
Contributor Author

wperron commented Nov 1, 2020

double checked the version upgrade; changing the ami does create a new instance, with a new block device. The old block device is still there, which means the data can still be accessed but it's not a simple "create new ami, run terraform" update.

I'll try something with an additional block device that can be managed independently of the instance, that'll probably solve the issue.

@wperron
Copy link
Contributor Author

wperron commented Nov 4, 2020

Quick update: dealing with an additional EBS device is trickier than it seemed. For instance, when the device is first created, it has not partition or filesystem. However when changing the EC2 instance (for example for upgrading the Meili version) and re-attaching the existing EBS device to the new instance, the partition and filesystem will be intact.

Bottom line is; the EBS device has a lifecycle of its own, separate from the EC2 instance. I don't know what the best way forward is. One option I see is that have a local-provisioner on the EBS device to launch a script to create a temporary EC2 instance and run a remote script to partition and create the filesystem on the device so that it's already "warm" when attaching it to any instance.

Another option would be to move the EBS and the Meili setup to a set of AWS Systems Manager scripts and automation tasks. It would likely make it easier to work with the additional EBS device at the cost of not using independent AMIs for new Meili versions. It might be a good compromise though since we could also automate version upgrades through Systems Manager so that no manual action has to be taken

@lucacasonato wdyt?

@wperron
Copy link
Contributor Author

wperron commented Jan 21, 2021

Looking back at this, there's a lot of cool things in this PR and I'm sure we could get it to work well enough, but I'm not really satisfied with it, it's a lot of work just to get a search engine up and running, not to mention operating it down the line, I think we should go with a managed solutions like Algolia, something we discussed on Discord.

@wperron wperron closed this Jan 21, 2021
@wperron wperron deleted the feat/meilisearch branch March 9, 2021 12:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve module search
2 participants