diff --git a/deployment/byoc.mdx b/deployment/byoc.mdx deleted file mode 100644 index 3024338..0000000 --- a/deployment/byoc.mdx +++ /dev/null @@ -1,5 +0,0 @@ ---- -title: "Bring Your Own Cloud" -description: "Configure Cursor for your documentation workflow" -icon: "cloud" ---- \ No newline at end of file diff --git a/deployment/self-hosting.mdx b/deployment/self-hosting.mdx deleted file mode 100644 index 45cc9e3..0000000 --- a/deployment/self-hosting.mdx +++ /dev/null @@ -1,5 +0,0 @@ ---- -title: "Self-Hosting" -description: "Configure Cursor for your documentation workflow" -icon: "server" ---- diff --git a/docs.json b/docs.json index 4d00c79..bd273de 100644 --- a/docs.json +++ b/docs.json @@ -49,18 +49,14 @@ ] }, { - "group": "Deploy E2B", + "group": "Infrastructure", "pages": [ - "deployment/byoc", - "deployment/self-hosting" + "infrastructure/architecture", + "infrastructure/self-hosting", + "infrastructure/byoc" ] } ] - }, - { - "anchor": "SDK Reference", - "icon": "square-terminal", - "href": "https://external-link.com/blog" } ] }, diff --git a/images/byoc-architecture-diagram.png b/images/byoc-architecture-diagram.png new file mode 100644 index 0000000..eb8e3e0 Binary files /dev/null and b/images/byoc-architecture-diagram.png differ diff --git a/infrastructure/architecture.mdx b/infrastructure/architecture.mdx new file mode 100644 index 0000000..0817c92 --- /dev/null +++ b/infrastructure/architecture.mdx @@ -0,0 +1,65 @@ +--- +title: "Architecture" +description: "E2B infrastructure architecture overview" +icon: "sitemap" +--- + +## Sandbox architecture +E2B is built around the orchestration of microVMs using Firecracker and KVM virtualization. +Its multi-tenant architecture allows you to run multiple sandboxes on a single machine while ensuring strong isolation between them. + +Core is an orchestrator that receives requests from the E2B control plane and manages the sandbox lifecycle. +Its responsible for low-level operations such as memory mapping, snapshotting, and system configuration, and is using Firecracker to run microVMs. + +E2B can run hundreds of nodes, with each node running an orchestrator that manages hundreds of sandboxes. +The API serves as the main point of entry for customers, handling all permissions and logic to build sandbox requests. +It is also responsible for fast and reliable scheduling of sandbox requests to orchestrators. + +When someone wants to access a port running in the sandbox, Edge (client-proxy) is used to route traffic from load balancer to the correct node. +On the node level, the orchestrator proxy completes routing directly to the sandbox network interface. + +## Template architecture + +We are using Ubuntu-based images for sandbox templates. +Currently, you can use a Docker image as a source for building, or a template build V2 that supports faster and code-declarative build configuration. + +We will extract the file system from the source we received, install and configure the required packages, and then create a snapshot of the file system. +This snapshot is later used to create a microVM that runs in the sandbox. We can create both file-system and memory snapshots for even faster sandbox creation. + +## Components + +### Services +- **API** - Handled consistency and logic for whole E2B platform. Used for sandbox lifecycle and template management. +- **Orchestrator** - Manages sandbox microVM lifecycle, proper system configuration, snapshotting, and else. +- **Template Manager** - Currently part of orchestrator, but can be deployed separately. Responsible for building sandbox templates. +- **Envd** - Small service running in each sandbox as a service handling communication with the E2B control plane and command execution. +- **Edge (client-proxy)** - Routes traffic to sandboxes, exposes API for cluster management, and gRPC proxy used by E2B control plane to communicate with orchestrators. +- **Docker Reverse Proxy** - Docker reverse proxy allows us to receive template source images with our own authentication and authorization. +- **Open Telemetry** - Collects logs, metrics, and traces from deployed services. Used for observability and monitoring. +- **ClickHouse** - Used for storing sandbox lifecycle metrics. +- **Loki** - Used for storing sandbox logs. Stored only in the cluster and not sent to Grafana or any other 3rd party service. + +### Cloud Services +- **Redis** - Used for metadata and synchronization between components. +- **Container Registry** - Storage for customers' source files of sandbox templates. +- **Object Storage** - Storage for sandbox snapshots/templates. Needs to support byte-range read requests. +- **PostgreSQL Database (currently only Supabase is supported)** - Used as a Postgres database and an OAuth/users management tool. +- **Machines with KVM virtualization support** - Google Cloud Platform VM with native/nested virtualization support. +- **Grafana (optional for monitoring)** - Used for monitoring logs/traces/metrics coming from Open Telemetry and ClickHouse. + +## Security + +### Virtualization isolation + +We are using Firecracker and Linux KVM to provide strong isolation between sandboxes. +This allows us to run multiple sandboxes on a single machine while ensuring that they are isolated from each other. +Firecracker is a lightweight virtualization technology that provides a minimalistic virtual machine monitor (VMM) for running microVMs. +It is designed to be secure and efficient, making it a great choice for running sandboxes. + +### Why visualization over containerization? +Docker is a popular containerization technology, but it does not provide the same level of isolation as Firecracker. +Docker containers share the same kernel and resources, which can lead to security vulnerabilities and performance issues. + +Firecracker, on the other hand, provides a lightweight virtual machine that runs its own kernel and resources, ensuring strong isolation between sandboxes. +This makes Firecracker a better choice for running sandboxes, especially in a multi-tenant environment +where security and performance are critical. diff --git a/infrastructure/byoc.mdx b/infrastructure/byoc.mdx new file mode 100644 index 0000000..b00cba8 --- /dev/null +++ b/infrastructure/byoc.mdx @@ -0,0 +1,72 @@ +--- +title: "BYOC (Bring Your Own Cloud)" +sidebarTitle: "Bring Your Own Cloud" +description: "Allows you to deploy E2B sandboxes to your own cloud VPC." +icon: "cloud" +--- + +BYOC is currently only available for AWS. +We are working on adding support for Google Cloud and Azure. + + + BYOC is offered to enterprise customers only. + If you’re interested in BYOC offering, please book a call with our team [here](https://e2b.dev/contact) or contact us at [enterprise@e2b.dev](mailto:enterprise@e2b.dev). + + +## Architecture + +Sandbox templates, snapshots, and runtime logs are stored within the customer's BYOC VPC. +Anonymized system metrics such as cluster memory and cpu are sent to the E2B Cloud for observability and cluster management purposes. + +All potentially sensitive traffic, such as sandbox template build source files, +sandbox traffic, and logs, is transmitted directly from the client to the customer's BYOC VPC without ever touching the E2B Cloud infrastructure. + +### Glossary +- **BYOC VPC**: The customer's Virtual Private Network where the E2B sandboxes are deployed. For example your AWS account. +- **E2B Cloud**: The managed service that provides the E2B platform, observability and cluster management. +- **OAuth Provider**: Customer-managed service that provides user and E2B Cloud with access to the cluster. + + + Graphics explaining key BYOC architecture parts + + +### BYOC Cluster Components +- **Orchestrator**: Represents a node that is responsible for managing sandboxes and their lifecycle. Optionally, it can also run the template builder component. +- **Edge Controller**: Routes traffic to sandboxes, exposes API for cluster management, and gRPC proxy used by E2B control plane to communicate with orchestrators. +- **Monitoring**: Collector that receives sandbox and build logs and system metrics from orchestrators and edge controllers. Only anonymized metrics are sent to the E2B Cloud for observability purposes. +- **Storage**: Persistent storage for sandbox templates, snapshots, and runtime logs. Image container repository for template images. + +## Onboarding + +Customers can initiate the onboarding process by reaching out to us. +Customers need to have a dedicated AWS account and know the region they will use. +After that, we will receive the IAM role needed for managing account resources. +For AWS account quota limits may need to be increased. + +Terraform configuration and machine images will be used for provisioning BYOC cluster. +When provisioning is done and running, we will create a new team under your E2B account that can be used by SDK/CLI the same way as it is hosted on E2B Cloud. + +## FAQ + +### How Is Cluster Monitored? + +Cluster is forwarding anonymized metrics such as machine cpu/memory usage to E2B Control plane for advanced observability and alerting. +The whole observability stack is anonymized and does not contain any sensitive information. + +### Can cluster automatically scale? + +A cluster can be scaled horizontally by adding more orchestrators and edge controllers. +The autoscaler is currently in V1 not capable of automatically scale orchestrator nodes that are needed for sandbox spawning. +This feature is coming in the next versions. + +### Are sandboxes accessible only from a customer’s private network? + +Yes. Load balancer that is handling all requests coming to sandbox can be configured as internal and VPC peering +with additional customer’s VPC can be configured so sandbox traffic can stay in the private network. + +### How control plane secure communication is ensured? + +Data sent between the E2B Cloud and your BYOC VPC is encrypted using TLS. + +VPC peering can be established to allow direct communication between the E2B Cloud and your BYOC VPC. +When using VPC peering, the load balancer can be configured as private without a public IP address. \ No newline at end of file diff --git a/infrastructure/self-hosting.mdx b/infrastructure/self-hosting.mdx new file mode 100644 index 0000000..a95cce6 --- /dev/null +++ b/infrastructure/self-hosting.mdx @@ -0,0 +1,186 @@ +--- +title: "Self-Hosting" +description: "Deploy E2B to your own cloud infrastructure" +icon: "server" +--- + +Self-hosting E2B allows you to deploy and manage the whole E2B open-source stack on your own infrastructure. +This gives you full control over your sandboxes, data, and security policies. + +We are currently officially supporting self-hosting on Google Cloud Platform (GCP) with Amazon Web Services (AWS), and on-premise support is coming soon. + + + If you are looking for a managed solution, consider our [Bring Your Own Cloud](/infrastructure/byoc) offering that will + bring you the same security and control with the E2B team managing infrastructure for you. + + +## Google Cloud Platform + +### Prerequisites + +**Tools** +- [Packer](https://developer.hashicorp.com/packer/tutorials/docker-get-started/get-started-install-cli#installing-packer) +- [Golang](https://go.dev/doc/install) +- [Docker](https://docs.docker.com/engine/install/) +- [Terraform](https://developer.hashicorp.com/terraform/tutorials/aws-get-started/install-cli) (v1.5.x) + - [This version that is still using Mozilla Public License](https://github.com/hashicorp/terraform/commit/b145fbcaadf0fa7d0e7040eac641d9aef2a26433) + - The last version of Terraform that supports Mozilla Public License is **v1.5.7** + - You can install it with [tfenv](https://github.com/tfutils/tfenv) for easier version management +- [Google Cloud CLI](https://cloud.google.com/sdk/docs/install) + - Used for managing GCP resources deployed by Terraform + - Authenticate with `gcloud auth login && gcloud auth application-default login` + +**Accounts** +- Cloudflare account with a domain +- Google Cloud Platform account and project +- Supabase account with PostgreSQL database +- **(Optional)** Grafana account for monitoring and logging +- **(Optional)** Posthog account for analytics + +### Steps + +1. Go to `console.cloud.google.com` and create a new GCP project + > Make sure your Quota allows you to have at least 2500 GB for `Persistent Disk SSD (GB)` and at least 24 for `CPUs`. +2. Create `.env.prod`, `.env.staging`, or `.env.dev` from [`.env.template`](https://github.com/e2b-dev/infra/blob/main/.env.template). You can pick any of them. Make sure to fill in the values. All are required if not specified otherwise. + > Get Postgres database connection string from your database, e.g. [from Supabase](https://supabase.com/docs/guides/database/connecting-to-postgres#direct-connection): Create a new project in Supabase and go to your project in Supabase -> Settings -> Database -> Connection Strings -> Postgres -> Direct. + + > Your Postgres database needs to have IPv4 access enabled. You can do that in the Connect screen. +3. Run `make switch-env ENV={prod,staging,dev}` to start using your env +4. Run `make login-gcloud` to login to `gcloud` CLI so Terraform and Packer can communicate with GCP API. +5. Run `make init` + > If this error, run it a second time. It's due to a race condition on Terraform enabling API access for the various GCP services; this can take several seconds. + + > A full list of services that will be enabled for API access: [Secret Manager API](https://console.cloud.google.com/apis/library/secretmanager.googleapis.com), [Certificate Manager API](https://console.cloud.google.com/apis/library/certificatemanager.googleapis.com), [Compute Engine API](https://console.cloud.google.com/apis/library/compute.googleapis.com), [Artifact Registry API](https://console.cloud.google.com/apis/library/artifactregistry.googleapis.com), [OS Config API](https://console.cloud.google.com/apis/library/osconfig.googleapis.com), [Stackdriver Monitoring API](https://console.cloud.google.com/apis/library/monitoring.googleapis.com), [Stackdriver Logging API](https://console.cloud.google.com/apis/library/logging.googleapis.com) + +6. Run `make build-and-upload` +7. Run `make copy-public-builds` +8. Run `make migrate` +9. Secrets are created and stored in GCP Secrets Manager. Once created, that is the source of truth--you will need to update values there to make changes. Create a secret value for the following secrets: +10. Update `e2b-cloudflare-api-token` in GCP Secrets Manager with a value taken from Cloudflare. + > Get Cloudflare API Token: go to the [Cloudflare dashboard](https://dash.cloudflare.com/) -> Manage Account -> Account API Tokens -> Create Token -> Edit Zone DNS -> in "Zone Resources" select your domain and generate the token +11. Run `make plan-without-jobs` and then `make apply` +12. Fill out the following secret in the GCP Secrets Manager: + - e2b-supabase-jwt-secrets (optional / required to self-host the [E2B dashboard](https://github.com/e2b-dev/dashboard)) + > Get Supabase JWT Secret: go to the [Supabase dashboard](https://supabase.com/dashboard) -> Select your Project -> Project Settings -> Data API -> JWT Settings + - e2b-postgres-connection-string + > This is the same value as for the `POSTGRES_CONNECTION_STRING` env variable. +13. Run `make plan` and then `make apply` + > Note: This will work after the TLS certificates are issued. It can take some time; you can check the status in the Google Cloud Console. +14. Setup data in the cluster by following one of the two + - `make prep-cluster` in `packages/shared` to create an initial user, etc. (You need to be logged in via [`e2b` CLI](https://www.npmjs.com/package/@e2b/cli)). It will create a user with same information (access token, api key, etc.) as you have in E2B. + - You can also create a user in the database, it will automatically also create a team, an API key, and an access token. You will need to build template(s) for your cluster. Use [`e2b` CLI](https://www.npmjs.com/package/@e2b/cli?activetab=versions)) and run `E2B_DOMAIN= e2b template build`. + + +### Interacting with the cluster + +#### SDK +When using SDK pass the domain when creating a new `Sandbox` in the JS/TS SDK +```javascript +import { Sandbox } from "@e2b/sdk"; + +const sandbox = new Sandbox({domain: ""}); +``` + +or in Python SDK + +```python +from e2b import Sandbox + +sandbox = Sandbox(domain="") +``` + +#### CLI +When using CLI, you can pass the domain as well +```sh +E2B_DOMAIN= e2b +``` + +### Monitoring and logging jobs + +To access the Nomad web UI, go to `https://nomad.`. Go to sign in, and when prompted for an API token, you can find this in GCP Secrets Manager. +From here, you can see nomad jobs and tasks for both client and server, including logging. + +To update jobs running in the cluster, look inside packages/nomad for config files. This can be useful for setting your logging and monitoring agents. + +### Deployment Troubleshooting + +If any problems arise, open a [GitHub issue on the repo](https://github.com/e2b-dev/infra/issues) and we'll look into it. + + +### Google Cloud Troubleshooting +**Quotas not available** + +If you can't find the quota in `All Quotas` in GCP's Console, then create and delete a dummy VM before proceeding to step 2 in self-deploy guide. This will create additional quotas and policies in GCP +``` +gcloud compute instances create dummy-init --project=YOUR-PROJECT-ID --zone=YOUR-ZONE --machine-type=e2-medium --boot-disk-type=pd-ssd --no-address +``` +Wait a minute and destroy the VM: +``` +gcloud compute instances delete dummy-init --zone=YOUR-ZONE --quiet +``` +Now, you should see the right quota options in `All Quotas` and be able to request the correct size. + + +## Linux Machine +All E2B services are AMD64 compatible and ready to be deployed on Ubuntu 22.04 machines. +Tooling for on-premise clustering and load-balancing is **not yet officially supported**. + +### Service images + +For running E2B core, you need to build and deploy **API**, **Edge (client-proxy)**, and **Orchestrator** services. +This will work on any Linux machine with Docker installed. Orchestrator is built with Docker but deployed as a static binary, because it needs precise control over the Firecracker MicroVMs in the host system. + +Building and provisioning services can be similar to what we do with Google Cloud Platform builds and Nomad jobs setup. +Details about architecture can be found in our [architecture](/infrastructure/architecture) sections. + +### Client machine setup + +#### Configuration + +The Orchestrator (client) machine requires a precise setup to spawn and control Firecracker-based sandboxes. +This includes a correct OS version installed (Ubuntu 22.04) with KVM. It's possible to run KVM with nested virtualization, but there are some performance drawbacks. + +Most of the configuration can be taken from our client [machine setup script](https://github.com/e2b-dev/infra/blob/main/packages/cluster/scripts/start-client.sh). +There are adjustments for the maximum number of inodes, socket connections, NBD, and huge pages allocations needed for the MicroVM process to work properly. + +#### Static binaries + +There is a need for a few files and folders to be present on the machine. +For correctly working sandbox spawning, you need to have Firecracker, Linux kernel, and Envd binaries. +We are distributing a pre-built one in the public Google Cloud bucket. + +```bash +# Access publicly available pre-built binaries +gsutil cp -r gs://e2b-prod-public-builds . +``` + +Static files and folder setup example. Please replace Linux and Firecracker with the versions you want to use. +Ensure you use the same Linux and Firecracker versions for both sandbox build and spawning. + +```bash +sudo mkdir -p /orchestrator/sandbox +sudo mkdir -p /orchestrator/template +sudo mkdir -p /orchestrator/build + +sudo mkdir /fc-envd +sudo mkdir /fc-envs +sudo mkdir /fc-vm + +# Replace with the source where you envd binary is hosted +# Currently, envd needs to be taken from your source as we are not providing it. +sudo curl -fsSL -o /fc-envd/envd ${source_url} +sudo chmod +x /fc-envd/envd + +SOURCE_URL="https://storage.googleapis.com/e2b-prod-public-builds" +KERNEL_VERSION="vmlinux-6.1.102" +FIRECRACKER_VERSION="v1.12.1_d990331" + +# Download Kernel +sudo mkdir -p /fc-kernels/vmlinux-${KERNEL_VERSION} +sudo curl -fsSL -o /fc-kernels/${KERNEL_VERSION}/vmlinux.bin ${SOURCE_URL}/kernels/${KERNEL_VERSION}/vmlinux.bin + +# Download Firecracker +sudo mkdir -p /fc-versions/${FIRECRACKER_VERSION} +sudo curl -fsSL -o /fc-versions/${FIRECRACKER_VERSION}/firecracker ${SOURCE_URL}/firecrackers/${FIRECRACKER_VERSION}/firecracker +sudo chmod +x /fc-versions/${FIRECRACKER_VERSION}/firecracker +```