# Instance-Optimized Serverless Programming
## 1. Introduction
Serverless computing is attractive thanks to the following characteristics:
* Pay-as-you-go, including scale-to-zero when unused.
* Autoscaling of resources to handle varying loads.
* Little to no management of servers.

The key benefit of serverless computing is **separating resource allocation from resource usage**:
* The cloud provider has a pool of constantly running nodes dedicated to a specific task (function invocation, object storage, etc).
* It only charges the user when these nodes are actually used, which avoids starting nodes on the fly and allows low latency usage.
* On the flip side, the cloud provider charges premium for each usage, meaning that user only saves money when his usage of the resources is relatively low.

Developing serverless systems require two components: **FaaS + BaaS**.
* FaaS for general purpose code written by the developer (Lambda, Azure Functions, etc).
    * These are ephemeral, isolated instances running developer code.
* BaaS to performs specialized tasks such as not suited for FaaS (e.g. durable storage).
    * Backend services are implemented in a serverful way, but the cloud provider provides a serverless illusion with a separation of allocation and usage.

We observe a few problems with the current serverless programming model:
1. For most serverless services, pay-as-you-go is only attractive at low scale, otherwise the premium charged by cloud providers makes serverless ineffective.
2. Due to the limitations of FaaS, any complex system has to be a backend offered by a cloud provider.
    * Only cloud providers can provide seperation of allocation and usage.
    * For example, there is no academic research on serverless database systems. All existing systems come from cloud companies (Aurora with AWS, Socrates with Microsoft, PolarDB with Alibaba and TaurusDB with Huawei). Researchers and open source programmers simply cannot compete.

Previous academic research has attempted developing better serverless models. They do so mainly by optimizing inter-function communication and data placement. However, they all run into the allocation/usage problem (e.g., Cloudburst, FaasM). Unless the cloud providers themselves adopt these model, overallocate necessary resources and bill by usage, it is impossible to use these models in a serverless manner. Replacing Lambda, DynamoDB, S3, or EFS is a non-starter; we must work with them.

In this paper, we introduce instance-optimized serverless programs (IOSP). Like serverless programs, IOSPs are pay-as-you-go, scale-to-zero, and autoscaling. Yet, at medium to high scale, they perform like serverful programs and are less expansive than serverless programs. We achieve this by augmenting the standard FaaS+BaaS model with dynamic **accelerators** that only run when it is cost effective to do so. These accelerators are implemented on top of cloud container services (ECS, ACS) to seamlessly speed up or reduce the cost of FaaS and BaaS tasks (e.g., compute, storage, communication).

To showcase the effectiveness of our technique, we built the SBTree, a high performance serverless B+Tree that (**hopefully**) performs as well or better than DynamoDB. We thus prove that IOSPs can help build a wider range of serverless systems at a lower price.

## 2. Motivation
Consider the following scenario which simulates a txn:
* A client send a message to an cloud actor with 8GB of memory and access to shared storage (S3 and EFS).
* The actor does some work for 5ms.
* It persists a WAL entry.
* It send response back to the client.

The actor can be implemented in the many ways we will shortly describe but, ideally, it should:
* Pause when inactive. Only billed when active.
* Instantly start when used. Preserve in-memory across recent invocations.
* Respond to messages with low latency (no indirections).
* Be Pay-as-you-go, and not have a cost premium at high usage.
    * In particular, should not have the allocation/usage problem that only cloud providers can solve.
* Have at most one running instance. This property is quite hard to guarantee in the cloud due to network partitions. Instead:
    * We use file locking for best effort deduplication.
    * We use the WAL to sequentialize message processing.
* No such actor exists in previous work.

Likewise the WAL should be:
* Safe:
   * Suppose `flush()` is called after a series of calls to `enqueue(entry)->lsn`.
   * The entries are safely stored at the specified lsn even in the presence of single-AZ failures:
       * Failures that do not simulateneously take down multiple availability zones.
       * We follow the AZ+1 failure model of Aurora.
* Fast. Flush latencies should stay below 1ms, since the WAL is the bottleneck of OLTP applications and is hard to paralellize.
* Pay only for the compute required to keep in-memory log buffers before flushing them, and for the storage of the log.
    * Again, no allocation/usage problem for the compute or the storage.
* No such log implementation exists.

We consider three implementations of an Actor + WAL:
* Option 1: Actor is a lambda function, uses S3 for indirect messaging, uses EFS to persist the log.
* Option 2: Actor is a dedicated container, uses http for direct messaging, and uses EFS to persist the log.
* Option 3: Actor is a dedicated container, uses http for direct messaging, and replication for even faster log persistence.
    * This is how a high performance database system would be implemented.

![Latency](figures/motivation/latency.png), ![Cost](figures/motivation/cost.png), ![Startup](figures/motivation/startup.png)

We show three measures in the following figures:
* How long to it takes to perform full task.
    * Option 3 is fastest, followed by option 2, followed by Option 1.
    * Option 1 takes 270ms (200ms for messaging, 5ms for work, 60ms for WAL).
    * Option 2 takes 12ms (~2ms for messaging, 5ms for work, ~5ms for WAL).
    * Option 3 takes 8ms (~2ms for messaging, 5ms of work, <1ms for WAL). It is 1.5x faster than Option 2.
* How long it takes for a start after a pause (this is a virtual actor so it can be paused).
    * Only Option 1 has a reasonable cold start 0.5.
    * Option 2 and 3 take 30-60s, which is intolerable. This is why **separation of allocation and usage** is crucial to serverless computing. Actually pausing containers and VMs creates intolerable latencies.
* How much it costs based on usage.
    * Option 1 is cheapest if actor is used up to 1/8th of the time (e.g. the actor receives a few requests per minute).
    * Option 2 is cheapest for medium to higher usage..
    * Option 3 is between Option 1 and 2 when usage is high.

In this work, we will present the Serverless Database Actor (SDA):
* Works nearly like the *ideal* actor and WAL described above. 
* We will use it as the building block for building the SBTree serverless B+Tree.
* Works like Option 1 at low activity. This is allows instant scale ups and cheaper compute costs with the downside of higher, but reasonable end-to-end latencies.
    * Unlike all previous research on better serverless computing models, does not have the allocation/usage problem that only cloud providers can solve.
* Works like Option 2 when it becomes cheaper than Option 1.
* Works like Option 3 when it becomes cheaper than Option 1 to allow submilliseconds WAL commits
    * Option 3 is never cheaper than Option 2, so Option 2 should not be the reference point for cost savings.## Instance-Optimized Serverless Computing

## 3. Instance-Optimized Serverless Computing
### 3.1 General Technique
We see from the motivation section, that we need to mix two general types of implementation:
1. Standard FaaS+BaaS implementation. This is the implementation used to provided scale-to-zero capabilities with instant startups, fine-grained pay-as-you-go which leads to lower costs under low usage, even if it may come with high communication and storage latencies.
    * In the previous example, an actor based that uses Lambda for compute, and S3 for message falls under this category.
    * A WAL that only uses EFS without replication also falls under this category.
2. High performance serverful implementation. This implementation has much slower scale ups, more coarse-grained costs, but is much more performant and costs much less under medium to high usage.
    * A actor that uses direct HTTP messaging falls under this category.
    * So does the WAL that uses multi-AZ quorum-based replication.

The following figure shows the technique we used to achieve this mix of implementations.
<br/>
<img src="figures/implementation/arch.png" alt="General Architecture" width="800"/>
<br/>

We will first generically describe these four components then, in the next subsections, showcase concrete examples:
1. The serverless endpoints. These are cloud provider given BaaS pay-as-you-go programs. S3, for example, is used for indirect messaging, and Lambda is used for stateless compute. These services are also used to implement the rescaler's logic. SQS is used for metrics collection; DynamoDB stores the rescaler's state; the rescaler itself is a Lambda function.
2. The dedicated instances. These are dedicated containers running on AWS Fargate. Since these containers can obtain IP addresses, we can direct HTTP requests to communicate with them. In addition, AWS Fargate will evenly spread containers across many availability zones, so we can use them to provide high durability and availability. Furthermore, these instances can communicate with the serverless endpoints to implement their tasks. For example, an object cache would spillover to S3 to evict its objects. Each of these instances communicates with the rescaler to push metrics and scale themselves down when usage is low.
3. The rescaler. The rescaler is primarily responsible for detecting usage and scaling up or down the high performance implementation. It collects usage metrics pushed by the clients and the dedicated instances. For example, based on the number of messages per seconds, a rescaler can decide that turning on a dedicated container will reduce costs and increase performance.
4. The client with a unified API. Since serverless programs are written to be agnostic of the underlying implementation, the client provides a unified API to multiple implementations. For example, to message an actor, the program should call just `send(actor_id, msg)`. The client may then translate the call to indirect S3 messages, or to direct HTTP requests. The client can read the rescaler's state from DynamoDB to know when to use the high performance implementation.


We will now describe a few concrete applications of the architecture above.

### 3.2 Function Invocation
Function invocation works as follows:
1. The user writes a function `process(request) -> response`.
2. The user deploys the function and specifies the compute resources required to execute the function.
3. The user calls `invoke(function_name, request)->response` to call the function, have it be executed with dedicated resources, and receive a response.
4. The function should be billed by its activity.

We can provide this functionality in two ways:
1. FaaS with Lambda. Lambda provides:
    * Instant scale up. Even cold start have a tolerable 500ms latency, and only occur if the function is active for a while.
    * Graceful handling of bursts because AWS has resources overallocated for executing lambdas.
    * Fine-grained pay-as-you-go. Lambda is billed for every ms of execution, but like all serverless services, becomes expansive under high usage.
2. Dedicated containers with a load balancer that ensures isolation of resources for each call to invoke. This provides:
    * Very cheap compute compared to Lambda, especially when using spot instances.
    * Lower latency warm starts. Once an instance is running, direct HTTP requests are faster than Lambda warm starts.
    * Ungraceful handling of bursts because it takes 30-60 seconds just to start new instances.

Given the tradeoffs above, a workload with spikes around a stable trend. The spikes can only be handled by Lambda. However, if we can measure the stable trend, we can use dedicated containers to handle that portion of the workload at lower price and lower latency. This is exactly what we will try achieve. Following the architecture diagram above, our instance-optimized implementation of function invocation works as follows:

1. Serverless endpoint: The only serverless endpoint needed here is Lambda (not counting those always used by the rescaler).
2. Serverful instances: Every fargate instance runs a worker container along with a load balancer sidecar. This is necessary is to provide the same guarantees as Lambda. An already active worker should neither process a second request, nor even dedicated resources to rejecting (e.g. OS thread context switches). The load balancer should contain that logic. Since load balancing is not the focus of this paper, we implement a simple decentralized form of client-side load balancing to avoid allocating a specialized load balancer. The client simply picks a random instance to send the request to. If that instance's worker is busy, the client falls back on Lambda. This strategy can be very wasteful at high enough scale, so in future work, we intend to deploy a more centralized load balancer. 
3. The rescaler aims to compute an exponential moving average of the `activity` of the function, meaning the fraction of time it is active in any minute. This activity is meant to capture the stable trend indepedently of spikes. It then deploy enough instance to handle this activity without increase costs. For example, with an activity of 50%, the function is only active half of the time and needs one dedicated container. If the activity is 250% then the function has more that two (concurrent) executions on average and needs three dedicated containers. Since dedicated instances are much cheaper than Lambda this does not result in a cost increase.
4. The client provides an `invoke(function_name, request) -> response` API call. We also provide an automated way of deploying functions written in Rust, but that is orthogonal to this paper. The client, by default, routes function invocations to AWS Lambda. However, it periodically reads DynamoDB to find the set of running instances and, when these instances are available, sends the requests to the instances' load balancer.


### 3.3 Actor Messaging
A virtual actor works as follows:
1. The use writes a message handler `handle(request)->response`.
2. The user deploys the actor with a unique name and specifies the compute resources to execute the function.
3. The user calls `send(actor_name, actor_id, request)->response` to call an instance of the actor with a unique id, have it handle the message with dedicated resources and receive a response.
4. Have the characterisitics described in the motivation (pausing when inactive, instant restarts, at most one instance, low latency messaging)

We can provide this functionality is two ways:
1. FaaS+BaaS with Lambda and S3. This provides.
    * Instant scale ups.
    * Fine-grained billed.
    * Bad messaging latencies due to the ~200ms required the read and write the request and the response using S3.
    * Bad shared storage latencies. We already saw that writing WAL entries to EFS is 10x slower on Lambda.
2. Dedicated containers with direct messaging. This provides:
    * Low latency messaging (direct HTTP requests) and shared storage access (5ms writes to EFS).
    * Billing that is coarse-grained, but is lower under high usage.
    * Large start up times of 30-60s which makes pause and restarting these very slow.

Given the tradeoffs above, our goal is as follows. Lambda and S3 should be used at low usage, or when restarting a paused actor. Then, if we detect medium to high usage, we should activate dedicated containers. In the 30 seconds required for container starts, we should still use FaaS+BaaS approach. Once the container starts, it will use file locking as well as the WAL (describe in next section) to prevent the Lambda from processing further messages. Following the architecture diagram above, our instance-optimized implementation of messaging works as follows:

1. Serverless endpoints: We use a Lambda for the compute, and we use S3 for indirect messaging. To mitigate duplicate execution we use three mechanism. Not that only the last mechanism is perfect, but only applies to actors that access the WAL.
    * First, we limit the concurrency of the Lambda to one.
    * Second, before processing a batch of messages, the Lambda acquires a file lock. This is not perfect as a network partition may lead to a loss of the lock before it is manually released.
    * Finally, the WAL will prevent any commit if another instance with a higher incarnation number is running. Every new instance gets a monotonically increasing incarnation number.
2. Serverful instances: An actor is represented is a single fargate instance that works as an HTTP server. This server uses the same deduplication above. Just like the lambda, it also polls S3 for indirect message to minimize message loss, especially when the instance freshly starts, and some messages are still being written to S3.
3. As with function invocation, the rescaler also monitors the activity of the actor using EMA. Given that a Lambda becomes expansive than a serverful instance if it is active more that 1/8th of the time, we use an activity of 12.5% as the threshold to switch between implementations.
4. The client provides the `send` function call. It monitors DynamoDB to see if the rescaler has activated an HTTP server. If an HTTP server is running, the client will use direct messaging. Otherwise it writes the request to S3, polls S3 for the response, and wakes up the Lambda using an asynchronous Lambda invocation.

### 3.4 Write Ahead Logging
A WAL is crucial to database implementations. It aims to provide low latency, durable, sequential recording of database operations. There are two ways to implement a WAL:
1. BaaS: use EFS to store and update the WAL.
    * Gain the durability and availability of EFS.
    * Slow when used from a Lambda (\~60ms), moderately fast when used from a serverful instance (~5ms).
    * Fine-grained billed. With EFS you pay only for storage.
    * Trivially correct.
2. Quorum-based replication to instances in multiple AZs.
    * High availability and durability (AZ+1 failure handling).
    * Very fast (<=1ms). Usually submillisecond.
    * Coarse-grained billing. Must keep 6 running instances in 3 AZs. Thus, this option is never cheaper that the BaaS version.
    * More complex to implement correctly.

Unlike our previous examples, the highest performance WAL implementation is never the cheapest one. As such, we only trigger this performance if the actor that owns the WAL spends more than 10% of its time waiting for WAL commits.

Here is our instance-optimized WALs work.

1. Serverless endpoints: On EFS, the WAL is a just a SQLite file. We use SQLite both for the storage of log entries, and also for the correct deduplication logic mentioned in the previous section thanks it transactional guarantees.
2. Serverful instances: The serverful instances are 6 instances running in three availability zones (two in each). Once a safe quorum of them has received a log entries, the entry is considered durable even if it not yet on the WAL file. Since we assume the safe quorum will never simultaneously fail, we are guaranteed that the entries will eventually make it to shared storage.
3. Rescaler: As we explain, the rescaler monitors the fraction of time spent waiting for the WAL. Once it exceeds 10%, we scale up the 6 instances.
4. The bulk of the logic is inside the client. When the replicated instances are inactive, the client just synchronously write the WAL entries to SQLite file on EFS. When the replicated instances are active, the client asynchronously write the entries, and asynchronously sends them to a quorum instances (send to all; wait for 4 out of 6). Whichever of the two operations completes first indicates the durability of the log. Even when the client crashes, some instance in the quorum will eventually complete the write. This makes log flushes very fast, makes recovery fairly complex since commit entries are now in two possible locations and need to be consolidated into one. Recovery should not frequently occur, so we don't consider a problem.


### 3.5 Putting it together: Serverless Database Actors
A serverless database actor is intended to be used as the building block of building serverless database. It combines the actor and WAL implementation above to provide the following features:
* Near instant scale-up, pausing when inactive and fast restart in the event of a crash.
* In steady state, messaging is as fast as direct messaging through HTTP.
* In steady state, WAL is as fast as dedicated replication-based persistence.
* High durability WAL.
* Exactly one instance of an SDA can commit to the log thanks to incarnation. Even messages that don't access the log are likely to be processed exactly once thanks to file locking.
* Fine-grained billing under low usage. Cheap billing in steady state.
* Moderately fast access to shared storage.

The SDA exemplifies our goal: without creating an allocation/usage problem, we can create high performance serverless systems. 

### Future Capabilities
* Dynamic object caches on top of S3.
* Message queues better than SQS.
* Any more complex (e.g. databases, knowledge graphs) should build upon the SDA, which we plan on further improving.

## 4. SBTree: The Serverless BTree
This is what I want to showcase as a complex system that can by built with our technique.
### Architecture
### Autoscaling
### Caveats

## Evaluation
### The Serverless Key-Value Store Benchmark
Develop a benchmark (the SKVSB) that measures the cost and performance of serverless key value stores under different usage patterns, loads and data volumes.
1. Two usage patterns:
    1. Infrequent: Used for a few hours in the day, inactive otherwise.
    2. Frequent: Constantly.
2. Two loads:
    1. Low: ~1 req/second.
    2. High: Constantly send requests across 10+ different clients.
    3. Cyclical: Over a period of 10 minutes, move from load to high, then back. 
3. Three volumes:
    1. Small: 1GB, fully cacheable in a single instance.
    2. Medium: 16GB, requires a few instances.
    3. Large: 64GB+, requires many instances.
4. Three read/update ratio:
    1. Read-only.
    2. 90/10 read-mostly.
    3. 50/50 balanced.
4. Setup:
    * Systems: SBTree (ours), Dynamodb Serverless, Mongodb Serverless.
    * Mechanism: Launch functions that performs 10 operations.
    * Measurement: Measure both per-operation latency and end-to-end latency. Also measure (approximate) cost.
### Sanity Check
This section is just to test that the implementations are working.

#### Function Invocation
Consider the following:
* There is an 2GB *echo* function. The function returns its input after sleeping for 100ms to simulate productive work.
* We invoke it as a sustained rate to keep always active (~9-10 invocations/second).

Here are the results:
![Latency](figures/sanity/function_latency.png) ![Cost](figures/sanity/function_cost.png)

We observe the following:
* Initially, Lambda is used to execute functions. This allows functions to instantly scale up and respond, but comes with higher lantecies and costs.
* After a while, the system detects that cost savings are possible, so it turns on function containers.
* As the computations are moved to the accelerators, we observe better latencies (near ideal), and costs (3-4x lower).
* Function acceleration is working, at least in simple scenarios.


#### Actor Messaging
Consider the following setup.
* There is a 2GB *echo* actor. The actor returns its input after sleeping for 100ms.
* We invoke it at a sustained high rate.

Here are the results:
![Latency](figures/sanity/message_latency.png) ![Cost](figures/sanity/message_cost.png)

We observe the following:
* Initially, Lambda+S3 are used to run the actor and perform messaging. This allows actors to instantly scale up and respond, but comes with much higher latencies and costs.
* After a while, the system detects that it can reduce costs by switching to addressable containers with direct messaging.
* As the actor is moved, we observer near-ideal round-trip latencies, and costs.

#### Actor WAL
Consisder the following setup.
* A 2GB actor is committing 1-byte WAL entries at a sustained high rate.
* We measure commit latency (ignore messaging latency).

Here are the results:
![Latency](figures/sanity/wal_latency.png) ![Cost](figures/sanity/wal_cost.png)

We observe the following:
* Initially Lambda+EFS are used. Allows instant scale up, but costs more and has commit latencies >50ms.
* Then Containers+EFS are used. Seamlessly replaces Lambda, costs less, but still has commit latencies >5ms.
* Finally replication is used, which speeds up execution allows commit lantecies <1ms, but with added cost of replicas. NOTE: the overall cost with replication should be around twice higher (same price as Lambda). This figure has a error. 

### Microbenchmarks
#### Function Invocation
#### Messaging
#### WAL
### SBTree Benchmarks
#### YCSB
#### Autoscaling
#### Recovery


## Related Works
### Serverless Computing
### Cloud Databases

## Conclusion

In [2]:
import pandas as pd
import numpy as np
import altair as alt
from altair import datum
import warnings
warnings.filterwarnings('ignore')

In [52]:
results_file = "obelisk/microbench/results/microbench/invoke.csv"
columns = ["since", "duration", "cost_per_ms"]
results = pd.read_csv(results_file, names=columns)
results["Type"] = np.array(["Actual" for _ in range(len(results))])


ideal = results.copy()
ideal.duration = np.array([100 for _ in range(len(results))])
ideal.Type = np.array(["Ideal" for _ in range(len(results))])
ideal.cost_per_ms = np.array([results.cost_per_ms.min() for _ in range(len(results))])
results = pd.concat([results, ideal])

duration_chart = alt.Chart(results).mark_point().encode(
    x = alt.X('since', title='Time since start (s)'),
    y = alt.Y('duration', title='End-to-end lantency (ms)'),
    color = 'Type',
).properties(
    title='Function Invocation Performance'
)

cost_chart = alt.Chart(results).mark_point().encode(
    x = alt.X('since', title='Time since start (s)'),
    y = alt.Y('cost_per_ms', title='Cost per ms of activity ($)'),

    color = 'Type',
).properties(
    title='Function Invocation Cost'
)


duration_chart

In [53]:
cost_chart

In [58]:
results_file = "obelisk/microbench/results/microbench/message.csv"
columns = ["since", "duration", "cost_per_ms"]
results = pd.read_csv(results_file, names=columns)
results["Type"] = np.array(["Actual" for _ in range(len(results))])


ideal = results.copy()
ideal.duration = np.array([100 for _ in range(len(results))])
ideal.Type = np.array(["Ideal" for _ in range(len(results))])
ideal.cost_per_ms = np.array([results.cost_per_ms.min() for _ in range(len(results))])
results = pd.concat([results, ideal])

duration_chart = alt.Chart(results).mark_point().encode(
    x = alt.X('since', title='Time since start (s)'),
    y = alt.Y('duration', title='End-to-end lantency (ms)'),
    color = 'Type',
).properties(
    title='Actor Messaging Performance'
)

cost_chart = alt.Chart(results).mark_point().encode(
    x = alt.X('since', title='Time since start (s)'),
    y = alt.Y('cost_per_ms', title='Cost per ms of activity ($)'),
    color = 'Type',
).properties(
    title='Actor Messaging Cost'
)


duration_chart

In [59]:
cost_chart

In [46]:
results_file = "obelisk/microbench/results/microbench/persist.csv"
columns = ["since", "duration", "cost_per_ms"]
results = pd.read_csv(results_file, names=columns)
results["Type"] = np.array(["Actual" for _ in range(len(results))])

# ideal = results.copy()
# ideal.duration = np.array([1 for _ in range(len(results))])
# ideal.Type = np.array(["Ideal" for _ in range(len(results))])
# ideal.cost_per_ms = np.array([results.cost_per_ms.min() for _ in range(len(results))])
# results = pd.concat([results, ideal])

duration_chart = alt.Chart(results).mark_point().encode(
    x = 'since',
    y = 'duration',
    color = 'Type',
)

cost_chart = alt.Chart(results).mark_point().encode(
    x = 'since',
    y = 'cost_per_ms',
    color = 'Type',
)

duration_chart

In [47]:
cost_chart

In [3]:
# Motivation data
durations = pd.DataFrame({
    'option': ['Option 1', 'Option 1', 'Option 1', 'Option 2', 'Option 2', 'Option 2', 'Option 3', 'Option 3', 'Option 3'],
    'component': ['Work', 'Messaging', 'WAL', 'Work', 'Messaging', 'WAL', 'Work', 'Messaging', 'WAL'],
    'running_time': [5, 200, 60, 5, 2, 5, 5, 2, 1],
})

alt.Chart(durations).mark_bar().encode(
    x=alt.X('option:N', title='Implementation Option'),
    y=alt.Y('sum(running_time):Q', title='End-to-end (ms)'),
    color='component'
).properties(
    title='End-to-end Latencies'
)

In [5]:
# Motivation data
starts = pd.DataFrame({
    'option': ['Option 1', 'Option 2', 'Option 3'],
    'starting_time': [0.5, 30, 30],
})

alt.Chart(starts).mark_bar().encode(
    x=alt.X('option:N', title='Implementation Option'),
    y=alt.Y('starting_time', title='Startup Time After a Pause (s)'),
).properties(
    title='Starting Times'
)

In [99]:
# Motivation duration
options = []
costs = []
activities = []
for i in range(0, 101):
    activity = i / 100.0
    # Lambda.
    activities.append(i)
    options.append("Option 1")
    costs.append(activity * 0.0000166667 * 8 * 3600)
    # ECS without replication.
    activities.append(i)
    options.append("Option 2")
    if i == 0:
        costs.append(0)
    else:
        costs.append(4*0.012144 + 8*0.0013335)
    # ECS with replication
    activities.append(i)
    options.append("Option 3")
    if i == 0:
        costs.append(0)
    else:
        costs.append(4*0.012144 + 8*0.0013335 + 6*(0.012144 + 2*0.0013335))
    
costs = pd.DataFrame({
    'option': options,
    'cost': costs,
    'activity': activities,
})

alt.Chart(costs).mark_line().encode(
    x=alt.X('activity:Q', title='Percent Active Time'),
    y=alt.Y('cost:Q', title='Hourly Compute Cost'),
    color='option',
).properties(
    title='Compute Costs'
)