Massively parallel on-demand serverless computing using AWS Lambda.
Python 3.9 is preferred, which can be easily installed via conda
:
You can install these and npm with the below:
conda create -n <envname> -c conda-forge nodejs python=3.9
conda activate <envname>
# Install the `lambdamap` python package
pip3 install -e .
# (optional) install ipywidgets to visualise the LambdaMap progress bars in Jupyter notebooks
pip3 install ipywidgets
# Install the AWS CDK Toolkit and CLI
npm i -g aws-cdk
Deploy the LambdaMap stack, you can configure the stack using the following
make
variables:
STACK_NAME
(default:STACK_NAME=LambdaMapStack
)- name of the lambdamap cloudformation stack
FUNCTION_NAME
(default:FUNCTION_NAME=LambdaMapFunction
)- name of the AWS Lambda function that will execute your python function
MEMORY
(default:MEMORY=256
)- memory, in MB, to assign to the Lambda function
TIMEOUT
(default:TIMEOUT=900
)- timeout, in seconds, to assign to the Lambda function
EXTRA_CMDS
(default:EXTRA_CMDS=''
)- e.g:
EXTRA_CMDS="'pip install pandas'"
- additional commands to execute in a single
RUN
instruction of theDockerfile
, such aspip install
for installing additional python packages in the Lambda container. - note that use of the double and single quotes (
"'...'"
)
- e.g:
CDK_TAGS
(default:CDK_TAGS='--tags Project=lambdamap'
)- e.g:
CDK_TAGS='--tags Project=lambdamap --tags Department=Dev'
- custom resource tags
- e.g:
# Deploy the LambdaMap stack
make bootstrap
make deploy EXTRA_CMDS="'pip install pandas'" CDK_TAGS='--tags Project=lambdamap'
import pandas as pd
from lambdamap import LambdaExecutor
# Define your custom function
def my_power(x, **kwargs):
import numpy as np
import pandas as pd
exponent = kwargs.get("exponent", 2)
df = pd.DataFrame()
df["formula"] = [f"{x}**{exponent}"]
df["result"] = np.power(x, exponent)
return df
# Instantiate the Lambda executor
executor = LambdaExecutor(
max_workers=100,
lambda_arn="LambdaMapFunction")
# Generate the function payloads
payloads = [{"args": (i,), "kwargs": {"exponent": 2}} for i in range(100)]
# Distribute the function calls over the lambdas
results = executor.map(my_power, payloads)
# Concatenate the list of results into a single dataframe
df_results = pd.concat(results)
df_results