This repository is a proof of concept demonstrating how a simple workflow can be built using Calculators on AWS Marketplace. The workflow contains the following steps:
- parsing molecular structures from an S3 bucket and sending them to an SQS
- calculating CNS-MPO scores with Calculators on AWS Marketplace
- storing the calculated results in a DynamoDB table
- filtering molecules based on their CNS-MPO scores (test)
If you have any questions or suggestions, please feel free to contact us at
calculators-support@chemaxon.com
- Create an AWS account if you do not have one yet.
- Subscribe to the Calculators on AWS Marketplace service and store your API key. Step-by-step guide.
- Install AWS CLI. The
aws configure
command is the fastest way to set up your AWS CLI installation. See this quickstart guide. - Install NodeJS.
The project uses only some basic AWS resources (S3, SQS, Lambda and DynamoDB). If you want to know more about them, please visit the related AWS documentation pages.
All resources and their configurations can be deployed with the following commands:
npm install
npm run cdk deploy -- --parameters cxnApiKey=<API_KEY> --parameters bucketName=<BUCKET_NAME>
- Subscribing to the service is free of charge. The cost is based on the number of calculation units consumed. Please check the Pricing page for further details.
- The price of the CNS-MPO calculation is 7 units/structure so the below test run on 100 structures costs 7 USD.
- There is some minimal cost of using the AWS resources (AWS Pricing documentation).
We recommend to delete the created resources after the test run finished: npm run cdk destroy
!! Please be careful and do not upload more than 100 structures for testing purposes to prevent an unexpected cost. !!
The following commands download molecular structures in SMILES format and upload the first 100 structures to the created S3 bucket:
wget https://mcule.s3.amazonaws.com/database/mcule_ultimate_express1_220828.smi.gz
gzcat mcule_ultimate_express1_220828.smi.gz | head -n 100 >molecules.smiles
aws s3 cp molecules.smiles s3://<BUCKET_NAME>/
A Lambda function is triggered by the S3 upload event, parses the CSV records and sends them to an SQS.
The records from the SQS are processed automatically. The CNS-MPO scores of the molecules are calculated and stored by another Lambda function.
Finally, the molecules can be searched based on the CNS MPO score:
aws dynamodb scan \
--table-name CxnResults \
--filter-expression "cns_mpo_score > :s" \
--expression-attribute-values '{ ":s": {"N": "5.75"} }' \
--projection-expression "mol,cns_mpo_score"