GitHub - awslabs/project-lakechain at pkgstats.com

Project Lakechain

Cloud-native, AI-powered, document processing pipelines on AWS.

🔖 Features

🤖 Composable — Composable API to express document processing pipelines using middlewares.
☁️ Scalable — Scales out-of-the box. Process millions of documents, scale to zero automatically when done.
⚡ Cost Efficient — Uses cost-optimized architectures to reduce costs and drive a pay-as-you-go model.
🚀 Ready to use — 60+ built-in middlewares for common document processing tasks, ready to be deployed.
🦎 GPU and CPU Support — Use the right compute type to balance between performance and cost.
📦 Bring Your Own — Create your own transform middlewares to process documents and extend Lakechain.
📙 Ready Made Examples - Quickstart your journey by leveraging 50+ examples we've built for you.

🚀 Getting Started

👉 Head to our documentation which contains all the information required to understand the project, and quickly start building!

What's Lakechain ❓

Project Lakechain is an experimental framework based on the AWS Cloud Development Kit (CDK) that makes it easy to express and deploy scalable document processing pipelines on AWS using infrastructure-as-code. It emphasizes on modularity of pipelines, and provides 40+ ready to use components for prototyping complex document pipelines that can scale out of the box to millions of documents.

This project has been designed to help AWS customers build and scale different types of document processing pipelines, ranging a wide array of use-cases including metadata extraction, document conversion, NLP analysis, text summarization, translations, audio transcriptions, computer vision, Retrieval Augmented Generation pipelines, and much more!

Show me the code ❗

👇 Below is an example of a pipeline that deploys the AWS infrastructure to automatically transcribe audio files uploaded to S3, in just a few lines of code. Scales to millions of documents.

LICENSE

See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 185 Commits
.cloud9		.cloud9
.cspell		.cspell
.devcontainer		.devcontainer
.github		.github
.husky		.husky
assets		assets
docs		docs
examples		examples
packages		packages
.eslintignore		.eslintignore
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
.npmignore		.npmignore
.nvmrc		.nvmrc
.prettierignore		.prettierignore
.prettierrc.yml		.prettierrc.yml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
SECURITY.md		SECURITY.md
THIRD-PARTY-LICENSES		THIRD-PARTY-LICENSES
bandit.yml		bandit.yml
commitlint.config.js		commitlint.config.js
cspell.config.yml		cspell.config.yml
jest.config.js		jest.config.js
lerna.json		lerna.json
nx.json		nx.json
package-lock.json		package-lock.json
package.json		package.json
sonar-project.properties		sonar-project.properties
tsconfig.json		tsconfig.json

License

awslabs/project-lakechain

Folders and files

Latest commit

History

Repository files navigation

Project Lakechain

🔖 Features

🚀 Getting Started

What's Lakechain ❓

Show me the code ❗

LICENSE

About

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Languages