Skip to content
main
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
ci
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

quickstart-datalake-cognizant-talend

Data Lake on the AWS Cloud with Talend Big Data Platform, AWS Services, and Cognizant Best Practices

This Quick Start builds a data lake environment on the Amazon Web Services (AWS) Cloud by deploying Talend Big Data Platform components and AWS services such as Amazon EMR, Amazon Redshift, Amazon Simple Storage Service (Amazon S3), and Amazon Relational Database Service (Amazon RDS).

The Quick Start also provides an optional sample dataset and Talend jobs developed by Cognizant Technology Solutions to illustrate big data practices for integrating Apache Spark, Apache Hadoop, Amazon EMR, Amazon Redshift, and Amazon S3 technologies into the data lake implementation.

The Quick Start is for users who are evaluating big data in the cloud or looking to accelerate their big data initiative through the adoption of best practices for big data integration.

The Quick Start offers two deployment options:

  • Deploying the data lake environment into a new virtual private cloud (VPC) that's configured for security, scalability, and high availability
  • Deploying the data lake environment into an existing VPC in your AWS account

You can also use the AWS CloudFormation templates as a starting point for your own implementation.

Quick Start architecture for data lake on AWS

For architectural details, step-by-step instructions, and customization options, see the deployment guide.

To post feedback, submit feature ideas, or report bugs, use the Issues section of this GitHub repo. If you'd like to submit code for this Quick Start, please review the AWS Quick Start Contributor's Kit.

About

AWS Quick Start Team

Resources

License

Releases

No releases published

Packages

No packages published

Languages

You can’t perform that action at this time.