Building a SaaS Multi-Tenant Data Ingestion, Storage and Querying Engine Using AWS Services

The project will provision an API Gateway, custom authorizer, Kinesis Data Stream, Kinesis Data Analytics, Kinesis Firehose, and S3 bucket for data lake.

Steps to deploy the solution:

Note: These steps has been tested in AWS Cloud9 environment. You can also run the deployment as long as you have all the dependencies installed.

Flink Application: We have a Apache Flink application written in Java as part of this deployment. In order for us to deploy the stack, we need to first compile the Flink application and create a jar file for deployment.

Step 1 - Prerequisities

Please install Maven to build the jar file. You can download Maven here. Only the binaries are required, so look for the link to apache-maven-{version}-bin.zip or apache-maven-{version}-bin.tar.gz.
Once you have downloaded the zip file, unzip the file, and add the bin folder to your path.

Step 2 - Build the project

Change directory in to the project's path

cd <project_path>

Run the command: mvn package"

This should start the build process. If everything complete successfully, you should see "BUILD SUCCESS" message once the build process completes.
This should create the file <project_path>/target/aws-kinesis-analytics-java-apps-1.0.jar.

Step 3 - Install dependencies

cd <project_path>/src/main/cdk/ingestion

Run command npm install to install project dependencies.

Step 4 - Deploy the CDK stack

Execute to provision the stack

cdk deploy
Please take note of the output from the cdk stack. We will need the output value in later steps.

Step 5 - Start the Kinesis Data Analytics application

Now that we have all resources provisioned, we'll need to start the Kinesis Data Analytics application.

cd <project_path>/scripts

Execute command:

chmod +x ./start-kda.sh

"./start-kda.sh -r <region i.e. us-west-2> -a <application name. This is the value from the cdk output 'IngestionStack.KinesisAnalyticsApplicationName>'"

Note: script will not provide any output. You can verify if the application started by going to Kinesis Data Analytics page in the console. Status field should show the application is "Running"

Step 6 - Create tenant user

Let's create a user

cd <project_path>/scripts

Execute command:
chmod +x ./create-user.sh

"./create-user.sh -c <user-pool-id This is from the cdk output 'IngestionStack.UserPoolId'> -u <email address> -p <password> -r <region i.e. us-west-2> -t <tenantId. You can make up a tenant Id or name>"

Step 7 - Generate JWT (JSON Web Token)

Now we have a user, we need the JWT (JSON Web Token) for the user from Amazon Congito.
cd <project_path>/scripts

Execute command:
chmod +x ./get-jwt.sh

"./get-jwt.sh -c <app client id. This is from the cdk output 'IngestionStack.AppClientId'> -u <email address> -p <password> -r <region i.e. us-west-2>"

you will want the IdToken section of the JWT to submit requests.

Step 8 - Test the solution

You can send a test message with Postman.
For the request endpoint, you will enter the endpoint for the API Gateway we just provisioned. You can get this from the cdk output 'IngestionStack.ApigatewayUrl'

You will want to click on the "Authorization tab, choose "Bearer Token" for type and copy the value from the IdToken of the JWT into the token field.
Click on the Body tab and make sure there is a Data node in the json message.

{
  Data: {
    "event": "user_clicked_product_search_button",
    "region": "US",
    "device": "TV"
  }
}

Step 9 - Run AWS Glue crawler

Now that we have sent a test message, we need to make sure we give Glue crawler an opportunity to run.

The crawler is deployed to run every 5 minutes. The crawler will need to run first to create a Glue db. You can go see the status of the crawler job by going to AWS Glue page in the console to view the status.

Once the job finish running, you can click on Databases in the Glue page and you should see a db called multi-tenant-db

You will want to click on the database to see the name of the table created by crawler. You will need the table name to run the Athena query. As part of this project, we deployed a Saved queries for Athena to show how you can query the data based on the tenantId.

Step 10 - Querying the data using Amazon Athena

You can find the Saved Query by going to Amazon Athena page in the console and click on the "Saved queries" tab.

Pre-requisite: You will need to setup Athena with an output S3 bucket before running the query.

SELECT * FROM "AwsDataCatalog"."multi-tenant-db"."TABLENAME" where tenant='TENANTID'

You will want to replace the values that are ALL CAPS from the saved query. You need to replace the TABLENAME with the name of the table created by crawler. You need to replace the TENANTID with the name of the tenant id you used when you created your user.

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
scripts		scripts
src/main		src/main
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
THIRD-PARTY-LICENSES.txt		THIRD-PARTY-LICENSES.txt
pom.xml		pom.xml

License

aws-samples/aws-saas-factory-multi-tenant-data-pipeline

Folders and files

Latest commit

History

Repository files navigation

Building a SaaS Multi-Tenant Data Ingestion, Storage and Querying Engine Using AWS Services

Steps to deploy the solution:

Step 1 - Prerequisities

Step 2 - Build the project

Step 3 - Install dependencies

Step 4 - Deploy the CDK stack

Step 5 - Start the Kinesis Data Analytics application

Step 6 - Create tenant user

Step 7 - Generate JWT (JSON Web Token)

Step 8 - Test the solution

Step 9 - Run AWS Glue crawler

Step 10 - Querying the data using Amazon Athena

Security

License

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Languages