After completing the labs of this workshop, you now have all the assets and configurations ready to complete this challenge.
The challenge is to create a search index from a collection of Stack Overflow posts. You are only interested in posts that match certain criteria and don't want posts that are not deemed meaningful. To accomplish this flow you are going to be using the following:
Astra DB
terminal to create the store of posts and add new postsAstra Streaming
terminal to create the functions that will process new posts- CDC for Astra to connect DB and Streaming together
Elasticsearch
andKibana
for the search index
The basic outline you should follow to complete the challenge:
Database schema to enable CDC
CREATE TABLE IF NOT EXISTS crud_data.posts (
id int,
postTypeId int,
title text,
score int,
viewCount int,
answerCount int,
PRIMARY KEY ((id))
);
Update
resources/conversion-function.yaml
with your cdc topic name
./bin/pulsar-admin topics create persistent://<NAME>-camp-const/astracdc/conversion-output-topic
./bin/pulsar-admin topics create persistent://<NAME>-camp-const/astracdc/conversion-function-logs
# Did you remember to set the function topic name??
./bin/pulsar-admin functions create --function-config-file ../resources/conversion-function.yaml
./bin/pulsar-admin topics create persistent://<NAME>-camp-const/astracdc/decisions-output-topic
./bin/pulsar-admin topics create persistent://<NAME>-camp-const/astracdc/decisions-function-logs
./bin/pulsar-admin functions create --function-config-file ../resources/decisions-function.yaml
Use resources/elasticsearch-sink.yaml
in the Astra Streaming UI to create a new sink.
Add a few posts
SOURCE '/workspace/advanced-cdc-for-astra/resources/insert-posts.sql'
-
Login to Elastic instance: https://camp-constellation.kb.us-central1.gcp.cloud.es.io:9243/app/home#/
-
In the Elasticsearch deployment choose "Management" > "Stack Management" from the left menu
-
Then choose "Kibana" > "Data Views" to get a prompt that you already have data in ElasticSearch. Choose "Create data view".
-
Find your index on the right, type in that name in the left text box, and save
-
Navigate to the "Discover" option of Analytics from the left menu
Armed with your new CDC skills go to your customers, ask for their architectures, hear their pains, and heal all their hurts. You have reached ninja status.
- Each function and sink provide quite a bit of logging. You have the choice of either using the Astra UI to view output or you can use the
pulsar-admin
cli to query the logs topic.