A collection of the things used during the Astra and StarTree demonstration
This is a WIP at the moment. It doesn't work... yet.
- Astra CLI
- cURL
- awk
Follow this guide to create an Astra account, a token, and config the cli.
# Should output your astra org uuid
astra org id
Follow this guide to create a Pinot token from your StarTree account.
PINOT_TOKEN="<TOKEN_HERE>"
Make sure the cloud providers and cloud regions match between DB and Steaming. It is a requirement of Astra CDC.
ORG_ID=$(astra org id)
CLOUD_PROVIDER_DB="GCP"
CLOUD_REGION_DB="us-central1"
CLOUD_PROVIDER_STREAMING="gcp"
CLOUD_REGION_STREAMING="uscentral1"
DB_NAME="suggestions-demo"
KEYSPACE_NAME="page_view_data"
TABLE_NAME="raw_clicks"
TENANT="webstore-page-views"
astra db create \
--keyspace ${KEYSPACE_NAME} \
--cloud ${CLOUD_PROVIDER_DB} \
--region ${CLOUD_REGION_DB} \
--wait \
${DB_NAME}
DB_ID=$(astra db get ${DB_NAME} --key id)
Do not continue until the DB has a status of "ACTIVE"
astra db get ${DB_NAME}
astra db cqlsh ${DB_NAME} \
--execute "CREATE TABLE IF NOT EXISTS ${KEYSPACE_NAME}.${TABLE_NAME}(click_epoch bigint PRIMARY KEY, UTC_offset int, request_url text, user_agent text, visitor_id text, coords text);"
astra streaming create \
--cloud ${CLOUD_PROVIDER_STREAMING} \
--region ${CLOUD_REGION_STREAMING} \
${TENANT}
WEB_SERVICE_URL=$(astra streaming get ${TENANT} -o json | jq -r '.data.cellValues[] | select(.Attribute == "WebServiceUrl") | .Value')
CLUSTER_NAME=$(astra streaming get ${TENANT} -o json | jq -r '.data.cellValues[] | select(.Attribute == "Cluster Name") | .Value')
PULSAR_TOKEN=$(astra streaming pulsar-token ${TENANT})
curl -S --fail -X POST ${WEB_SERVICE_URL}/admin/v3/astra/tenants/${TENANT}/cdc \
--header "Authorization: ${PULSAR_TOKEN}" \
--header "X-DataStax-Pulsar-Cluster: ${CLUSTER_NAME}" \
--data '{
"databaseId": "'${DB_ID}'",
"databaseName": "'${DB_NAME}'",
"keyspace": "'${KEYSPACE_NAME}'",
"orgId": "'${ORG_ID}'",
"tableName": "'${TABLE_NAME}'",
"topicPartitions": 3
}'
# Add a row of data to create the topic schema
astra db cqlsh ${DB_NAME} --execute "INSERT INTO ${KEYSPACE_NAME}.${TABLE_NAME}(click_epoch, UTC_offset, request_url, user_agent, visitor_id, coords) values(1665769440000, -4, 'https://request.com/something/page.html', 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.106 Safari/537.36 OPR/38.0.2220.41','1234asd','{\"coords\":{\"latitude\":37.40730296266554,\"heading\":null,\"accuracy\":40,\"altitudeAccuracy\":null,\"altitude\":null,\"longitude\":-122.08019055687454,\"speed\":null},\"timestamp\":1339712284200}');"
# Retrieve the auto-created topic
PULSAR_TOPIC=$(astra streaming pulsar-shell --execute-command "admin topics list ${TENANT}/astracdc" ${TENANT} | grep data-.*-0 | awk '{gsub("-partition-0","");print}')
curl -S --fail -X POST "https://pinot.demo1.chinmayorg.startree.cloud/schemas?override=true" \
-H "accept: application/json" \
-H "Authorization: Basic ${PINOT_TOKEN}" \
-H "Content-Type: application/json" \
-d "@pinot/full-click-schema.json"
awk '{gsub("PULSAR_TOKEN","'${PULSAR_TOKEN}'");gsub("PULSAR_TOPIC","'${PULSAR_TOPIC}'");print}' pinot/full-click-table.json > formatted-table.json
curl -S --fail -X POST "https://pinot.demo1.chinmayorg.startree.cloud/tables" \
-H "accept: application/json" \
-H "Authorization: Basic ${PINOT_TOKEN}" \
-H "Content-Type: application/json" \
-d "@formatted-table.json"
Assuming you did not change the values of KEYSPACE_NAME and TABLE_NAME, you can see the whole example in action by running the following command.
astra db cqlsh ${DB_NAME} --file ./astra/test-data.cql
Remember - upserts are enabled on the Pinot table using the
visitor_id
column. So when you select rows it only show the latest value for each distinct id. To see all the rows in the table use:select * from fullClick limit 10 option(skipUpsert=true)
.