-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Move Cloud tutorials to Guide
- Loading branch information
Showing
7 changed files
with
709 additions
and
4 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,128 @@ | ||
(objects-basics)= | ||
|
||
# Objects: Analyzing Marketing Data | ||
|
||
Marketers often need to handle multi-structured data from different platforms. | ||
CrateDB's dynamic `OBJECT` data type allows us to store and analyze this complex, | ||
nested data efficiently. In this tutorial, we'll explore how to leverage this | ||
feature in marketing data analysis, along with the use of generated columns to | ||
parse and manage URLs. | ||
|
||
Consider marketing data that captures details of various campaigns. | ||
|
||
:::{code} json | ||
{ | ||
"campaign_id": "c123", | ||
"source": "Google Ads", | ||
"metrics": { | ||
"clicks": 500, | ||
"impressions": 10000, | ||
"conversion_rate": 0.05 | ||
}, | ||
"landing_page_url": "https://example.com/products?utm_source=google" | ||
} | ||
::: | ||
|
||
To begin, let's create the schema for this dataset. | ||
|
||
## Creating the Table | ||
|
||
CrateDB uses SQL, the most popular query language for database management. To | ||
store the marketing data, create a table with columns tailored to the | ||
dataset using the `CREATE TABLE` command: | ||
|
||
:::{code} sql | ||
CREATE TABLE marketing_data ( | ||
campaign_id TEXT PRIMARY KEY, | ||
source TEXT, | ||
metrics OBJECT(DYNAMIC) AS ( | ||
clicks INTEGER, | ||
impressions INTEGER, | ||
conversion_rate DOUBLE PRECISION | ||
), | ||
landing_page_url TEXT, | ||
url_parts GENERATED ALWAYS AS parse_url(landing_page_url) | ||
); | ||
::: | ||
|
||
Let's highlight two features in this table definition: | ||
|
||
:metrics: An `OBJECT` column featuring a dynamic structure for | ||
performing flexible queries on its nested attributes like | ||
clicks, impressions, and conversion rate. | ||
:url_parts: A generated column to | ||
decode an URL from the `landing_page_url` column. This is convenient | ||
to query for specific components of the URL later on. | ||
|
||
The table is designed to accommodate both fixed and dynamic attributes, | ||
providing a robust and flexible structure for storing your marketing data. | ||
|
||
|
||
## Inserting Data | ||
|
||
Now, insert the data using the `COPY FROM` SQL statement. | ||
|
||
:::{code} sql | ||
COPY marketing_data | ||
FROM 'https://github.com/crate/cratedb-datasets/raw/main/cloud-tutorials/data_marketing.json.gz' | ||
WITH (format = 'json', compression='gzip'); | ||
::: | ||
|
||
## Analyzing Data | ||
|
||
Start with a basic `SELECT` statement on the `metrics` column, and limit the | ||
output to display only 10 records, in order to quickly explore a few samples | ||
worth of data. | ||
|
||
:::{code} sql | ||
SELECT metrics | ||
FROM marketing_data | ||
LIMIT 10; | ||
::: | ||
|
||
You can see that the `metrics` column returns an object in the form of a JSON. | ||
If you just want to return a single property of this object, you can adjust the | ||
query slightly by adding the property to the selection using bracket notation. | ||
|
||
:::{code} sql | ||
SELECT metrics['clicks'] | ||
FROM marketing_data | ||
LIMIT 10; | ||
::: | ||
|
||
It's helpful to select individual properties from a nested object, but what if | ||
you also want to filter results based on these properties? For instance, to find | ||
`campaign_id` and `source` where `conversion_rate` exceeds `0.09`, employ | ||
the same bracket notation for filtering as well. | ||
|
||
:::{code} sql | ||
SELECT campaign_id, source | ||
FROM marketing_data | ||
WHERE metrics['conversion_rate'] > 0.09 | ||
LIMIT 50; | ||
::: | ||
|
||
This allows you to narrow down the query results while still leveraging CrateDB's | ||
ability to query nested objects effectively. | ||
|
||
Finally, let's explore data aggregation based on UTM source parameters. The | ||
`url_parts` generated column, which is populated using the `parse_url()` | ||
function, automatically splits the URL into its constituent parts upon data | ||
insertion. | ||
|
||
To analyze the UTM source, you can directly query these parsed parameters. The | ||
goal is to count the occurrences of each UTM source and sort them in descending | ||
order. This lets you easily gauge marketing effectiveness for different sources, | ||
all while taking advantage of CrateDB's powerful generated columns feature. | ||
|
||
:::{code} sql | ||
SELECT | ||
url_parts['parameters']['utm_source'] AS utm_source, | ||
COUNT(*) | ||
FROM marketing_data | ||
GROUP BY 1 | ||
ORDER BY 2 DESC; | ||
::: | ||
|
||
In this tutorial, we explored the versatility and power of CrateDB's dynamic | ||
`OBJECT` data type for handling complex, nested marketing data. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,111 @@ | ||
(search-basics)= | ||
|
||
# Full-Text: Exploring the Netflix Catalog | ||
|
||
In this tutorial, we will explore how to manage a dataset of Netflix titles, | ||
making use of CrateDB Cloud's full-text search capabilities. | ||
Each entry in our imaginary dataset will have the following attributes: | ||
|
||
:show_id: A unique identifier for each show or movie. | ||
:type: Specifies whether the title is a movie, TV show, or another format. | ||
:title: The title of the movie or show. | ||
:director: The name of the director. | ||
:cast: An array listing the cast members. | ||
:country: The country where the title was produced. | ||
:date_added: A timestamp indicating when the title was added to the catalog. | ||
:release_year: The year the title was released. | ||
:rating: The content rating (e.g., PG, R, etc.). | ||
:duration: The duration of the title in minutes or seasons. | ||
:listed_in: An array containing genres that the title falls under. | ||
:description: A textual description of the title, indexed using full-text search. | ||
|
||
To begin, let's create the schema for this dataset. | ||
|
||
|
||
## Creating the Table | ||
|
||
CrateDB uses SQL, the most popular query language for database management. To | ||
store the data, create a table with columns tailored to the | ||
dataset using the `CREATE TABLE` command. | ||
|
||
Importantly, you will also take advantage | ||
of CrateDB's full-text search capabilities by setting up a full-text index on | ||
the description column. This will enable you to perform complex textual queries | ||
later on. | ||
|
||
:::{code} sql | ||
CREATE TABLE "netflix_catalog" ( | ||
"show_id" TEXT PRIMARY KEY, | ||
"type" TEXT, | ||
"title" TEXT, | ||
"director" TEXT, | ||
"cast" ARRAY(TEXT), | ||
"country" TEXT, | ||
"date_added" TIMESTAMP, | ||
"release_year" TEXT, | ||
"rating" TEXT, | ||
"duration" TEXT, | ||
"listed_in" ARRAY(TEXT), | ||
"description" TEXT INDEX using fulltext | ||
); | ||
::: | ||
|
||
Run the above SQL command in CrateDB to set up your table. With the table ready, | ||
you’re now set to insert the dataset. | ||
|
||
## Inserting Data | ||
|
||
Now, insert data into the table you just created, by using the `COPY FROM` | ||
SQL statement. | ||
|
||
:::{code} sql | ||
COPY netflix_catalog | ||
FROM 'https://github.com/crate/cratedb-datasets/raw/main/cloud-tutorials/data_netflix.json.gz' | ||
WITH (format = 'json', compression='gzip'); | ||
::: | ||
|
||
Run the above SQL command in CrateDB to import the dataset. After this commands | ||
finishes, you are now ready to start querying the dataset. | ||
|
||
## Using Full-text Search | ||
|
||
Start with a basic `SELECT` statement on all columns, and limit the output to | ||
display only 10 records, in order to quickly explore a few samples worth of data. | ||
|
||
:::{code} sql | ||
SELECT * | ||
FROM netflix_catalog | ||
LIMIT 10; | ||
::: | ||
|
||
CrateDB Cloud’s full-text search can be leveraged to find specific entries based | ||
on text matching. In this query, you are using the `MATCH` function on the | ||
`description` field to find all movies or TV shows that contain the word "love". | ||
The results can be sorted by relevance score by using the synthetic `_score` column. | ||
|
||
:::{code} sql | ||
SELECT title, description | ||
FROM netflix_catalog | ||
WHERE MATCH(description, 'love') | ||
ORDER BY _score DESC | ||
LIMIT 10; | ||
::: | ||
|
||
While full-text search is incredibly powerful, you can still perform more | ||
traditional types of queries. For example, to find all titles directed by | ||
"Kirsten Johnson", and sort them by release year, you can use: | ||
|
||
:::{code} sql | ||
SELECT title, release_year | ||
FROM netflix_catalog | ||
WHERE director = 'Kirsten Johnson' | ||
ORDER BY release_year DESC; | ||
::: | ||
|
||
This query uses the conventional `WHERE` clause to find movies directed by | ||
Kirsten Johnson, and the `ORDER BY` clause to sort them by their release year | ||
in descending order. | ||
|
||
Through these examples, you can see that CrateDB Cloud offers you a wide array | ||
of querying possibilities, from basic SQL queries to advanced full-text | ||
searches, making it a versatile choice for managing and querying your datasets. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.