Move Cloud tutorials to Guide (#70)

* Move Cloud tutorials to Guide
crate · May 2, 2024 · 8686293 · 8686293
1 parent c2d5383
commit 8686293
Show file tree

Hide file tree

Showing 7 changed files with 709 additions and 4 deletions.
diff --git a/docs/domain/document/index.md b/docs/domain/document/index.md
@@ -9,8 +9,15 @@ Storing documents in CrateDB provides the same development convenience like the
 document-oriented storage layer of Lotus Notes / Domino, CouchDB, MongoDB, and
 PostgreSQL's `JSON(B)` types.
 
-- [](inv:cloud#object)
+- [](#objects-basics)
 - [Unleashing the Power of Nested Data: Ingesting and Querying JSON Documents with SQL]
 
 
 [Unleashing the Power of Nested Data: Ingesting and Querying JSON Documents with SQL]: https://youtu.be/S_RHmdz2IQM?feature=shared
+
+```{toctree}
+:maxdepth: 1
+:hidden:
+
+objects-hands-on
+```
diff --git a/docs/domain/document/objects-hands-on.md b/docs/domain/document/objects-hands-on.md
@@ -0,0 +1,128 @@
+(objects-basics)=
+
+# Objects: Analyzing Marketing Data
+
+Marketers often need to handle multi-structured data from different platforms.
+CrateDB's dynamic `OBJECT` data type allows us to store and analyze this complex,
+nested data efficiently. In this tutorial, we'll explore how to leverage this
+feature in marketing data analysis, along with the use of generated columns to
+parse and manage URLs.
+
+Consider marketing data that captures details of various campaigns.
+
+:::{code} json
+{
+    "campaign_id": "c123",
+    "source": "Google Ads",
+    "metrics": {
+        "clicks": 500,
+        "impressions": 10000,
+        "conversion_rate": 0.05
+    },
+    "landing_page_url": "https://example.com/products?utm_source=google"
+}
+:::
+
+To begin, let's create the schema for this dataset.
+
+## Creating the Table
+
+CrateDB uses SQL, the most popular query language for database management. To
+store the marketing data, create a table with columns tailored to the
+dataset using the `CREATE TABLE` command:
+
+:::{code} sql
+CREATE TABLE marketing_data (
+    campaign_id TEXT PRIMARY KEY,
+    source TEXT,
+    metrics OBJECT(DYNAMIC) AS (
+        clicks INTEGER,
+        impressions INTEGER,
+        conversion_rate DOUBLE PRECISION
+    ),
+    landing_page_url TEXT,
+    url_parts GENERATED ALWAYS AS parse_url(landing_page_url)
+);
+:::
+
+Let's highlight two features in this table definition:
+
+:metrics: An `OBJECT` column featuring a dynamic structure for
+  performing flexible queries on its nested attributes like
+  clicks, impressions, and conversion rate.
+:url_parts: A generated column to
+  decode an URL from the `landing_page_url` column. This is convenient
+  to query for specific components of the URL later on.
+
+The table is designed to accommodate both fixed and dynamic attributes,
+providing a robust and flexible structure for storing your marketing data.
+
+
+## Inserting Data
+
+Now, insert the data using the `COPY FROM` SQL statement.
+
+:::{code} sql
+COPY marketing_data
+FROM 'https://github.com/crate/cratedb-datasets/raw/main/cloud-tutorials/data_marketing.json.gz'
+WITH (format = 'json', compression='gzip');
+:::
+
+## Analyzing Data
+
+Start with a basic `SELECT` statement on the `metrics` column, and limit the
+output to display only 10 records, in order to quickly explore a few samples
+worth of data.
+
+:::{code} sql
+SELECT metrics
+FROM marketing_data
+LIMIT 10;
+:::
+
+You can see that the `metrics` column returns an object in the form of a JSON.
+If you just want to return a single property of this object, you can adjust the
+query slightly by adding the property to the selection using bracket notation.
+
+:::{code} sql
+SELECT metrics['clicks']
+FROM marketing_data
+LIMIT 10;
+:::
+
+It's helpful to select individual properties from a nested object, but what if
+you also want to filter results based on these properties? For instance, to find
+`campaign_id` and `source` where `conversion_rate` exceeds `0.09`, employ
+the same bracket notation for filtering as well.
+
+:::{code} sql
+SELECT campaign_id, source
+FROM marketing_data
+WHERE metrics['conversion_rate'] > 0.09
+LIMIT 50;
+:::
+
+This allows you to narrow down the query results while still leveraging CrateDB's
+ability to query nested objects effectively.
+
+Finally, let's explore data aggregation based on UTM source parameters. The
+`url_parts` generated column, which is populated using the `parse_url()`
+function, automatically splits the URL into its constituent parts upon data
+insertion.
+
+To analyze the UTM source, you can directly query these parsed parameters. The
+goal is to count the occurrences of each UTM source and sort them in descending
+order. This lets you easily gauge marketing effectiveness for different sources,
+all while taking advantage of CrateDB's powerful generated columns feature.
+
+:::{code} sql
+SELECT
+    url_parts['parameters']['utm_source'] AS utm_source,
+    COUNT(*)
+FROM marketing_data
+GROUP BY 1
+ORDER BY 2 DESC;
+:::
+
+In this tutorial, we explored the versatility and power of CrateDB's dynamic
+`OBJECT` data type for handling complex, nested marketing data.
diff --git a/docs/domain/search/index.md b/docs/domain/search/index.md
@@ -6,7 +6,7 @@ Learn how to set up your database for full-text search, how to create the
 relevant indices, and how to query your text data efficiently. A must-read
 for anyone looking to make sense of large volumes of unstructured text data.
 
-- [](inv:cloud#full-text)
+- [](#search-basics)
 
 
 :::{note}
@@ -15,3 +15,10 @@ data sets. One of its standout features are its full-text search capabilities,
 built on top of the powerful Lucene library. This makes it a great fit for
 organizing, searching, and analyzing extensive datasets.
 :::
+
+```{toctree}
+:maxdepth: 1
+:hidden:
+
+search-hands-on
+```
diff --git a/docs/domain/search/search-hands-on.md b/docs/domain/search/search-hands-on.md
@@ -0,0 +1,111 @@
+(search-basics)=
+
+# Full-Text: Exploring the Netflix Catalog
+
+In this tutorial, we will explore how to manage a dataset of Netflix titles,
+making use of CrateDB Cloud's full-text search capabilities.
+Each entry in our imaginary dataset will have the following attributes:
+
+:show_id: A unique identifier for each show or movie.
+:type: Specifies whether the title is a movie, TV show, or another format.
+:title: The title of the movie or show.
+:director: The name of the director.
+:cast: An array listing the cast members.
+:country: The country where the title was produced.
+:date_added: A timestamp indicating when the title was added to the catalog.
+:release_year: The year the title was released.
+:rating: The content rating (e.g., PG, R, etc.).
+:duration: The duration of the title in minutes or seasons.
+:listed_in: An array containing genres that the title falls under.
+:description: A textual description of the title, indexed using full-text search.
+
+To begin, let's create the schema for this dataset.
+
+
+## Creating the Table
+
+CrateDB uses SQL, the most popular query language for database management. To
+store the data, create a table with columns tailored to the
+dataset using the `CREATE TABLE` command.
+
+Importantly, you will also take advantage
+of CrateDB's full-text search capabilities by setting up a full-text index on
+the description column. This will enable you to perform complex textual queries
+later on.
+
+:::{code} sql
+CREATE TABLE "netflix_catalog" (
+   "show_id" TEXT PRIMARY KEY,
+   "type" TEXT,
+   "title" TEXT,
+   "director" TEXT,
+   "cast" ARRAY(TEXT),
+   "country" TEXT,
+   "date_added" TIMESTAMP,
+   "release_year" TEXT,
+   "rating" TEXT,
+   "duration" TEXT,
+   "listed_in"  ARRAY(TEXT),
+   "description" TEXT INDEX using fulltext
+);
+:::
+
+Run the above SQL command in CrateDB to set up your table. With the table ready, 
+you’re now set to insert the dataset.
+
+## Inserting Data
+
+Now, insert data into the table you just created, by using the `COPY FROM`
+SQL statement.
+
+:::{code} sql
+COPY netflix_catalog
+FROM 'https://github.com/crate/cratedb-datasets/raw/main/cloud-tutorials/data_netflix.json.gz'
+WITH (format = 'json', compression='gzip');
+:::
+
+Run the above SQL command in CrateDB to import the dataset. After this commands 
+finishes, you are now ready to start querying the dataset.
+
+## Using Full-text Search
+
+Start with a basic `SELECT` statement on all columns, and limit the output to
+display only 10 records, in order to quickly explore a few samples worth of data.
+
+:::{code} sql
+SELECT *
+FROM netflix_catalog
+LIMIT 10;
+:::
+
+CrateDB Cloud’s full-text search can be leveraged to find specific entries based
+on text matching. In this query, you are using the `MATCH` function on the
+`description` field to find all movies or TV shows that contain the word "love".
+The results can be sorted by relevance score by using the synthetic `_score` column.
+
+:::{code} sql
+SELECT title, description
+FROM netflix_catalog
+WHERE MATCH(description, 'love')
+ORDER BY _score DESC
+LIMIT 10;
+:::
+
+While full-text search is incredibly powerful, you can still perform more
+traditional types of queries. For example, to find all titles directed by
+"Kirsten Johnson", and sort them by release year, you can use:
+
+:::{code} sql
+SELECT title, release_year
+FROM netflix_catalog
+WHERE director = 'Kirsten Johnson'
+ORDER BY release_year DESC;
+:::
+
+This query uses the conventional `WHERE` clause to find movies directed by
+Kirsten Johnson, and the `ORDER BY` clause to sort them by their release year
+in descending order.
+
+Through these examples, you can see that CrateDB Cloud offers you a wide array
+of querying possibilities, from basic SQL queries to advanced full-text
+searches, making it a versatile choice for managing and querying your datasets.
diff --git a/docs/domain/timeseries/index.md b/docs/domain/timeseries/index.md
@@ -6,15 +6,17 @@ Learn how to optimally use CrateDB for time series use-cases.
 - [](#timeseries-basics)
 - [](#timeseries-normalize)
 - [Financial data collection and processing using pandas]
-- [](inv:cloud#time-series)
-- [](inv:cloud#time-series-advanced)
+- [](#timeseries-analysis)
+- [](#timeseries-objects)
 - [Time-series data: From raw data to fast analysis in only three steps]
 
 :::{toctree}
 :hidden:
 
 generate/index
 normalize-intervals
+timeseries-querying
+timeseries-and-metadata
 :::
 
 [Financial data collection and processing using pandas]: https://community.cratedb.com/t/automating-financial-data-collection-and-storage-in-cratedb-with-python-and-pandas-2-0-0/916