# Advanced SQL

## Executive summary

This notebook will group all related functionalities for advanced SQL in Starburst

## Plan

Business question to answer
- [ ] Test functionalities in Starburst
- [ ] Group related functionalities

## Task 1: General Starburst information

There are Catalogs --> Schemas --> Tables

#### For Starburst: 

        - Catalog --> delta
        - Schema --> one of the data products
        - Structure: 
```sql 
select * from delta.INFORMATION_SCHEMA.COLUMNS limit 10
```

#### For LiveDB

        - Catalog --> not defined
        - Schema --> glovo_live

#### Explore schemas:

```sql
show schemas from delta like '%XXX%' --for Starburst
```

```sql
show schemas --for LiveDB
```

#### Explore tables inside schemas:

```sql
show tables from delta.mfc__pna__odp --for Starburst
```

```sql
show tables from glovo_live --for LiveDB
```

#### To search for an specific column:

```sql
select column_name from delta.INFORMATION_SCHEMA.COLUMNS where table_name='XXX' limit 10 --for Starburst specifying the table
```

```sql select column_name, table_name
from information_schema.columns
where 1=1
    and table_schema = 'glovo_live'
    and column_name like '%XXX%'
limit 10 --for LiveDB
```


## Task 2: Arrays

**Array conditions**

`all_match(array_agg(feedback_selected_option),x -> x='ORDER_STATUS')` --> Check if all elements of the array equal ORDER_STATUS



`any_match(array_agg(feedback_selected_option),x -> x='ORDER_STATUS')` --> Check if any elements of the array equal ORDER_STATUS


`arrays_overlap(a1,a2)` --> returns true/false if an element of one array exists on the other. Null if anyo contains nulls

**Array definition**

`array_agg(feedback_selected_option)` --> This will results in order id --> [FFF1,FFF2,FF3,...]

`array_distinct(array_agg(feedback_selected_option))` --> this will result in a new array containing the distinct elmenents of the passed array

`array_sort(a1)` --> sorts the eleements of the array in asc

`array_remove(a, 'XXX')` --> returns the original array without the elements that match the parameter

**Array elements**

`array_join(a, '_', 'null_element')` --> outputs the value of an array with all the elements concatenated. If used, there is an optional parameter to substitute null values in the array

**Array interaction**

`array_except(a1, a2)` --> ouptuts the elemenents that are in the first array but not in the second

`array_intersect(a1,a2)` --> outputs the elements that are in both arrays

`array_union(a1, a2)` --> returns the union of two arrays with no duplicates


**Array metadata**

`array_position(a, 'xxx')` --> returns the position of the first occurrence of a given value. The first position of an array is the cell 1.

**Array statistics**

`array_histogram(a)` --> outputs a frquency histogram of the array elements in map format. We can later unnest it to work with each element's frequencies

```sql
with a as (select 
    histogram(feedback_selected_option) as feedback_selected_options
from delta.contact_contact_intent_odp.fct_contact_intent
where 1=1
    and p_created_date >= date('2024-01-01')
    and order_id in (100385003913,100384688076)
limit 10)

OUTPUT
{ CONTACT_SUPPORT = 3, CHANGE_ITEMS = 1, ORDER_STATUS = 2 }
```

`array_max(a)` --> outputs the array_max or array_min of the array