# Data Description and Exploration


## Data Description


Wide World Importers (WWI) is a wholesale importer and distributor operating in the San Francisco Bay Area.

WWI's customers are primarily companies that resell goods to individuals. WWI sells to retail customers across the United States, including specialty stores, supermarkets, computer stores, and some individuals. WWI also sells to other wholesalers through a network of agents who promote products on behalf of WWI.

WWI purchases goods from suppliers. They store the goods in their WWI warehouse and reorder from suppliers as needed to fulfill customer orders. They also purchase large volumes of packaging materials and sell them in smaller quantities for customer convenience.

The WWI database contains many different schemas. For our analysis, we will need the following schemas.

### Sales Schema


Data on product sales to customers.

<img src="../assets/er_sales.png" alt="">

We will need the following tables and fields in this schema:

**sales.orders**

Field | Description
-|-
order_id | Order ID.
customer_id | ID of the customer who placed the order.
order_date | Date the order was created.
expected_delivery_date | Expected delivery date of the order.
picking_completed_when | Time when order picking was completed.

**sales.order_lines**

Field | Description
-|-
order_line_id | Order line ID.
order_id | Order ID to which this line belongs.
stock_item_id | ID of the stock item (from the warehouse.stock_items table) specified in the order line.
package_type_id | ID of the package type (from the warehouse.package_types table) used for the item.
quantity | Quantity of the item to be supplied.
unit_price | Price per unit of the item.
tax_rate | Tax rate applied to the item.
picked_quantity | Quantity of the item that was picked from the warehouse.
picking_completed_when | Time when picking for this order line was completed.

**sales.customer_categories**

Field | Description
-|-
customer_category_id | Customer category ID.
customer_category_name | Full name of the category to which customers can be assigned.

**sales.customers**

Field | Description
-|-
customer_id | Customer ID.
customer_name | Full name of the customer (usually the trade name).
customer_category_id | ID of the customer's category.
delivery_method_id | ID of the standard delivery method for goods shipped to this customer.
delivery_city_id | ID of the delivery city for this address.

**sales.invoices**

Field | Description
-|-
invoice_id | Invoice ID.
customer_id | ID of the customer to whom the invoice is issued.
order_id | ID of the order associated with this invoice.
delivery_method_id | ID of the delivery method for the goods listed in the invoice.
invoice_date | Date when the invoice was issued.
confirmed_delivery_time | Confirmed delivery time.

**sales.invoice_lines**

Field | Description
-|-
invoice_line_id | Invoice line ID.
invoice_id | ID of the invoice to which this line belongs.
stock_item_id | ID of the stock item (from the warehouse.stock_items table) specified in the invoice line.
package_type_id | ID of the package type (from the warehouse.package_types table) used for the item.
quantity | Quantity of the item specified in the invoice line.
unit_price | Price per unit of the item.
tax_rate | Tax rate applied to the item.
tax_amount | Tax amount calculated for the invoice line.
line_profit | Profit earned from this invoice line, based on the current cost price.
extended_price | Total cost of the invoice line ($\text{quantity} * \text{unit\_price} + \text{tax\_amount}$).

**sales.customer_transactions**

Field | Description
-|-
customer_transaction_id | Transaction ID.
customer_id | ID of the customer associated with this transaction.
transaction_type_id | ID of the transaction type (e.g., invoice, payment, credit note).
invoice_id | ID of the invoice associated with this transaction (if applicable).
payment_method_id | ID of the payment method (e.g., cash, bank transfer).
transaction_date | Transaction date.
amount_excluding_tax | Transaction amount excluding tax.
tax_amount | Tax amount calculated for the transaction.
transaction_amount | Total transaction amount (including tax).
outstanding_balance | Amount still unpaid for this transaction. Indicates the outstanding debt for the transaction.
finalization_date | Date when the transaction was finalized (if finalized).


### Application Schema

Reference data and system settings.

<img src="../assets/er_application.png" alt="">

We will need the following tables and fields in this schema:

**application.countries**

Field | Description
-|-
country_id | Country ID
country_name | Country name

**application.state_provinces**

Field | Description
-|-
state_province_id | State or province ID
state_province_name | Official name of the state or province
country_id | Country for this state or province

**application.cities**

Field | Description
-|-
city_id | City ID
city_name | Official name of the city
state_province_id | State or province for this city

**application.delivery_methods**

Field | Description
-|-
delivery_method_id | Delivery method ID
delivery_method_name | Delivery method name

**application.payment_methods**

Field | Description
-|-
payment_method_id | Payment method ID
payment_method_name | Payment method name

**application.transaction_types**

Field | Description
-|-
transaction_type_id | Transaction type ID in the database
transaction_type_name | Full name of the transaction type

### Warehouse Schema

Data on inventory and warehouse operations.

<img src="../assets/er_warehouse.png" alt="">

We will need the following tables and fields in this schema:

**warehouse.stock_items**

Field | Description
-|-
stock_item_id | Stock item ID
stock_item_name | Full name of the stock item
color_id | Color ID of the item
size | Item size

**warehouse.stock_item_stock_groups**

Field | Description
-|-
stock_item_stock_group_id | Record ID in the table (this is a junction table)
stock_item_id | Stock item ID
stock_group_id | Stock group ID

**warehouse.stock_groups**

Field | Description
-|-
stock_group_id | Stock group ID
stock_group_name | Stock group name

**warehouse.package_types**

Field | Description
-|-
package_type_id | Package type ID
package_type_name | Full name of the package type

**warehouse.colors**

Field | Description
-|-
color_id | Color ID
color_name | Color name

## Creating Functions

Let's switch to the source schema.

In [None]:
con('src')

Connected to srс


Let's create a function that will output information about a column.

In [None]:
%%sql
CREATE OR REPLACE FUNCTION get_column_summary(
    table_name TEXT,
    column_name TEXT,
    only_summary BOOLEAN DEFAULT FALSE
)
RETURNS TABLE(
    "Summary Type" TEXT,
    "Summary Count" TEXT,
    "-" TEXT,
    "Stats Type" TEXT,
    "Stats Value" TEXT,
    "--" TEXT,
    "Top Values" TEXT
) AS $$
DECLARE
    sql_query TEXT;
    schema_name TEXT;
    table_only_name TEXT;    
    column_type TEXT;
    is_numeric BOOLEAN;    
BEGIN
    IF strpos(table_name, '.') > 0 THEN
        schema_name := split_part(table_name, '.', 1);
        table_only_name := split_part(table_name, '.', 2);
    ELSE
        schema_name := 'public';
        table_only_name := table_name;
    END IF;
    -- We get the type of column data
    EXECUTE format('SELECT data_type FROM information_schema.columns 
                   WHERE table_schema = %L AND table_name = %L AND column_name = %L', 
                   schema_name, table_only_name, column_name)
    INTO column_type;    
    -- Check if the type is numerical
    is_numeric := column_type IN ('smallint', 'integer', 'bigint', 'decimal', 'numeric', 'real', 'double precision');    

    sql_query := 
        'WITH 
        column_summary AS (
            SELECT 
                *
                , row_number() OVER () AS dummy_id
            FROM (        
                SELECT 
                    ''Total Count'' AS summary_1, COUNT(*) AS summary_2
                FROM ' || quote_ident(schema_name) || '.' || quote_ident(table_only_name) || '
                UNION ALL
                SELECT
                    ''Unique Count'', COUNT(DISTINCT ' || quote_ident(column_name) || ')
                FROM ' || quote_ident(schema_name) || '.' || quote_ident(table_only_name) || '
                UNION ALL
                SELECT
                    ''Missing'', COUNT(*) FILTER (WHERE ' || quote_ident(column_name) || ' IS NULL)
                FROM ' || quote_ident(schema_name) || '.' || quote_ident(table_only_name) || '
                UNION ALL
                SELECT
                    ''Duplicated'', COUNT(' || quote_ident(column_name) || ') - COUNT(DISTINCT ' || quote_ident(column_name) || ')
                FROM ' || quote_ident(schema_name) || '.' || quote_ident(table_only_name) || '
                ';
    IF is_numeric THEN
        sql_query := sql_query || '                
                UNION ALL
                SELECT
                    ''Zero'', COUNT(' || quote_ident(column_name) || ') FILTER (WHERE ' || quote_ident(column_name) || ' = 0)
                FROM ' || quote_ident(schema_name) || '.' || quote_ident(table_only_name) || '
                UNION ALL
                SELECT
                    ''Negative'', COUNT(' || quote_ident(column_name) || ') FILTER (WHERE ' || quote_ident(column_name) || ' < 0)
                FROM ' || quote_ident(schema_name) || '.' || quote_ident(table_only_name) || '
                ';
    ELSE
        sql_query := sql_query || '
                UNION ALL
                SELECT
                    ''Zero'', NULL 
                UNION ALL
                SELECT
                    ''Negative'', NULL
                ';
    END IF;      
    sql_query := sql_query || '              
            )
        )';

IF NOT only_summary THEN
    IF is_numeric THEN
        sql_query := sql_query || '
            , column_stats AS (
                SELECT 
                    *
                    , row_number() OVER () AS dummy_id
                FROM (           
                    SELECT 
                        ''Max'' AS stats_1, MAX(' || quote_ident(column_name) || ') AS stats_2
                    FROM ' || quote_ident(schema_name) || '.' || quote_ident(table_only_name) || '    
                    UNION ALL
                    SELECT
                        ''75%'', PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY ' || quote_ident(column_name) || ')
                    FROM ' || quote_ident(schema_name) || '.' || quote_ident(table_only_name) || '
                    UNION ALL    
                    SELECT
                        ''Mean'', ROUND(AVG(' || quote_ident(column_name) || '), 2)
                    FROM ' || quote_ident(schema_name) || '.' || quote_ident(table_only_name) || '
                    UNION ALL
                    SELECT
                        ''Median'', PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY ' || quote_ident(column_name) || ')
                    FROM ' || quote_ident(schema_name) || '.' || quote_ident(table_only_name) || '
                    UNION ALL
                    SELECT
                        ''25%'', PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY ' || quote_ident(column_name) || ')
                    FROM ' || quote_ident(schema_name) || '.' || quote_ident(table_only_name) || '
                    UNION ALL
                    SELECT
                        ''Min'', MIN(' || quote_ident(column_name) || ')
                    FROM ' || quote_ident(schema_name) || '.' || quote_ident(table_only_name) || '   
                ) AS t
            )';
    ELSE
        sql_query := sql_query || '
            , column_stats AS (
                SELECT 
                    *
                    , row_number() OVER () AS dummy_id
                FROM (           
                    SELECT 
                        ''Max'' AS stats_1, MAX(' || quote_ident(column_name) || ') AS stats_2
                    FROM ' || quote_ident(schema_name) || '.' || quote_ident(table_only_name) || '    
                    UNION ALL
                    SELECT
                        ''75%'', NULL AS stats_2
                    UNION ALL    
                    SELECT
                        ''Mean'', NULL AS stats_2
                    UNION ALL
                    SELECT
                        ''Median'', NULL AS stats_2
                    UNION ALL
                    SELECT
                        ''25%'', NULL AS stats_2
                    UNION ALL
                    SELECT
                        ''Min'', MIN(' || quote_ident(column_name) || ')
                    FROM ' || quote_ident(schema_name) || '.' || quote_ident(table_only_name) || '   
                ) AS t
            )';
    END IF;
    
    sql_query := sql_query || '
            , top_values AS (
                SELECT 
                    *
                    , row_number() OVER () AS dummy_id
                FROM (   
                    SELECT 
                        ' || quote_ident(column_name) || '::TEXT || '' ('' || COUNT(*)::TEXT || '')'' AS top_count
                    FROM ' || quote_ident(schema_name) || '.' || quote_ident(table_only_name) || '
                    WHERE ' || quote_ident(column_name) || ' IS NOT NULL
                    GROUP BY ' || quote_ident(column_name) || '
                    ORDER BY COUNT(*) DESC, ' || quote_ident(column_name) || '::TEXT
                    LIMIT 6
                ) AS t
            )';
END IF;
    if only_summary then
        sql_query := sql_query || '
                SELECT
                    summary_1::TEXT AS "Type",
                    summary_2::TEXT AS "Count",
                    '' '' AS " ",
                    '' '' AS " ",
                    '' '' AS " ",
                    '' '' AS " ",
                    '' '' AS " "
                FROM
                    column_summary;'; 
    else
        sql_query := sql_query || '
                SELECT
                    summary_1::TEXT AS "Type",
                    summary_2::TEXT AS "Count",
                    '' '' AS " ",
                    stats_1::TEXT AS "Type",
                    stats_2::TEXT AS "Value",
                    '' '' AS " ",
                    top_count::TEXT AS "Top Values"
                FROM
                    column_summary
                    LEFT JOIN column_stats USING(dummy_id)
                    LEFT JOIN top_values USING(dummy_id);';
    end if;
    RETURN QUERY EXECUTE sql_query;
END;
$$ LANGUAGE plpgsql;

Let's create a function that will show information about the relationship between tables.

In [None]:
%%sql
CREATE OR REPLACE FUNCTION analyze_relationship(
    left_table_name TEXT,
    right_table_name TEXT,
    left_key_name TEXT,
    right_key_name TEXT
)
RETURNS TABLE (
    relationship_type TEXT,
    left_only_keys BIGINT,
    right_only_keys BIGINT,
    left_size BIGINT,
    right_size BIGINT,
    common_keys BIGINT
) AS $$
DECLARE
    left_count BIGINT;
    right_count BIGINT;
    left_distinct_count BIGINT;
    right_distinct_count BIGINT;
    common_count BIGINT;
    left_only_count BIGINT;
    right_only_count BIGINT;
    max_right_per_left_val BIGINT;
    max_left_per_right_val BIGINT;
    rel_type TEXT;
BEGIN
    -- Basic counts
    EXECUTE format('SELECT COUNT(DISTINCT %I), COUNT(*) FROM %s', 
                  left_key_name, left_table_name)
    INTO left_distinct_count, left_count;
    
    EXECUTE format('SELECT COUNT(DISTINCT %I), COUNT(*) FROM %s', 
                  right_key_name, right_table_name)
    INTO right_distinct_count, right_count;
    
    -- Common keys 
    EXECUTE format('
        SELECT COUNT(DISTINCT l.%I) 
        FROM %s l
        WHERE EXISTS (SELECT 1 FROM %s r WHERE r.%I = l.%I)',
        left_key_name, left_table_name, right_table_name, right_key_name, left_key_name)
    INTO common_count;
    
    -- Keys only in left/right
    left_only_count := left_distinct_count - common_count;
    right_only_count := right_distinct_count - common_count;
    
    -- Max right per left key
    EXECUTE format('
        SELECT COALESCE(MAX(cnt), 0) FROM (
            SELECT COUNT(r.%I) as cnt
            FROM %s r
            WHERE r.%I IN (SELECT %I FROM %s)
            GROUP BY r.%I
        ) t',
        right_key_name, right_table_name, right_key_name, left_key_name, left_table_name, right_key_name)
    INTO max_right_per_left_val;
    
    -- Max left per right key 
    EXECUTE format('
        SELECT COALESCE(MAX(cnt), 0) FROM (
            SELECT COUNT(l.%I) as cnt
            FROM %s l
            WHERE l.%I IN (SELECT %I FROM %s)
            GROUP BY l.%I
        ) t',
        left_key_name, left_table_name, left_key_name, right_key_name, right_table_name, left_key_name)
    INTO max_left_per_right_val;
    
    -- Determine relationship type
    IF max_right_per_left_val <= 1 AND max_left_per_right_val <= 1 THEN
        rel_type := '1:1';
    ELSIF max_left_per_right_val > 1 AND max_right_per_left_val <= 1 THEN
        rel_type := 'N:1';
    ELSIF max_left_per_right_val <= 1 AND max_right_per_left_val > 1 THEN
        rel_type := '1:N';
    ELSIF common_count > 0 THEN
        rel_type := 'N:M';
    ELSE
        rel_type := 'no_relation';
    END IF;
    
    RETURN QUERY SELECT 
        rel_type,
        left_only_count,
        right_only_count,
        left_count,
        right_count,
        common_count;
END;
$$ LANGUAGE plpgsql;

## Data Exploration


Before developing the dashboard, let's explore the necessary tables and fields, as well as the relationships between them.

### Variable Exploration


#### Sales Schema

##### Table sales.orders

Let's look at the rows.

In [None]:
%%sql
SELECT
    *
FROM
    sales.orders
LIMIT 5

Unnamed: 0,order_id,customer_id,salesperson_person_id,picked_by_person_id,contact_person_id,backorder_order_id,order_date,expected_delivery_date,customer_purchase_order_number,is_undersupply_backordered,comments,delivery_instructions,internal_comments,picking_completed_when,last_edited_by,last_edited_when
0,2,803,8,,3003,46.0,2013-01-01,2013-01-02,15342,True,,,,2013-01-01 12:00:00,7,2013-01-01 12:00:00
1,3,105,7,,1209,47.0,2013-01-01,2013-01-02,12211,True,,,,2013-01-01 12:00:00,7,2013-01-01 12:00:00
2,4,57,16,3.0,1113,,2013-01-01,2013-01-02,17129,True,,,,2013-01-01 11:00:00,3,2013-01-01 11:00:00
3,5,905,3,,3105,48.0,2013-01-01,2013-01-02,10369,True,,,,2013-01-01 12:00:00,7,2013-01-01 12:00:00
4,6,976,13,3.0,3176,,2013-01-01,2013-01-02,13383,True,,,,2013-01-01 11:00:00,3,2013-01-01 11:00:00


Let's examine each column we will use for creating the dashboard individually.

**order_id**

In [None]:
%%sql
SELECT * FROM get_column_summary('sales.orders', 'order_id', True);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Missing,0,,,,,
1,Unique Count,73595,,,,,
2,Zero,0,,,,,
3,Total Count,73595,,,,,
4,Duplicated,0,,,,,
5,Negative,0,,,,,


**customer_id**

In [None]:
%%sql
SELECT * FROM get_column_summary('sales.orders', 'customer_id', False);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Missing,0,,Max,1061.0,,90 (150)
1,Unique Count,663,,75%,877.0,,831 (147)
2,Zero,0,,Mean,528.79,,968 (146)
3,Duplicated,72932,,Median,518.0,,405 (145)
4,Negative,0,,25%,160.0,,804 (145)
5,Total Count,73595,,Min,1.0,,143 (144)


**order_date**

In [None]:
%%sql
SELECT * FROM get_column_summary('sales.orders', 'order_date', False);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Negative,,,25%,,,2016-01-06 (133)
1,Duplicated,72526.0,,Median,,,2015-10-19 (127)
2,Total Count,73595.0,,Mean,,,2015-07-06 (126)
3,Zero,,,75%,,,2015-02-03 (125)
4,Unique Count,1069.0,,Min,2013-01-01,,2016-04-28 (123)
5,Missing,0.0,,Max,2016-05-31,,2015-02-23 (122)


**expected_delivery_date**

In [None]:
%%sql
SELECT * FROM get_column_summary('sales.orders', 'expected_delivery_date', False);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Negative,,,25%,,,2015-03-02 (173)
1,Unique Count,891.0,,Median,,,2015-04-06 (167)
2,Total Count,73595.0,,Mean,,,2015-03-30 (164)
3,Zero,,,75%,,,2015-09-14 (164)
4,Duplicated,72704.0,,Min,2013-01-02,,2014-04-28 (163)
5,Missing,0.0,,Max,2016-06-01,,2016-04-11 (163)


**picking_completed_when**

In [None]:
%%sql
SELECT * FROM get_column_summary('sales.orders', 'picking_completed_when', False);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Negative,,,25%,,,2016-02-26 11:00:00 (107)
1,Duplicated,68386.0,,Median,,,2016-04-18 11:00:00 (107)
2,Total Count,73595.0,,Mean,,,2016-01-07 11:00:00 (106)
3,Zero,,,75%,,,2016-05-04 11:00:00 (106)
4,Unique Count,2124.0,,Min,2013-01-01 11:00:00,,2016-02-24 11:00:00 (104)
5,Missing,3085.0,,Max,2016-05-31 12:00:00,,2015-01-21 11:00:00 (103)


Let's look at rows with missing values in picking_completed_when.

In [None]:
%%sql
SELECT
    *
FROM
    sales.orders
WHERE
    picking_completed_when is NULL
LIMIT 5

Unnamed: 0,order_id,customer_id,salesperson_person_id,picked_by_person_id,contact_person_id,backorder_order_id,order_date,expected_delivery_date,customer_purchase_order_number,is_undersupply_backordered,comments,delivery_instructions,internal_comments,picking_completed_when,last_edited_by,last_edited_when
0,694,430,13,,2059,,2013-01-12,2013-01-14,18641,True,,,,,5,2013-01-12 12:00:00
1,858,197,13,,1393,,2013-01-15,2013-01-16,17999,True,,,,,9,2013-01-15 12:00:00
2,863,538,2,,2275,,2013-01-15,2013-01-16,13574,True,,,,,9,2013-01-15 12:00:00
3,865,926,2,,3126,,2013-01-15,2013-01-16,18066,True,,,,,9,2013-01-15 12:00:00
4,1065,70,20,,1139,,2013-01-19,2013-01-21,12157,True,,,,,19,2013-01-19 12:00:00


**Key Observations:**  

- There are missing values in the picking_completed_when column. These same rows also have missing values in picked_by_person_id. Most likely, the order was not picked. Possibly the item was out of stock.
- No critical anomalies were found.
- The sales.orders table contains order data from 2013-01-01 to 2016-05-31.

##### Table sales.order_lines

Let's look at the rows.

In [None]:
%%sql
SELECT
    *
FROM
    sales.order_lines
LIMIT 5

Unnamed: 0,order_line_id,order_id,stock_item_id,description,package_type_id,quantity,unit_price,tax_rate,picked_quantity,picking_completed_when,last_edited_by,last_edited_when
0,1,45,164,32 mm Double sided bubble wrap 50m,7,50,112.0,15.0,50,2013-01-02 11:00:00,4,2013-01-02 11:00:00
1,2,1,67,Ride on toy sedan car (Black) 1/12 scale,7,10,230.0,15.0,10,2013-01-01 11:00:00,3,2013-01-01 11:00:00
2,3,2,50,Developer joke mug - old C developers never di...,7,9,13.0,15.0,9,2013-01-01 11:00:00,3,2013-01-01 11:00:00
3,4,46,89,"""The Gu"" red shirt XML tag t-shirt (Black) 3XS",7,72,18.0,15.0,72,2013-01-02 11:00:00,4,2013-01-02 11:00:00
4,5,46,171,32 mm Anti static bubble wrap (Blue) 10m,7,90,32.0,15.0,90,2013-01-02 11:00:00,4,2013-01-02 11:00:00


Let's examine each column we will use for creating the dashboard individually.

**order_line_id**

In [None]:
%%sql
SELECT * FROM get_column_summary('sales.order_lines', 'order_line_id', True);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Missing,0,,,,,
1,Unique Count,231412,,,,,
2,Duplicated,0,,,,,
3,Total Count,231412,,,,,
4,Zero,0,,,,,
5,Negative,0,,,,,


**order_id**

In [None]:
%%sql
SELECT * FROM get_column_summary('sales.order_lines', 'order_id', True);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Total Count,231412,,,,,
1,Missing,0,,,,,
2,Negative,0,,,,,
3,Zero,0,,,,,
4,Duplicated,157817,,,,,
5,Unique Count,73595,,,,,


**stock_item_id**

In [None]:
%%sql
SELECT * FROM get_column_summary('sales.order_lines', 'stock_item_id', True);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Total Count,231412,,,,,
1,Missing,0,,,,,
2,Negative,0,,,,,
3,Zero,0,,,,,
4,Duplicated,231185,,,,,
5,Unique Count,227,,,,,


**description**

In [None]:
%%sql
SELECT * FROM get_column_summary('sales.order_lines', 'description', True);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Negative,,,,,,
1,Zero,,,,,,
2,Total Count,231412.0,,,,,
3,Missing,0.0,,,,,
4,Duplicated,231185.0,,,,,
5,Unique Count,227.0,,,,,


**package_type_id**

In [None]:
%%sql
SELECT * FROM get_column_summary('sales.order_lines', 'package_type_id', True);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Total Count,231412,,,,,
1,Missing,0,,,,,
2,Negative,0,,,,,
3,Duplicated,231408,,,,,
4,Unique Count,4,,,,,
5,Zero,0,,,,,


**quantity**

In [None]:
%%sql
SELECT * FROM get_column_summary('sales.order_lines', 'quantity', False);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Total Count,231412,,Max,360.0,,10 (15799)
1,Unique Count,61,,75%,60.0,,5 (12876)
2,Zero,0,,Mean,40.24,,1 (12716)
3,Missing,0,,Median,10.0,,8 (12701)
4,Duplicated,231351,,25%,5.0,,7 (12681)
5,Negative,0,,Min,1.0,,2 (12654)


**unit_price**

In [None]:
%%sql
SELECT * FROM get_column_summary('sales.order_lines', 'unit_price', False);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Total Count,231412,,Max,1899.0,,13.00 (44577)
1,Unique Count,62,,75%,32.0,,18.00 (36536)
2,Zero,0,,Mean,45.21,,32.00 (35575)
3,Missing,0,,Median,18.0,,25.00 (13553)
4,Duplicated,231350,,25%,13.0,,30.00 (7307)
5,Negative,0,,Min,0.66,,4.10 (7290)


**tax_rate**

In [None]:
%%sql
SELECT * FROM get_column_summary('sales.order_lines', 'tax_rate', False);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Total Count,231412,,Max,15.0,,15.000 (230376)
1,Duplicated,231410,,75%,15.0,,10.000 (1036)
2,Zero,0,,Mean,14.98,,
3,Missing,0,,Median,15.0,,
4,Unique Count,2,,25%,15.0,,
5,Negative,0,,Min,10.0,,


**picked_quantity**

In [None]:
%%sql
SELECT * FROM get_column_summary('sales.order_lines', 'picked_quantity', False);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Total Count,231412,,Max,360.0,,10 (15799)
1,Unique Count,62,,75%,60.0,,5 (12876)
2,Zero,3147,,Mean,38.68,,1 (12716)
3,Missing,0,,Median,9.0,,8 (12701)
4,Duplicated,231350,,25%,5.0,,7 (12681)
5,Negative,0,,Min,0.0,,2 (12654)


**picking_completed_when**

In [None]:
%%sql
SELECT * FROM get_column_summary('sales.order_lines', 'picking_completed_when', False);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Negative,,,25%,,,2016-05-04 11:00:00 (395)
1,Zero,,,Median,,,2015-01-21 11:00:00 (390)
2,Total Count,231412.0,,Mean,,,2015-11-24 11:00:00 (388)
3,Duplicated,227196.0,,75%,,,2015-06-26 11:00:00 (386)
4,Unique Count,1069.0,,Min,2013-01-01 11:00:00,,2016-03-23 11:00:00 (386)
5,Missing,3147.0,,Max,2016-05-31 11:00:00,,2015-10-19 11:00:00 (385)


**Key Observations:**  

- There are missing values in the picking_completed_when column.
- No critical anomalies were found.
- The date range matches the orders table.

##### Table sales.customer_categories

Let's look at the rows.

In [None]:
%%sql
SELECT
    *
FROM
    sales.customer_categories
LIMIT 5

Unnamed: 0,customer_category_id,customer_category_name,last_edited_by
0,1,Agent,1
1,2,Wholesaler,1
2,3,Novelty Shop,1
3,4,Supermarket,1
4,5,Computer Store,1


Let's examine each column we will use for creating the dashboard individually.

**customer_category_id**

In [None]:
%%sql
SELECT * FROM get_column_summary('sales.customer_categories', 'customer_category_id', True);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Total Count,8,,,,,
1,Unique Count,8,,,,,
2,Missing,0,,,,,
3,Duplicated,0,,,,,
4,Zero,0,,,,,
5,Negative,0,,,,,


**customer_category_name**

In [None]:
%%sql
SELECT * FROM get_column_summary('sales.customer_categories', 'customer_category_name', True);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Total Count,8.0,,,,,
1,Unique Count,8.0,,,,,
2,Missing,0.0,,,,,
3,Duplicated,0.0,,,,,
4,Zero,,,,,,
5,Negative,,,,,,


**Key Observations:**  

- There are no missing values in the columns we need.
- No critical anomalies were found.

##### Table sales.customers

Let's look at the rows.

In [None]:
%%sql
SELECT
    *
FROM
    sales.customers
LIMIT 5

Unnamed: 0,customer_id,customer_name,bill_to_customer_id,customer_category_id,buying_group_id,primary_contact_person_id,alternate_contact_person_id,delivery_method_id,delivery_city_id,postal_city_id,...,delivery_run,run_position,website_url,delivery_address_line_1,delivery_address_line_2,delivery_postal_code,postal_address_line_1,postal_address_line_2,postal_postal_code,last_edited_by
0,1,Tailspin Toys (Head Office),1,3,1,1001,1002,3,19586,19586,...,,,http://www.tailspintoys.com,Shop 38,1877 Mittal Road,90410,PO Box 8975,Ribeiroville,90410,1
1,2,"Tailspin Toys (Sylvanite, MT)",1,3,1,1003,1004,3,33475,33475,...,,,http://www.tailspintoys.com/Sylvanite,Shop 245,705 Dita Lane,90216,PO Box 259,Jogiville,90216,1
2,3,"Tailspin Toys (Peeples Valley, AZ)",1,3,1,1005,1006,3,26483,26483,...,,,http://www.tailspintoys.com/PeeplesValley,Unit 217,1970 Khandke Road,90205,PO Box 3648,Lucescuville,90205,1
3,4,"Tailspin Toys (Medicine Lodge, KS)",1,3,1,1007,1008,3,21692,21692,...,,,http://www.tailspintoys.com/MedicineLodge,Suite 164,967 Riutta Boulevard,90152,PO Box 5065,Maciasville,90152,1
4,5,"Tailspin Toys (Gasport, NY)",1,3,1,1009,1010,3,12748,12748,...,,,http://www.tailspintoys.com/Gasport,Unit 176,1674 Skujins Boulevard,90261,PO Box 6294,Kellnerovaville,90261,1


Let's examine each column we will use for creating the dashboard individually.

**customer_id**

In [None]:
%%sql
SELECT * FROM get_column_summary('sales.customers', 'customer_id', True);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Total Count,663,,,,,
1,Unique Count,663,,,,,
2,Missing,0,,,,,
3,Duplicated,0,,,,,
4,Zero,0,,,,,
5,Negative,0,,,,,


**customer_name**

In [None]:
%%sql
SELECT * FROM get_column_summary('sales.customers', 'customer_name', True);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Total Count,663.0,,,,,
1,Unique Count,663.0,,,,,
2,Missing,0.0,,,,,
3,Duplicated,0.0,,,,,
4,Zero,,,,,,
5,Negative,,,,,,


**customer_category_id**

In [None]:
%%sql
SELECT * FROM get_column_summary('sales.customers', 'customer_category_id', True);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Total Count,663,,,,,
1,Unique Count,5,,,,,
2,Missing,0,,,,,
3,Duplicated,658,,,,,
4,Zero,0,,,,,
5,Negative,0,,,,,


**buying_group_id**

In [None]:
%%sql
SELECT * FROM get_column_summary('sales.customers', 'buying_group_id', True);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Total Count,663,,,,,
1,Unique Count,2,,,,,
2,Missing,261,,,,,
3,Duplicated,400,,,,,
4,Zero,0,,,,,
5,Negative,0,,,,,


Let's look at rows with missing values in buying_group_id.

In [None]:
%%sql
SELECT
    *
FROM
    sales.customers
WHERE
    buying_group_id is NULL
LIMIT 5

Unnamed: 0,customer_id,customer_name,bill_to_customer_id,customer_category_id,buying_group_id,primary_contact_person_id,alternate_contact_person_id,delivery_method_id,delivery_city_id,postal_city_id,...,delivery_run,run_position,website_url,delivery_address_line_1,delivery_address_line_2,delivery_postal_code,postal_address_line_1,postal_address_line_2,postal_postal_code,last_edited_by
0,801,Eric Torres,801,7,,3001,,3,31321,31321,...,,,http://www.microsoft.com/EricTorres/,Unit 26,1772 Allu Street,90218,PO Box 4858,Sandhuville,90218,1
1,802,Cosmina Vlad,802,7,,3002,,3,5192,5192,...,,,http://www.microsoft.com/CosminaVlad/,Suite 9,908 Nadar Lane,90602,PO Box 1954,Gonzalesville,90602,15
2,803,Bala Dixit,803,3,,3003,,3,33799,33799,...,,,http://www.microsoft.com/BalaDixit/,Unit 7,844 Magnusson Lane,90676,PO Box 8565,Blahoville,90676,1
3,804,Aleksandrs Riekstins,804,5,,3004,,3,18069,18069,...,,,http://www.microsoft.com/AleksandrsRiekstins/,Shop 20,498 Bagheri Lane,90797,PO Box 6490,Linnaville,90797,1
4,805,Ratan Poddar,805,3,,3005,,3,10194,10194,...,,,http://www.microsoft.com/RatanPoddar/,Shop 16,1071 Goransson Crescent,90457,PO Box 6237,Shakibaville,90457,1


**delivery_method_id**

In [None]:
%%sql
SELECT * FROM get_column_summary('sales.customers', 'delivery_method_id', True);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Total Count,663,,,,,
1,Unique Count,1,,,,,
2,Missing,0,,,,,
3,Duplicated,662,,,,,
4,Zero,0,,,,,
5,Negative,0,,,,,


**delivery_city_id**

In [None]:
%%sql
SELECT * FROM get_column_summary('sales.customers', 'delivery_city_id', False);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Total Count,663,,Max,38184.0,,16702 (2)
1,Unique Count,655,,75%,28468.0,,242 (2)
2,Missing,0,,Mean,19033.07,,26010 (2)
3,Duplicated,8,,Median,19232.0,,29320 (2)
4,Zero,0,,25%,9369.5,,31685 (2)
5,Negative,0,,Min,15.0,,33832 (2)


**Key Observations:**  

- There are missing values in the buying_group_id column. Apparently, these are customers without a group.
- No critical anomalies were found.

##### Table sales.invoices

Let's look at the rows.

In [None]:
%%sql
SELECT
    *
FROM
    sales.invoices
LIMIT 5

Unnamed: 0,invoice_id,customer_id,bill_to_customer_id,order_id,delivery_method_id,contact_person_id,accounts_person_id,salesperson_person_id,packed_by_person_id,invoice_date,...,internal_comments,total_dry_items,total_chiller_items,delivery_run,run_position,returned_delivery_data,confirmed_delivery_time,confirmed_received_by,last_edited_by,last_edited_when
0,1,832,832,1,3,3032,3032,2,14,2013-01-01,...,,1,0,,,"{""Events"": [{ ""Event"":""Ready for collection"",""...",2013-01-02 07:05:00,Aakriti Byrraju,15,2013-01-02 07:00:00
1,2,803,803,2,3,3003,3003,8,14,2013-01-01,...,,2,0,,,"{""Events"": [{ ""Event"":""Ready for collection"",""...",2013-01-02 07:10:00,Bala Dixit,15,2013-01-02 07:00:00
2,3,105,1,3,3,1209,1001,7,14,2013-01-01,...,,1,0,,,"{""Events"": [{ ""Event"":""Ready for collection"",""...",2013-01-02 07:15:00,Sung-Hwan Hwang,15,2013-01-02 07:00:00
3,4,57,1,4,3,1113,1001,16,14,2013-01-01,...,,3,0,,,"{""Events"": [{ ""Event"":""Ready for collection"",""...",2013-01-02 07:20:00,Aile Mae,15,2013-01-02 07:00:00
4,5,905,905,5,3,3105,3105,3,14,2013-01-01,...,,3,0,,,"{""Events"": [{ ""Event"":""Ready for collection"",""...",2013-01-02 07:25:00,Sara Huiting,15,2013-01-02 07:00:00


Let's examine each column we will use for creating the dashboard individually.

**invoice_id**

In [None]:
%%sql
SELECT * FROM get_column_summary('sales.invoices', 'invoice_id', True);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Missing,0,,,,,
1,Unique Count,70510,,,,,
2,Zero,0,,,,,
3,Negative,0,,,,,
4,Duplicated,0,,,,,
5,Total Count,70510,,,,,


**customer_id**

In [None]:
%%sql
SELECT * FROM get_column_summary('sales.invoices', 'customer_id', True);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Total Count,70510,,,,,
1,Missing,0,,,,,
2,Negative,0,,,,,
3,Unique Count,663,,,,,
4,Duplicated,69847,,,,,
5,Zero,0,,,,,


**order_id**

In [None]:
%%sql
SELECT * FROM get_column_summary('sales.invoices', 'order_id', True);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Total Count,70510,,,,,
1,Missing,0,,,,,
2,Negative,0,,,,,
3,Zero,0,,,,,
4,Duplicated,0,,,,,
5,Unique Count,70510,,,,,


**delivery_method_id**

In [None]:
%%sql
SELECT * FROM get_column_summary('sales.invoices', 'delivery_method_id', True);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Total Count,70510,,,,,
1,Missing,0,,,,,
2,Negative,0,,,,,
3,Duplicated,70509,,,,,
4,Unique Count,1,,,,,
5,Zero,0,,,,,


**invoice_date**

In [None]:
%%sql
SELECT * FROM get_column_summary('sales.invoices', 'invoice_date', False);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Negative,,,25%,,,2016-01-06 (117)
1,Duplicated,69441.0,,Median,,,2016-04-18 (117)
2,Total Count,70510.0,,Mean,,,2015-07-06 (116)
3,Zero,,,75%,,,2016-02-24 (116)
4,Unique Count,1069.0,,Min,2013-01-01,,2016-02-26 (116)
5,Missing,0.0,,Max,2016-05-31,,2016-05-04 (116)


**total_dry_items**

In [None]:
%%sql
SELECT * FROM get_column_summary('sales.invoices', 'total_dry_items', False);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Total Count,70510,,Max,5.0,,3 (17179)
1,Unique Count,6,,75%,4.0,,2 (17024)
2,Zero,16,,Mean,3.22,,4 (16883)
3,Missing,0,,Median,3.0,,5 (13676)
4,Duplicated,70504,,25%,2.0,,1 (5732)
5,Negative,0,,Min,0.0,,0 (16)


**total_chiller_items**

In [None]:
%%sql
SELECT * FROM get_column_summary('sales.invoices', 'total_chiller_items', False);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Total Count,70510,,Max,3.0,,0 (69519)
1,Unique Count,4,,75%,0.0,,1 (948)
2,Zero,69519,,Mean,0.01,,2 (41)
3,Duplicated,70506,,Median,0.0,,3 (2)
4,Negative,0,,25%,0.0,,
5,Missing,0,,Min,0.0,,


**Key Observations:**  

- There are no missing values in the columns we need.
- No critical anomalies were found.
- The date range matches the orders table.

##### Table sales.invoice_lines

Let's look at the rows.

In [None]:
%%sql
SELECT
    *
FROM
    sales.invoice_lines
LIMIT 5

Unnamed: 0,invoice_line_id,invoice_id,stock_item_id,description,package_type_id,quantity,unit_price,tax_rate,tax_amount,line_profit,extended_price,last_edited_by,last_edited_when
0,1,1,67,Ride on toy sedan car (Black) 1/12 scale,7,10,230.0,15.0,345.0,850.0,2645.0,7,2013-01-01 12:00:00
1,2,2,50,Developer joke mug - old C developers never di...,7,9,13.0,15.0,17.55,76.5,134.55,7,2013-01-01 12:00:00
2,3,2,10,USB food flash drive - chocolate bar,7,9,32.0,15.0,43.2,180.0,331.2,7,2013-01-01 12:00:00
3,4,3,114,Superhero action jacket (Blue) XXL,7,3,30.0,15.0,13.5,24.0,103.5,7,2013-01-01 12:00:00
4,5,4,206,Permanent marker black 5mm nib (Black) 5mm,7,96,2.7,15.0,38.88,96.0,298.08,7,2013-01-01 12:00:00


Let's examine each column we will use for creating the dashboard individually.

**invoice_line_id**

In [None]:
%%sql
SELECT * FROM get_column_summary('sales.invoice_lines', 'invoice_line_id', True);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Missing,0,,,,,
1,Unique Count,228265,,,,,
2,Duplicated,0,,,,,
3,Total Count,228265,,,,,
4,Zero,0,,,,,
5,Negative,0,,,,,


**invoice_id**

In [None]:
%%sql
SELECT * FROM get_column_summary('sales.invoice_lines', 'invoice_id', True);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Total Count,228265,,,,,
1,Missing,0,,,,,
2,Negative,0,,,,,
3,Zero,0,,,,,
4,Duplicated,157755,,,,,
5,Unique Count,70510,,,,,


**stock_item_id**

In [None]:
%%sql
SELECT * FROM get_column_summary('sales.invoice_lines', 'stock_item_id', True);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Total Count,228265,,,,,
1,Missing,0,,,,,
2,Negative,0,,,,,
3,Zero,0,,,,,
4,Duplicated,228038,,,,,
5,Unique Count,227,,,,,


**description**

In [None]:
%%sql
SELECT * FROM get_column_summary('sales.invoice_lines', 'description', True);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Negative,,,,,,
1,Zero,,,,,,
2,Total Count,228265.0,,,,,
3,Missing,0.0,,,,,
4,Unique Count,227.0,,,,,
5,Duplicated,228038.0,,,,,


**package_type_id**

In [None]:
%%sql
SELECT * FROM get_column_summary('sales.invoice_lines', 'package_type_id', True);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Total Count,228265,,,,,
1,Missing,0,,,,,
2,Negative,0,,,,,
3,Duplicated,228261,,,,,
4,Unique Count,4,,,,,
5,Zero,0,,,,,


**quantity**

In [None]:
%%sql
SELECT * FROM get_column_summary('sales.invoice_lines', 'quantity', False);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Total Count,228265,,Max,360.0,,10 (15799)
1,Unique Count,61,,75%,60.0,,5 (12876)
2,Zero,0,,Mean,39.21,,1 (12716)
3,Missing,0,,Median,10.0,,8 (12701)
4,Duplicated,228204,,25%,5.0,,7 (12681)
5,Negative,0,,Min,1.0,,2 (12654)


**unit_price**

In [None]:
%%sql
SELECT * FROM get_column_summary('sales.invoice_lines', 'unit_price', False);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Total Count,228265,,Max,1899.0,,13.00 (44577)
1,Duplicated,228203,,75%,32.0,,32.00 (35177)
2,Zero,0,,Mean,45.59,,18.00 (34326)
3,Missing,0,,Median,18.0,,25.00 (13553)
4,Unique Count,62,,25%,13.0,,30.00 (7307)
5,Negative,0,,Min,0.66,,4.10 (7290)


**tax_rate**

In [None]:
%%sql
SELECT * FROM get_column_summary('sales.invoice_lines', 'tax_rate', False);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Total Count,228265,,Max,15.0,,15.000 (227229)
1,Duplicated,228263,,75%,15.0,,10.000 (1036)
2,Zero,0,,Mean,14.98,,
3,Missing,0,,Median,15.0,,
4,Unique Count,2,,25%,15.0,,
5,Negative,0,,Min,10.0,,


**tax_amount**

In [None]:
%%sql
SELECT * FROM get_column_summary('sales.invoice_lines', 'tax_amount', False);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Total Count,228265,,Max,2848.5,,13.65 (4613)
1,Duplicated,227772,,75%,129.6,,1.95 (4580)
2,Negative,0,,Mean,112.95,,5.85 (4514)
3,Missing,0,,Median,34.5,,3.90 (4505)
4,Unique Count,493,,25%,14.4,,17.55 (4469)
5,Zero,0,,Min,0.38,,15.60 (4451)


**line_profit**

In [None]:
%%sql
SELECT * FROM get_column_summary('sales.invoice_lines', 'line_profit', False);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Total Count,228265,,Max,9200.0,,85.00 (5056)
1,Unique Count,570,,75%,390.0,,68.00 (4662)
2,Negative,4626,,Mean,375.57,,59.50 (4613)
3,Duplicated,227695,,Median,120.0,,25.50 (4610)
4,Zero,0,,25%,51.0,,17.00 (4598)
5,Missing,0,,Min,-645.0,,8.50 (4580)


**extended_price**

In [None]:
%%sql
SELECT * FROM get_column_summary('sales.invoice_lines', 'extended_price', False);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Total Count,228265,,Max,21838.5,,104.65 (4613)
1,Duplicated,227768,,75%,993.6,,14.95 (4580)
2,Zero,0,,Mean,867.6,,44.85 (4514)
3,Unique Count,497,,Median,264.5,,29.90 (4505)
4,Negative,0,,25%,110.4,,134.55 (4469)
5,Missing,0,,Min,2.88,,119.60 (4451)


**Key Observations:**  

- There are no missing values in the columns we need.
- No critical anomalies were found.

##### Table sales.customer_transactions

Let's look at the rows.

In [None]:
%%sql 
SELECT
    *
FROM
    sales.customer_transactions
LIMIT 5

Unnamed: 0,customer_transaction_id,customer_id,transaction_type_id,invoice_id,payment_method_id,transaction_date,amount_excluding_tax,tax_amount,transaction_amount,outstanding_balance,finalization_date,is_finalized,last_edited_by,last_edited_when
0,5,803,1,2,,2013-01-01,405.0,60.75,465.75,0.0,2013-01-02,True,10,2013-01-02 11:30:00
1,7,1,1,3,,2013-01-01,90.0,13.5,103.5,0.0,2013-01-02,True,10,2013-01-02 11:30:00
2,11,1,1,4,,2013-01-01,445.2,66.78,511.98,0.0,2013-01-02,True,10,2013-01-02 11:30:00
3,15,905,1,5,,2013-01-01,704.0,105.6,809.6,0.0,2013-01-02,True,10,2013-01-02 11:30:00
4,19,976,1,6,,2013-01-01,430.0,64.5,494.5,0.0,2013-01-02,True,10,2013-01-02 11:30:00


Let's examine each column we will use for creating the dashboard individually.

**customer_transaction_id**

In [None]:
%%sql
SELECT * FROM get_column_summary('sales.customer_transactions', 'customer_transaction_id', True);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Missing,0,,,,,
1,Total Count,97147,,,,,
2,Duplicated,0,,,,,
3,Zero,0,,,,,
4,Unique Count,97147,,,,,
5,Negative,0,,,,,


**customer_id**

In [None]:
%%sql
SELECT * FROM get_column_summary('sales.customer_transactions', 'customer_id', True);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Missing,0,,,,,
1,Total Count,97147,,,,,
2,Negative,0,,,,,
3,Unique Count,263,,,,,
4,Duplicated,96884,,,,,
5,Zero,0,,,,,


**transaction_type_id**

In [None]:
%%sql
SELECT * FROM get_column_summary('sales.customer_transactions', 'transaction_type_id', True);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Missing,0,,,,,
1,Total Count,97147,,,,,
2,Negative,0,,,,,
3,Zero,0,,,,,
4,Unique Count,2,,,,,
5,Duplicated,97145,,,,,


**invoice_id**

In [None]:
%%sql
SELECT * FROM get_column_summary('sales.customer_transactions', 'invoice_id', True);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Missing,26637,,,,,
1,Total Count,97147,,,,,
2,Negative,0,,,,,
3,Zero,0,,,,,
4,Duplicated,0,,,,,
5,Unique Count,70510,,,,,


**payment_method_id**

In [None]:
%%sql
SELECT * FROM get_column_summary('sales.customer_transactions', 'payment_method_id', True);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Missing,70510,,,,,
1,Total Count,97147,,,,,
2,Negative,0,,,,,
3,Duplicated,26636,,,,,
4,Unique Count,1,,,,,
5,Zero,0,,,,,


**transaction_date**

In [None]:
%%sql
SELECT * FROM get_column_summary('sales.customer_transactions', 'transaction_date', False);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Negative,,,25%,,,2016-01-07 (164)
1,Duplicated,95900.0,,Median,,,2015-11-24 (159)
2,Total Count,97147.0,,Mean,,,2015-07-07 (157)
3,Zero,,,75%,,,2016-03-22 (155)
4,Unique Count,1247.0,,Min,2013-01-01,,2016-01-06 (154)
5,Missing,0.0,,Max,2016-05-31,,2015-07-23 (151)


**Key Observations:**

- Many missing values in the invoice_id column. This is normal, as not all transactions have an invoice_id. This is a system feature.
- Very many missing values in payment_method_id. And only one unique value. This is normal, as transactions are not always related to payments.
- The date range matches the orders table.

#### Application Schema

##### Table application.countries

Let's look at the rows.

In [None]:
%%sql
SELECT
    *
FROM
    application.countries
LIMIT 5

Unnamed: 0,country_id,country_name,formal_name,iso_alpha_3_code,iso_numeric_code,country_type,latest_recorded_population,continent,region,subregion,last_edited_by
0,1,Afghanistan,Islamic State of Afghanistan,AFG,4,UN Member State,28400000,Asia,Asia,Southern Asia,1
1,3,Albania,Republic of Albania,ALB,8,UN Member State,3785031,Europe,Europe,Southern Europe,20
2,4,Algeria,People's Democratic Republic of Algeria,DZA,12,UN Member State,34178188,Africa,Africa,Northern Africa,1
3,6,Andorra,Principality of Andorra,AND,20,UN Member State,87243,Europe,Europe,Southern Europe,15
4,7,Angola,People's Republic of Angola,AGO,24,UN Member State,12799293,Africa,Africa,Middle Africa,1


Let's examine each column we will use for creating the dashboard individually.

**country_id**

In [None]:
%%sql
SELECT * FROM get_column_summary('application.countries', 'country_id', True);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Total Count,190,,,,,
1,Unique Count,190,,,,,
2,Missing,0,,,,,
3,Duplicated,0,,,,,
4,Zero,0,,,,,
5,Negative,0,,,,,


**country_name**

In [None]:
%%sql
SELECT * FROM get_column_summary('application.countries', 'country_name', True);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Total Count,190.0,,,,,
1,Unique Count,190.0,,,,,
2,Missing,0.0,,,,,
3,Duplicated,0.0,,,,,
4,Zero,,,,,,
5,Negative,,,,,,


**iso_alpha_3_code**

In [None]:
%%sql
SELECT * FROM get_column_summary('application.countries', 'iso_alpha_3_code', True);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Total Count,190.0,,,,,
1,Unique Count,190.0,,,,,
2,Missing,0.0,,,,,
3,Duplicated,0.0,,,,,
4,Zero,,,,,,
5,Negative,,,,,,


**Key Observations:**  

- No missing values in the columns we need.
- No critical anomalies were found.

##### Table application.state_provinces

Let's look at the rows.

In [None]:
%%sql
SELECT
    *
FROM
    application.state_provinces
LIMIT 5

Unnamed: 0,state_province_id,state_province_code,state_province_name,country_id,sales_territory,latest_recorded_population,last_edited_by
0,1,AL,Alabama,230,Southeast,5437278,15
1,2,AK,Alaska,230,Far West,735132,1
2,3,AZ,Arizona,230,Southwest,6891688,8
3,4,AR,Arkansas,230,Southeast,3077747,8
4,5,CA,California,230,Far West,41460453,15


Let's examine each column we will use for creating the dashboard individually.

**state_province_id**

In [None]:
%%sql
SELECT * FROM get_column_summary('application.state_provinces', 'state_province_id', True);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Total Count,53,,,,,
1,Unique Count,53,,,,,
2,Missing,0,,,,,
3,Duplicated,0,,,,,
4,Zero,0,,,,,
5,Negative,0,,,,,


**state_province_code**

In [None]:
%%sql
SELECT * FROM get_column_summary('application.state_provinces', 'state_province_code', True);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Total Count,53.0,,,,,
1,Unique Count,53.0,,,,,
2,Missing,0.0,,,,,
3,Duplicated,0.0,,,,,
4,Zero,,,,,,
5,Negative,,,,,,


**state_province_name**

In [None]:
%%sql
SELECT * FROM get_column_summary('application.state_provinces', 'state_province_name', True);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Total Count,53.0,,,,,
1,Unique Count,53.0,,,,,
2,Missing,0.0,,,,,
3,Duplicated,0.0,,,,,
4,Zero,,,,,,
5,Negative,,,,,,


**country_id**

In [None]:
%%sql
SELECT * FROM get_column_summary('application.state_provinces', 'country_id', True);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Total Count,53,,,,,
1,Unique Count,1,,,,,
2,Missing,0,,,,,
3,Duplicated,52,,,,,
4,Zero,0,,,,,
5,Negative,0,,,,,


**sales_territory**

In [None]:
%%sql
SELECT * FROM get_column_summary('application.state_provinces', 'sales_territory', True);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Total Count,53.0,,,,,
1,Unique Count,9.0,,,,,
2,Missing,0.0,,,,,
3,Duplicated,44.0,,,,,
4,Zero,,,,,,
5,Negative,,,,,,


**latest_recorded_population**

In [None]:
%%sql
SELECT * FROM get_column_summary('application.state_provinces', 'latest_recorded_population', True);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Total Count,53,,,,,
1,Unique Count,53,,,,,
2,Missing,0,,,,,
3,Duplicated,0,,,,,
4,Zero,0,,,,,
5,Negative,0,,,,,


**Key Observations:**

- No missing values in the columns we need.
- No critical anomalies were found.
- All states in the application.state_provinces table are from the USA.

##### Table application.cities

Let's look at the rows.

In [None]:
%%sql
SELECT
    *
FROM
    application.cities
LIMIT 5

Unnamed: 0,city_id,city_name,state_province_id,latest_recorded_population,last_edited_by
0,1,Aaronsburg,39,613,1
1,3,Abanda,1,192,1
2,4,Abbeville,42,5237,1
3,5,Abbeville,11,2908,1
4,6,Abbeville,1,2688,1


Let's examine each column we will use for creating the dashboard individually.

**city_id**

In [None]:
%%sql
SELECT * FROM get_column_summary('application.cities', 'city_id', True);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Missing,0,,,,,
1,Total Count,37940,,,,,
2,Duplicated,0,,,,,
3,Negative,0,,,,,
4,Unique Count,37940,,,,,
5,Zero,0,,,,,


**city_name**

In [None]:
%%sql
SELECT * FROM get_column_summary('application.cities', 'city_name', True);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Negative,,,,,,
1,Zero,,,,,,
2,Missing,0.0,,,,,
3,Total Count,37940.0,,,,,
4,Unique Count,23279.0,,,,,
5,Duplicated,14661.0,,,,,


**state_province_id**

In [None]:
%%sql
SELECT * FROM get_column_summary('application.cities', 'state_province_id', True);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Missing,0,,,,,
1,Total Count,37940,,,,,
2,Negative,0,,,,,
3,Zero,0,,,,,
4,Unique Count,53,,,,,
5,Duplicated,37887,,,,,


**latest_recorded_population**

In [None]:
%%sql
SELECT * FROM get_column_summary('application.cities', 'latest_recorded_population', True);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Missing,11048,,,,,
1,Total Count,37940,,,,,
2,Negative,0,,,,,
3,Zero,14,,,,,
4,Duplicated,17568,,,,,
5,Unique Count,9324,,,,,


**Key Observations:**  

- Not all cities have population values.
- No critical anomalies were found.
- The date range matches the orders table.

##### Table application.delivery_methods

Let's look at the rows.

In [None]:
%%sql
SELECT
    *
FROM
    application.delivery_methods
LIMIT 5

Unnamed: 0,delivery_method_id,delivery_method_name,last_edited_by
0,1,Post,1
1,2,Courier,1
2,3,Delivery Van,1
3,4,Customer Collect,1
4,5,Chilled Van,16


Let's examine each column we will use for creating the dashboard individually.

**delivery_method_id**

In [None]:
%%sql
SELECT * FROM get_column_summary('application.delivery_methods', 'delivery_method_id', True);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Total Count,10,,,,,
1,Unique Count,10,,,,,
2,Missing,0,,,,,
3,Duplicated,0,,,,,
4,Zero,0,,,,,
5,Negative,0,,,,,


**delivery_method_name**

In [None]:
%%sql
SELECT * FROM get_column_summary('application.delivery_methods', 'delivery_method_name', True);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Total Count,10.0,,,,,
1,Unique Count,10.0,,,,,
2,Missing,0.0,,,,,
3,Duplicated,0.0,,,,,
4,Zero,,,,,,
5,Negative,,,,,,


**Key Observations:**  

- No missing values in the columns we need.
- No critical anomalies were found.

##### Table application.payment_methods

Let's look at the rows.

In [None]:
%%sql
SELECT
    *
FROM
    application.payment_methods
LIMIT 5

Unnamed: 0,payment_method_id,payment_method_name,last_edited_by
0,1,Cash,1
1,2,Check,1
2,3,Credit-Card,9
3,4,EFT,1


Let's examine each column we will use for creating the dashboard individually.

**payment_method_id**

In [None]:
%%sql
SELECT * FROM get_column_summary('application.payment_methods', 'payment_method_id', True);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Total Count,4,,,,,
1,Unique Count,4,,,,,
2,Missing,0,,,,,
3,Duplicated,0,,,,,
4,Zero,0,,,,,
5,Negative,0,,,,,


**payment_method_name**

In [None]:
%%sql
SELECT * FROM get_column_summary('application.payment_methods', 'payment_method_name', True);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Total Count,4.0,,,,,
1,Unique Count,4.0,,,,,
2,Missing,0.0,,,,,
3,Duplicated,0.0,,,,,
4,Zero,,,,,,
5,Negative,,,,,,


**Key Observations:**  

- No missing values in the columns we need.
- No critical anomalies were found.

##### Table application.transaction_types

Let's look at the rows.

In [None]:
%%sql
SELECT
    *
FROM
    application.transaction_types
LIMIT 5

Unnamed: 0,transaction_type_id,transaction_type_name,last_edited_by
0,1,Customer Invoice,1
1,2,Customer Credit Note,1
2,3,Customer Payment Received,1
3,4,Customer Refund,1
4,5,Supplier Invoice,1


Let's examine each column we will use for creating the dashboard individually.

**transaction_type_id**

In [None]:
%%sql
SELECT * FROM get_column_summary('application.transaction_types', 'transaction_type_id', True);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Total Count,13,,,,,
1,Unique Count,13,,,,,
2,Missing,0,,,,,
3,Duplicated,0,,,,,
4,Zero,0,,,,,
5,Negative,0,,,,,


**transaction_type_name**

In [None]:
%%sql
SELECT * FROM get_column_summary('application.transaction_types', 'transaction_type_name', True);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Total Count,13.0,,,,,
1,Unique Count,13.0,,,,,
2,Missing,0.0,,,,,
3,Duplicated,0.0,,,,,
4,Zero,,,,,,
5,Negative,,,,,,


**Key Observations:**  

- No missing values in the columns we need.
- No critical anomalies were found.

#### Warehouse Schema

##### Table warehouse.stock_items

Let's look at the rows.

In [None]:
%%sql
SELECT
    *
FROM
    warehouse.stock_items
LIMIT 5

Unnamed: 0,stock_item_id,stock_item_name,supplier_id,color_id,unit_package_id,outer_package_id,brand,size,lead_time_days,quantity_per_outer,...,unit_price,recommended_retail_price,typical_weight_per_unit,marketing_comments,internal_comments,photo,custom_fields,tags,search_details,last_edited_by
0,1,USB missile launcher (Green),12,,7,7,,,14,1,...,25.0,37.38,0.3,Complete with 12 projectiles,,,"{ ""CountryOfManufacture"": ""China"", ""Tags"": [""U...","[""USB Powered""]",USB missile launcher (Green) Complete with 12 ...,1
1,2,USB rocket launcher (Gray),12,12.0,7,7,,,14,1,...,25.0,37.38,0.3,Complete with 12 projectiles,,,"{ ""CountryOfManufacture"": ""China"", ""Tags"": [""U...","[""USB Powered""]",USB rocket launcher (Gray) Complete with 12 pr...,1
2,3,Office cube periscope (Black),12,3.0,7,6,,,14,10,...,18.5,27.66,0.25,Need to see over your cubicle wall? This is ju...,,,"{ ""CountryOfManufacture"": ""China"", ""Tags"": [] }",[],Office cube periscope (Black) Need to see over...,1
3,4,USB food flash drive - sushi roll,12,,7,7,,,14,1,...,32.0,47.84,0.05,,,,"{ ""CountryOfManufacture"": ""Japan"", ""Tags"": [""3...","[""32GB"",""USB Powered""]",USB food flash drive - sushi roll,1
4,5,USB food flash drive - hamburger,12,,7,7,,,14,1,...,32.0,47.84,0.05,,,,"{ ""CountryOfManufacture"": ""Japan"", ""Tags"": [""1...","[""16GB"",""USB Powered""]",USB food flash drive - hamburger,1


Let's examine each column we will use for creating the dashboard individually.

**stock_item_id**

In [None]:
%%sql
SELECT * FROM get_column_summary('warehouse.stock_items', 'stock_item_id', True);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Total Count,227,,,,,
1,Unique Count,227,,,,,
2,Missing,0,,,,,
3,Duplicated,0,,,,,
4,Zero,0,,,,,
5,Negative,0,,,,,


**stock_item_name**

In [None]:
%%sql
SELECT * FROM get_column_summary('warehouse.stock_items', 'stock_item_name', True);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Total Count,227.0,,,,,
1,Unique Count,227.0,,,,,
2,Missing,0.0,,,,,
3,Duplicated,0.0,,,,,
4,Zero,,,,,,
5,Negative,,,,,,


**color_id**

In [None]:
%%sql
SELECT * FROM get_column_summary('warehouse.stock_items', 'color_id', True);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Total Count,227,,,,,
1,Unique Count,7,,,,,
2,Missing,99,,,,,
3,Duplicated,121,,,,,
4,Zero,0,,,,,
5,Negative,0,,,,,


**unit_package_id**

In [None]:
%%sql
SELECT * FROM get_column_summary('warehouse.stock_items', 'unit_package_id', True);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Total Count,227,,,,,
1,Unique Count,4,,,,,
2,Missing,0,,,,,
3,Duplicated,223,,,,,
4,Zero,0,,,,,
5,Negative,0,,,,,


**outer_package_id**

In [None]:
%%sql
SELECT * FROM get_column_summary('warehouse.stock_items', 'outer_package_id', True);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Total Count,227,,,,,
1,Unique Count,3,,,,,
2,Missing,0,,,,,
3,Duplicated,224,,,,,
4,Zero,0,,,,,
5,Negative,0,,,,,


**brand**

In [None]:
%%sql
SELECT * FROM get_column_summary('warehouse.stock_items', 'brand', False);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Total Count,227.0,,Max,Northwind,,Northwind (18)
1,Unique Count,1.0,,75%,,,
2,Missing,209.0,,Mean,,,
3,Duplicated,17.0,,Median,,,
4,Zero,,,25%,,,
5,Negative,,,Min,Northwind,,


**size**

In [None]:
%%sql
SELECT * FROM get_column_summary('warehouse.stock_items', 'size', False);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Total Count,227.0,,Max,XXS,,XL (12)
1,Unique Count,43.0,,75%,,,L (11)
2,Missing,64.0,,Mean,,,M (11)
3,Duplicated,120.0,,Median,,,S (11)
4,Zero,,,25%,,,1/12 scale (9)
5,Negative,,,Min,1.5m,,1/50 scale (9)


**tax_rate**

In [None]:
%%sql
SELECT * FROM get_column_summary('warehouse.stock_items', 'tax_rate', False);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Total Count,227,,Max,15.0,,15.000 (219)
1,Unique Count,2,,75%,15.0,,10.000 (8)
2,Missing,0,,Mean,14.82,,
3,Duplicated,225,,Median,15.0,,
4,Zero,0,,25%,15.0,,
5,Negative,0,,Min,10.0,,


**unit_price**

In [None]:
%%sql
SELECT * FROM get_column_summary('warehouse.stock_items', 'unit_price', False);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Total Count,227,,Max,1899.0,,13.00 (42)
1,Unique Count,57,,75%,32.0,,18.00 (35)
2,Missing,0,,Mean,44.16,,32.00 (34)
3,Duplicated,170,,Median,18.0,,25.00 (13)
4,Zero,0,,25%,13.0,,30.00 (7)
5,Negative,0,,Min,0.66,,4.10 (7)


**typical_weight_per_unit**

In [None]:
%%sql
SELECT * FROM get_column_summary('warehouse.stock_items', 'typical_weight_per_unit', False);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Total Count,227,,Max,21.0,,0.150 (42)
1,Unique Count,23,,75%,0.7,,0.400 (28)
2,Missing,0,,Mean,1.83,,0.350 (25)
3,Duplicated,204,,Median,0.35,,0.300 (21)
4,Zero,0,,25%,0.15,,0.250 (18)
5,Negative,0,,Min,0.05,,0.500 (13)


**Key Observations:**

- Not all items have a brand, color, or size.
- No critical anomalies were found.

##### Table warehouse.stock_item_stock_groups

Let's look at the rows.

In [None]:
%%sql
SELECT
    *
FROM
    warehouse.stock_item_stock_groups
LIMIT 5

Unnamed: 0,stock_item_stock_group_id,stock_item_id,stock_group_id,last_edited_by,last_edited_when
0,1,1,6,1,2013-01-01
1,2,1,1,1,2013-01-01
2,3,1,7,1,2013-01-01
3,4,2,6,1,2013-01-01
4,5,2,1,1,2013-01-01


Let's examine each column we will use for creating the dashboard individually.

**stock_item_stock_group_id**

In [None]:
%%sql
SELECT * FROM get_column_summary('warehouse.stock_item_stock_groups', 'stock_item_stock_group_id', True);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Total Count,442,,,,,
1,Unique Count,442,,,,,
2,Missing,0,,,,,
3,Duplicated,0,,,,,
4,Zero,0,,,,,
5,Negative,0,,,,,


**stock_item_id**

In [None]:
%%sql
SELECT * FROM get_column_summary('warehouse.stock_item_stock_groups', 'stock_item_id', True);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Total Count,442,,,,,
1,Unique Count,227,,,,,
2,Missing,0,,,,,
3,Duplicated,215,,,,,
4,Zero,0,,,,,
5,Negative,0,,,,,


**stock_group_id**

In [None]:
%%sql
SELECT * FROM get_column_summary('warehouse.stock_item_stock_groups', 'stock_group_id', True);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Total Count,442,,,,,
1,Unique Count,9,,,,,
2,Missing,0,,,,,
3,Duplicated,433,,,,,
4,Zero,0,,,,,
5,Negative,0,,,,,


**Key Observations:**  

- No missing values in the columns we need.
- No critical anomalies were found.

##### Table warehouse.stock_groups

Let's look at the rows.

In [None]:
%%sql
SELECT
    *
FROM
    warehouse.stock_groups
LIMIT 5

Unnamed: 0,stock_group_id,stock_group_name,last_edited_by
0,1,Novelty Items,1
1,2,Clothing,1
2,3,Mugs,1
3,4,T-Shirts,1
4,5,Airline Novelties,1


Let's examine each column we will use for creating the dashboard individually.

**stock_group_id**

In [None]:
%%sql
SELECT * FROM get_column_summary('warehouse.stock_groups', 'stock_group_id', True);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Total Count,10,,,,,
1,Unique Count,10,,,,,
2,Missing,0,,,,,
3,Duplicated,0,,,,,
4,Zero,0,,,,,
5,Negative,0,,,,,


**stock_group_name**

In [None]:
%%sql
SELECT * FROM get_column_summary('warehouse.stock_groups', 'stock_group_name', True);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Total Count,10.0,,,,,
1,Unique Count,10.0,,,,,
2,Missing,0.0,,,,,
3,Duplicated,0.0,,,,,
4,Zero,,,,,,
5,Negative,,,,,,


**Key Observations:**  

- No missing values in the columns we need.
- No critical anomalies were found.

##### Table warehouse.package_types

Let's look at the rows.

In [None]:
%%sql
SELECT
    *
FROM
    warehouse.package_types
LIMIT 5

Unnamed: 0,package_type_id,package_type_name,last_edited_by
0,1,Bag,1
1,2,Block,1
2,3,Bottle,1
3,4,Box,1
4,5,Can,1


Let's examine each column we will use for creating the dashboard individually.

**package_type_id**

In [None]:
%%sql
SELECT * FROM get_column_summary('warehouse.package_types', 'package_type_id', True);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Total Count,14,,,,,
1,Unique Count,14,,,,,
2,Missing,0,,,,,
3,Duplicated,0,,,,,
4,Zero,0,,,,,
5,Negative,0,,,,,


**package_type_name**

In [None]:
%%sql
SELECT * FROM get_column_summary('warehouse.package_types', 'package_type_name', True);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Total Count,14.0,,,,,
1,Unique Count,14.0,,,,,
2,Missing,0.0,,,,,
3,Duplicated,0.0,,,,,
4,Zero,,,,,,
5,Negative,,,,,,


**Key Observations:**  

- No missing values in the columns we need.
- No critical anomalies were found.

##### Table warehouse.colors

Let's look at the rows.

In [None]:
%%sql
SELECT
    *
FROM
    warehouse.colors
LIMIT 5

Unnamed: 0,color_id,color_name,last_edited_by
0,1,Azure,1
1,2,Beige,1
2,3,Black,1
3,4,Blue,1
4,5,Charcoal,1


Let's examine each column we will use for creating the dashboard individually.

**color_id**

In [None]:
%%sql
SELECT * FROM get_column_summary('warehouse.colors', 'color_id', True);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Total Count,36,,,,,
1,Unique Count,36,,,,,
2,Missing,0,,,,,
3,Duplicated,0,,,,,
4,Zero,0,,,,,
5,Negative,0,,,,,


**color_name**

In [None]:
%%sql
SELECT * FROM get_column_summary('warehouse.colors', 'color_name', True);

Unnamed: 0,Summary Type,Summary Count,-,Stats Type,Stats Value,--,Top Values
0,Total Count,36.0,,,,,
1,Unique Count,36.0,,,,,
2,Missing,0.0,,,,,
3,Duplicated,0.0,,,,,
4,Zero,,,,,,
5,Negative,,,,,,


**Key Observations:**  

- No missing values in the columns we need.
- No critical anomalies were found.

### Exploring Relationships Between Tables

Let's examine the relationships between tables for further joins.

Let's check if there are any key mismatches.

#### Sales Schema

**sales.orders and sales.order_lines**

In [None]:
%%sql
Select * from analyze_relationship('sales.orders', 'sales.order_lines', 'order_id', 'order_id')

Unnamed: 0,relationship_type,left_only_keys,right_only_keys,left_size,right_size,common_keys
0,1:N,0,0,73595,231412,73595


**sales.orders and sales.customers**

In [None]:
%%sql
Select * from analyze_relationship('sales.orders', 'sales.customers', 'customer_id', 'customer_id')

Unnamed: 0,relationship_type,left_only_keys,right_only_keys,left_size,right_size,common_keys
0,N:1,0,0,73595,663,663


**sales.orders and sales.invoiced**

In [None]:
%%sql
Select * from analyze_relationship('sales.orders', 'sales.invoices', 'order_id', 'order_id')

Unnamed: 0,relationship_type,left_only_keys,right_only_keys,left_size,right_size,common_keys
0,1:1,3085,0,73595,70510,70510


**sales.customers and sales.customer_categories**

In [None]:
%%sql
Select * from analyze_relationship('sales.customers', 'sales.customer_categories', 'customer_category_id', 'customer_category_id')

Unnamed: 0,relationship_type,left_only_keys,right_only_keys,left_size,right_size,common_keys
0,N:1,0,3,663,8,5


**sales.invoiced and sales.invoice_lines**

In [None]:
%%sql
Select * from analyze_relationship('sales.invoices', 'sales.invoice_lines', 'invoice_id', 'invoice_id')

Unnamed: 0,relationship_type,left_only_keys,right_only_keys,left_size,right_size,common_keys
0,1:N,0,0,70510,228265,70510


**sales.customers and sales.customer_transactions**

In [None]:
%%sql
Select * from analyze_relationship('sales.customers', 'sales.customer_transactions', 'customer_id', 'customer_id')

Unnamed: 0,relationship_type,left_only_keys,right_only_keys,left_size,right_size,common_keys
0,1:N,400,0,663,97147,263


**sales.customers and sales.invoiced**

In [None]:
%%sql
Select * from analyze_relationship('sales.customers', 'sales.invoices', 'customer_id', 'customer_id')

Unnamed: 0,relationship_type,left_only_keys,right_only_keys,left_size,right_size,common_keys
0,1:N,0,0,663,70510,663


**sales.invoiced and sales.customer_transactions**

In [None]:
%%sql
Select * from analyze_relationship('sales.invoices', 'sales.customer_transactions', 'invoice_id', 'invoice_id')

Unnamed: 0,relationship_type,left_only_keys,right_only_keys,left_size,right_size,common_keys
0,1:1,0,0,70510,97147,70510


**Key Observations:**

- The sales.orders table contains order_id values that are not present in sales.invoices. This is normal, as not all orders have invoices.
- The sales.customer_categories table contains customer_id values that are not present in the sales.customers table. This is also normal.
- The sales.customers table contains customer_id values that are not present in the sales.customer_transactions table. This is also normal, as not all customers have transactions.
- No critical anomalies were found.

#### Application Schema

**application.country and application.state_provinces**

In [None]:
%%sql
Select * from analyze_relationship('application.countries', 'application.state_provinces', 'country_id', 'country_id')

Unnamed: 0,relationship_type,left_only_keys,right_only_keys,left_size,right_size,common_keys
0,1:N,189,0,190,53,1


**application.state_provinces and application.cities**

In [None]:
%%sql
Select * from analyze_relationship('application.state_provinces', 'application.cities', 'state_province_id', 'state_province_id')

Unnamed: 0,relationship_type,left_only_keys,right_only_keys,left_size,right_size,common_keys
0,1:N,0,0,53,37940,53


**application.delivery_methods and sales.customers**

In [None]:
%%sql
Select * from analyze_relationship('application.delivery_methods', 'sales.customers', 'delivery_method_id', 'delivery_method_id')

Unnamed: 0,relationship_type,left_only_keys,right_only_keys,left_size,right_size,common_keys
0,1:N,9,0,10,663,1


**application.payment_methods and sales.customer_transactions**

In [None]:
%%sql
Select * from analyze_relationship('application.payment_methods', 'sales.customer_transactions', 'payment_method_id', 'payment_method_id')

Unnamed: 0,relationship_type,left_only_keys,right_only_keys,left_size,right_size,common_keys
0,1:N,3,0,4,97147,1


**application.transaction_types and sales.customer_transactions**

In [None]:
%%sql
Select * from analyze_relationship('application.transaction_types', 'sales.customer_transactions', 'transaction_type_id', 'transaction_type_id')

Unnamed: 0,relationship_type,left_only_keys,right_only_keys,left_size,right_size,common_keys
0,1:N,11,0,13,97147,2


**Key Observations:**

- The application.countries table contains country_id values that are not present in application.state_provinces. This is normal, as not all countries are represented in state_provinces.
- The application.delivery_methods table contains delivery_method_id values that are not present in the sales.customers table. This is also normal, as not all delivery methods may have been used.
- The application.payment_methods table contains payment_method_id values that are not present in the sales.customer_transactions table. This is also normal, as not all payment methods may have been used.
- The application.transaction_types table contains transaction_type_id values that are not present in the sales.customer_transactions table. This is also normal, as not all transaction types may have been used.
- No critical anomalies were found.

#### Warehouse Schema

**warehouse.stock_items and warehouse.stock_item_stock_groups**

In [None]:
%%sql
Select * from analyze_relationship('warehouse.stock_items', 'warehouse.stock_item_stock_groups', 'stock_item_id', 'stock_item_id')

Unnamed: 0,relationship_type,left_only_keys,right_only_keys,left_size,right_size,common_keys
0,1:N,0,0,227,442,227


**warehouse.stock_groups and warehouse.stock_item_stock_groups**

In [None]:
%%sql
Select * from analyze_relationship('warehouse.stock_groups', 'warehouse.stock_item_stock_groups', 'stock_group_id', 'stock_group_id')

Unnamed: 0,relationship_type,left_only_keys,right_only_keys,left_size,right_size,common_keys
0,1:N,1,0,10,442,9


**warehouse.stock_items and warehouse.colors**

In [None]:
%%sql
Select * from analyze_relationship('warehouse.stock_items', 'warehouse.colors', 'color_id', 'color_id')

Unnamed: 0,relationship_type,left_only_keys,right_only_keys,left_size,right_size,common_keys
0,N:1,0,29,227,36,7


**warehouse.stock_items and warehouse.package_types (unit_package_id)**

In [None]:
%%sql
Select * from analyze_relationship('warehouse.stock_items', 'warehouse.package_types', 'unit_package_id', 'package_type_id')

Unnamed: 0,relationship_type,left_only_keys,right_only_keys,left_size,right_size,common_keys
0,N:1,0,10,227,14,4


**warehouse.stock_items and warehouse.package_types (outer_package_id)**

In [None]:
%%sql
Select * from analyze_relationship('warehouse.stock_items', 'warehouse.package_types', 'outer_package_id', 'package_type_id')

Unnamed: 0,relationship_type,left_only_keys,right_only_keys,left_size,right_size,common_keys
0,N:1,0,11,227,14,3


**Key Observations:**

- The warehouse.stock_groups table contains stock_group_id values that are not present in warehouse.stock_item_stock_groups. This is normal, as not all stock groups may be represented.
- The warehouse.colors table contains color_id values that are not present in the warehouse.stock_items table. This is also normal, as not all colors may be used.
- The warehouse.package_types table contains package_type_id values that are not present in the warehouse.stock_items table. This is also normal, as not all package types may be used.
- No critical anomalies were found.