# Pharmacy OTC Sales Data (2022)

## 1. Introduction

The purpose of this notebook is to implement an inital and incremental load for BRONZE stage.

## 2. Environment setup

In [0]:
%sql
USE CATALOG training_catalog;

In [0]:
%sql
USE SCHEMA pharmacy_otc_sales_data_2022_db;

In [0]:
%sql
SELECT current_catalog() AS current_catalog, current_schema() AS current_schema;

## 3. Batch data ingestion with CTAS

This method is to ingest data as a inital load.

Steps:

1. Create table
2. Ingest data

### 3.1. Exploring data

Exploring files

In [0]:
%sql
LIST "/Volumes/training_catalog/pharmacy_otc_sales_data_2022_db/training_files"

Exploring file content

In [0]:
%sql
SELECT *
FROM read_files(
    "/Volumes/training_catalog/pharmacy_otc_sales_data_2022_db/training_files",
    FORMAT => "CSV"
)
LIMIT 5;

### 3.2. Creating managed delta table

Some considerations:

* All columns are defined as **STRING**, because there might be some malformed inputs, and **STRING** supports them.
* **_rescued_data** column is not being used, because it is not working properly.
* Validate data integrity can be performed by data profiling process.

In [0]:
%sql
DROP TABLE IF EXISTS pharmacy_sales_bronze;

-- Create table
CREATE TABLE IF NOT EXISTS pharmacy_sales_bronze
AS
SELECT *
FROM read_files(
    "/Volumes/training_catalog/pharmacy_otc_sales_data_2022_db/training_files",
    FORMAT => "CSV",
    HEADER => TRUE,
    SCHEMA => "date STRING, product STRING, sales_person STRING, boxes_shipped STRING, amount STRING, country STRING"
);

-- Preview data
SELECT * FROM pharmacy_sales_bronze LIMIT 5;

In [0]:
%sql
DESCRIBE TABLE EXTENDED pharmacy_sales_bronze;

## 4. Incremental ingestion with COPY INTO

This method is to ingest data incrementally, where "COPY INTO" instruction only ingest new data. If there are no changes, this method does not insert data.

### 4.1. Exploring data

In [0]:
%sql
LIST "/Volumes/training_catalog/pharmacy_otc_sales_data_2022_db/training_files"

In [0]:
%sql
SELECT *
FROM read_files(
    "/Volumes/training_catalog/pharmacy_otc_sales_data_2022_db/training_files/pharmacy_otc_sales_data_2.csv",
    FORMAT => "CSV"
)
LIMIT 5;

### 4.2. Ingesting data

Some considerations:

* It is being used "header = false" option, because it is not supported by "COPY INTO".
* It is being used "skipRows = 1" option, in order to avoid original headers.

In [0]:
%sql
COPY INTO pharmacy_sales_bronze (date, product, sales_person, boxes_shipped, amount, country)
FROM "/Volumes/training_catalog/pharmacy_otc_sales_data_2022_db/training_files/"
FILEFORMAT = csv
FORMAT_OPTIONS (
  "header" = "false",
  "inferSchema" = "false",
  "skipRows" = "1"
);

Rerun instruction (no data inserted)

In [0]:
%sql
COPY INTO pharmacy_sales_bronze (date, product, sales_person, boxes_shipped, amount, country)
FROM "/Volumes/training_catalog/pharmacy_otc_sales_data_2022_db/training_files/"
FILEFORMAT = csv
FORMAT_OPTIONS (
  "header" = "false",
  "inferSchema" = "false",
  "skipRows" = "1"
);