# Setup Data in Snowflake for Tasty Bytes Analytics

## Initial Database Configuration

### Environment Setup
This section configures the necessary Snowflake environment and creates the main database for our Voice of Customer (VOC) analytics platform.

### Security and Resource Configuration
- **Role**: Using `demoadmin` for administrative privileges
- **Warehouse**: Utilizing `compute_wh` for processing power

### Configuration Details

**Role Configuration**
- `demoadmin`: Administrative role with necessary privileges for database creation and management
- Ensures proper security controls and access management

**Warehouse Selection**
- `compute_wh`: Default compute warehouse
- Handles computational resources for database operations

**Database Creation**
- Database Name: `tb_voc`
- Purpose: Centralized storage for Voice of Customer analytics
- `CREATE OR REPLACE`: Ensures clean installation if database already exists

### Best Practices
- Always verify role permissions before execution
- Ensure warehouse has adequate resources
- Document any custom configurations

In [None]:
USE ROLE demoadmin;
USE WAREHOUSE compute_wh;

-- create tb_voc database
CREATE OR REPLACE DATABASE tb_voc;


## Database Schema Setup

### Overview
This section establishes the foundational database structure for our Voice of Customer (VOC) analysis system. We create four distinct schemas to organize our data pipeline:
- Raw POS (Point of Sale) data
- Raw Customer Support data
- Harmonized data layer
- Analytics layer

### Schema Architecture
The database follows a layered architecture:
1. **Raw Data Layers** - Store original, unmodified data
2. **Harmonized Layer** - Contains cleaned and standardized data
3. **Analytics Layer** - Hosts transformed data ready for analysis

### Schema Descriptions

**Raw POS Schema (tb_voc.raw_pos)**
- Stores point-of-sale transaction data
- Contains original, unmodified retail data
- Source for customer purchase behavior analysis

**Raw Support Schema (tb_voc.raw_support)**
- Houses customer support interaction data
- Includes customer feedback and support tickets
- Maintains original support data structure

**Harmonized Schema (tb_voc.harmonized)**
- Stores cleaned and standardized data
- Combines data from multiple sources
- Ensures consistent data formats and structures

**Analytics Schema (tb_voc.analytics)**
- Contains derived metrics and aggregated data
- Optimized for reporting and analysis
- Supports business intelligence applications

### Usage Notes
- Always use appropriate schema when creating new tables
- Follow naming conventions for each schema
- Maintain data lineage documentation

In [None]:
-- create raw_pos schema
CREATE OR REPLACE SCHEMA tb_voc.raw_pos;

-- create raw_customer schema
CREATE OR REPLACE SCHEMA tb_voc.raw_support;

-- create harmonized schema
CREATE OR REPLACE SCHEMA tb_voc.harmonized;

-- create analytics schema
CREATE OR REPLACE SCHEMA tb_voc.analytics;


## Data Science Warehouse Configuration

### Overview
This section sets up a dedicated warehouse for data science operations. The warehouse is configured with specific parameters to optimize performance and cost efficiency.

### Warehouse Specifications
| Parameter | Value | Purpose |
|-----------|--------|---------|
| Size | Large | Higher computational power |
| Type | Standard | General-purpose processing |
| Auto-suspend | 60 seconds | Cost optimization |
| Auto-resume | Enabled | Seamless operation |
| Initial State | Suspended | Resource efficiency |

### Configuration Details

**Resource Management**
- **Warehouse Name**: `tasty_ds_wh`
- **Purpose**: Dedicated to data science workloads
- **Size**: Large warehouse for handling complex computations

**Performance Settings**
- Auto-suspends after 60 seconds of inactivity
- Automatically resumes when queries are submitted
- Starts in suspended state to prevent unnecessary costs

**Cost Optimization Features**
- Automatic suspension saves credits during idle periods
- Auto-resume ensures no manual intervention needed
- Initial suspended state prevents immediate resource allocation

### Usage Guidelines
- Monitor warehouse utilization for optimal sizing
- Adjust auto-suspend timing based on usage patterns
- Consider scaling during peak processing times

In [None]:
-- create tasty_ds_wh warehouse
CREATE OR REPLACE WAREHOUSE tasty_ds_wh
    WAREHOUSE_SIZE = 'large'
    WAREHOUSE_TYPE = 'standard'
    AUTO_SUSPEND = 60
    AUTO_RESUME = TRUE
    INITIALLY_SUSPENDED = TRUE
COMMENT = 'data science warehouse for tasty bytes';


## Data Loading Infrastructure Setup

### Overview
This section configures the necessary components for data ingestion from S3 into Snowflake. We set up:
- A warehouse connection
- CSV file format specification
- External stage connection to S3

### Infrastructure Components
| Component | Purpose |
|-----------|---------|
| Warehouse | Processing resources |
| File Format | Data parsing rules |
| External Stage | S3 connection point |

### Configuration Details

**File Format Configuration**
- Name: `tb_voc.public.csv_ff`
- Type: CSV
- Purpose: Standardizes data parsing rules

**External Stage Setup**
- Name: `tb_voc.public.s3load`
- Location: S3 bucket for Tasty Bytes VOC data
- Integration: Links to specified CSV file format

### Usage Notes
- Ensure warehouse is active before loading data
- Verify S3 bucket permissions
- Monitor data loading performance
- Use appropriate file format for data type

In [None]:

USE WAREHOUSE tasty_ds_wh;

/*--
• file format and stage creation
--*/

CREATE OR REPLACE FILE FORMAT tb_voc.public.csv_ff 
type = 'csv';

CREATE OR REPLACE STAGE tb_voc.public.s3load
COMMENT = 'Quickstarts S3 Stage Connection'
url = 's3://sfquickstarts/tastybytes-voc/'
file_format = tb_voc.public.csv_ff;


## Raw Data Tables Configuration

### Overview
This section establishes the foundational tables in our raw data zone. These tables store original, unmodified data from various sources including:
- Menu information
- Truck details
- Order transactions
- Customer reviews

### Menu Table Details
- **Purpose**: Stores complete menu catalog information
- **Key Fields**:
  - Menu and item identifiers
  - Categorization fields
  - Pricing information
  - Health metrics (stored as VARIANT for flexibility)

### Truck Table Details
- **Purpose**: Maintains food truck fleet information
- **Key Fields**:
  - Location and regional data
  - Vehicle specifications
  - Franchise information
  - Operational details

### Order Header Table Details
- **Purpose**: Records transaction details
- **Key Fields**:
  - Order identification and timing
  - Customer and location references
  - Financial information
  - Operational metrics

### Review Table Details
- **Purpose**: Stores customer feedback data
- **Key Fields**:
  - Review identification
  - Content and language
  - Source tracking
  - Order reference

### Data Relationships
- Orders link to specific trucks and customers
- Reviews connect to specific orders
- Trucks associate with menu types
- Menu items belong to specific truck brands

### Usage Guidelines
- Maintain data integrity with proper key relationships
- Monitor storage usage for large text fields
- Consider partitioning strategy for large tables
- Implement appropriate indexing



In [None]:

/*--
raw zone table build 
--*/

-- menu table build
CREATE OR REPLACE TABLE tb_voc.raw_pos.menu
(
    menu_id NUMBER(19,0),
    menu_type_id NUMBER(38,0),
    menu_type VARCHAR(16777216),
    truck_brand_name VARCHAR(16777216),
    menu_item_id NUMBER(38,0),
    menu_item_name VARCHAR(16777216),
    item_category VARCHAR(16777216),
    item_subcategory VARCHAR(16777216),
    cost_of_goods_usd NUMBER(38,4),
    sale_price_usd NUMBER(38,4),
    menu_item_health_metrics_obj VARIANT
);

-- truck table build 
CREATE OR REPLACE TABLE tb_voc.raw_pos.truck
(
    truck_id NUMBER(38,0),
    menu_type_id NUMBER(38,0),
    primary_city VARCHAR(16777216),
    region VARCHAR(16777216),
    iso_region VARCHAR(16777216),
    country VARCHAR(16777216),
    iso_country_code VARCHAR(16777216),
    franchise_flag NUMBER(38,0),
    year NUMBER(38,0),
    make VARCHAR(16777216),
    model VARCHAR(16777216),
    ev_flag NUMBER(38,0),
    franchise_id NUMBER(38,0),
    truck_opening_date DATE
);

-- order_header table build
CREATE OR REPLACE TABLE tb_voc.raw_pos.order_header
(
    order_id NUMBER(38,0),
    truck_id NUMBER(38,0),
    location_id FLOAT,
    customer_id NUMBER(38,0),
    discount_id VARCHAR(16777216),
    shift_id NUMBER(38,0),
    shift_start_time TIME(9),
    shift_end_time TIME(9),
    order_channel VARCHAR(16777216),
    order_ts TIMESTAMP_NTZ(9),
    served_ts VARCHAR(16777216),
    order_currency VARCHAR(3),
    order_amount NUMBER(38,4),
    order_tax_amount VARCHAR(16777216),
    order_discount_amount VARCHAR(16777216),
    order_total NUMBER(38,4)
);

-- truck_reviews table build
CREATE OR REPLACE TABLE tb_voc.raw_support.truck_reviews
(
    order_id NUMBER(38,0),
    language VARCHAR(16777216),
    source VARCHAR(16777216),
    review VARCHAR(16777216),
    review_id NUMBER(18,0)
);


## Data Views Configuration

### Overview
This section creates harmonized and analytics views for customer review data. The views integrate data from multiple raw tables to provide:
- Unified review information
- Customer insights
- Operational metrics
- Geographic analysis

### View Architecture
Two layers of views are created:
1. Harmonized layer for data integration
2. Analytics layer for business consumption

### Harmonized View Details
- **Purpose**: Combines review data with operational context
- **Key Components**:
  - Review information (ID, content, language)
  - Order details
  - Location data
  - Customer identification
  - Brand information

**Data Integration**
- Links reviews to orders
- Associates orders with trucks
- Connects trucks to menu types
- Provides geographical context

### Analytics View Details
- **Purpose**: Exposes harmonized data for analysis
- **Usage**: Business intelligence and reporting
- **Benefits**: 
  - Simplified access to integrated data
  - Consistent data structure
  - Single source of truth

### Implementation Notes
- Views ensure data consistency
- DISTINCT clause removes duplicates
- Date conversion standardizes temporal analysis
- Joins maintain referential integrity

### Best Practices
- Monitor view performance
- Update views when source schemas change
- Document business rules
- Maintain clear naming conventions

In [None]:

/*--
• harmonized view creation
--*/

-- truck_reviews_v view
CREATE OR REPLACE VIEW tb_voc.harmonized.truck_reviews_v
    AS
SELECT DISTINCT
    r.review_id,
    r.order_id,
    oh.truck_id,
    r.language,
    source,
    r.review,
    t.primary_city,
    oh.customer_id,
    TO_DATE(oh.order_ts) AS date,
    m.truck_brand_name
FROM tb_voc.raw_support.truck_reviews r
JOIN tb_voc.raw_pos.order_header oh
    ON oh.order_id = r.order_id
JOIN tb_voc.raw_pos.truck t
    ON t.truck_id = oh.truck_id
JOIN tb_voc.raw_pos.menu m
    ON m.menu_type_id = t.menu_type_id;

/*--
• analytics view creation
--*/

-- truck_reviews_v view
CREATE OR REPLACE VIEW tb_voc.analytics.truck_reviews_v
    AS
SELECT * FROM harmonized.truck_reviews_v;


----




## Data Loading Process

### Overview
This section handles the initial data load from S3 into our raw tables. The process includes:
- Loading POS (Point of Sale) data
- Loading customer review data
- Optimizing warehouse resources

### Loading Sequence
The data is loaded in a specific order to maintain referential integrity:
1. Menu reference data
2. Truck information
3. Transaction records
4. Customer reviews

### Data Source Details
- **Source**: S3 staged files
- **Target**: Raw zone tables
- **Method**: Bulk COPY operation

**Loading Paths**
| Table | Source Path |
|-------|-------------|
| Menu | raw_pos/menu/ |
| Truck | raw_pos/truck/ |
| Orders | raw_pos/order_header/ |
| Reviews | raw_support/truck_reviews/ |

### Resource Optimization
- Warehouse size adjusted to medium for balanced performance
- Optimizes compute resources for data loading
- Ensures cost-effective processing

### Best Practices
- Monitor load performance
- Verify data integrity after loading
- Check for load errors
- Scale warehouse as needed

In [None]:
/*--
raw zone table load 
--*/


-- menu table load
COPY INTO tb_voc.raw_pos.menu
FROM @tb_voc.public.s3load/raw_pos/menu/;

-- truck table load
COPY INTO tb_voc.raw_pos.truck
FROM @tb_voc.public.s3load/raw_pos/truck/;

-- order_header table load
COPY INTO tb_voc.raw_pos.order_header
FROM @tb_voc.public.s3load/raw_pos/order_header/;

-- truck_reviews table load
COPY INTO tb_voc.raw_support.truck_reviews
FROM @tb_voc.public.s3load/raw_support/truck_reviews/;


-- scale wh to medium
ALTER WAREHOUSE tasty_ds_wh SET WAREHOUSE_SIZE = 'Medium';

-- setup completion note
SELECT 'setup is now complete' AS note;