# Yelp

## 1. Introduction

The purpose of this notebook is to implement an inital load for BRONZE stage using JSON data.

## 2. Environment setup

In [0]:
%sql
USE CATALOG training_catalog;

In [0]:
%sql
USE SCHEMA yelp_db;

In [0]:
%sql
SELECT current_catalog() AS current_catalog, current_schema() AS current_schema;

## 3. Batch data ingestion with CTAS

This method is to ingest data as a inital load.

Steps:

1. Create table
2. Ingest data

### 3.1. Exploring data

Exploring files

In [0]:
%sql
LIST "/Volumes/training_catalog/yelp_db/training_files"

Exploring file content

In [0]:
spark.sql("SELECT * FROM text.`/Volumes/training_catalog/yelp_db/training_files` LIMIT 5").display()

In [0]:
%sql
SELECT *
FROM read_files(
    "/Volumes/training_catalog/yelp_db/training_files",
    FORMAT => "JSON"
)
LIMIT 5;

### 3.2. Creating managed delta table

#### 3.2.1. Store data into table

In [0]:
%sql
DROP TABLE IF EXISTS yelp_bronze;

-- Create table
CREATE TABLE IF NOT EXISTS yelp_bronze
AS
SELECT *
FROM read_files(
    "/Volumes/training_catalog/yelp_db/training_files",
    FORMAT => "JSON",
    HEADER => TRUE
);

-- Preview data
SELECT * FROM yelp_bronze LIMIT 5;

#### 3.2.2. Store data with selected values from json into a new table

In [0]:
%sql
CREATE OR REPLACE TABLE yelp_json_bronze
AS
SELECT * EXCEPT(attributes, hours),
  attributes.AcceptsInsurance AS accepts_insurance,
  hours.Sunday AS sunday_hour,
  hours.Monday AS monday_hour,
  hours.Tuesday AS tuesday_hour,
  hours.Wednesday AS wednesday_hour,
  hours.Thursday AS thursday_hour,
  hours.Friday AS friday_hour,
  hours.Saturday AS saturday_hour
FROM yelp_bronze;

SELECT * FROM yelp_json_bronze LIMIT 5;