# Snowflake

1. Instalando bibliotecas e inicializando variáveis do envs
2. Criando os dois bancos de dados e fazendo as conexões
3. Rodando scripts DDL no Snowflakes para criar as tabelas do banco Staging e DW
4. Criando os streams de dados no banco Staging
5. Criando as Tasks no banco Staging
6. Testando a inserção de dados e consultados nas tabelas para testar o fluxo de dados
7. Encerrando e excluindo as tasks

<div class="alert alert-danger">
     
**Nota**
 
- Este notebook está utilizando o Python 3.10.9 para ser executado, versão superiores a essa ocorre um erro na conexão do snowflake (open ssl 3.0.9).
     
</div>


In [1]:
%pip install snowflake-connector-python
%pip install python-dotenv

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [19]:
import snowflake.connector
from dotenv import load_dotenv
import os

load_dotenv()

True

## Inicializando variáveis utiilizandas no código com os valores do arquivo env

In [20]:
user = os.getenv('SNOWFLAKE_USER')
password = os.getenv('SNOWFLAKE_PASSWORD')
account = os.getenv('SNOWFLAKE_ACCOUNT')
warehouse = os.getenv('SNOWFLAKE_WAREHOUSE')
schema = os.getenv('SNOWFLAKE_SCHEMA')

database2 = os.getenv('SNOWFLAKE_DATABASE2')
database3 = os.getenv('SNOWFLAKE_DATABASE3')

In [21]:
conn = snowflake.connector.connect(
    user=user,
    password=password,
    account=account
)

cur = conn.cursor()

## Criando os dois bancos de dados e fazendo as conexões

Banco de dados 1: Staging, onde serão armazenados os dados brutos, sem tratamento.

Banco de dados 2: DW, onde serão armazenados os dados tratados e prontos para serem consumidos.


In [22]:
cur.execute(f"CREATE DATABASE IF NOT EXISTS {database2}")

cur.execute(f"CREATE DATABASE IF NOT EXISTS {database3}")

conn = snowflake.connector.connect(
    user=user,
    password=password,
    account=account,
    warehouse=warehouse,
    database=database2,
    schema=schema
)

conn2 = snowflake.connector.connect(
    user=user,
    password=password,
    account=account,
    warehouse=warehouse,
    database=database3,
    schema=schema
)

cur = conn.cursor()
cur2 = conn2.cursor()

## Rodando scripts DDL no Snowflakes


### Criando tabelas no banco Staging


In [23]:
query = """
CREATE TABLE customers (
  customer_id INTEGER PRIMARY KEY,
  customer_name VARCHAR(50) NOT NULL,
  email VARCHAR(50) UNIQUE,
  address VARCHAR(100)
);

CREATE TABLE products (
  product_id INTEGER PRIMARY KEY,
  product_name VARCHAR(50) NOT NULL,
  description VARCHAR(500),
  price DECIMAL(10,2) NOT NULL
);

CREATE TABLE sales (
  sale_id INTEGER PRIMARY KEY,
  customer_id INTEGER NOT NULL,
  product_id INTEGER NOT NULL,
  sale_date DATE NOT NULL,
  quantity INTEGER NOT NULL,
  FOREIGN KEY (customer_id) REFERENCES customers(customer_id),
  FOREIGN KEY (product_id) REFERENCES products(product_id)
);
"""

cur.execute(query, num_statements=3)

<snowflake.connector.cursor.SnowflakeCursor at 0x7fd8e10cb9d0>

### Criando tabelas no banco DW


In [24]:
query = """
CREATE TABLE dim_customers (
  customer_sk INTEGER AUTOINCREMENT PRIMARY KEY,
  customer_id INTEGER NOT NULL UNIQUE,
  customer_name VARCHAR(50) NOT NULL,
  email VARCHAR(50) UNIQUE,
  address VARCHAR(100)
);

CREATE TABLE dim_products (
  product_sk INTEGER AUTOINCREMENT PRIMARY KEY,
  product_id INTEGER NOT NULL UNIQUE,
  product_name VARCHAR(50) NOT NULL,
  description VARCHAR(500),
  price DECIMAL(10,2) NOT NULL
);

CREATE TABLE dim_dates (
  date_sk INTEGER AUTOINCREMENT PRIMARY KEY,
  date DATE NOT NULL UNIQUE,
  day INTEGER NOT NULL,
  month INTEGER NOT NULL,
  year INTEGER NOT NULL,
  quarter INTEGER NOT NULL
);

CREATE TABLE fact_sales (
  sale_sk INTEGER AUTOINCREMENT PRIMARY KEY,	
  sale_id INTEGER,
  customer_sk INTEGER NOT NULL,
  product_sk INTEGER NOT NULL,
  date_sk INTEGER NOT NULL,
  quantity INTEGER NOT NULL,
  FOREIGN KEY (customer_sk) REFERENCES dim_customers(customer_sk),
  FOREIGN KEY (product_sk) REFERENCES dim_products(product_sk),
  FOREIGN KEY (date_sk) REFERENCES dim_dates(date_sk)
);
"""

cur2.execute(query, num_statements=4)

<snowflake.connector.cursor.SnowflakeCursor at 0x7fd8e0fbbf70>

### Criando tabela de dimensão de tempo no Banco DW


In [25]:
cur2.execute("""
INSERT  INTO DIM_DATES ( DATE,DAY,MONTH, YEAR, QUARTER)
WITH date_range AS (
  SELECT
    DATEADD(day, ROW_NUMBER() OVER (ORDER BY seq4()) - 1, '2020-01-01') AS date
  FROM
    TABLE(GENERATOR(rowcount => 2191)) -- 2191 days between '2020-01-01' and '2025-12-31'
),
date_components AS (
  SELECT
    date,
    EXTRACT(day FROM date) AS day,
    EXTRACT(month FROM date) AS month,
    EXTRACT(year FROM date) AS year,
    EXTRACT(quarter FROM date) AS quarter
  FROM
    date_range
)
SELECT  date,day,month, year, quarter FROM  date_components
""")

<snowflake.connector.cursor.SnowflakeCursor at 0x7fd8e0fbbf70>

## Criando os streams de dados no banco Staging

In [26]:
cur.execute("""create stream customers_stream on table customers""")
cur.execute("""create stream products_stream on table products""")
cur.execute("""create stream sales_stream on table sales""")

<snowflake.connector.cursor.SnowflakeCursor at 0x7fd8e10cb9d0>

## Criando as Tasks no banco Staging, para carregar os dados dos streams para as tabelas do banco DW.


### Criando task da tabela de dim_customers

In [27]:
cur.execute(f"""
create or replace task dim_customers_task
warehouse = WAREHOUSEDEV
SCHEDULE = '1 MINUTE'
AS
insert into {database3}.{schema}.dim_customers (customer_id, customer_name, email, address)
select
    cs.customer_id,
    cs.customer_name,
    cs.email,
    cs.address
from
    customers_stream cs
where cs.METADATA$ACTION = 'INSERT'
""")

<snowflake.connector.cursor.SnowflakeCursor at 0x7fd8e10cb9d0>

### Criando task da tabela de dim_products

In [28]:
cur.execute(f"""
create or replace task dim_products_task
warehouse = WAREHOUSEDEV
after dim_customers_task
AS
insert into {database3}.{schema}.dim_products (product_id, product_name, description, price)
select
    ps.product_id,
    ps.product_name,
    ps.description,
    ps.price
from
    products_stream ps
where 
    ps.METADATA$ACTION = 'INSERT'
""")

<snowflake.connector.cursor.SnowflakeCursor at 0x7fd8e10cb9d0>

### Criando task da tabela de sales_stream

In [29]:
cur.execute(f"""
create or replace task fact_sales_task
warehouse = WAREHOUSEDEV
after dim_products_task
AS
insert into {database3}.{schema}.FACT_SALES (sale_id, customer_sk, product_sk, date_sk, quantity)
select
    ss.sale_id,
    (select max(customer_sk) from {database3}.{schema}.DIM_CUSTOMERS where customer_id = ss.customer_id) as customer_sk,
    (select max(product_sk) from {database3}.{schema}.DIM_PRODUCTS where product_id = ss.product_id) as product_sk,
    (select max(date_sk) from {database3}.{schema}.DIM_DATES where date = ss.sale_date) as date_sk,
    ss.quantity
from
    sales_stream ss
where
    ss.METADATA$ACTION = 'INSERT'
""")

<snowflake.connector.cursor.SnowflakeCursor at 0x7fd8e10cb9d0>

### Iniciando as tasks, mudando o status de 'suspended' para 'resume'

In [30]:
cur.execute("""
alter task fact_sales_task resume;
alter task dim_products_task resume;
alter task dim_customers_task resume;
""", num_statements=3)

<snowflake.connector.cursor.SnowflakeCursor at 0x7fd8e10cb9d0>

## Testando a inserção de dados nas tabelas do banco Staging, para o stream de dados pegar os dados e inserir nas tabelas do banco DW.


### Inserindo dados na tabela de products e verificando os dados na tabela products_stream

In [19]:
cur.execute("""
insert into products (product_id, product_name, description, price)
values
(1, 'Widget A', 'A great widget', 19.99),
(2, 'Widget B', 'A better widget', 29.99),
(3, 'Widget C', 'The best widget', 39.99)
""")

cur.execute("""select * from products_stream""")

tables = cur.fetchall()

for table in tables:
    print(table)

(1, 'Widget A', 'A great widget', Decimal('19.99'), 'INSERT', False, '63354edf1278566259b28e95588841a3d3685d76')
(2, 'Widget B', 'A better widget', Decimal('29.99'), 'INSERT', False, '39f10ad34d27abcf861587dbca870abe67f64c8a')
(3, 'Widget C', 'The best widget', Decimal('39.99'), 'INSERT', False, '637efc6d158cf6a6faf64c8ed0fec6f59f09ab0e')


### Inserindo dados na tabela de customers e verificando os dados na tabela customers_stream

In [20]:
cur.execute("""
insert into customers (customer_id, customer_name, email, address)
values
(1, 'Jhon Smith', 'jhon@smith', '123 Main St'),
(2, 'Jane Doe', 'jane@doe', '456 Oak St')
""")

cur.execute("""select * from customers_stream""")

tables = cur.fetchall()

for table in tables:
    print(table)

(1, 'Jhon Smith', 'jhon@smith', '123 Main St', 'INSERT', False, '80a550183056fb71679063bab90b91503daa9425')
(2, 'Jane Doe', 'jane@doe', '456 Oak St', 'INSERT', False, '42c43b32c00a213b987431852a576af50d6a62c2')


### Inserindo dados na tabela de sales e verificando os dados na tabela sales_stream

In [21]:
cur.execute("""
insert into sales (sale_id, customer_id, product_id, sale_date, quantity)
values
(1, 1, 1, '2023-10-18', 1)
""")

cur.execute("""select * from sales_stream""")

tables = cur.fetchall()

for table in tables:
    print(table)

(1, 1, 1, datetime.date(2023, 10, 18), 1, 'INSERT', False, '97b3fc131e219ab52247c5fd3174943217296cb8')


## Consultando os dados nas tabelas do banco DW

### Consultando dados na tabela de dim_customers

In [17]:
cur2.execute("""select * from dim_customers""")

tables = cur2.fetchall()

for table in tables:
    print(table)

(1, 1, 'Jhon Smith', 'jhon@smith', '123 Main St')
(2, 2, 'Jane Doe', 'jane@doe', '456 Oak St')


### Consultando dados na tabela de dim_products

In [16]:
cur2.execute("""select * from dim_products""")

tables = cur2.fetchall()

for table in tables:
    print(table)

(1, 1, 'Widget A', 'A great widget', Decimal('19.99'))
(2, 2, 'Widget B', 'A better widget', Decimal('29.99'))
(3, 3, 'Widget C', 'The best widget', Decimal('39.99'))


### Consultando dados na tabela de fact_sales

In [12]:
cur2.execute("""select * from fact_sales""")

tables = cur2.fetchall()

for table in tables:
    print(table)

(1, 1, 1, 1, 1387, 1)


## Encerrando das task e excluindo do banco de dados Staging

In [14]:

cur.execute("""
alter task dim_customers_task suspend;
alter task dim_products_task suspend;
alter task fact_sales_task suspend;
             
drop task dim_customers_task; 
drop task dim_products_task; 
drop task fact_sales_task; 
""", num_statements=6)

<snowflake.connector.cursor.SnowflakeCursor at 0x7fd8e0fbbc70>