## CoderHouse - Data Engineering - Kevin Schiebelbein - Entregable 2

> Antes de ejecutar el codigo se necesita instalar las dependencias con ` pip3 install -r requeriments.txt ` en la carpeta raiz del Notebook

Se importan las librearias necesarias

In [None]:
from utils import getFakeData
from dotenv import dotenv_values
import redshift_connector
import json
import pandas as pd
#from sqlalchemy import create_engine

pd.set_option('display.max_columns', None)

Se define la configuracion para la consulta al Data Warehouse

In [None]:
config = dotenv_values(".env")
driver = config["DRIVER"]
host = config["HOST"]
db = config["DB"]
user = config["USER"]
password = config["PASSDW"]

Se extraen los datos de la API publica y se organizan los datos para la insercion

In [None]:
try:
  api = "https://fakestoreapi.com/products"
  result = getFakeData(api)
  products = json.loads(result.text)
  print(products)
  #values = [tuple((p["id"], p["title"], p["price"], p["category"], p["image"], p["rating"]["rate"])) for p in products]
  df = pd.DataFrame(products)
except Exception as e:
  print(e)


ETL - Asigno columna rate y count por separado de rating

In [None]:
df = pd.concat([df.drop(['rating'], axis=1), df['rating'].apply(pd.Series)], axis=1)

ETL - Verifico si el producto esta en oferta y lo agrego al dataframe

In [None]:
df['status'] = ['On Sale' if x < 50 else '' for x in df['price']]
df.head()

ETL - Cantidad de productos por categoria

In [None]:
categories = df.groupby(["category"]).count()
categories

ETL - Top 10 productos

In [None]:
top_10_products = df.sort_values("rate", ascending=False).head(10)
top_10_products

Se insertan los datos en el destino

In [None]:
# Connects to Redshift cluster using AWS credentials
with redshift_connector.connect(host=host,database=db,user=user,password=password) as conn:
  with conn.cursor() as cursor:
    conn.autocommit = True
    cursor: redshift_connector.Cursor = conn.cursor()
    tabla = f"""
      CREATE TABLE IF NOT EXISTS public.products (
      id INTEGER,
      title VARCHAR(128),
      price FLOAT8,
      category VARCHAR(256),
      image TEXT,
      rate FLOAT8
      ) DISTKEY(id) SORTKEY(rate);
    """
    cursor.execute(tabla)
    try:
      cursor.executemany("insert into products (id, title, price, category, image, rate) values (%s, %s, %s, %s, %s, %s)", values)
    except Exception as e:
      print(f"Error al guardar los datos: {e}")
    cursor.close()
  conn.close()