## CoderHouse - Data Engineering - Kevin Schiebelbein - Entregable 2

> Antes de ejecutar el codigo se necesita instalar las dependencias con ` pip3 install -r requeriments.txt ` en la carpeta raiz del Notebook

Se importan las librearias necesarias

In [14]:
from utils import getFakeData
from dotenv import dotenv_values
import redshift_connector
import json
import pandas as pd
from datetime import date, timedelta
#from sqlalchemy import create_engine

pd.set_option('display.max_columns', None)

Se define la configuracion para la consulta al Data Warehouse

In [15]:
config = dotenv_values(".env")
driver = config["DRIVER"]
host = config["HOST"]
db = config["DB"]
user = config["USER"]
password = config["PASSDW"]

Se extraen los datos de la API publica y se organizan los datos para la insercion

In [16]:
try:
  api = "https://fakestoreapi.com/products"
  result = getFakeData(api)
  products = json.loads(result.text)
  print(products)
  #values = [tuple((p["id"], p["title"], p["price"], p["category"], p["image"], p["rating"]["rate"])) for p in products]
  df = pd.DataFrame(products)
except Exception as e:
  print(e)


[{'id': 1, 'title': 'Fjallraven - Foldsack No. 1 Backpack, Fits 15 Laptops', 'price': 109.95, 'description': 'Your perfect pack for everyday use and walks in the forest. Stash your laptop (up to 15 inches) in the padded sleeve, your everyday', 'category': "men's clothing", 'image': 'https://fakestoreapi.com/img/81fPKd-2AYL._AC_SL1500_.jpg', 'rating': {'rate': 3.9, 'count': 120}}, {'id': 2, 'title': 'Mens Casual Premium Slim Fit T-Shirts ', 'price': 22.3, 'description': 'Slim-fitting style, contrast raglan long sleeve, three-button henley placket, light weight & soft fabric for breathable and comfortable wearing. And Solid stitched shirts with round neck made for durability and a great fit for casual fashion wear and diehard baseball fans. The Henley style round neckline includes a three-button placket.', 'category': "men's clothing", 'image': 'https://fakestoreapi.com/img/71-3HjGNDUL._AC_SY879._SX._UX._SY._UY_.jpg', 'rating': {'rate': 4.1, 'count': 259}}, {'id': 3, 'title': 'Mens Cotto

ETL - Asigno columna rate y count por separado de rating

In [17]:
df = pd.concat([df.drop(['rating'], axis=1), df['rating'].apply(pd.Series)], axis=1)

ETL - Verifico si el producto esta en oferta y lo agrego al dataframe

In [18]:
df['status'] = ['On Sale' if x < 50 else '' for x in df['price']]
df.head()

Unnamed: 0,id,title,price,description,category,image,rate,count,status
0,1,"Fjallraven - Foldsack No. 1 Backpack, Fits 15 ...",109.95,Your perfect pack for everyday use and walks i...,men's clothing,https://fakestoreapi.com/img/81fPKd-2AYL._AC_S...,3.9,120.0,
1,2,Mens Casual Premium Slim Fit T-Shirts,22.3,"Slim-fitting style, contrast raglan long sleev...",men's clothing,https://fakestoreapi.com/img/71-3HjGNDUL._AC_S...,4.1,259.0,On Sale
2,3,Mens Cotton Jacket,55.99,great outerwear jackets for Spring/Autumn/Wint...,men's clothing,https://fakestoreapi.com/img/71li-ujtlUL._AC_U...,4.7,500.0,
3,4,Mens Casual Slim Fit,15.99,The color could be slightly different between ...,men's clothing,https://fakestoreapi.com/img/71YXzeOuslL._AC_U...,2.1,430.0,On Sale
4,5,John Hardy Women's Legends Naga Gold & Silver ...,695.0,"From our Legends Collection, the Naga was insp...",jewelery,https://fakestoreapi.com/img/71pWzhdJNwL._AC_U...,4.6,400.0,


ETL - Agrego fecha de producto

In [19]:
df["fecha"] = date.today()
df

Unnamed: 0,id,title,price,description,category,image,rate,count,status,fecha
0,1,"Fjallraven - Foldsack No. 1 Backpack, Fits 15 ...",109.95,Your perfect pack for everyday use and walks i...,men's clothing,https://fakestoreapi.com/img/81fPKd-2AYL._AC_S...,3.9,120.0,,2023-06-23
1,2,Mens Casual Premium Slim Fit T-Shirts,22.3,"Slim-fitting style, contrast raglan long sleev...",men's clothing,https://fakestoreapi.com/img/71-3HjGNDUL._AC_S...,4.1,259.0,On Sale,2023-06-23
2,3,Mens Cotton Jacket,55.99,great outerwear jackets for Spring/Autumn/Wint...,men's clothing,https://fakestoreapi.com/img/71li-ujtlUL._AC_U...,4.7,500.0,,2023-06-23
3,4,Mens Casual Slim Fit,15.99,The color could be slightly different between ...,men's clothing,https://fakestoreapi.com/img/71YXzeOuslL._AC_U...,2.1,430.0,On Sale,2023-06-23
4,5,John Hardy Women's Legends Naga Gold & Silver ...,695.0,"From our Legends Collection, the Naga was insp...",jewelery,https://fakestoreapi.com/img/71pWzhdJNwL._AC_U...,4.6,400.0,,2023-06-23
5,6,Solid Gold Petite Micropave,168.0,Satisfaction Guaranteed. Return or exchange an...,jewelery,https://fakestoreapi.com/img/61sbMiUnoGL._AC_U...,3.9,70.0,,2023-06-23
6,7,White Gold Plated Princess,9.99,Classic Created Wedding Engagement Solitaire D...,jewelery,https://fakestoreapi.com/img/71YAIFU48IL._AC_U...,3.0,400.0,On Sale,2023-06-23
7,8,Pierced Owl Rose Gold Plated Stainless Steel D...,10.99,Rose Gold Plated Double Flared Tunnel Plug Ear...,jewelery,https://fakestoreapi.com/img/51UDEzMJVpL._AC_U...,1.9,100.0,On Sale,2023-06-23
8,9,WD 2TB Elements Portable External Hard Drive -...,64.0,USB 3.0 and USB 2.0 Compatibility Fast data tr...,electronics,https://fakestoreapi.com/img/61IBBVJvSDL._AC_S...,3.3,203.0,,2023-06-23
9,10,SanDisk SSD PLUS 1TB Internal SSD - SATA III 6...,109.0,"Easy upgrade for faster boot up, shutdown, app...",electronics,https://fakestoreapi.com/img/61U7T1koQqL._AC_S...,2.9,470.0,,2023-06-23


Test funcionamiento insercion de datos - Agrego registros con el mismo ID pero con diferente fecha para la creacion de los registros en el DW, persistiendo lo historico

In [20]:
manana = date.today() + timedelta(days=1) 
df.loc[len(df.index)] = [20, "Producto nuevo 1 - Lente camara", 500, "Alguna description 1", "IT", "https://johnstillk8.scusd.edu/sites/main/files/main-images/camera_lense_0.jpeg", 3.5, 9, "On Sale", manana]
df.loc[len(df.index)] = [21, "Producto nuevo 1 - Notebook", 3000, "Alguna description 2", "IT", "https://www.oberlo.com/media/1603969791-image-1.jpg?fit=max&fm=webp&w=1824", 6, 25, "On Sale", manana]
df

Unnamed: 0,id,title,price,description,category,image,rate,count,status,fecha
0,1,"Fjallraven - Foldsack No. 1 Backpack, Fits 15 ...",109.95,Your perfect pack for everyday use and walks i...,men's clothing,https://fakestoreapi.com/img/81fPKd-2AYL._AC_S...,3.9,120.0,,2023-06-23
1,2,Mens Casual Premium Slim Fit T-Shirts,22.3,"Slim-fitting style, contrast raglan long sleev...",men's clothing,https://fakestoreapi.com/img/71-3HjGNDUL._AC_S...,4.1,259.0,On Sale,2023-06-23
2,3,Mens Cotton Jacket,55.99,great outerwear jackets for Spring/Autumn/Wint...,men's clothing,https://fakestoreapi.com/img/71li-ujtlUL._AC_U...,4.7,500.0,,2023-06-23
3,4,Mens Casual Slim Fit,15.99,The color could be slightly different between ...,men's clothing,https://fakestoreapi.com/img/71YXzeOuslL._AC_U...,2.1,430.0,On Sale,2023-06-23
4,5,John Hardy Women's Legends Naga Gold & Silver ...,695.0,"From our Legends Collection, the Naga was insp...",jewelery,https://fakestoreapi.com/img/71pWzhdJNwL._AC_U...,4.6,400.0,,2023-06-23
5,6,Solid Gold Petite Micropave,168.0,Satisfaction Guaranteed. Return or exchange an...,jewelery,https://fakestoreapi.com/img/61sbMiUnoGL._AC_U...,3.9,70.0,,2023-06-23
6,7,White Gold Plated Princess,9.99,Classic Created Wedding Engagement Solitaire D...,jewelery,https://fakestoreapi.com/img/71YAIFU48IL._AC_U...,3.0,400.0,On Sale,2023-06-23
7,8,Pierced Owl Rose Gold Plated Stainless Steel D...,10.99,Rose Gold Plated Double Flared Tunnel Plug Ear...,jewelery,https://fakestoreapi.com/img/51UDEzMJVpL._AC_U...,1.9,100.0,On Sale,2023-06-23
8,9,WD 2TB Elements Portable External Hard Drive -...,64.0,USB 3.0 and USB 2.0 Compatibility Fast data tr...,electronics,https://fakestoreapi.com/img/61IBBVJvSDL._AC_S...,3.3,203.0,,2023-06-23
9,10,SanDisk SSD PLUS 1TB Internal SSD - SATA III 6...,109.0,"Easy upgrade for faster boot up, shutdown, app...",electronics,https://fakestoreapi.com/img/61U7T1koQqL._AC_S...,2.9,470.0,,2023-06-23


ETL - Cantidad de productos por categoria

In [21]:
categories = df.groupby(["category"]).count()
categories

Unnamed: 0_level_0,id,title,price,description,image,rate,count,status,fecha
category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
IT,2,2,2,2,2,2,2,2,2
electronics,6,6,6,6,6,6,6,6,6
jewelery,4,4,4,4,4,4,4,4,4
men's clothing,4,4,4,4,4,4,4,4,4
women's clothing,6,6,6,6,6,6,6,6,6


ETL - Top 10 productos

In [22]:
top_10_products = df.sort_values("rate", ascending=False).head(10)
top_10_products

Unnamed: 0,id,title,price,description,category,image,rate,count,status,fecha
21,21,Producto nuevo 1 - Notebook,3000.0,Alguna description 2,IT,https://www.oberlo.com/media/1603969791-image-...,6.0,25.0,On Sale,2023-06-24
10,11,Silicon Power 256GB SSD 3D NAND A55 SLC Cache ...,109.0,3D NAND flash are applied to deliver high tran...,electronics,https://fakestoreapi.com/img/71kWymZ+c+L._AC_S...,4.8,319.0,,2023-06-23
11,12,WD 4TB Gaming Drive Works with Playstation 4 P...,114.0,"Expand your PS4 gaming experience, Play anywhe...",electronics,https://fakestoreapi.com/img/61mtL65D4cL._AC_S...,4.8,400.0,,2023-06-23
2,3,Mens Cotton Jacket,55.99,great outerwear jackets for Spring/Autumn/Wint...,men's clothing,https://fakestoreapi.com/img/71li-ujtlUL._AC_U...,4.7,500.0,,2023-06-23
17,18,MBJ Women's Solid Short Sleeve Boat Neck V,9.85,"95% RAYON 5% SPANDEX, Made in USA or Imported,...",women's clothing,https://fakestoreapi.com/img/71z3kpMAYsL._AC_U...,4.7,130.0,On Sale,2023-06-23
4,5,John Hardy Women's Legends Naga Gold & Silver ...,695.0,"From our Legends Collection, the Naga was insp...",jewelery,https://fakestoreapi.com/img/71pWzhdJNwL._AC_U...,4.6,400.0,,2023-06-23
18,19,Opna Women's Short Sleeve Moisture,7.95,"100% Polyester, Machine wash, 100% cationic po...",women's clothing,https://fakestoreapi.com/img/51eg55uWmdL._AC_U...,4.5,146.0,On Sale,2023-06-23
1,2,Mens Casual Premium Slim Fit T-Shirts,22.3,"Slim-fitting style, contrast raglan long sleev...",men's clothing,https://fakestoreapi.com/img/71-3HjGNDUL._AC_S...,4.1,259.0,On Sale,2023-06-23
0,1,"Fjallraven - Foldsack No. 1 Backpack, Fits 15 ...",109.95,Your perfect pack for everyday use and walks i...,men's clothing,https://fakestoreapi.com/img/81fPKd-2AYL._AC_S...,3.9,120.0,,2023-06-23
5,6,Solid Gold Petite Micropave,168.0,Satisfaction Guaranteed. Return or exchange an...,jewelery,https://fakestoreapi.com/img/61sbMiUnoGL._AC_U...,3.9,70.0,,2023-06-23


Se insertan los datos en el destino

In [None]:
# Connects to Redshift cluster using AWS credentials
with redshift_connector.connect(host=host,database=db,user=user,password=password) as conn:
  with conn.cursor() as cursor:
    conn.autocommit = True
    cursor: redshift_connector.Cursor = conn.cursor()
    tabla = f"""
      CREATE TABLE IF NOT EXISTS public.products (
      id INTEGER,
      title VARCHAR(128),
      price FLOAT8,
      category VARCHAR(256),
      image TEXT,
      rate FLOAT8
      ) DISTKEY(id) SORTKEY(rate);
    """
    cursor.execute(tabla)
    try:
      cursor.executemany("insert into products (id, title, price, category, image, rate) values (%s, %s, %s, %s, %s, %s)", values)
    except Exception as e:
      print(f"Error al guardar los datos: {e}")
    cursor.close()
  conn.close()