# Crear Target

## Ejercicio 1:
Cree el archivo `competencia_01.csv`, usando el `competencia_01_crudo.csv` con una variable adicional llamada `clase_ternaria`, que contenga las categorías **CONTINUA**, **BAJA+1** y **BAJA+2** 

### Ayudita 

Para practicar el muy útil y necesario lenguaje **SQL**, vamos a utilizar una base de datos **OLAP** llamada **DuckDB**. 

La documentación la puede encontrar [aquí](https://duckdb.org/docs/archive/0.8.1/sql/introduction)
Procedemos a instalarla, esto se debe ejecutar una sola vez

In [1]:
%%bash 
pip install duckdb
pip install jupysql
pip install duckdb-engine

Collecting duckdb
  Downloading duckdb-0.8.1-cp310-cp310-macosx_11_0_arm64.whl (12.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.6/12.6 MB[0m [31m10.4 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hInstalling collected packages: duckdb
Successfully installed duckdb-0.8.1
Collecting jupysql
  Downloading jupysql-0.9.1-py3-none-any.whl (80 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m80.7/80.7 kB[0m [31m2.3 MB/s[0m eta [36m0:00:00[0m
Collecting ploomber-core>=0.2.7
  Downloading ploomber_core-0.2.14-py3-none-any.whl (21 kB)
Collecting sqlglot>=11.3.7
  Downloading sqlglot-17.14.2-py3-none-any.whl (293 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m293.7/293.7 kB[0m [31m6.4 MB/s[0m eta [36m0:00:00[0m00:01[0m
Collecting prettytable
  Downloading prettytable-3.8.0-py3-none-any.whl (27 kB)
Collecting posthog
  Downloading posthog-3.0.2-py2.py3-none-any.whl (37 kB)
Collecting monotonic>=1.5
  Downloading

Configuramos el entorno de ejecución. Si ya tiene todo instalado, solo necesita ejecutar esta celda para empezar a usar **duckdb** 

In [9]:
import duckdb
import pandas as pd

%load_ext sql
%config SqlMagic.autopandas = True
%config SqlMagic.feedback = False
%config SqlMagic.displaycon = False

%sql duckdb:///:default:

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


Y ya podemos usar **SQL** dentro de una notebook!

In [10]:
%%sql
select 'hola mundo' 

Unnamed: 0,'hola mundo'
0,hola mundo


Para cargar el archivo `.csv` a una tabla:

In [11]:
%%sql
create or replace table competencia_01 as
select 
    *
from read_csv_auto('../../datasets/competencia_01_crudo.csv')

Unnamed: 0,Count
0,491063


Hagamos unas queries básicas para comprobar que todo esta funcionando bien.

In [12]:
%sql select * from competencia_01 limit 5

Unnamed: 0,numero_de_cliente,foto_mes,active_quarter,cliente_vip,internet,cliente_edad,cliente_antiguedad,mrentabilidad,mrentabilidad_annual,mcomisiones,...,Visa_madelantodolares,Visa_fultimo_cierre,Visa_mpagado,Visa_mpagospesos,Visa_mpagosdolares,Visa_fechaalta,Visa_mconsumototal,Visa_cconsumos,Visa_cadelantosefectivo,Visa_mpagominimo
0,29183981,202103,1,0,0,50,197,14468.81,125765.29,2389.82,...,0.0,7,0.0,-114954.0,0.0,5938,101050.66,68,0,71811.06
1,29184630,202103,1,0,0,59,322,11901.57,74158.93,18750.68,...,0.0,1,0.0,-40330.15,17.59,4089,26834.09,7,0,3894.36
2,29185433,202103,1,0,0,68,268,847.15,21672.47,481.62,...,0.0,21,4692.0,-1173.0,0.0,7829,1651.36,3,0,1560.09
3,29185587,202103,1,0,0,79,322,4976.94,47735.98,1839.31,...,0.0,1,0.0,-15988.67,0.0,7580,30025.29,11,0,1700.85
4,29185646,202103,1,0,0,60,257,2860.45,37800.71,4035.4,...,0.0,21,380616.14,-97383.25,0.0,7827,359610.7,31,0,15600.9


In [13]:
%%sql
select 
    foto_mes
    , count(*) as cantidad -- cuenta cuantos casos hay en cada foto_mes 
                           -- y lo guarda en un campo llamado cantidad
from competencia_01
group by foto_mes

Unnamed: 0,foto_mes,cantidad
0,202103,163324
1,202104,163637
2,202105,164102


Perfecto, ahora cree una nueva tabla con la variable adicional que se le pide. Algunas funciones que le pueden ser útiles:  [where](https://duckdb.org/docs/sql/query_syntax/where), [left join](https://duckdb.org/docs/sql/query_syntax/from), [case statement](https://duckdb.org/docs/sql/expressions/case)



In [15]:
%%sql
CREATE OR REPLACE TABLE competencia_01 AS
SELECT
    C.*,
    1 AS clase_ternaria
FROM competencia_01 AS C

Unnamed: 0,Count
0,491063


In [19]:
%%sql
SELECT
    *
FROM competencia_01
WHERE RANDOM() < 0.01;

Unnamed: 0,numero_de_cliente,foto_mes,active_quarter,cliente_vip,internet,cliente_edad,cliente_antiguedad,mrentabilidad,mrentabilidad_annual,mcomisiones,...,Visa_fultimo_cierre,Visa_mpagado,Visa_mpagospesos,Visa_mpagosdolares,Visa_fechaalta,Visa_mconsumototal,Visa_cconsumos,Visa_cadelantosefectivo,Visa_mpagominimo,clase_ternaria
0,29212948,202103,1,0,0,65,181,-1207.52,12966.81,4382.16,...,1.0,0.0,-76159.48,0.00,4689.0,30207.51,10.0,0.0,20222.52,1
1,29242359,202103,1,0,0,47,299,3088.64,100452.18,967.45,...,7.0,0.0,-50072.03,0.00,5231.0,8152.04,4.0,0.0,2005.83,1
2,29253366,202103,1,0,0,51,322,7294.91,48738.63,1260.67,...,1.0,0.0,-8785.77,0.00,2486.0,11660.12,7.0,0.0,13266.63,1
3,29283990,202103,1,0,0,44,89,-9093.72,-16830.21,2241.08,...,1.0,0.0,-60063.27,2.35,1630.0,29290.22,7.0,0.0,19952.73,1
4,29351531,202103,1,0,0,44,68,-274.59,7186.84,-239.11,...,1.0,0.0,-10454.79,0.00,1254.0,5209.30,3.0,0.0,609.96,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4954,182630612,202105,1,0,0,26,7,1029.74,4423.82,396.72,...,5.0,0.0,0.00,0.00,48.0,0.00,0.0,0.0,0.00,1
4955,183829446,202105,1,0,0,34,5,1161.69,8665.06,0.00,...,5.0,0.0,-8809.23,0.00,134.0,0.00,0.0,0.0,7237.41,1
4956,186138377,202105,1,0,0,31,2,-13843.39,-12213.78,921.98,...,12.0,0.0,0.00,0.00,49.0,7115.07,5.0,0.0,0.00,1
4957,186462981,202105,1,0,0,60,2,1023.55,1023.55,1157.93,...,5.0,0.0,0.00,0.00,50.0,5043.90,2.0,0.0,1137.81,1


## Ejercicio 1.1

* ¿Cuál es la nominalidad de cada clase?
* ¿Cuál es la proporción del target?

In [None]:
%%sql
-- introduzca sus queries

Para guardar a un **.csv** simplemente debe ejecutar la siguiente sentencia

In [None]:
%%sql
COPY competencia_01 TO '/home/aleb/dmeyf23/datasets/competencia_01.csv' (FORMAT CSV, HEADER)

## Ejercicio 2 - Avanzado
Use ahora el archivo `ejercicio_target.csv` y calcule para todos los clientes en todos los periodos su **clase_ternaria** al mismo tiempo