![Transactions](img/lake.PNG)

In [9]:
%%sql
create table estados
select _c0 `Nome`
     , _c1 `Acronimo`
  from csv.`workspace://churn/data/estados.csv.gz`

# Original Data

Mobile operators have historical records on which customers ultimately ended up churning and which continued using the service. We can use this historical information to construct an ML model of one mobile operator’s churn using a process called training. After training the model, we can pass the profile information of an arbitrary customer (the same profile information that we used to train the model) to the model, and have the model predict whether this customer is going to churn. Of course, we expect the model to make mistakes–after all, predicting the future is tricky business! But I’ll also show how to deal with prediction errors.

```sh
wget http://dataminingconsultant.com/DKD2e_data_sets.zip
unzip -o DKD2e_data_sets.zip
```

In [1]:
import pandas as pd
pd.options.display.max_rows = 999
pd.options.display.max_columns = 999
spark.read.format('csv').options(header='true', inferSchema='true').load('data/churn.txt').show()

+-----+--------------+---------+--------+----------+----------+-------------+--------+---------+----------+--------+---------+----------+----------+-----------+------------+---------+----------+-----------+--------------+------+
|State|Account Length|Area Code|   Phone|Int'l Plan|VMail Plan|VMail Message|Day Mins|Day Calls|Day Charge|Eve Mins|Eve Calls|Eve Charge|Night Mins|Night Calls|Night Charge|Intl Mins|Intl Calls|Intl Charge|CustServ Calls|Churn?|
+-----+--------------+---------+--------+----------+----------+-------------+--------+---------+----------+--------+---------+----------+----------+-----------+------------+---------+----------+-----------+--------------+------+
|   KS|           128|      415|382-4657|        no|       yes|           25|   265.1|      110|     45.07|   197.4|       99|     16.78|     244.7|         91|       11.01|     10.0|         3|        2.7|             1|False.|
|   OH|           107|      415|371-7191|        no|       yes|           26|   161.

# Fontes de dados

+ [ORACLE1](https://console.aws.amazon.com/rds): Oracle 12c
+ [FIN_BR](https://console.aws.amazon.com/rds): MySQL 8

![Transactions](img/transactions_erd.PNG)

In [10]:
%%sql -v
SELECT * 
  FROM ORACLE1.OT.TRANSACTION_TYPES

INFO:Execution Time: 105.829305


Unnamed: 0,id,description
0,2.0,Educação
1,3.0,Transporte
2,5.0,Lazer
3,6.0,Supermercado
4,1.0,Serviços
5,4.0,Restaurante
6,7.0,Outros


[![consulta](img/import.PNG)](https://console.aws.amazon.com/states/home?region=us-east-1#/executions/details/arn:aws:states:us-east-1:229343956935:execution:DoraImportMachine:20200415100645351937.ORACLE1.OT.TRANSACTION_TYPES)

In [11]:
%%sql -v
SELECT t.USER_ID 
     , t.TRANSACTION_DATE 
     , t.TRANSACTION_TYPE 
     , t.VALUE 
     , tt.DESCRIPTION 
  FROM ORACLE1.OT.TRANSACTIONS t
  JOIN ORACLE1.OT.TRANSACTION_TYPES tt ON t.CATEGORY_ID = tt.ID 
 ORDER BY t.USER_ID
 LIMIT 10

INFO:ORACLE1.OT.TRANSACTION_TYPES is updated: 2020-04-15 (7 days)
INFO:Execution Time: 41.719065


Unnamed: 0,USER_ID,TRANSACTION_DATE,TRANSACTION_TYPE,VALUE,DESCRIPTION
0,1.0,2020-03-04 19:20:22,DEBITO,253.08,Outros
1,1.0,2020-03-29 21:51:10,CREDITO,413.16,Educação
2,1.0,2020-03-10 05:51:52,DEBITO,570.53,Outros
3,1.0,2020-03-30 02:05:05,DEBITO,721.35,Serviços
4,1.0,2020-03-25 06:48:34,CREDITO,287.68,Outros
5,22.0,2020-03-20 10:52:58,CREDITO,62.76,Outros
6,23.0,2020-03-01 13:34:02,CREDITO,280.26,Serviços
7,23.0,2020-03-23 01:28:24,CREDITO,773.86,Transporte
8,23.0,2020-03-14 12:38:09,CREDITO,258.43,Educação
9,23.0,2020-03-12 16:22:23,CREDITO,38.03,Transporte


In [12]:
%%sql
CREATE OR REPLACE VIEW CATEGORIES AS
SELECT t.USER_ID `USER_ID`
     , ts.total `T_SERVICOS`
     , td.total `T_EDUCACAO`
     , tr.total `T_RESTAURANTE`
     , tt.total `T_TRANSPORTE`
     , tl.total `T_LAZER`
     , tm.total `T_SUPERMERCADO`
     , to.total `T_OUTROS`
  FROM (
SELECT distinct(USER_ID) `USER_ID`
  FROM ORACLE1.OT.TRANSACTIONS) t
  LEFT OUTER JOIN (
SELECT USER_ID
     , SUM(VALUE) `TOTAL`
  FROM ORACLE1.OT.TRANSACTIONS
 WHERE CATEGORY_ID = 1
 GROUP BY USER_ID) ts ON ts.USER_ID = t.USER_ID
  LEFT OUTER JOIN (
SELECT USER_ID
     , SUM(VALUE) `TOTAL`
  FROM ORACLE1.OT.TRANSACTIONS
 WHERE CATEGORY_ID = 2
 GROUP BY USER_ID) td ON td.USER_ID = t.USER_ID
  LEFT OUTER JOIN (
SELECT USER_ID
     , SUM(VALUE) `TOTAL`
  FROM ORACLE1.OT.TRANSACTIONS
 WHERE CATEGORY_ID = 3
 GROUP BY USER_ID) tr ON tr.USER_ID = t.USER_ID
  LEFT OUTER JOIN (
SELECT USER_ID
     , SUM(VALUE) `TOTAL`
  FROM ORACLE1.OT.TRANSACTIONS
 WHERE CATEGORY_ID = 4
 GROUP BY USER_ID) tt ON tt.USER_ID = t.USER_ID
  LEFT OUTER JOIN (
SELECT USER_ID
     , SUM(VALUE) `TOTAL`
  FROM ORACLE1.OT.TRANSACTIONS
 WHERE CATEGORY_ID = 5
 GROUP BY USER_ID) tl ON tl.USER_ID = t.USER_ID
  LEFT OUTER JOIN (
SELECT USER_ID
     , SUM(VALUE) `TOTAL`
  FROM ORACLE1.OT.TRANSACTIONS
 WHERE CATEGORY_ID = 6
 GROUP BY USER_ID) tm ON tm.USER_ID = t.USER_ID
  LEFT OUTER JOIN (
SELECT USER_ID
     , SUM(VALUE) `TOTAL`
  FROM ORACLE1.OT.TRANSACTIONS
 WHERE CATEGORY_ID = 7
 GROUP BY USER_ID) to ON to.USER_ID = t.USER_ID

In [None]:
%%sql
SELECT * 
  FROM CATEGORIES 
 LIMIT 10

![Transactions](img/users_erd.PNG)

In [None]:
%%sql
select * from fin_br.fin.ADDRESS limit 2

In [None]:
%%sql
select * from csv.`workspace://churn/data/estados.csv.gz`

In [None]:
%%sql
SELECT *
 FROM fin_br.fin.ADDRESS a
  JOIN csv.`workspace://churn/data/estados.csv.gz` e ON upper(e._c0) = upper(a.state)
LIMIT 10

In [None]:
%%sql
CREATE TABLE CHURN AS
SELECT a.postcode `CEP`
     , e._c1 `ESTADO`
     , u.gender `GENERO`
     , substring(u.cell,2,2) `DDD`
     , substring(replace(u.cell,'-',''),6) `CELULAR`
     , datediff(now(), u.dob)/365 `IDADE`
     , datediff(now(), l.registered)/365 `IDADE_CONTA`
     , cast(nvl(c.total, 0) as float) `CREDITO`
     , cast(nvl(d.total, 0) as float) `DEBITO`
     , cast(nvl(cat.T_SERVICOS, 0) AS float) `T_SERVICOS`
     , cast(nvl(cat.T_EDUCACAO, 0) AS float) `T_EDUCACAO`
     , cast(nvl(cat.T_RESTAURANTE, 0) AS float) `T_RESTAURANTE`
     , cast(nvl(cat.T_TRANSPORTE, 0) AS float) `T_TRANSPORTE`
     , cast(nvl(cat.T_LAZER, 0) AS float) `T_LAZER`
     , cast(nvl(cat.T_SUPERMERCADO, 0) AS float) `T_SUPERMERCADO`
     , cast(nvl(cat.T_OUTROS, 0) AS float) `T_OUTROS`
     , if(l.inactivate_date is null, 'False.', 'True.') `CHURN`
  FROM fin_br.fin.LOGIN l 
  JOIN fin_br.fin.USERS u ON l.UUID = u.UUID 
  JOIN fin_br.fin.ADDRESS a ON a.UUID = u.UUID 
  JOIN csv.`workspace://churn/data/estados.csv.gz` e ON upper(e._c0) = upper(a.state)
  LEFT OUTER JOIN (
SELECT t.USER_ID
     , SUM(t.VALUE) `TOTAL`
  FROM ORACLE1.OT.TRANSACTIONS t
 WHERE t.TRANSACTION_TYPE = 'CREDITO'
 GROUP BY t.USER_ID) c ON c.USER_ID = l.UUID
  LEFT OUTER JOIN (
SELECT t.USER_ID
     , SUM(t.VALUE) `TOTAL`
  FROM ORACLE1.OT.TRANSACTIONS t
 WHERE t.TRANSACTION_TYPE = 'DEBITO'
 GROUP BY t.USER_ID) d ON d.USER_ID = l.UUID
  LEFT OUTER JOIN CATEGORIES cat ON cat.USER_ID = l.UUID

In [13]:
%%sql
show tables

Unnamed: 0,database,tableName,isTemporary
0,dora_didone,categories,False
1,dora_didone,churn,False
2,dora_didone,estados,False
3,dora_didone,t_servicos,False


In [None]:
%%sql
select * from CHURN limit 10