## Estructura de los Datos

### Cash_Request (CR)

* **CR.id**:
  Unique ID of Cash Request.

* **CR.user_id**:
  Unique ID of the user who requested the cash advance.

* **CR.deleted_account_id**:
  If a user delete his account, we are replacing the user_id by this id. 
  
  It corresponds to a unique ID in the deleted account table with some keys information saved for fraud-fighting purposes (while respecting GDPR regulation).

* **CR.amount**:
  Amount of the Cash Request.

* **CR.created_at**:
  Timestamp of the CR creation.

* **CR.updated_at**:
  Timestamp of the latest CR's details update **(= update of at least one column in this table)**.

* **CR.reimbursement_date**:
  Planned reimbursement date. The user card will be charged at this date.

* **CR.cash_request_received_date**:
  Date of the receipt of the CR. Based on user's bank history.

* **CR.money_back_date**:
  Date where the CR was considered as money back. 
  
  It's either the paid_by_card date or the date were we considered that's the direc debit have low odds to be rejected (based on business rules) 

* **CR.send_at**:
  Timestamp of the funds's transfer.

* **CR.transfer_type**:
  Possible values are:

  | Campo        | Registros | Descripcion                                                  |
  | ------------ | --------: | ------------------------------------------------------------ |
  | **instant**: |     13882 | El usuario eligió recibir el adelanto instantáneamente. (user choose not received the advance instantly) |
  | **regular**: |     10086 | El usuario eligió no pagar inmediatamente y esperar la transferencia. (user choose to not pay and wait for the transfer) |

---

* **CR.reco_creation**:
  Timestamp of the recovery creation.
* **CR.reco_last_update**:
  Timestamp of the last recovery case update. Can be used to determine the incident closure date.
* **CR.moderated_at**:
  Timestamp of the manual review. Only filled if the CR needed a manual review.

* **CR.recovery_status**:
  Possible values are:

  | Campo                     | Registros | Descripcion                                                  |
  | ------------------------- | --------: | ------------------------------------------------------------ |
  | **NULL (Nice)**:          |     20639 | El CR nunca tuvo un incidente de pago. (Null if the cash request never had a payment incident.) |
  | **completed**:            |      2467 | El incidente de pago fue resuelto (el CR fue reembolsado). (the payment incident was resolved **(=the cash request was reimbursed)**) |
  | **pending**: (+-=direct_debit_rejected): |       845 | El incidente de pago aún está abierto. (the payment incident still open) |
  | **pending_direct_debit**: |        16 | El incidente de pago sigue abierto, pero se ha lanzado un débito directo SEPA. (the payment incident still open but a SEPA direct debit is launched) |
  | **cancelled**:            |         1 | ?? No figura a la documentacio                               |


* **CR.status**:     (23970 registros)
  Status of the CR. Possible values are: 

  | Campo                                                     | Registros | Descripcion                                                  |
  | --------------------------------------------------------- | --------: | ------------------------------------------------------------ |
  | **money_back**:                                           |     16395 | El CR fue reembolsado exitosamente.(The CR was successfully reimbursed.) |
  | **active**:                                               |        59 | Los fondos fueron recibidos en la cuenta del cliente. (Funds were received on the customer account.) |
  | **direct_debit_sent**:                                    |        34 | Se envió un débito directo SEPA, pero aún no se confirma el resultado. (We sent/scheduled a SEPA direct debit to charge the customer account. The result of this debit is not yet confirmed) |
  |                                                           |           |                                                              |
  | **rejected**:                                             |      6568 | El CR necesitó una revisión manual y fue rechazado. (The CR needed a manual review and was rejected) |
  | **direct_debit_rejected**: (+-=CR.recovery_status)        |       831 | El intento de débito directo SEPA falló. (Our last attempt of SEPA direct debit to charge the customer was rejected ) |
  | **transaction_declined**:                                 |        48 | No se pudo enviar el dinero al cliente. (We failed to send the funds to the customer) |
  | **canceled**:                                             |        33 | El usuario no confirmó el CR en la app, fue cancelado automáticamente. (The user didn't confirm the cash request in-app, we automatically canceled it) |
  | **En los datos proporcionados, NO aparecen los valores:** |           |                                                              |
  | **approved**:                                             |         0 | CR is a 'regular' one (= without fees) and was approved either automatically or manually. Funds will be sent aprox. 7 days after the creation. |
  | **money_sent**:                                           |         0 | We transferred the fund to the customer account. Will change to active once we detect that the user received the funds (using user's bank history). |
  | **pending**:                                              |         0 | The CR is pending a manual review from an analyst.           |
  | **waiting_user_confirmation**:                            |         0 | The user needs to confirm in-app that he want the CR (for legal reasons) |
  | **waiting_reimbursement**:                                |         0 | We were not able to estimate a date of reimbursement, the user needs to choose one in the app. |
     

---
---
#### Columnas que figuran en la documentacion per que NO TENEMOS EN LA BASE DE DATOS:

* **(CR.reason)**:
  Filled only if the CR was manually reviewed and rejected. **That's the rejection's reason displayed in-app**. 

* **(CR.cash_request_debited_date)**:
  Filled only if a SEPA direct debit was sent. It's the date were the latest direct debit was seen on the user account.


### Fees (FE)

* **FE.id**:
  Unique ID of the fee object.

* **FE.cash_request_id**:
  Unique ID of the CR linked to this fee.

* **FE.total_amount**:
  Amount of the fee (including VAT). **TODO: Es un importe o un porcentaje ??**

* **FE.reason**:
  Description of the fee.

* **FE.created_at**:
  Timestamp of the fee's creation.

* **FE.updated_at**:
  Timestamp of the latest fee's details update.

* **FE.paid_at**:
  Timestamp of the fee's payment.

* **FE.from_date**:
  Apply only to postpone fees. Initial date of reimbursement for the CR.

* **FE.to_date**:
  Apply only to postpone fees. New date of reimbursement for the CR.


* **FE.type**:
  Type of fee. Possible values are:

  | Campo                | Registros | Descripcion                                                  |
  | -------------------- | --------: | ------------------------------------------------------------ |
  | **instant_payment**: |     11095 | Fees por adelanto instantáneo. (fees for instant cash request (send directly after user's request, **through SEPA Instant Payment**) ) |
  | **postpone**:        |      7766 | Fees por la solicitud de posponer un reembolso. (fees created when a user want to postpone the reimbursment of a CR) |
  | **incident**:        |      2196 | Fees por fallos de reembolsos. (fees for failed reimbursement. Created after a failed direct debit) |
  | **split_payment**:   |         0 | Fees por pago fraccionado (en caso de un incidente).<br/>(futures fees for split payment (in case of an incident, we'll soon offer the possibility to our users to reimburse in multiples installements)) |


* **FE.status**:
  (= does the fees was successfully charged). Possibles values are:

  | Campo          | Registros | Descripcion                                                  |
  | -------------- | --------: | ------------------------------------------------------------ |
  | **accepted**:  |     14841 | El fee fue cobrado exitosamente. (fees were successfully charged) |
  | **cancelled**: |      4934 | El fee fue **creado pero cancelado** por algún motivo. <br>Se utiliza para solucionar problemas con las `fees`, pero se refiere principalmente a `fees` `postpone` que fallaron. Estamos cobrando las `fees` en el momento de la solicitud del `postpone`. <br/ >Si falla, no se acepta el `postpone` y la fecha de reembolso sigue siendo la misma. |
  |                |           | (Fee was created and cancelled for some reasons. It's used to fix issues with fees but it mainly concern postpone fees who failed. We are charging the fees at the moment of the postpone request. If it failed, the postpone is not accepted and the reimbursement date still the same.) |
  | **rejected**:  |      1194 | El último intento de cobrar el fee falló. (the last attempt to charge the fee failed.) |
  | **confirmed**: |        88 | El usuario completó una acción que creó un fee. (the user made an action who created a fee. <br/>It will normally get charged at the moment of the CR's reimbursement. In some rare cases, postpones are confirmed without being charges due to a commercial offer.) |


* **FE.category**:
  Describe the reason of the incident fee. 2 possibles values:

  | Campo                       | Registros | Descripcion                                                  |
  | --------------------------- | --------: | ------------------------------------------------------------ |
  | **rejected_direct_debit**:  |     18861 | Fees creados cuando el banco del usuario rechaza el primer débito directo. (fees created when user's bank rejects the first direct debit) |
  | **month_delay_on_payment**: |      1599 | Fees creados cada mes hasta que el incidente se cierre. (fees created every month until the incident is closed) |
  | **NULL**:                   |       597 | ?? No figura a la documentacio                               |


* **FE.charge_moment**:
  When the fee will be charge. 2 possibles values:

  | Campo       | Registros | Descripcion                                                  |
  | ----------- | --------: | ------------------------------------------------------------ |
  | **after**:  |     16720 | El fee se cobra cuando el CR es reembolsado. (the fee should be charged at the moment of the CR's reimbursement) |
  | **before**: |      4337 | El fee se cobra en el momento de su creación. (the fee should be charged at the moment of its creation) |

## EDA

In [17]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.graph_objects as go
import itertools # Importando itertools para generar combinaciones de columnas
# Importando la función seasonal_decompose para la descomposición de series temporales
from statsmodels.tsa.seasonal import seasonal_decompose
import matplotlib.patches as mpatches
import payments_manager as pm

cr_cp = pm.df('cr_cp')
fe_cp = pm.df('fe_cp')
df_jo = pm.df('df_jo')

#df_jo = pm.sort("df_jo", ["id_cr"]).reset_index()
df_jo = pm.sort("df_jo", ['created_at','created_at_fe']).reset_index()
df_jo = df_jo.drop(columns=['index'])

#df_jo = df_jo.drop(columns=['Mes_created_at'])
df_jo_cp = df_jo.copy()
df_jo_cp['cr_received_date'] = df_jo_cp.cash_request_received_date
#df_jo.info()
pd.options.display.max_columns = None
#display(df_jo)

In [35]:
#df = df_jo 
# df_filtrado = df[df['moderated_at'].isnull()].copy()
# df_filtrado['created_at'] = pd.to_datetime(df_filtrado['created_at'])
# df_anciana = df_filtrado.loc[df_filtrado.groupby('user_id')['created_at'].idxmin()]
# resultados = df_anciana[['user_id', 'created_at']]

# # Convertimos 'created_at' a tipo datetime si no lo está
# df['created_at'] = pd.to_datetime(df['created_at'])

# # Primero, ordenamos el DataFrame por 'user_id' y 'created_at'
# df = df.sort_values(by=['user_id', 'created_at'])

# # Ahora, obtenemos la primera fila de cada 'user_id' (la más antigua) y verificamos que 'moderated_at' no sea nulo
# primer_registro_no_nulo = df.groupby('user_id').first()

# # Filtramos para obtener solo aquellos 'user_id' donde 'moderated_at' no es nulo
# user_ids_validos = primer_registro_no_nulo[primer_registro_no_nulo['moderated_at'].notnull()]

# # Esto te da un DataFrame con los 'user_id' donde la primera fecha (más antigua) tiene 'moderated_at' no nulo
# resultados = user_ids_validos[['user_id']]

# # Si solo necesitas los 'user_id' en una lista:
# user_ids_validos_lista = resultados['user_id'].tolist()

# print (resultados) #user_ids_que_cumplen = resultados['user_id'].tolist()

In [11]:
df_jo = pm.df('df_jo')
#df_jo['created_at'] = df_jo['created_at'].dt.to_period('d')
#df_jo['created_at'] = pd.to_datetime(df_jo['created_at']).dt.date
#df_jo['created_at'] = df_jo['created_at'].dt.to_period('m').dt.to_timestamp()
#pm.format_to_dates(df_jo, time_format='d') # 'min','s'
#df_jo.info()
#display(df_jo)
cohort_analysis_2 = (
    df_jo.query('stat_fe == "accepted" | stat_cr == "money_back"')
        .groupby(['user_id', 'Mes_created_at'], as_index=False)
        .agg(
            total_paid_fees=('fee', lambda x: x[df_jo.loc[x.index, 'stat_fe'] == 'accepted'].sum()),            
            # Contar los valores únicos de 'stat_cr' donde su valor sea 'money_back'
            total_paid_cr=('amount', lambda x: x[df_jo.loc[x.index, 'stat_cr'] == 'money_back'].unique().sum()),
            # Contar los valores únicos de 'id_cr' donde 'stat_cr' es igual a 'money_back'
            Num_Solicitudes=('id_cr', lambda x: x[df_jo.loc[x.index, 'stat_cr'] == 'money_back'].nunique())
        )
)

In [None]:
df = df_jo 
df['created_at'] = pd.to_datetime(df['created_at'])
df = df.sort_values(by=['user_id', 'created_at'])

# obtenemos la primera fila de cada 'user_id' (la más antigua)
primer_registro = df.groupby('user_id').first()
primer_registro['no_moderat'] = primer_registro['moderated_at'].isnull()
no_moderats = primer_registro.reset_index()
no_moderats = primer_registro[primer_registro['no_moderat'] == True]
resultat = no_moderats.reset_index()[['id_cr','id_fe','user_id','created_at','moderated_at','','no_moderat','transfer_type']]
#resultat = no_moderats[no_moderats.transfer_type == 'instant']
display(resultat.sort_values(['created_at']))
#len(df_jo[df_jo['active']==0])

# Unimos esta información con el DataFrame original para ver el estado de 'moderated_at' para todos los registros
#df_completo = df.merge(primer_registro[['moderated_at_no_nulo']], on='user_id', how='left')

# Ahora, 'df_completo' tiene la información de todos los usuarios, junto con una columna 'moderated_at_no_nulo' que te indica si el primer registro de cada usuario tiene 'moderated_at' no nulo

#print(df_completo[['user_id', 'created_at', 'moderated_at', 'moderated_at_no_nulo']])


Unnamed: 0,id_cr,id_fe,user_id,created_at,moderated_at,no_moderat,transfer_type
4917,3772,0,99001956,2020-06-17 21:46:47.696448,NaT,True,regular
4916,3808,0,99001953,2020-06-18 02:32:07.423951,NaT,True,regular
4918,3937,0,99002028,2020-06-18 13:25:27.260563,NaT,True,regular
4919,3980,0,99002037,2020-06-18 16:02:48.029538,NaT,True,regular
4920,4480,0,99002135,2020-06-21 09:47:28.030725,NaT,True,regular
...,...,...,...,...,...,...,...
4459,26998,21149,95838,2020-11-01 21:33:51.080334,NaT,True,instant
4673,27000,21153,98469,2020-11-01 22:00:17.658472,NaT,True,instant
4915,27005,0,103719,2020-11-01 23:06:44.582445,NaT,True,instant
4828,27009,21183,100781,2020-11-01 23:15:28.102894,NaT,True,instant


In [86]:
df = pm.df('df_jo')
#df = df_jo.copy()
pm.format_to_dates(df, time_format='s') # 'min','s'
fields = ['id_cr','user_id','created_at','moderated_at','transfer_type','stat_cr' ,'amount','fee',
          'n_fees','n_backs','needs_m_check_recov', 'n_recovery', 'n_incidents', #'created_at_dow',
          'stat_fe','id_fe','created_at_fe','updated_at_fe','reason','money_back_date', 'reimbursement_date',
          'to_reimbur','from_date','to_date', 'charge_moment','type','recovery_status' # 'paid_at', 'to_end',, #,'user_id', 'cr_received_date','recovery_status'
          #'to_receive_ini','to_receive_bank' #,'to_reimbur_cash', 'updated_at', 'to_send','send_at','moderated_at'
]

user_id = 90 #78 # 99030367 # 526
display(cohort_analysis_2[cohort_analysis_2.user_id == user_id])
display(df[df.user_id == user_id ][fields].sort_values(['created_at','created_at_fe']).reset_index(drop=True))
#display(df_jo[df_jo['n_recovery'] != 0][fields].head(5))

# Casos:
# 2002      Ok
# 12934     vip, todo instant
# 526       Ok vip
# 12274     Ok
# 16391     Ok
# 13851     Ok
# 102105    Ok
# 23328     Ok
# 99000262  Ok
# 19655     Ok
# 21465     Ok
# 14631     Ok, 10 peticiones todo rejected

# 90 Este se esta gestionando mal: todos instant, con demoras y sin gestion por 
# 1946 Parece un ejemplo de buena gestion, al final tiene un instant y se le ha dado margen el las demoras.
# 1987 Parece un ejemplo de buen usuario, se pasa a instant para siempre.
# Casos que esta examinando Alba: 19655, 21465, 14631  
# 54879, 12441, 430, 63894, 18730, 10116, 99000262

Unnamed: 0,user_id,Mes_created_at,total_paid_fees,total_paid_cr,Num_Solicitudes
35,90,2019-12,0.0,100.0,1
36,90,2020-02,0.0,100.0,1
37,90,2020-03,0.0,100.0,1
38,90,2020-05,0.0,100.0,1
39,90,2020-06,0.0,100.0,2
40,90,2020-07,0.0,100.0,1
41,90,2020-08,0.0,100.0,1
42,90,2020-09,0.0,100.0,1
43,90,2020-10,0.0,100.0,1


Unnamed: 0,id_cr,user_id,created_at,moderated_at,transfer_type,stat_cr,amount,fee,n_fees,n_backs,needs_m_check_recov,n_recovery,n_incidents,stat_fe,id_fe,created_at_fe,updated_at_fe,reason,money_back_date,reimbursement_date,to_reimbur,from_date,to_date,charge_moment,type,recovery_status
0,314,90,2019-12-26 19:20:47,2019-12-27 14:06:27,regular,money_back,100.0,0.0,0,1,0,0,0,cr_regular,0,NaT,NaT,,2020-01-08 23:00:00,2020-01-08 23:00:00,13 days 03:39:12.193814,NaT,NaT,,nice,nice
1,667,90,2020-02-11 19:04:59,2020-02-12 11:17:31,regular,money_back,100.0,0.0,0,2,0,0,0,cr_regular,0,NaT,NaT,,2020-03-10 21:30:35,2020-03-01 23:00:00,19 days 03:55:00.284449,NaT,NaT,,nice,nice
2,893,90,2020-03-16 14:56:43,2020-03-16 15:11:42,regular,money_back,100.0,0.0,0,3,0,0,0,cr_regular,0,NaT,NaT,,2020-04-15 20:51:00,2020-04-06 22:00:00,21 days 07:03:16.526399,NaT,NaT,,nice,nice
3,2155,90,2020-05-25 12:48:13,2020-05-25 13:20:08,regular,money_back,100.0,0.0,0,4,0,0,0,cr_regular,0,NaT,NaT,,2020-06-11 22:37:58,2020-06-05 22:00:00,11 days 09:11:46.980106,NaT,NaT,,nice,nice
4,3343,90,2020-06-15 10:39:34,2020-06-15 11:52:28,regular,money_back,100.0,0.0,0,5,0,0,0,cr_regular,0,NaT,NaT,,2020-06-27 03:25:34,2020-06-25 10:39:34,9 days 23:59:59.991291,NaT,NaT,,nice,nice
5,5608,90,2020-06-27 14:08:39,2020-06-27 17:43:48,regular,money_back,100.0,0.0,0,6,0,0,0,cr_regular,0,NaT,NaT,,2020-07-09 22:00:00,2020-07-08 23:51:00,11 days 09:42:20.862178,NaT,NaT,,nice,nice
6,9206,90,2020-07-21 08:16:11,2020-07-21 13:49:36,regular,money_back,100.0,0.0,0,7,0,0,0,cr_regular,0,NaT,NaT,,2020-07-30 22:00:00,2020-07-30 22:00:00,9 days 13:43:48.486970,NaT,NaT,,nice,nice
7,11384,90,2020-08-07 14:50:30,2020-08-07 15:58:06,regular,money_back,100.0,0.0,0,8,0,0,0,cr_regular,0,NaT,NaT,,2020-08-30 22:00:00,2020-08-28 22:00:00,21 days 07:09:29.901500,NaT,NaT,,nice,nice
8,14367,90,2020-09-03 12:54:09,NaT,regular,money_back,100.0,0.0,0,9,0,0,0,cr_regular,0,NaT,NaT,,2020-10-02 11:26:13,2020-10-01 22:00:00,28 days 09:05:50.814326,NaT,NaT,,nice,nice
9,19294,90,2020-10-05 15:34:36,2020-10-05 15:56:11,regular,money_back,100.0,0.0,0,10,0,0,0,cr_regular,0,NaT,NaT,,2020-11-02 19:37:40,2020-11-02 22:00:00,28 days 06:25:23.609622,NaT,NaT,,nice,nice


In [5]:
display(cohort_analysis_2[cohort_analysis_2.user_id == 526])

display(cohort_analysis_2[cohort_analysis_2.user_id == 16391])
display(cohort_analysis_2[cohort_analysis_2.user_id == 2002])
display(cohort_analysis_2[cohort_analysis_2.user_id == 13851])
display(cohort_analysis_2[cohort_analysis_2.user_id == 1987]) #.tail(60))

Unnamed: 0,user_id,Mes_created_at,total_paid_fees,total_paid_cr,Num_Solicitudes
198,526,2019-12,0.0,90.0,1
199,526,2020-02,0.0,50.0,1
200,526,2020-03,0.0,150.0,2
201,526,2020-04,0.0,80.0,1
202,526,2020-06,0.0,80.0,1
203,526,2020-07,0.0,90.0,1
204,526,2020-08,5.0,90.0,1
205,526,2020-09,5.0,100.0,1
206,526,2020-10,15.0,190.0,2


Unnamed: 0,user_id,Mes_created_at,total_paid_fees,total_paid_cr,Num_Solicitudes
5421,16391,2020-06,5.0,100.0,1
5422,16391,2020-08,10.0,100.0,1
5423,16391,2020-10,5.0,100.0,1


Unnamed: 0,user_id,Mes_created_at,total_paid_fees,total_paid_cr,Num_Solicitudes
747,2002,2020-01,0.0,100.0,1
748,2002,2020-02,0.0,100.0,1
749,2002,2020-03,0.0,100.0,1
750,2002,2020-04,0.0,100.0,1
751,2002,2020-05,0.0,100.0,1
752,2002,2020-06,0.0,100.0,1
753,2002,2020-07,15.0,100.0,1
754,2002,2020-08,0.0,100.0,1
755,2002,2020-09,0.0,100.0,1
756,2002,2020-10,5.0,100.0,1


Unnamed: 0,user_id,Mes_created_at,total_paid_fees,total_paid_cr,Num_Solicitudes
4640,13851,2020-08,5.0,150.0,2
4641,13851,2020-10,5.0,100.0,1


Unnamed: 0,user_id,Mes_created_at,total_paid_fees,total_paid_cr,Num_Solicitudes
733,1987,2019-12,0.0,100.0,1
734,1987,2020-01,0.0,100.0,1
735,1987,2020-02,0.0,100.0,1
736,1987,2020-03,0.0,100.0,1
737,1987,2020-04,0.0,100.0,1
738,1987,2020-05,0.0,100.0,1
739,1987,2020-06,10.0,100.0,1
740,1987,2020-08,15.0,100.0,1
741,1987,2020-10,10.0,100.0,2


In [6]:
# Restablecer el índice para un DataFrame limpio (opcional, ya garantizado por as_index=False)
cohort_analysis_2.reset_index(drop=True, inplace=True)

# Calcular el índice como porcentaje entre 'total_paid_fees' y 'total_paid_cr'
cohort_analysis_2['cash_index'] = (
    cohort_analysis_2['total_paid_fees'] / cohort_analysis_2['total_paid_cr'] ) * 100

# Reemplazar valores 'inf' con 0 para manejar divisiones por cero
cohort_analysis_2['cash_index'] = cohort_analysis_2['cash_index'].replace(np.inf, 0)

# Calcular la fecha del último pedido por usuario a partir del DataFrame original
df_jo['created_at'] = pd.to_datetime(df_jo['created_at'])  # Asegurarse de que el formato sea datetime
last_order_per_user = (
    df_jo.groupby('user_id')['created_at']
    .max()  # Obtener la fecha más reciente de pedido para cada usuario
    .dt.to_period('M')  # Convertir a periodo mensual
    .reset_index()  # Restablecer el índice para facilitar el merge
)

# Incorporar la fecha del último pedido en el DataFrame de análisis de cohortes
cohort_analysis_2 = pd.merge(
    cohort_analysis_2,
    last_order_per_user.rename(columns={'created_at': 'last_order'}),
    on='user_id',
    how='left'
)

display(cohort_analysis_2.head(5))

Unnamed: 0,user_id,Mes_created_at,total_paid_fees,total_paid_cr,Num_Solicitudes,cash_index,last_order
0,47,2020-04,0.0,100.0,1,0.0,2020-10
1,47,2020-05,5.0,10.0,1,50.0,2020-10
2,47,2020-08,10.0,10.0,1,100.0,2020-10
3,47,2020-09,5.0,5.0,1,100.0,2020-10
4,47,2020-10,10.0,6.0,2,166.666667,2020-10


### Estudio de registros

In [2]:
counts = cr_cp.status.value_counts()
display(counts)

counts = cr_cp.transfer_type.value_counts()
display(counts)

counts = cr_cp.recovery_status.value_counts()
display(counts)

counts = cr_cp.money_back_date.value_counts()
display(counts)

status
money_back               16395
rejected                  6568
direct_debit_rejected      831
active                      59
transaction_declined        48
direct_debit_sent           34
canceled                    33
Name: count, dtype: int64

transfer_type
instant    13882
regular    10086
Name: count, dtype: int64

recovery_status
nice                    20639
completed                2467
pending                   845
pending_direct_debit       16
cancelled                   1
Name: count, dtype: int64

money_back_date
2020-08-04 22:00:00.000000    364
2020-08-05 22:00:00.000000    295
2020-07-29 22:00:00.000000    244
2020-07-07 22:00:00.000000    180
2020-09-01 22:00:00.000000    134
                             ... 
2020-10-14 19:53:17.175800      1
2020-11-06 20:27:59.808825      1
2020-11-12 17:56:26.237355      1
2020-09-09 19:50:00.196742      1
2020-09-04 19:34:10.516739      1
Name: count, Length: 12265, dtype: int64

In [3]:
counts = fe_cp.status.value_counts()
display(counts)

counts = fe_cp.type.value_counts()
display(counts)

counts = fe_cp.category.value_counts()
display(counts)

counts = fe_cp.charge_moment.value_counts()
display(counts)


NameError: name 'fe_cp' is not defined

In [9]:
pd.options.display.max_columns = None
good_cr = ['approved', 'money_sent', 'pending', 'direct_debit_sent', 'active', 'money_back']
good_fe = ['confirmed', 'accepted', 'cr_regular']
df_jo['needs_m_check2'] = (~((df_jo['stat_cr'].isin(good_cr)) & (df_jo['stat_fe'].isin(good_fe)))).astype(int)
fields.append('needs_m_check2')
display(df_jo[fields].head(5))
#display(df_jo.head(5))

Unnamed: 0,id_cr,user_id,created_at,transfer_type,type,stat_cr,amount,fee,n_fees,n_backs,needs_m_check_recov,n_recovery,n_incidents,stat_fe,id_fe,created_at_fe,updated_at_fe,reason,money_back_date,reimbursement_date,to_reimbur,from_date,to_date,charge_moment,recovery_status,needs_m_check2
8778,3,47,2019-11-19 13:57:53.511561,regular,nice,canceled,1.0,0.0,0,0,1,0,1,cr_regular,0,NaT,NaT,,NaT,2019-12-05 23:00:00.000000,16 days 09:02:06.488439,NaT,NaT,,nice,1
8013,4,99001309,2019-12-09 14:47:35.190714,regular,nice,money_back,100.0,0.0,0,1,0,0,0,cr_regular,0,NaT,NaT,,2019-12-16 23:00:00,2019-12-16 23:00:00.000000,7 days 08:12:24.809286,NaT,NaT,,nice,0
0,5,804,2019-12-10 19:05:21.596873,regular,nice,rejected,100.0,0.0,0,0,1,0,1,cr_regular,0,NaT,NaT,,NaT,2020-01-09 19:05:21.596363,29 days 23:59:59.999490,NaT,NaT,,nice,1
11859,6,812,2019-12-10 19:05:48.921042,regular,nice,direct_debit_rejected,100.0,0.0,0,0,1,1,1,cr_regular,0,NaT,NaT,,NaT,2020-02-05 23:00:00.000000,57 days 03:54:11.078958,NaT,NaT,,pending,1
2,7,191,2019-12-10 19:13:35.825460,regular,nice,rejected,100.0,0.0,0,0,1,0,1,cr_regular,0,NaT,NaT,,NaT,2020-01-09 19:13:35.825041,29 days 23:59:59.999581,NaT,NaT,,nice,1


### Formato de fechas

In [10]:
# df = df_jo.copy()
# timeFormat ='d' #s
# df['created_at'] = df['created_at'].dt.to_period(timeFormat) #'Min')
# df['created_at_fe'] = df['created_at_fe'].dt.to_period(timeFormat) #'Min')
# df['updated_at'] = df['updated_at'].dt.to_period(timeFormat) #'Min')
# df['updated_at_fe'] = df['updated_at_fe'].dt.to_period(timeFormat) #'Min')

# df['to_receive_ini'] = pd.to_timedelta(df['to_receive_ini']).round(timeFormat)
# #df['to_receive_ini'] = df['to_receive_ini'].timedelta(seconds=math.ceil(df['to_receive_ini'].total_seconds()))

# df['to_receive_bank'] = pd.to_timedelta(df['to_receive_bank']).round(timeFormat)
# df['to_reimbur'] = pd.to_timedelta(df['to_reimbur']).round(timeFormat)
# df['to_reimbur_cash'] = pd.to_timedelta(df['to_reimbur_cash']).round(timeFormat)
# df['to_end'] = pd.to_timedelta(df['to_end']).round(timeFormat)
# df['to_send'] = pd.to_timedelta(df['to_send']).round(timeFormat)

# df['money_back_date'] = df['money_back_date'].dt.to_period(timeFormat)
# df['send_at'] = df['send_at'].dt.to_period(timeFormat)
# df['paid_at'] = df['paid_at'].dt.to_period(timeFormat)
# df['moderated_at'] = df['moderated_at'].dt.to_period(timeFormat)
# df['from_date'] = df['from_date'].dt.to_period(timeFormat)
# df['to_date'] = df['to_date'].dt.to_period(timeFormat)

# df['reco_creation'] = df['reco_creation'].dt.to_period(timeFormat)
# df['reco_last_update'] = df['reco_last_update'].dt.to_period(timeFormat)
# display(df.head(5))

### cohort_analysis_2

In [11]:
cohort_analysis_2 = (
    df_jo.query('stat_fe == "accepted" | stat_cr == "money_back"')
        .groupby(['user_id', 'Mes_created_at'], as_index=False)
        .agg(
            total_paid_fees=('fee', lambda x: x[df_jo.loc[x.index, 'stat_fe'] == 'accepted'].sum()),            
            # Contar los valores únicos de 'stat_cr' donde su valor sea 'money_back'
            total_paid_cr=('amount', lambda x: x[df_jo.loc[x.index, 'stat_cr'] == 'money_back'].unique().sum()),
            # Contar los valores únicos de 'id_cr' donde 'stat_cr' es igual a 'money_back'
            Num_Solicitudes=('id_cr', lambda x: x[df_jo.loc[x.index, 'stat_cr'] == 'money_back'].nunique())
        )
)


In [12]:
# Restablecer el índice para un DataFrame limpio (opcional, ya garantizado por as_index=False)
cohort_analysis_2.reset_index(drop=True, inplace=True)

# Calcular el índice como porcentaje entre 'total_paid_fees' y 'total_paid_cr'
cohort_analysis_2['cash_index'] = (
    cohort_analysis_2['total_paid_fees'] / cohort_analysis_2['total_paid_cr'] ) * 100

# Reemplazar valores 'inf' con 0 para manejar divisiones por cero
cohort_analysis_2['cash_index'] = cohort_analysis_2['cash_index'].replace(np.inf, 0)

# Calcular la fecha del último pedido por usuario a partir del DataFrame original
df_jo['created_at'] = pd.to_datetime(df_jo['created_at'])  # Asegurarse de que el formato sea datetime
last_order_per_user = (
    df_jo.groupby('user_id')['created_at']
    .max()  # Obtener la fecha más reciente de pedido para cada usuario
    .dt.to_period('M')  # Convertir a periodo mensual
    .reset_index()  # Restablecer el índice para facilitar el merge
)

# Incorporar la fecha del último pedido en el DataFrame de análisis de cohortes
cohort_analysis_2 = pd.merge(
    cohort_analysis_2,
    last_order_per_user.rename(columns={'created_at': 'last_order'}),
    on='user_id',
    how='left'
)

display(cohort_analysis_2.head(5))

Unnamed: 0,user_id,Mes_created_at,total_paid_fees,total_paid_cr,Num_Solicitudes,cash_index,last_order
0,47,2020-04,0.0,100.0,1,0.0,2020-10
1,47,2020-05,5.0,10.0,1,50.0,2020-10
2,47,2020-08,10.0,10.0,1,100.0,2020-10
3,47,2020-09,5.0,5.0,1,100.0,2020-10
4,47,2020-10,10.0,6.0,2,166.666667,2020-10


### Estudio de casos de clientes concretos (user_id)

* 102105
* 16391 
* 2002
* 13851
* 1987 
* ...
* 16391 # 2002, 1987, 13851, 16391, 102105

* user_ids = [13851] [2002] , 1987, 1946, 90, 526, 12934] #, 12274 54879 12441, 13851, 16391, 430,  63894,18730,10116,21465, 99000262]
* vips 12934 526
* 90 Este se esta gestionando mal: todos instant, con demoras y sin gestion por 
* 1946 Parece un ejemplo de buena gestion, al final tiene un instant y se le ha dado margen el las demoras.
* 1987 Parece un ejemplo de buen usuario, se pasa a instant para siempre.

Casos que esta examinando Alba:
* 19655
* 21465
* 14631


In [13]:
display(cohort_analysis_2[cohort_analysis_2.user_id == 102105])

display(cohort_analysis_2[cohort_analysis_2.user_id == 16391])
display(cohort_analysis_2[cohort_analysis_2.user_id == 2002])
display(cohort_analysis_2[cohort_analysis_2.user_id == 13851])
display(cohort_analysis_2[cohort_analysis_2.user_id == 1987]) #.tail(60))

Unnamed: 0,user_id,Mes_created_at,total_paid_fees,total_paid_cr,Num_Solicitudes,cash_index,last_order
15482,102105,2020-10,0.0,100.0,1,0.0,2020-10


Unnamed: 0,user_id,Mes_created_at,total_paid_fees,total_paid_cr,Num_Solicitudes,cash_index,last_order
5421,16391,2020-06,5.0,100.0,1,5.0,2020-10
5422,16391,2020-08,10.0,100.0,1,10.0,2020-10
5423,16391,2020-10,5.0,100.0,1,5.0,2020-10


Unnamed: 0,user_id,Mes_created_at,total_paid_fees,total_paid_cr,Num_Solicitudes,cash_index,last_order
747,2002,2020-01,0.0,100.0,1,0.0,2020-10
748,2002,2020-02,0.0,100.0,1,0.0,2020-10
749,2002,2020-03,0.0,100.0,1,0.0,2020-10
750,2002,2020-04,0.0,100.0,1,0.0,2020-10
751,2002,2020-05,0.0,100.0,1,0.0,2020-10
752,2002,2020-06,0.0,100.0,1,0.0,2020-10
753,2002,2020-07,15.0,100.0,1,15.0,2020-10
754,2002,2020-08,0.0,100.0,1,0.0,2020-10
755,2002,2020-09,0.0,100.0,1,0.0,2020-10
756,2002,2020-10,5.0,100.0,1,5.0,2020-10


Unnamed: 0,user_id,Mes_created_at,total_paid_fees,total_paid_cr,Num_Solicitudes,cash_index,last_order
4640,13851,2020-08,5.0,150.0,2,3.333333,2020-10
4641,13851,2020-10,5.0,100.0,1,5.0,2020-10


Unnamed: 0,user_id,Mes_created_at,total_paid_fees,total_paid_cr,Num_Solicitudes,cash_index,last_order
733,1987,2019-12,0.0,100.0,1,0.0,2020-10
734,1987,2020-01,0.0,100.0,1,0.0,2020-10
735,1987,2020-02,0.0,100.0,1,0.0,2020-10
736,1987,2020-03,0.0,100.0,1,0.0,2020-10
737,1987,2020-04,0.0,100.0,1,0.0,2020-10
738,1987,2020-05,0.0,100.0,1,0.0,2020-10
739,1987,2020-06,10.0,100.0,1,10.0,2020-10
740,1987,2020-08,15.0,100.0,1,15.0,2020-10
741,1987,2020-10,10.0,100.0,2,10.0,2020-10


In [14]:
user_id = 2002# 16391 # 2002, 1987, 13851, 16391, 102105
display(cohort_analysis_2[cohort_analysis_2.user_id == user_id])

#print("Casos segun Cash Request ID")
pd.options.display.max_columns = None
for id in ([-8177]): # 16391 20108, 20104, 20112,
    df_t = df_jo[df_jo['id_cr'] == id].sort_values(['created_at','created_at_fe']).reset_index()
    print(f"Cash Request ID: {id}")
    display(df_t[fields])

user_ids = [user_id] 
pd.options.display.max_columns = None
#print("Casos segun Cash User ID")
for id in (user_ids):
    df_t = df_jo[(df_jo['user_id'] == id)]#.reset_index()
    df_t = df_t[df_t['stat_cr'] == 'money_back']
    df_t = df_t[df_t['stat_fe'] == 'accepted']
    
    df_t = df_t.sort_values(['created_at','created_at_fe']).reset_index(drop=True)
    #df_t.set_index('id_cr', inplace=True)
    print(f"Only money_back - user_id {id}")
    display(df_t[fields])
    df_t = df_jo[(df_jo['user_id'] == id) ].sort_values(['created_at','created_at_fe']).reset_index(drop=True)

    print(f"user_id {id}")
    display(df_t[fields])

Unnamed: 0,user_id,Mes_created_at,total_paid_fees,total_paid_cr,Num_Solicitudes,cash_index,last_order
747,2002,2020-01,0.0,100.0,1,0.0,2020-10
748,2002,2020-02,0.0,100.0,1,0.0,2020-10
749,2002,2020-03,0.0,100.0,1,0.0,2020-10
750,2002,2020-04,0.0,100.0,1,0.0,2020-10
751,2002,2020-05,0.0,100.0,1,0.0,2020-10
752,2002,2020-06,0.0,100.0,1,0.0,2020-10
753,2002,2020-07,15.0,100.0,1,15.0,2020-10
754,2002,2020-08,0.0,100.0,1,0.0,2020-10
755,2002,2020-09,0.0,100.0,1,0.0,2020-10
756,2002,2020-10,5.0,100.0,1,5.0,2020-10


Cash Request ID: -8177


Unnamed: 0,id_cr,user_id,created_at,transfer_type,type,stat_cr,amount,fee,n_fees,n_backs,needs_m_check_recov,n_recovery,n_incidents,stat_fe,id_fe,created_at_fe,updated_at_fe,reason,money_back_date,reimbursement_date,to_reimbur,from_date,to_date,charge_moment,recovery_status,needs_m_check2


Only money_back - user_id 2002


Unnamed: 0,id_cr,user_id,created_at,transfer_type,type,stat_cr,amount,fee,n_fees,n_backs,needs_m_check_recov,n_recovery,n_incidents,stat_fe,id_fe,created_at_fe,updated_at_fe,reason,money_back_date,reimbursement_date,to_reimbur,from_date,to_date,charge_moment,recovery_status,needs_m_check2
0,8177,2002,2020-07-15 13:17:11.174285,regular,postpone,money_back,100.0,5.0,1,8,0,0,1,accepted,1749,2020-07-22 01:33:56.400884,2020-10-13 14:25:00.882490,Postpone Cash Request 8177,2020-08-03 09:01:41.363548,2020-08-05 01:33:48,20 days 12:16:36.825715,2020-08-04 22:00:00.000,2020-08-15 01:33:48.128,before,nice,0
1,8177,2002,2020-07-15 13:17:11.174285,regular,postpone,money_back,100.0,5.0,2,9,0,0,1,accepted,1839,2020-07-23 11:31:44.836318,2020-10-13 14:25:02.005106,Postpone Cash Request 8177,2020-08-03 09:01:41.363548,2020-08-05 01:33:48,20 days 12:16:36.825715,2020-08-15 01:33:48.128,2020-08-30 01:33:48.128,before,nice,0
2,8177,2002,2020-07-15 13:17:11.174285,regular,postpone,money_back,100.0,5.0,3,10,0,0,1,accepted,1977,2020-07-25 19:20:00.560197,2020-10-13 14:25:04.227093,Postpone Cash Request 8177,2020-08-03 09:01:41.363548,2020-08-05 01:33:48,20 days 12:16:36.825715,2020-08-30 01:33:48.128,2020-09-14 01:33:48.128,before,nice,0
3,24248,2002,2020-10-25 07:41:00.266568,instant,instant_payment,money_back,100.0,5.0,4,13,0,0,1,accepted,17289,2020-10-25 07:41:49.198215,2020-10-25 07:41:49.198245,Instant Payment Cash Request 24248,2020-11-09 19:11:48.425712,2020-11-09 22:00:00,15 days 14:18:59.733432,NaT,NaT,after,nice,0


user_id 2002


Unnamed: 0,id_cr,user_id,created_at,transfer_type,type,stat_cr,amount,fee,n_fees,n_backs,needs_m_check_recov,n_recovery,n_incidents,stat_fe,id_fe,created_at_fe,updated_at_fe,reason,money_back_date,reimbursement_date,to_reimbur,from_date,to_date,charge_moment,recovery_status,needs_m_check2
0,423,2002,2020-01-10 10:55:20.757139,regular,nice,money_back,100.0,0.0,0,1,0,0,0,cr_regular,0,NaT,NaT,,2020-02-06 23:00:00.000000,2020-02-06 23:00:00,27 days 12:04:39.242861,NaT,NaT,,nice,0
1,697,2002,2020-02-18 16:36:51.249037,regular,nice,money_back,100.0,0.0,0,2,0,0,0,cr_regular,0,NaT,NaT,,2020-02-27 23:00:00.000000,2020-02-27 23:00:00,9 days 06:23:08.750963,NaT,NaT,,nice,0
2,835,2002,2020-03-10 07:47:39.337041,regular,nice,money_back,100.0,0.0,0,3,0,0,0,cr_regular,0,NaT,NaT,,2020-04-14 20:25:59.132327,2020-04-05 22:00:00,26 days 14:12:20.662959,NaT,NaT,,nice,0
3,1172,2002,2020-04-14 21:03:09.519326,regular,nice,money_back,100.0,0.0,0,4,0,0,0,cr_regular,0,NaT,NaT,,2020-05-14 21:05:09.488707,2020-05-05 22:00:00,21 days 00:56:50.480674,NaT,NaT,,nice,0
4,1800,2002,2020-05-15 04:09:51.091889,regular,nice,money_back,100.0,0.0,0,5,0,0,0,cr_regular,0,NaT,NaT,,2020-06-10 22:17:53.131912,2020-06-06 22:00:00,22 days 17:50:08.908111,NaT,NaT,,nice,0
5,2949,2002,2020-06-10 23:34:13.556501,regular,nice,money_back,100.0,0.0,0,6,0,0,0,cr_regular,0,NaT,NaT,,2020-07-07 22:00:00.000000,2020-07-06 22:00:00,25 days 22:25:46.443499,NaT,NaT,,nice,0
6,8177,2002,2020-07-15 13:17:11.174285,regular,postpone,money_back,100.0,5.0,0,7,1,0,1,cancelled,1674,2020-07-20 20:43:33.629841,2020-10-13 14:25:15.265408,Postpone Cash Request 8177,2020-08-03 09:01:41.363548,2020-08-05 01:33:48,20 days 12:16:36.825715,2020-08-04 22:00:00.000,2020-08-19 22:00:00.000,after,nice,1
7,8177,2002,2020-07-15 13:17:11.174285,regular,postpone,money_back,100.0,5.0,1,8,0,0,1,accepted,1749,2020-07-22 01:33:56.400884,2020-10-13 14:25:00.882490,Postpone Cash Request 8177,2020-08-03 09:01:41.363548,2020-08-05 01:33:48,20 days 12:16:36.825715,2020-08-04 22:00:00.000,2020-08-15 01:33:48.128,before,nice,0
8,8177,2002,2020-07-15 13:17:11.174285,regular,postpone,money_back,100.0,5.0,2,9,0,0,1,accepted,1839,2020-07-23 11:31:44.836318,2020-10-13 14:25:02.005106,Postpone Cash Request 8177,2020-08-03 09:01:41.363548,2020-08-05 01:33:48,20 days 12:16:36.825715,2020-08-15 01:33:48.128,2020-08-30 01:33:48.128,before,nice,0
9,8177,2002,2020-07-15 13:17:11.174285,regular,postpone,money_back,100.0,5.0,3,10,0,0,1,accepted,1977,2020-07-25 19:20:00.560197,2020-10-13 14:25:04.227093,Postpone Cash Request 8177,2020-08-03 09:01:41.363548,2020-08-05 01:33:48,20 days 12:16:36.825715,2020-08-30 01:33:48.128,2020-09-14 01:33:48.128,before,nice,0


### Estudio 2 de clientes

In [15]:
# fields = ['id_cr','created_at','transfer_type','type','stat_cr' ,'amount','fee','n_fees','n_backs', # 'good_user',
#           'stat_fe','id_fe','created_at_fe','updated_at_fe','reason','money_back_date', 'reimbursement_date',
#           'to_reimbur','from_date','to_date', 'charge_moment' # 'paid_at', 'to_end',, #,'user_id', 'cr_received_date','recovery_status'
#           #'to_receive_ini','to_receive_bank' #,'to_reimbur_cash', 'updated_at', 'to_send','send_at','moderated_at'
# ]

df_jo = df_jo.reindex(columns=fields)
df_jo.reset_index(drop=True, inplace=True)

pd.options.display.max_columns = None
pd.options.display.max_rows = None
#display(df.head(5))
display(df_jo[df_jo['user_id'] == 2002]) #.reset_index()

Unnamed: 0,id_cr,user_id,created_at,transfer_type,type,stat_cr,amount,fee,n_fees,n_backs,needs_m_check_recov,n_recovery,n_incidents,stat_fe,id_fe,created_at_fe,updated_at_fe,reason,money_back_date,reimbursement_date,to_reimbur,from_date,to_date,charge_moment,recovery_status,needs_m_check2
356,423,2002,2020-01-10 10:55:20.757139,regular,nice,money_back,100.0,0.0,0,1,0,0,0,cr_regular,0,NaT,NaT,,2020-02-06 23:00:00.000000,2020-02-06 23:00:00,27 days 12:04:39.242861,NaT,NaT,,nice,0
627,697,2002,2020-02-18 16:36:51.249037,regular,nice,money_back,100.0,0.0,0,2,0,0,0,cr_regular,0,NaT,NaT,,2020-02-27 23:00:00.000000,2020-02-27 23:00:00,9 days 06:23:08.750963,NaT,NaT,,nice,0
762,835,2002,2020-03-10 07:47:39.337041,regular,nice,money_back,100.0,0.0,0,3,0,0,0,cr_regular,0,NaT,NaT,,2020-04-14 20:25:59.132327,2020-04-05 22:00:00,26 days 14:12:20.662959,NaT,NaT,,nice,0
1098,1172,2002,2020-04-14 21:03:09.519326,regular,nice,money_back,100.0,0.0,0,4,0,0,0,cr_regular,0,NaT,NaT,,2020-05-14 21:05:09.488707,2020-05-05 22:00:00,21 days 00:56:50.480674,NaT,NaT,,nice,0
1741,1800,2002,2020-05-15 04:09:51.091889,regular,nice,money_back,100.0,0.0,0,5,0,0,0,cr_regular,0,NaT,NaT,,2020-06-10 22:17:53.131912,2020-06-06 22:00:00,22 days 17:50:08.908111,NaT,NaT,,nice,0
3171,2949,2002,2020-06-10 23:34:13.556501,regular,nice,money_back,100.0,0.0,0,6,0,0,0,cr_regular,0,NaT,NaT,,2020-07-07 22:00:00.000000,2020-07-06 22:00:00,25 days 22:25:46.443499,NaT,NaT,,nice,0
8381,8177,2002,2020-07-15 13:17:11.174285,regular,postpone,money_back,100.0,5.0,0,7,1,0,1,cancelled,1674,2020-07-20 20:43:33.629841,2020-10-13 14:25:15.265408,Postpone Cash Request 8177,2020-08-03 09:01:41.363548,2020-08-05 01:33:48,20 days 12:16:36.825715,2020-08-04 22:00:00.000,2020-08-19 22:00:00.000,after,nice,1
8382,8177,2002,2020-07-15 13:17:11.174285,regular,postpone,money_back,100.0,5.0,1,8,0,0,1,accepted,1749,2020-07-22 01:33:56.400884,2020-10-13 14:25:00.882490,Postpone Cash Request 8177,2020-08-03 09:01:41.363548,2020-08-05 01:33:48,20 days 12:16:36.825715,2020-08-04 22:00:00.000,2020-08-15 01:33:48.128,before,nice,0
8383,8177,2002,2020-07-15 13:17:11.174285,regular,postpone,money_back,100.0,5.0,2,9,0,0,1,accepted,1839,2020-07-23 11:31:44.836318,2020-10-13 14:25:02.005106,Postpone Cash Request 8177,2020-08-03 09:01:41.363548,2020-08-05 01:33:48,20 days 12:16:36.825715,2020-08-15 01:33:48.128,2020-08-30 01:33:48.128,before,nice,0
8384,8177,2002,2020-07-15 13:17:11.174285,regular,postpone,money_back,100.0,5.0,3,10,0,0,1,accepted,1977,2020-07-25 19:20:00.560197,2020-10-13 14:25:04.227093,Postpone Cash Request 8177,2020-08-03 09:01:41.363548,2020-08-05 01:33:48,20 days 12:16:36.825715,2020-08-30 01:33:48.128,2020-09-14 01:33:48.128,before,nice,0


### Estudio de top FEES

In [16]:
pd.options.display.max_rows = None
tops = df_jo[df_jo['stat_cr' ]== 'money_back'].groupby('user_id').agg(fees=('fee','sum'))
tops = df_jo[df_jo['stat_fe'] == 'accepted'  ].groupby('user_id').agg(fees=('fee','sum'))
#display(df_jo[tops])
top_users = tops.sort_values(by='fees', ascending=False).iloc[:10].reset_index()
display(top_users)
#top_users = tops.sort_values(by='fees', ascending=True).iloc[:10]
#display(top_users)

#display(df_jo[top_users])

Unnamed: 0,user_id,fees
0,17144,75.0
1,12934,55.0
2,4982,35.0
3,99021532,35.0
4,4636,35.0
5,5189,35.0
6,4317,35.0
7,13404,35.0
8,9199,35.0
9,17603,35.0
