# Income Analysis per User

In this section, we calculate the total income generated by each user for the company. We will create an interactive plot using Plotly, which allows us to click on a point representing the income generated by a user and see details like the user_id and the date of the request.

In [30]:
import pandas as pd
import plotly.express as px

# Load the data
fees_data = pd.read_csv('../cleaned_dataset/modified_fees_data.csv')
cash_requests_data = pd.read_csv('../cleaned_dataset/modified_cash_requests_data.csv')

# Calculate the total income generated by each user
fees_data['created_at'] = pd.to_datetime(fees_data['created_at'])
user_income = fees_data.groupby('cash_request_id')['total_amount'].sum().reset_index()
user_income = user_income.merge(cash_requests_data[['id', 'user_id', 'created_at']], left_on='cash_request_id', right_on='id', how='left')

# Create an interactive plot using Plotly
fig = px.scatter(user_income, x='total_amount', y='user_id', size='total_amount', color='total_amount',
                 hover_data=['user_id', 'created_at'],
                 labels={'total_amount': 'Income Generated', 'user_id': 'User ID'},
                 title='Income Generated by User')
fig.update_layout(transition_duration=500)
fig.show()

# Analysis of Last 7 Months for Incident Rates and Income

In this analysis, we will compare incidents to income over the last 7 months. We will also examine the proportion of different statuses and transfer types to see if there have been improvements in incident rates and income generation.


In [38]:
import warnings

# Ignorar advertencias específicas sobre la conversión de la información de zona horaria
warnings.filterwarnings("ignore", message="Converting to PeriodArray/Index representation will drop timezone information.")

# Asegurar que las columnas 'created_at' están en formato datetime
fees_data['created_at'] = pd.to_datetime(fees_data['created_at'])
cash_requests_data['created_at'] = pd.to_datetime(cash_requests_data['created_at'])

# Filtrar los datos para los últimos 7 meses
recent_months = fees_data['created_at'].dt.to_period('M').sort_values().unique()[-7:]
filtered_fees = fees_data[fees_data['created_at'].dt.to_period('M').isin(recent_months)]
filtered_cash_requests = cash_requests_data[cash_requests_data['created_at'].dt.to_period('M').isin(recent_months)]

# Calcular incidentes e ingresos
monthly_incidents = filtered_fees[filtered_fees['type'] == 'incident'].groupby(filtered_fees['created_at'].dt.to_period('M')).size()
monthly_income = filtered_fees.groupby(filtered_fees['created_at'].dt.to_period('M'))['total_amount'].sum()

# Asegurar que todos los períodos estén alineados
all_periods = pd.period_range(start=recent_months.min(), end=recent_months.max(), freq='M')
monthly_incidents = monthly_incidents.reindex(all_periods, fill_value=0)
monthly_income = monthly_income.reindex(all_periods, fill_value=0)

# Visualización con Plotly para comparar incidentes e ingresos
fig = px.line(x=all_periods.astype(str), y=[monthly_incidents.values, monthly_income.values],
              labels={'x': 'Month', 'y': 'Quantity', 'variable': 'Type'},
              title='Comparison of Incident Rates and Income Over the Last 7 Months')
fig.update_traces(mode='lines+markers')
fig.show()

# Proporción de estados y tipos de transferencia
status_proportion = filtered_fees['status'].value_counts(normalize=True)
transfer_type_proportion = filtered_cash_requests['transfer_type'].value_counts(normalize=True)

# Gráfico de barras de Plotly para proporciones
fig = px.bar(x=status_proportion.index, y=status_proportion.values, labels={'x': 'Status', 'y': 'Proportion'},
             title='Proportion of Status Over the Last 7 Months')
fig.show()

fig = px.bar(x=transfer_type_proportion.index, y=transfer_type_proportion.values, labels={'x': 'Transfer Type', 'y': 'Proportion'},
             title='Proportion of Transfer Type Over the Last 7 Months')
fig.show()