title: Consumer Credit Purpose Overview
author: Helder Silva 
date: 2020-09-03
region: EU  
tags: credit, bank products, survey
summary:  -  The consumer credit purposes order can change greatly depending on whether we look into drafts or offers, whether these offers are disbursed, and the provider of these offers (the list of purposes depends on the country of application). -  Generally, "car" related purposes are usually one of the most picked across the board, and when looking into disbursed offers, the purposes "Overdraft" and "Home Improvement" also tend to be amongst the most chosen ones. -  79.2% percent of the users that have more than one credit offer use only one purpose for all their offers. - Moving forward, we should investigate if there is any correlation between these purposes and credit scores/ write-off rates.

<div class="alert alert-block alert-success">
    <H1>Consumer Credit Purpose Overview</H1>

</div>

# Summary

This is an exploratory analysis on the purposes our users give when applying to a consumer credit, either directly with N26, or with one of our external providers (Auxmoney and Younited). After looking into when the purposes were first created, we decided to only consider drafts created between 2018-01-01 and 2020-08-31.

Here are the main take-aways:
-  The consumer credit purposes order can change greatly depending on whether we look into drafts or offers, whether these offers are disbursed, and the provider of these offers (the list of purposes depends on the country of application). 
-  Generally, "car" related purposes are usually one of the most picked across the board, and when looking into disbursed offers, the purposes "Overdraft" and "Home Improvement" also tend to be amongst the most chosen ones.
-  79.2% percent of the users that have more than one credit offer use only one purpose for all their offers.
- Moving forward, we should investigate if there is any correlation between these purposes and credit scores/ write-off rates.

In [1]:
import pandas as pd
import altair as alt
from utils.datalib_database import df_from_sql

# N26 colors
# Primary
teal = "#48AC98"
rhubarb = "#CB7C7A"
petrol = "#266678 "
wheat = "#CDA35F"

# Secondary
pink = "#E5C3C7"  # Goes with Teal
green = "#CAD7CA"  # Goes with Rhubarb
blue = "#C8D7E5"  # Goes with Wheat
beige = "#F5D5B9"  # Goes with Petrol

In [2]:
draft_reason_query = """ 
select 
to_char(created, 'YYYY-MM') as month,
purpose_id,
case when purpose_id = 0 then 'OTHER'
when purpose_id = 1 then 'CAR'
when purpose_id = 2 then 'HOME_IMPROVEMENT'
when purpose_id = 3 then 'VACATION'
when purpose_id = 4 then 'ELECTRONICS'
when purpose_id = 5 then 'BUSINESS'
when purpose_id = 6 then 'EDUCATION'
when purpose_id = 7 then 'CREDITS'
when purpose_id = 8 then 'OVERDRAFT'
when purpose_id = 9 then 'RENOVATION'
when purpose_id = 10 then 'MOVING'
when purpose_id = 11 then 'FURNISHINGS'
when purpose_id = 12 then 'APPLIANCES'
when purpose_id = 13 then 'USED_CAR'
when purpose_id = 14 then 'BIRTH'
when purpose_id = 15 then 'WEDDING'
end as purpose_name, 
case when created >= '2018-01-01' then true else false end as post_2018,
count(*) as all_drafts,
count(distinct user_created) as distinct_users,
sum(loan_asked_cents) as draft_volume,
all_drafts::numeric/ distinct_users::numeric as drafts_per_user
from cc_credit_draft
where created <= '2020-08-31'
group by 1, 2, 3, 4
"""

In [3]:
df_reason = df_from_sql("redshiftreader", draft_reason_query)

{"message": "started", "db": "redshiftreader", "name": "datalib-logger", "args": [], "levelname": "INFO", "pathname": "/usr/local/lib/python3.7/site-packages/datalib/database.py", "filename": "database.py", "module": "database", "lineno": 114, "funcName": "df_from_sql", "created": "20200922T133717", "processName": "MainProcess", "service": "fargo", "environment": "local", "loggerId": "9c8080c6-6193-41c5-91c1-b460c3f52a0b", "hostname": "172.28.0.4"}
{"message": "success", "db": "redshiftreader", "duration": 0.2756, "name": "datalib-logger", "args": [], "levelname": "INFO", "pathname": "/usr/local/lib/python3.7/site-packages/datalib/database.py", "filename": "database.py", "module": "database", "lineno": 124, "funcName": "df_from_sql", "created": "20200922T133717", "processName": "MainProcess", "service": "fargo", "environment": "local", "loggerId": "9c8080c6-6193-41c5-91c1-b460c3f52a0b", "hostname": "172.28.0.4"}


In [4]:
df_reason.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 637 entries, 0 to 636
Data columns (total 8 columns):
month              637 non-null object
purpose_id         637 non-null int64
purpose_name       637 non-null object
post_2018          637 non-null bool
all_drafts         637 non-null int64
distinct_users     637 non-null int64
draft_volume       637 non-null int64
drafts_per_user    637 non-null float64
dtypes: bool(1), float64(1), int64(4), object(2)
memory usage: 35.5+ KB


In [5]:
df_reason.head()

Unnamed: 0,month,purpose_id,purpose_name,post_2018,all_drafts,distinct_users,draft_volume,drafts_per_user
0,2017-01,4,ELECTRONICS,False,16,13,2899900,1.230769
1,2017-02,7,CREDITS,False,1723,1251,987092700,1.377298
2,2017-02,5,BUSINESS,False,1500,1107,878710600,1.355014
3,2017-07,4,ELECTRONICS,False,429,341,86439000,1.258065
4,2017-10,3,VACATION,False,244,177,61135000,1.378531


In [6]:
df_reason_grouped = df_reason.groupby("purpose_name").sum().reset_index()
df_reason_grouped.head()

Unnamed: 0,purpose_name,purpose_id,post_2018,all_drafts,distinct_users,draft_volume,drafts_per_user
0,APPLIANCES,408,32.0,175593,108648,45471493800,57.573325
1,BIRTH,462,32.0,15276,11973,4500129000,41.710174
2,BUSINESS,225,32.0,37059,28453,19688085900,58.503861
3,CAR,45,32.0,197125,127499,100846144000,75.614488
4,CREDITS,315,32.0,60589,46842,38980362600,58.416618


# Purpose per Draft

When looking into the volume of Credit Purposes per draft in all time we can see that the purpose "CAR", "APPLIANCES" and "USED_CAR" are the most common ones, corresponding to 43% of all drafts.

In [36]:
all_reason_bar = (
    alt.Chart(df_reason_grouped)
    .mark_rect(color=wheat, size=40)
    .encode(
        alt.X("purpose_name:O", title="CC Purpose", sort="-y"),
        alt.Y("all_drafts:Q", title="Count of Drafts"),
    )
    .properties(width=800, height=400)
)

all_reason_text = all_reason_bar.mark_text(
    align="center",
    baseline="middle",
    dy=-10,  # Nudges text to right so it doesn't appear on top of the bar
).encode(text="all_drafts:Q")

all_reason_bar + all_reason_text

## Top 3 purposes over time
("CAR", "APPLIANCES" and "USED_CAR")

By comparing these top 3 purposes over time, we can see that the drafts with the purpose "CAR" have an unusual amount of drafts in February 2017. Also, we can see that 7 out of the 16 purposes were only implemented by the end of 2017. Therefore, the results following this section will only consider data between Jan 2018 and Aug 2020.

In [8]:
# all drafts per month
df_reason_all_month = df_reason.groupby("month").sum().reset_index()

# car drafts per month
df_reason_car_month = df_reason[df_reason["purpose_name"] == "CAR"]
df_reason_car_month = df_reason_car_month.groupby("month").sum().reset_index()

# used_car drafts per month
df_reason_used_car_month = df_reason[df_reason["purpose_name"] == "USED_CAR"]
df_reason_used_car_month = df_reason_used_car_month.groupby("month").sum().reset_index()

# appliances drafts per month
df_reason_appliances_month = df_reason[df_reason["purpose_name"] == "APPLIANCES"]
df_reason_appliances_month = (
    df_reason_appliances_month.groupby("month").sum().reset_index()
)

In [9]:
all_reason_month = (
    alt.Chart(df_reason_all_month)
    .mark_rect(color=petrol, size=5)
    .encode(
        alt.X("month:O", title="month"),
        alt.Y("all_drafts:Q", title="Count of Drafts", sort="x"),
        tooltip="all_drafts:Q",
    )
    .properties(width=400, height=400)
)

all_reason_car_month = (
    alt.Chart(df_reason_car_month)
    .mark_rect(color=beige, size=5)
    .encode(
        alt.X("month:O", title="month"),
        alt.Y("all_drafts:Q", title="Count of Drafts", sort="x"),
        tooltip="all_drafts:Q",
    )
    .properties(width=400, height=400)
)

all_reason_used_car_month = (
    alt.Chart(df_reason_used_car_month)
    .mark_rect(color=blue, size=5)
    .encode(
        alt.X("month:O", title="month"),
        alt.Y("all_drafts:Q", title="Count of Drafts", sort="x"),
        tooltip="all_drafts:Q",
    )
    .properties(width=400, height=400)
)

all_reason_appliances_month = (
    alt.Chart(df_reason_appliances_month)
    .mark_rect(color=pink, size=5)
    .encode(
        alt.X("month:O", title="month"),
        alt.Y("all_drafts:Q", title="Count of Drafts", sort="x"),
        tooltip="all_drafts:Q",
    )
    .properties(width=400, height=400)
)


all_reason_month.properties(title="All Drafts") | all_reason_car_month.properties(
    title="Drafts with Reason 'Car'"
)

In [10]:
all_reason_used_car_month.properties(
    title="Drafts with Reason 'Used Car'"
) | all_reason_appliances_month.properties(title="Drafts with Reason 'Appliances'")

In [11]:
purpose_created_query = """
select 
case when purpose_id = 0 then 'OTHER'
when purpose_id = 1 then 'CAR'
when purpose_id = 2 then 'HOME_IMPROVEMENT'
when purpose_id = 3 then 'VACATION'
when purpose_id = 4 then 'ELECTRONICS'
when purpose_id = 5 then 'BUSINESS'
when purpose_id = 6 then 'EDUCATION'
when purpose_id = 7 then 'CREDITS'
when purpose_id = 8 then 'OVERDRAFT'
when purpose_id = 9 then 'RENOVATION'
when purpose_id = 10 then 'MOVING'
when purpose_id = 11 then 'FURNISHINGS'
when purpose_id = 12 then 'APPLIANCES'
when purpose_id = 13 then 'USED_CAR'
when purpose_id = 14 then 'BIRTH'
when purpose_id = 15 then 'WEDDING'
end as purpose_name, 
min(created)::date as min_created_tstamp
from cc_credit_draft
where created <= '2020-08-31'
group by 1
order by 2
"""

In [12]:
df_purpose_created = df_from_sql("redshiftreader", purpose_created_query)

{"message": "started", "db": "redshiftreader", "name": "datalib-logger", "args": [], "levelname": "INFO", "pathname": "/usr/local/lib/python3.7/site-packages/datalib/database.py", "filename": "database.py", "module": "database", "lineno": 114, "funcName": "df_from_sql", "created": "20200922T133718", "processName": "MainProcess", "service": "fargo", "environment": "local", "loggerId": "9c8080c6-6193-41c5-91c1-b460c3f52a0b", "hostname": "172.28.0.4"}
{"message": "success", "db": "redshiftreader", "duration": 0.1242, "name": "datalib-logger", "args": [], "levelname": "INFO", "pathname": "/usr/local/lib/python3.7/site-packages/datalib/database.py", "filename": "database.py", "module": "database", "lineno": 124, "funcName": "df_from_sql", "created": "20200922T133719", "processName": "MainProcess", "service": "fargo", "environment": "local", "loggerId": "9c8080c6-6193-41c5-91c1-b460c3f52a0b", "hostname": "172.28.0.4"}


In [13]:
df_purpose_created

Unnamed: 0,purpose_name,min_created_tstamp
0,HOME_IMPROVEMENT,2016-12-16
1,CAR,2016-12-16
2,OVERDRAFT,2016-12-16
3,VACATION,2016-12-17
4,OTHER,2016-12-17
5,ELECTRONICS,2016-12-18
6,BUSINESS,2016-12-22
7,EDUCATION,2016-12-22
8,CREDITS,2016-12-23
9,APPLIANCES,2017-11-30


In [14]:
# post 2018 filter
df_reason_2020 = df_reason[df_reason["post_2018"] == True]
df_reason_2020_grouped = df_reason_2020.groupby("purpose_name").sum().reset_index()
df_reason_2020_grouped

Unnamed: 0,purpose_name,purpose_id,post_2018,all_drafts,distinct_users,draft_volume,drafts_per_user
0,APPLIANCES,384,32.0,174346,107992,45168097900,52.674089
1,BIRTH,448,32.0,15196,11904,4467429000,40.550754
2,BUSINESS,160,32.0,28700,22274,15355418900,41.343268
3,CAR,32,32.0,155363,104145,86926610700,48.005858
4,CREDITS,224,32.0,52018,40411,34205608800,41.335354
5,EDUCATION,192,32.0,26082,19722,7206009800,42.478567
6,ELECTRONICS,128,32.0,63616,48192,13685292700,42.132063
7,FURNISHINGS,352,32.0,31016,26318,10646448000,37.624262
8,HOME_IMPROVEMENT,64,32.0,61937,45247,20949855600,43.986544
9,MOVING,320,32.0,40845,30595,14637745600,42.442975


## Purposes per Draft
(Jan 2018 - Aug 2020)

We can see that when we apply this filter, the purpose "APPLIANCES" takes the first place, the purposes "HOME_IMPROVEMENT" and "FURNISHINGS" move up a couple of positions, and the order of the remaining purposes stays unchanged.

In [37]:
reason_2020_bar = (
    alt.Chart(df_reason_2020_grouped)
    .mark_rect(color=petrol, size=40)
    .encode(
        alt.X("purpose_name:O", title="CC Purpose", sort="-y"),
        alt.Y("all_drafts:Q", title="Count of Drafts"),
    )
    .properties(width=800, height=400)
)

reason_2020_text = reason_2020_bar.mark_text(
    align="center",
    baseline="middle",
    dy=-10,  # Nudges text to right so it doesn't appear on top of the bar
).encode(text="all_drafts:Q")

reason_2020_bar + reason_2020_text

# Purpose per [disbursed] offer

As for all offers and disbursed offers, we can find a very different purposes pattern than when comparing with drafts. Namely, purposes like "APPLIANCES", that were between the most common purposes for drafts, are now showing as some of the least common in disbursed offers.

Also, when looking into disbursed offers, the purposes "Overdraft" and "Home Improvement" tend to be amongst the most chosen ones.

In [16]:
offer_reason_query = """
select 
date_trunc('month', d.created) as month,
provider,
case when o.status = 'APPROVED' and d.status in ('DISBURSED', 'IN_REPAYMENT', 'COMPLETED', 'IN_ARREARS') then true else false end as is_disbursed,
case when purpose_id = 0 then 'OTHER'
when purpose_id = 1 then 'CAR'
when purpose_id = 2 then 'HOME_IMPROVEMENT'
when purpose_id = 3 then 'VACATION'
when purpose_id = 4 then 'ELECTRONICS'
when purpose_id = 5 then 'BUSINESS'
when purpose_id = 6 then 'EDUCATION'
when purpose_id = 7 then 'CREDITS'
when purpose_id = 8 then 'OVERDRAFT'
when purpose_id = 9 then 'RENOVATION'
when purpose_id = 10 then 'MOVING'
when purpose_id = 11 then 'FURNISHINGS'
when purpose_id = 12 then 'APPLIANCES'
when purpose_id = 13 then 'USED_CAR'
when purpose_id = 14 then 'BIRTH'
when purpose_id = 15 then 'WEDDING'
end as purpose_name,
count (*) as offer_count
from cc_credit_draft d 
inner join cc_credit_offer o on d.id = o.credit_draft_id 
where d.created between '2018-01-01' and '2020-08-31'
and o.is_success = 1
group by 1, 2, 3, 4
"""

In [17]:
df_offer_reason = df_from_sql("redshiftreader", offer_reason_query)

{"message": "started", "db": "redshiftreader", "name": "datalib-logger", "args": [], "levelname": "INFO", "pathname": "/usr/local/lib/python3.7/site-packages/datalib/database.py", "filename": "database.py", "module": "database", "lineno": 114, "funcName": "df_from_sql", "created": "20200922T133719", "processName": "MainProcess", "service": "fargo", "environment": "local", "loggerId": "9c8080c6-6193-41c5-91c1-b460c3f52a0b", "hostname": "172.28.0.4"}
{"message": "success", "db": "redshiftreader", "duration": 0.36, "name": "datalib-logger", "args": [], "levelname": "INFO", "pathname": "/usr/local/lib/python3.7/site-packages/datalib/database.py", "filename": "database.py", "module": "database", "lineno": 124, "funcName": "df_from_sql", "created": "20200922T133720", "processName": "MainProcess", "service": "fargo", "environment": "local", "loggerId": "9c8080c6-6193-41c5-91c1-b460c3f52a0b", "hostname": "172.28.0.4"}


In [18]:
df_offer_reason_grouped = df_offer_reason.groupby("purpose_name").sum().reset_index()

In [19]:
# disbursed offer filter
df_disbursed_reason = df_offer_reason[df_offer_reason["is_disbursed"] == True]
df_disbursed_reason_grouped = (
    df_disbursed_reason.groupby("purpose_name").sum().reset_index()
)

In [20]:
offer_reason_bar = (
    alt.Chart(df_offer_reason_grouped)
    .mark_rect(color=petrol, size=20)
    .encode(
        alt.X("purpose_name:O", title="CC Purpose", sort="-y"),
        alt.Y("offer_count:Q", title="Count of Drafts"),
        tooltip="offer_count:Q",
    )
    .properties(width=430, height=400)
)

disbursed_reason_bar = (
    alt.Chart(df_disbursed_reason_grouped)
    .mark_rect(color=teal, size=20)
    .encode(
        alt.X("purpose_name:O", title="CC Purpose", sort="-y"),
        alt.Y("offer_count:Q", title="Count of Drafts"),
        tooltip="offer_count:Q",
    )
    .properties(width=430, height=400)
)

offer_reason_bar.properties(title="All Offers") | disbursed_reason_bar.properties(
    title="Disbursed Offers"
)

## All Offer Purposes per Provider

In [21]:
# N26
df_n26_reason = df_offer_reason[df_offer_reason["provider"] == "N26"]
df_n26_reason = df_n26_reason.groupby("purpose_name").sum().reset_index()

# Auxmoney
df_auxmoney_reason = df_offer_reason[df_offer_reason["provider"] == "AUX_MONEY"]
df_auxmoney_reason = df_auxmoney_reason.groupby("purpose_name").sum().reset_index()

# Younited
df_younited_reason = df_offer_reason[df_offer_reason["provider"] == "YOUNITED"]
df_younited_reason = df_younited_reason.groupby("purpose_name").sum().reset_index()

In [22]:
df_provider_reason = (
    df_offer_reason.groupby(["provider", "purpose_name"]).sum().reset_index()
)

It seems these differences above happened due to the fact that the list of purposes depends on the country of application, and therefore not all purposes are used by all providers. This is especially visible with the purposes that are exclusive to the provider "YOUNITED”, since their offer conversion rate (from all offers to disbursed offers) is only 0.08%.

In [23]:
selection = alt.selection_single(empty="all", fields=["purpose_name", "provider"])

brush = alt.selection(type="interval")

color_legend = alt.condition(
    selection,
    alt.Color("offer_count:O", legend=None),
    alt.value("lightGray"),
)

legend = (
    alt.Chart(df_provider_reason)
    .mark_square(size=1400)
    .encode(
        y=alt.Y("provider:N", axis=alt.Axis(orient="left")),
        x="purpose_name:N",
        color=color_legend,
        tooltip="offer_count:Q",
    )
    .properties(width=800, height=150)
    .add_selection(selection)
)

graph = (
    alt.vconcat(legend.properties(title="Offers per Provider and Purpose"))
    .configure_axis(
        labelFontSize=12,
        titleFontSize=14,
        labelAngle=30,
        labelColor="#666666",
        titleColor="#266678",
        labelFontWeight="bold",
        grid=False,
    )
    .resolve_scale(color="independent")
)

graph

In [24]:
provider_conversion_query = """ 
with totals as (
select 
provider,
case when o.status = 'APPROVED' and d.status in ('DISBURSED', 'IN_REPAYMENT', 'COMPLETED', 'IN_ARREARS') then true else false end as is_disbursed,
count (*) as offer_count
from cc_credit_draft d 
inner join cc_credit_offer o on d.id = o.credit_draft_id 
where d.created between '2018-01-01' and '2020-08-31'
and o.is_success = 1
group by 1, 2
)
select 
provider,
sum(case when is_disbursed then offer_count end) as disbursed_offers, 
sum(offer_count) as all_offers,
round((disbursed_offers::numeric/ all_offers::numeric)*100, 2) as conversion_rate
from totals 
group by 1
"""

In [25]:
df_provider_conversion = df_from_sql("redshiftreader", provider_conversion_query)

{"message": "started", "db": "redshiftreader", "name": "datalib-logger", "args": [], "levelname": "INFO", "pathname": "/usr/local/lib/python3.7/site-packages/datalib/database.py", "filename": "database.py", "module": "database", "lineno": 114, "funcName": "df_from_sql", "created": "20200922T133720", "processName": "MainProcess", "service": "fargo", "environment": "local", "loggerId": "9c8080c6-6193-41c5-91c1-b460c3f52a0b", "hostname": "172.28.0.4"}
{"message": "success", "db": "redshiftreader", "duration": 0.2006, "name": "datalib-logger", "args": [], "levelname": "INFO", "pathname": "/usr/local/lib/python3.7/site-packages/datalib/database.py", "filename": "database.py", "module": "database", "lineno": 124, "funcName": "df_from_sql", "created": "20200922T133720", "processName": "MainProcess", "service": "fargo", "environment": "local", "loggerId": "9c8080c6-6193-41c5-91c1-b460c3f52a0b", "hostname": "172.28.0.4"}


In [39]:
alt_bar = (
    alt.Chart(df_provider_conversion)
    .mark_rect(color=teal, size=100)
    .encode(
        alt.X("provider:O", title="Provider", sort="-y"),
        alt.Y("conversion_rate:Q", title="Conversion Rate (%)"),
    )
    .properties(width=400, height=400)
)

all_text = alt_bar.mark_text(
    align="center",
    baseline="middle",
    dy=-10,  # Nudges text to right so it doesn't appear on top of the bar
).encode(text="conversion_rate:Q")

alt_bar + all_text

In [27]:
n26_reason_bar = (
    alt.Chart(df_n26_reason)
    .mark_rect(color=blue, size=10)
    .encode(
        alt.X("purpose_name:O", title="CC Purpose", sort="-y"),
        alt.Y("offer_count:Q", title="Count of Drafts"),
        tooltip="offer_count:Q",
    )
    .properties(width=250, height=400)
)

auxmoney_reason_bar = (
    alt.Chart(df_auxmoney_reason)
    .mark_rect(color=green, size=10)
    .encode(
        alt.X("purpose_name:O", title="CC Purpose", sort="-y"),
        alt.Y("offer_count:Q", title="Count of Drafts"),
        tooltip="offer_count:Q",
    )
    .properties(width=250, height=400)
)

younited_reason_bar = (
    alt.Chart(df_younited_reason)
    .mark_rect(color=beige, size=10)
    .encode(
        alt.X("purpose_name:O", title="CC Purpose", sort="-y"),
        alt.Y("offer_count:Q", title="Count of Drafts"),
        tooltip="offer_count:Q",
    )
    .properties(width=250, height=400)
)
n26_reason_bar.properties(title="All N26 Offers") | auxmoney_reason_bar.properties(
    title="All Auxmoney Offers"
) | younited_reason_bar.properties(title="All Younited Offers")

## Disbursed Purposes per Provider

In [28]:
# N26
df_n26_disbursed_reason = df_disbursed_reason[df_disbursed_reason["provider"] == "N26"]
df_n26_disbursed_reason = (
    df_n26_disbursed_reason.groupby("purpose_name").sum().reset_index()
)

# Auxmoney
df_auxmoney_disbursed_reason = df_disbursed_reason[
    df_disbursed_reason["provider"] == "AUX_MONEY"
]
df_auxmoney_disbursed_reason = (
    df_auxmoney_disbursed_reason.groupby("purpose_name").sum().reset_index()
)

# Younited
df_younited_disbursed_reason = df_disbursed_reason[
    df_disbursed_reason["provider"] == "YOUNITED"
]
df_younited_disbursed_reason = (
    df_younited_disbursed_reason.groupby("purpose_name").sum().reset_index()
)

In [29]:
n26_reason_bar = (
    alt.Chart(df_n26_disbursed_reason)
    .mark_rect(color=petrol, size=10)
    .encode(
        alt.X("purpose_name:O", title="CC Purpose", sort="-y"),
        alt.Y("offer_count:Q", title="Count of Drafts"),
        tooltip="offer_count:Q",
    )
    .properties(width=250, height=400)
)

auxmoney_reason_bar = (
    alt.Chart(df_auxmoney_disbursed_reason)
    .mark_rect(color=teal, size=10)
    .encode(
        alt.X("purpose_name:O", title="CC Purpose", sort="-y"),
        alt.Y("offer_count:Q", title="Count of Drafts"),
        tooltip="offer_count:Q",
    )
    .properties(width=250, height=400)
)

younited_reason_bar = (
    alt.Chart(df_younited_disbursed_reason)
    .mark_rect(color=wheat, size=10)
    .encode(
        alt.X("purpose_name:O", title="CC Purpose", sort="-y"),
        alt.Y("offer_count:Q", title="Count of Drafts"),
        tooltip="offer_count:Q",
    )
    .properties(width=250, height=400)
)
n26_reason_bar.properties(
    title="Disbursed N26 Offers"
) | auxmoney_reason_bar.properties(
    title="Disbursed Auxmoney Offers"
) | younited_reason_bar.properties(
    title="Disbursed Younited Offers"
)

# Number of Offer Purposes per User

Finally, when looking into how many different purposes our users give in all their offers (filtered for users that have more offers), we can see that most users (79.2%) use only one purpose for all their offers.

In [30]:
offer_per_user_query = """
with totals as (
select 
user_created,
provider,
--date_trunc('month', d.created) as month,
case when purpose_id = 0 then 'OTHER'
when purpose_id = 1 then 'CAR'
when purpose_id = 2 then 'HOME_IMPROVEMENT'
when purpose_id = 3 then 'VACATION'
when purpose_id = 4 then 'ELECTRONICS'
when purpose_id = 5 then 'BUSINESS'
when purpose_id = 6 then 'EDUCATION'
when purpose_id = 7 then 'CREDITS'
when purpose_id = 8 then 'OVERDRAFT'
when purpose_id = 9 then 'RENOVATION'
when purpose_id = 10 then 'MOVING'
when purpose_id = 11 then 'FURNISHINGS'
when purpose_id = 12 then 'APPLIANCES'
when purpose_id = 13 then 'USED_CAR'
when purpose_id = 14 then 'BIRTH'
when purpose_id = 15 then 'WEDDING'
end as purpose_name,
count (*) as offer_count
from cc_credit_draft d 
inner join cc_credit_offer o on d.id = o.credit_draft_id 
where d.created between '2018-01-01' and '2020-08-31'
and o.is_success = 1
group by 1, 2, 3
)
select 
'TOTAL' as provider,
user_created,
count(distinct purpose_name)
from totals 
where offer_count > 1
group by 1,2
union all 
select 
'N26' as provider,
user_created,
count(distinct purpose_name)
from totals 
where provider = 'N26'
and offer_count > 1
group by 1,2
union all 
select 
'AUX_MONEY' as provider,
user_created,
count(distinct purpose_name)
from totals 
where provider = 'AUX_MONEY'
and offer_count > 1
group by 1,2
union all 
select 
'YOUNITED' as provider,
user_created,
count(distinct purpose_name)
from totals 
where provider = 'YOUNITED'
and offer_count > 1
group by 1,2
"""

In [31]:
df_offer_per_user = df_from_sql("redshiftreader", offer_per_user_query)

{"message": "started", "db": "redshiftreader", "name": "datalib-logger", "args": [], "levelname": "INFO", "pathname": "/usr/local/lib/python3.7/site-packages/datalib/database.py", "filename": "database.py", "module": "database", "lineno": 114, "funcName": "df_from_sql", "created": "20200922T133721", "processName": "MainProcess", "service": "fargo", "environment": "local", "loggerId": "9c8080c6-6193-41c5-91c1-b460c3f52a0b", "hostname": "172.28.0.4"}
{"message": "success", "db": "redshiftreader", "duration": 2.4543, "name": "datalib-logger", "args": [], "levelname": "INFO", "pathname": "/usr/local/lib/python3.7/site-packages/datalib/database.py", "filename": "database.py", "module": "database", "lineno": 124, "funcName": "df_from_sql", "created": "20200922T133724", "processName": "MainProcess", "service": "fargo", "environment": "local", "loggerId": "9c8080c6-6193-41c5-91c1-b460c3f52a0b", "hostname": "172.28.0.4"}


In [32]:
# Total
df_total_offer_per_user = df_offer_per_user[df_offer_per_user["provider"] == "TOTAL"]
df_total_offer_per_user = df_total_offer_per_user.groupby("count").count().reset_index()
df_total_offer_per_user[["percentage"]] = df_total_offer_per_user[
    ["user_created"]
].apply(lambda x: round((x / x.sum()) * 100, 1), axis=0)
df_total_offer_per_user[["number_of_purposes"]] = df_total_offer_per_user[["count"]]
df_total_offer_per_user = df_total_offer_per_user[
    ["number_of_purposes", "user_created", "percentage"]
]

# N26
df_n26_offer_per_user = df_offer_per_user[df_offer_per_user["provider"] == "N26"]
df_n26_offer_per_user = df_n26_offer_per_user.groupby("count").count().reset_index()
df_n26_offer_per_user[["percentage"]] = df_n26_offer_per_user[["user_created"]].apply(
    lambda x: round((x / x.sum()) * 100, 1), axis=0
)

# Auxmoney
df_auxmoney_offer_per_user = df_offer_per_user[
    df_offer_per_user["provider"] == "AUX_MONEY"
]
df_auxmoney_offer_per_user = (
    df_auxmoney_offer_per_user.groupby("count").count().reset_index()
)
df_auxmoney_offer_per_user[["percentage"]] = df_auxmoney_offer_per_user[
    ["user_created"]
].apply(lambda x: round((x / x.sum()) * 100, 1), axis=0)

# Younited
df_younited_offer_per_user = df_offer_per_user[
    df_offer_per_user["provider"] == "YOUNITED"
]
df_younited_offer_per_user = (
    df_younited_offer_per_user.groupby("count").count().reset_index()
)
df_younited_offer_per_user[["percentage"]] = df_younited_offer_per_user[
    ["user_created"]
].apply(lambda x: round((x / x.sum()) * 100, 1), axis=0)

In [33]:
df_total_offer_per_user

Unnamed: 0,number_of_purposes,user_created,percentage
0,1,57288,79.2
1,2,11347,15.7
2,3,2745,3.8
3,4,710,1.0
4,5,200,0.3
5,6,67,0.1
6,7,10,0.0
7,8,3,0.0


In [35]:
total_per_user = (
    alt.Chart(df_total_offer_per_user)
    .mark_rect(color=teal, size=10)
    .encode(
        alt.X("number_of_purposes:O", title="Count of reasons", sort="-y"),
        alt.Y("percentage:Q", title="Percentage of Users"),
        tooltip="percentage:Q",
    )
    .properties(width=170, height=400)
)

n26_per_user = (
    alt.Chart(df_n26_offer_per_user)
    .mark_rect(color=blue, size=10)
    .encode(
        alt.X("count:O", title="Count of reasons", sort="-y"),
        alt.Y("percentage:Q", title="Percentage of Users"),
        tooltip="percentage:Q",
    )
    .properties(width=170, height=400)
)

auxmoney_per_user = (
    alt.Chart(df_auxmoney_offer_per_user)
    .mark_rect(color=green, size=10)
    .encode(
        alt.X("count:O", title="Count of reasons", sort="-y"),
        alt.Y("percentage:Q", title="Percentage of Users"),
        tooltip="percentage:Q",
    )
    .properties(width=170, height=400)
)

younited_per_user = (
    alt.Chart(df_younited_offer_per_user)
    .mark_rect(color=beige, size=10)
    .encode(
        alt.X("count:O", title="Count of reasons", sort="-y"),
        alt.Y("percentage:Q", title="Percentage of Users"),
        tooltip="percentage:Q",
    )
    .properties(width=170, height=400)
)

total_per_user.properties(title="All users") | n26_per_user.properties(
    title="N26 users"
) | auxmoney_per_user.properties(title="Auxmoney users") | younited_per_user.properties(
    title="Younited users"
)

# Future Recommendations

Since this was an exploratory analysis into this topic, the main recommendations here are to continue on exploring this topic, specifically in trying to see if there is any correlation between these purposes and credit scores/ write-off rates. More specifically, we can look into:
- Average SCHUFA score per purpose (N26 and Auxmoney only)
- Percentage of write-offs per purpose
- Credit volume per purpose