title: Backfilling Overdraft Decisions With Credit Score Provider
author: Helder Silva 
date: 2021-04-30 
region: EU
tags: overdraft, schufa, crif, experian, bank products, einsteinium, californium, lisbon, credit scores
summary: The aim of this deep dive is to create a model that shows which credit score providers led to which overdraft decision in Plutonium. You can find the ouptut of this model on dbt.bp_credit_score_providers.

In [1]:
!pip install duckdb

Collecting duckdb
  Downloading duckdb-0.2.5-cp37-cp37m-manylinux2010_x86_64.whl (8.2 MB)
[K     |████████████████████████████████| 8.2 MB 1.6 MB/s eta 0:00:01     |█████████                       | 2.3 MB 2.7 MB/s eta 0:00:03
Installing collected packages: duckdb
Successfully installed duckdb-0.2.5
You should consider upgrading via the '/usr/local/bin/python -m pip install --upgrade pip' command.[0m


<div class="alert alert-block alert-success">
    <H1>Backfilling Overdraft Decisions With Credit Score Provider</H1>
</div>

In order for N26 users to get an arranged overdraft, we have some checks in place to reduce the risk of this product. These checks are currently being done by the Einsteinium service, and the decision output is passed to our overdraft service, Plutonium. One large component of these checks are credit scores, that can come either from external providers (Schufa for Germany, CRIF for Austria, and Experian from when we had UK customers), or from our internal scoring service, Lisbon.

Unfortunately, as of the date of this deep dive (April 30th 2021), we don't have a way to identify exactly which Einsteinium decision is being passed to Plutonium, which can make it difficult to pinpoint which provider led to a decision in case there are more than 1 being evaluated for a user on a given day.

Therefore, the aim of this deep dive is to create a model that shows which credit score providers led to which overdraft decision in Plutonium (or our best estimation in case there are more than 1 scores on a given decision day).

In order to do that, we'll go through the following questions:
 - [How many Overdraft decisions were made through Einsteinium?](#section1)
 - [Do we have duplicate decisions for these users?](#section2)
 - [Do we have users that only have a bad/ unknown score at the time of the overdraft decision?](#section3)
 - [Do users with bad/ unknown scores still have an overdraft?](#section4)
 - [Do we have users that didn't go through the Plutonium granted process?](#section5)
 
The output of this deep dive was used to create the [dbt.bp_credit_score_providers model](https://dwh-documentation.tech26.de/#!/model/model.zurich.bp_credit_score_providers).

In [2]:
import pandas as pd
from utils.datalib_database import df_from_sql

import utils.altair_functions as af

import duckdb

con = duckdb.connect(database=":memory:", read_only=False)

  r"(.*)\[(.*)]", r"\1"
  vault_params["key"] = vault_params["index"].str.replace(r"(.*)\[(.*)]", r"\2")


In [3]:
query = """
-- Gets first OD Generated for each user
with pu_first_generated_od as (
select 
  user_id,
  min(created::date)  as od_first_created
from pu_overdraft_history
where status = 'GENERATED'
group by 1
),
-- Gets all users that had a 'GENERATED' overdraft and current status/ limit
all_generated_od as (
select 
  user_id, 
  user_created,
  pu.user_id is not null as has_granted_od,
  case when pu.user_id is null then generated_at -- If users didn't go through normal overdraft granting prcess, use generated_at instead
    when generated_at < od_first_created then generated_at --users before plutonium have ods generated before od_first_created, use generated_at instead
    else od_first_created end as od_first_created,
  status, 
  amount_cents::numeric/100 as amount_eur
from pu_overdraft
left join pu_first_generated_od pu using (user_id)
where status not in ('REQUESTED', 'NOT_GRANTED')
),
-- Gets Californium scores at the time of the Overdraft Decision (needed for cases where we don't have an einsteinium decision)
ca_scores as (
select 
  od.user_id, 
  provider,
  cc.rating,
  rev_timestamp,
  ascii(rating) as int_rating,
  case when int_rating <= 77 then 'GOOD' -- 77 corresponds to M
    when int_rating >= 78 then 'BAD'
    end as ca_score_type
from all_generated_od od
inner join private.ca_credit_score_aud cc
  on od.user_id = cc.user_id
  and od_first_created::date between cc.rev_timestamp::date and cc.end_timestamp::date
group by 1,2,3, 4, 5
),
-- Gets Einsteinium scores at the time of the Overdraft Decision
es_scores as (
select 
  user_id, 
  eligibility_calculated_at,
  cs.credit_score_provider, 
  cs.credit_score_score,
  -- Threshold for N26 good scores is 12, for Schufa is 14 (Rating M)
  case when credit_score_provider is null then null
    when credit_score_provider = 'N26' and credit_score_score::int <= 12 then 'GOOD'
    when credit_score_provider != 'N26' and credit_score_score::int <= 14 then 'GOOD'
    else 'BAD'
    end as es_score_type
from private.es_requested_credit_score cs
inner join private.es_credit_risk_requests rr 
  on rr.request_id  = cs.credit_risk_request_id 
  and rr.credit_score_search_purpose = 'OVERDRAFT'
inner join private.es_requested_eligibility re 
  on re.credit_risk_request_id = rr.request_id 
  and eligibility_result
)
--Joins all scores, if users have Einsteinium Score take that, else take Californium
select 
  od.user_id, 
  od.user_created, 
  od_first_created,
  has_granted_od,
  status, 
  amount_eur,
  to_char(od_first_created, 'YYYY-MM') as month,
  coalesce(eligibility_calculated_at, rev_timestamp) as decision_date,
  case when es.user_id is not null then 'Einsteinium' 
    when ca.user_id is not null then 'Californium' 
    else 'Unknown'
    end as score_service,
  coalesce(es.credit_score_provider, ca.provider) as provider, 
  coalesce(es.credit_score_score, ca.rating) as score, 
  coalesce(es_score_type, ca_score_type) as score_type,
  case when score_type = 'GOOD' then 1 else 2 end as score_type_order,
  case when score is not null then 1 else 2 end as null_score_order,
  --First take non-null scores, then Good Scores, if we still have dups, then take last score in day
  row_number() over (partition by od.user_id order by null_score_order, score_type_order, decision_date desc) as rn
from all_generated_od od
left join es_scores es
  on od.user_id = es.user_id
  and od.od_first_created::date = eligibility_calculated_at::date
left join ca_scores ca 
  on od.user_id = ca.user_id
  and es.user_id is null
group by 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12
"""

In [4]:
input_df = df_from_sql("redshiftreader", query)

{"message": "started", "db": "redshiftreader", "name": "datalib-logger", "args": [], "levelname": "INFO", "pathname": "/usr/local/lib/python3.7/site-packages/datalib/database.py", "filename": "database.py", "module": "database", "lineno": 121, "funcName": "df_from_sql", "created": "20210430T155520", "processName": "MainProcess", "service": "fargo", "environment": "local", "loggerId": "e5905a46-56fd-4b19-b82b-284ea34a30ee", "hostname": "172.23.0.4"}
{"message": "success", "db": "redshiftreader", "duration": 82.5323, "name": "datalib-logger", "args": [], "levelname": "INFO", "pathname": "/usr/local/lib/python3.7/site-packages/datalib/database.py", "filename": "database.py", "module": "database", "lineno": 131, "funcName": "df_from_sql", "created": "20210430T155643", "processName": "MainProcess", "service": "fargo", "environment": "local", "loggerId": "e5905a46-56fd-4b19-b82b-284ea34a30ee", "hostname": "172.23.0.4"}


<a id='section1'></a>
# How many Overdraft decisions were made through Einsteinium?

In [5]:
es_scores_query = """ 
select
left(month, 4) as year,
case when score_service = 'Einsteinium' then 'Has Einsteinium Score' else 'No Einsteinium Score' end as es_status,
count(distinct user_id) as users
from input_df 
where has_granted_od
group by 1, 2
order by 1, 2
"""

In [6]:
# First we register the table name to existing dataframe
con.register("input_df", input_df)

# Then we execute que query and store the outupt in a different df (we could store it in the same one, ofc)
es_scores_df = con.execute(es_scores_query).fetchdf()

In [7]:
es_scores_df[["perc_users"]] = es_scores_df[["users"]].apply(
    lambda x: round((x / x.sum()) * 100, 1), axis=0
)
es_scores_df

Unnamed: 0,year,es_status,users,perc_users
0,2016,No Einsteinium Score,2,0.0
1,2017,No Einsteinium Score,370,0.2
2,2018,No Einsteinium Score,604,0.3
3,2019,No Einsteinium Score,139061,75.5
4,2020,Has Einsteinium Score,26745,14.5
5,2020,No Einsteinium Score,10597,5.8
6,2021,Has Einsteinium Score,6773,3.7
7,2021,No Einsteinium Score,2,0.0


Only 18% of all users that have a generated overdraft in Plutonium also have an Einsteinium score. This is expected since we only fully adopted Einsteinium in March 2020. For users that have no Einsteinium score, we will add Californium scores instead (since before Einsteinium the only check was based on the external credit scores provided by Californium).

In [8]:
all_scores_query = """
select
left(month, 4) as year,
score_service,
count(distinct user_id) as users
from input_df 
where has_granted_od
group by 1, 2
order by 2, 1
"""

In [9]:
all_scores_df = con.execute(all_scores_query).fetchdf()

In [10]:
all_scores_df[["perc_users"]] = all_scores_df[["users"]].apply(
    lambda x: round((x / x.sum()) * 100, 1), axis=0
)
all_scores_df

Unnamed: 0,year,score_service,users,perc_users
0,2016,Californium,2,0.0
1,2017,Californium,306,0.2
2,2018,Californium,601,0.3
3,2019,Californium,138962,75.5
4,2020,Californium,10597,5.8
5,2021,Californium,2,0.0
6,2020,Einsteinium,26745,14.5
7,2021,Einsteinium,6773,3.7
8,2017,Unknown,64,0.0
9,2018,Unknown,3,0.0


<a id='section2'></a>
# Do we have duplicate decisions for these users?

In [11]:
dup_scores_query = """
with dups as (
select
user_id, 
case when provider is null then 1 end as null_scores, 
count(distinct score) as n_scores,
count(distinct provider) as n_providers
from input_df 
where has_granted_od
group by 1,2
)
select 
case when null_scores then 'No score'
when n_scores <= 1 then 'No duplicate scores'
else 'Has duplicate scores' end as score_type, 
case when null_scores then 'No score'
when n_providers <= 1 then 'No duplicate providers'
else 'Has duplicate providers' end as provider_type, 
count(*) as users
from dups
group by 1, 2
order by 1 desc, 2 desc
"""

In [12]:
dup_scores_df = con.execute(dup_scores_query).fetchdf()

In [13]:
dup_scores_df[["perc_users"]] = dup_scores_df[["users"]].apply(
    lambda x: round((x / x.sum()) * 100, 1), axis=0
)
dup_scores_df

Unnamed: 0,score_type,provider_type,users,perc_users
0,No score,No score,168,0.1
1,No duplicate scores,No duplicate providers,172852,93.9
2,No duplicate scores,Has duplicate providers,16,0.0
3,Has duplicate scores,No duplicate providers,11040,6.0
4,Has duplicate scores,Has duplicate providers,80,0.0


We found duplicate scores/ providers for about 6% of the users. Since it is possible that some of these scores are unfavorable (i.e, these users wouldn't get an Overdraft based on those scores) let's try to remove those first.

In [14]:
good_scores_query = """
with dups as (
select
user_id, 
case when provider is null then 1 end as null_scores, 
count(distinct case when score_type = 'GOOD' then score end) as n_scores,
count(distinct provider) as n_providers
from input_df 
where has_granted_od
group by 1,2
)
select 
case when null_scores then 'No score'
when n_scores <= 1 then 'No duplicate scores'
else 'Has duplicate scores' end as score_type, 
case when null_scores then 'No score'
when n_providers <= 1 then 'No duplicate providers'
else 'Has duplicate providers' end as provider_type, 
count(*) as users
from dups
group by 1, 2
order by 1 desc, 2 desc
"""

In [15]:
good_scores_df = con.execute(good_scores_query).fetchdf()
good_scores_df[["perc_users"]] = good_scores_df[["users"]].apply(
    lambda x: round((x / x.sum()) * 100, 1), axis=0
)
good_scores_df

Unnamed: 0,score_type,provider_type,users,perc_users
0,No score,No score,168,0.1
1,No duplicate scores,No duplicate providers,175242,95.2
2,No duplicate scores,Has duplicate providers,33,0.0
3,Has duplicate scores,No duplicate providers,8650,4.7
4,Has duplicate scores,Has duplicate providers,63,0.0


It seems that removing these negative scores only decreased the percentage of duplicates to 4.7%.

**To complete the deduplication, we'll infer the following rules:**
- For Einsteinium decisions, we can infer that if a user has several evaluations within a given day of a granted overdraft, the last evaluation in that day would be the one sent back to Plutonium
- For Californium, since we can have scores for more than 1 provider, we can also take the last score
- We will also add a no_duplicates flag that returns false for these users that had more than 1 positive score on the day of the overdraft decision.

In [16]:
final_query = """
with dups as (
select
user_id, 
count(distinct case when score_type = 'GOOD' then score end) as n_scores,
count(distinct provider) as n_providers
from input_df 
where provider is not null
and has_granted_od
group by 1
having n_providers > 1 or n_scores > 1
)
select 
dups.user_id is null as no_duplicates,
coalesce(provider, 'Unknown') as provider,
count(*) as n_users
from input_df
left join dups using (user_id)
where rn=1
and has_granted_od
group by 1, 2
order by 1, 2
"""

In [17]:
final_scores_df = con.execute(final_query).fetchdf()

In [18]:
final_scores_df[["perc_users"]] = final_scores_df[["n_users"]].apply(
    lambda x: round((x / x.sum()) * 100, 1), axis=0
)
final_scores_df

Unnamed: 0,no_duplicates,provider,n_users,perc_users
0,False,CRIF,530,0.3
1,False,EXPERIAN,73,0.0
2,False,N26,172,0.1
3,False,SCHUFA,7971,4.3
4,True,CRIF,7213,3.9
5,True,EXPERIAN,93519,50.8
6,True,N26,12159,6.6
7,True,SCHUFA,62351,33.9
8,True,Unknown,166,0.1


In [19]:
all_decisions_query = """
select 
month as granted_od_month,
coalesce(provider, 'Unknown') as provider,
count(*) as n_users
from input_df
where rn=1
and has_granted_od
group by 1, 2
"""

In [20]:
all_decisions_df = con.execute(all_decisions_query).fetchdf()

Below you can find all granted overdraft per provider over time.

In [21]:
af.line_multi(
    all_decisions_df, "provider:N", "granted_od_month:O", "n_users:Q", 800, 400, "x"
).properties(title="All Granted OD per Provider")

<a id='section3'></a>
# Do we have users that only have a bad/ unknown score at the time of the overdraft decision?

Approximately 28.6K of all users defined in the step above have only bad/ unknown scores at the time of their overdraft decision (corresponding to 16% of all scores). The vast majority of these (69%) came through the provider Experian, for which we no longer have users after leaving the UK market.

In [22]:
final_bad_query = """
with repeated_scores as (
select user_id, 
count(distinct score) as n_scores,
count(distinct provider) as n_providers
from input_df
where has_granted_od
group by 1
)
select
n_scores =1 and n_providers = 1 as certainty_flag,
coalesce(score_type, 'Unknown') as score_type,
coalesce(provider, 'Unknown') as provider,
count(*) as all_rows, 
count(distinct user_id) as n_users
from input_df
inner join repeated_scores using (user_id)
where rn=1
and has_granted_od
and coalesce(score_type, 'Unknown') in ('BAD', 'Unknown')
and n_scores =1 and n_providers = 1
group by 1, 2, 3
order by 1, 2, 3
"""

In [23]:
final_bad_df = con.execute(final_bad_query).fetchdf()

In [24]:
final_bad_df[["perc_users"]] = final_bad_df[["n_users"]].apply(
    lambda x: round((x / x.sum()) * 100, 1), axis=0
)
final_bad_df

Unnamed: 0,certainty_flag,score_type,provider,all_rows,n_users,perc_users
0,True,BAD,CRIF,854,854,3.0
1,True,BAD,EXPERIAN,19782,19782,69.1
2,True,BAD,SCHUFA,8009,8009,28.0


### When did users with bad/ unknown scores get evaluated?

Most of the decisions where we only have negative scores for users happened between May and August 2017, around the time we migrated from ddb to plutonium.

In [25]:
bad_scores_query = """
select 
month as granted_od_month,
coalesce(provider, 'Unknown') as provider,
coalesce(score_type, 'Unknown') as score_type,
count(*) as n_users
from input_df
where coalesce(score_type, 'Unknown') in ('BAD', 'Unknown')
and rn=1
and has_granted_od
group by 1, 2, 3
"""

In [26]:
bad_scores_df = con.execute(bad_scores_query).fetchdf()

In [27]:
bad_scores_chart = af.line_multi(
    bad_scores_df[bad_scores_df["score_type"] == "BAD"],
    "provider:N",
    "granted_od_month:O",
    "n_users:Q",
    600,
    400,
    "x",
)
unknown_scores_chart = af.line_multi(
    bad_scores_df[bad_scores_df["score_type"] == "Unknown"],
    "provider:N",
    "granted_od_month:O",
    "n_users:Q",
    150,
    400,
    "x",
)

bad_scores_chart.properties(
    title="Granted OD with Bad Scores"
) | unknown_scores_chart.properties(title="Granted OD with Unknown Scores")

<a id='section4'></a>
# Do users with bad/ unknown scores still have an overdraft?

From all 28.6K users above, only 792 seem to have an enabled overdraft at the time of this deep dive.

In [28]:
currently_bad_query = """
select 
status, 
coalesce(provider, 'Unknown') as provider,
coalesce(score_type, 'Unknown') as score_type,
count(*) as n_users
from input_df
where rn = 1
and has_granted_od
and coalesce(score_type, 'Unknown') in ('BAD', 'Unknown')
group by 1, 2, 3
order by 1, 2, 3
"""

In [29]:
currently_bad_df = con.execute(currently_bad_query).fetchdf()

In [30]:
currently_bad_df

Unnamed: 0,status,provider,score_type,n_users
0,BLOCKED,CRIF,BAD,4
1,BLOCKED,SCHUFA,BAD,285
2,BLOCK_REQUESTED,SCHUFA,BAD,1
3,DISABLED,CRIF,BAD,818
4,DISABLED,EXPERIAN,BAD,19782
5,DISABLED,SCHUFA,BAD,6853
6,DISABLED,Unknown,Unknown,166
7,ENABLED,CRIF,BAD,30
8,ENABLED,SCHUFA,BAD,763
9,EXPIRED,SCHUFA,BAD,107


In [31]:
score_balance_query = """
with current_lisbon_score as (
with all_users as (
select
cmd.id::text as user_id, 
lsa.*, 
row_number() over (partition by user_created order by calculated_at desc) AS aud
from etl_reporting.ls_score_aud lsa
inner join cmd_users cmd using(user_created)
where purpose = 'OVERDRAFT'
and calculated_at::date <= current_date::date
and score_status = 'VALID'
)
select *
from all_users
where aud = 1
),
outstanding_balance as (
select
zu.user_id::text as user_id,
end_time,
outstanding_balance_eur,
(max_amount_cents / 100::float) as max_amount_eur,
row_number() over (partition by user_id order by end_time desc) as aud
from dbt.bp_overdraft_users bp
inner join dbt.zrh_users zu
using (user_created)
where timeframe = 'day'
and end_time::date >= (current_date - interval '1 day')::date
and has_overdraft_enabled 
)
select
ls.user_id,
ls.rating_class,
ls.calculated_at,
ob.end_time::date as balance_observation_date,
outstanding_balance_eur,
max_amount_eur,
product_id
from current_lisbon_score ls
inner join outstanding_balance ob using(user_id)
inner join dbt.zrh_users using(user_id)
"""

In [32]:
score_balance_df = df_from_sql("redshiftreader", score_balance_query)

  r"(.*)\[(.*)]", r"\1"
  vault_params["key"] = vault_params["index"].str.replace(r"(.*)\[(.*)]", r"\2")


{"message": "started", "db": "redshiftreader", "name": "datalib-logger", "args": [], "levelname": "INFO", "pathname": "/usr/local/lib/python3.7/site-packages/datalib/database.py", "filename": "database.py", "module": "database", "lineno": 121, "funcName": "df_from_sql", "created": "20210430T155647", "processName": "MainProcess", "service": "fargo", "environment": "local", "loggerId": "e5905a46-56fd-4b19-b82b-284ea34a30ee", "hostname": "172.23.0.4"}
{"message": "success", "db": "redshiftreader", "duration": 46.2766, "name": "datalib-logger", "args": [], "levelname": "INFO", "pathname": "/usr/local/lib/python3.7/site-packages/datalib/database.py", "filename": "database.py", "module": "database", "lineno": 131, "funcName": "df_from_sql", "created": "20210430T155733", "processName": "MainProcess", "service": "fargo", "environment": "local", "loggerId": "e5905a46-56fd-4b19-b82b-284ea34a30ee", "hostname": "172.23.0.4"}


In [33]:
current_bad_scores_query = """
select 
case when outstanding_balance_eur is not null then 'Using OD' else 'Not Using OD' end as od_usage, 
case when rating_class <= 12 then 'GOOD' 
when rating_class >= 13 then 'BAD' 
else 'Unknown' end as current_lisbon_score_type,
count(*) as n_users,
count(case when outstanding_balance_eur > max_amount_eur then 1 end) as users_in_arrears,
round(coalesce(avg(outstanding_balance_eur), 0), 2) as avg_negative_balance, 
round(coalesce(sum(outstanding_balance_eur), 0), 2) as sum_negative_balance 
from input_df
left join score_balance_df sb
using (user_id)
where rn=1
and has_granted_od
and status = 'ENABLED'
and coalesce(score_type, 'Unknown') in ('BAD', 'Unknown')
group by 1, 2
order by 1, 2
"""

In [34]:
# First we register the table name to existing dataframe
con.register("score_balance_df", score_balance_df)

# Then we execute que query and store the outupt in a different df (we could store it in the same one, ofc)
current_bad_scores_df = con.execute(current_bad_scores_query).fetchdf()

And from those 792 users, only 66 seem to have a bad or unknown Lisbon score as of the time of this deep dive (and out of these, only 40 were using Overdraft at this point in time).

In [35]:
current_bad_scores_df

Unnamed: 0,od_usage,current_lisbon_score_type,n_users,users_in_arrears,avg_negative_balance,sum_negative_balance
0,Not Using OD,BAD,8,1,0.0,0.0
1,Not Using OD,GOOD,394,29,0.0,0.0
2,Not Using OD,Unknown,19,1,0.0,0.0
3,Using OD,BAD,40,4,3850.58,154023.16
4,Using OD,GOOD,332,24,2843.81,944143.77


 ### Do these users with bad scores have a flex account?

We also looked into whether having a Flex account would justify all users with a bad/ unknown score above, but it seems that only 9 out of those 66 users had this product at the time.

In [36]:
flex_bad_scores_query = """
select 
case when outstanding_balance_eur is not null then 'Using OD' else 'Not Using OD' end as od_usage, 
case when rating_class <= 12 then 'GOOD' 
when rating_class >= 13 then 'BAD' 
else 'Unknown' end as current_lisbon_score_type,
count(*) as n_users,
count(case when outstanding_balance_eur > max_amount_eur then 1 end) as users_in_arrears,
round(coalesce(avg(outstanding_balance_eur), 0), 2) as avg_negative_balance, 
round(coalesce(sum(outstanding_balance_eur), 0), 2) as sum_negative_balance 
from input_df
left join score_balance_df sb
using (user_id)
where rn=1
and has_granted_od
and status = 'ENABLED'
and coalesce(score_type, 'Unknown') in ('BAD', 'Unknown')
and product_id = 'FLEX_ACCOUNT_MONTHLY'
group by 1, 2
order by 1, 2
"""

In [37]:
flex_bad_scores_df = con.execute(flex_bad_scores_query).fetchdf()
flex_bad_scores_df

Unnamed: 0,od_usage,current_lisbon_score_type,n_users,users_in_arrears,avg_negative_balance,sum_negative_balance
0,Not Using OD,GOOD,27,0,0.0,0.0
1,Using OD,BAD,9,4,3932.33,35391.0
2,Using OD,GOOD,48,2,2940.47,141142.78


<a id='section5'></a>
# Do we have users that didn't go through the Plutonium granted process?

In [38]:
non_pu_scores_query = """
select 
month as granted_od_month,
coalesce(provider, 'Unknown') as provider,
coalesce(score_type, 'Unknown') as score_type,
count(*) as n_users
from input_df
where coalesce(score_type, 'Unknown') in ('BAD', 'Unknown')
and rn=1
and not has_granted_od
group by 1, 2, 3
"""

In [39]:
non_pu_scores_df = con.execute(non_pu_scores_query).fetchdf()

We have about 153K users that didn't go through the granted overdraft step in the Plutonium process as of the date of this deep dive. Most of these users either came from the service we had before Plutonium, ddb, and since we have a substantial amount of unknown scores for these users (48K), we're adding a has_granted_od flag that allows us to filter out these users we're not 100% sure about. 

In [40]:
bad_scores_chart = af.line_multi(
    non_pu_scores_df[non_pu_scores_df["score_type"] == "BAD"],
    "provider:N",
    "granted_od_month:O",
    "n_users:Q",
    600,
    400,
    "x",
)
unknown_scores_chart = af.line_multi(
    non_pu_scores_df[non_pu_scores_df["score_type"] == "Unknown"],
    "provider:N",
    "granted_od_month:O",
    "n_users:Q",
    150,
    400,
    "x",
)

bad_scores_chart.properties(
    title="Granted OD with Bad Scores"
) | unknown_scores_chart.properties(title="Granted OD with Unknown Scores")