# Looking at the data annotations for the recurrent matcher to understand what is 

## TL; DR

The model has two things that could be improved

1. It has a low recall 
    - From a high level overview, it seems part of the recall could be that the recurrent incomes haven't been identified prior to running the matcher. If there is no recurrent income set up, it will not match
    - There may be some issues with varying amounts.. That is, recurrent income has been set up but payments vary in amount and don't get picked up, but I am not sure this is the reason they are not being picked up. 
2. High non-matches due to 'delayed'
    - Assuming that all delayed should be  matches that would be done had they been done at another time, the percentage of delayed transactions out of all positive matches is 46% , which may indicate the scheduling of the matcher could be improved. 


Other comments

- There is an example below that indicates the annotated data may need to be cleaned up because there is an example of a Cleo user where there are 2 different contradicting annotations within a minute for the same transaction. 

In [35]:
import pandas as pd
from datetime import datetime
import boto3
from botocore.exceptions import ClientError
from io import StringIO
from cleodata.utils.secrets import get_secret
import json

import s3fs
from fastparquet import ParquetFile
from cleodata.sources.sync.sync import SyncDataSource
boto3.setup_default_session(profile_name='DataScientist-878877078763')
redshift_source = SyncDataSource("data_exploration", use_redshift=True, redshift_cluster="cleo-production-redshift", redshift_db="cleo")

[2m2024-05-31 11:23:54[0m [[32m[1mdebug    [0m] [1mfetching credentials          [0m
[2m2024-05-31 11:23:55[0m [[32m[1minfo     [0m] [1mCredentials acquired          [0m
[2m2024-05-31 11:23:55[0m [[32m[1minfo     [0m] [1mBuilt connection pool         [0m


In [36]:
def read_from_s3(path):
    """Read parquet files and combine them into a single dataframe"""
    fs = s3fs.core.S3FileSystem()
    all_paths_from_s3 = fs.glob(path=f"{path}*.parquet")

    if len(all_paths_from_s3) > 0:
        s3 = s3fs.S3FileSystem()
        fp_obj = ParquetFile(
            all_paths_from_s3, open_with=s3.open
        )  # use s3fs as the filesystem
        data = fp_obj.to_pandas()
        return data
    elif len(all_paths_from_s3)==1:
        return pd.read_parquet(all_paths_from_s3[0])
    else:
        print(f"Nothing found")
        print(f"paths from a{all_paths_from_s3}")
    
def read_csv_s3(bucket, key):
    try:
        s3 = boto3.client('s3')
        obj = s3.get_object(Bucket=bucket, Key=key)
        df = pd.read_csv(obj['Body'])
        return df
    except ClientError as ex:
        if ex.response['Error']['Code'] == 'NoSuchKey':
            print("Key doesn't match. Please check the key value entered.")


def list_s3_flies(base_path):
    fs = s3fs.core.S3FileSystem()
    all_paths_from_s3 = fs.glob(path=f"{base_path}*.parquet")
    return all_paths_from_s3


In [37]:
sql_counts_months = """ SELECT
    EXTRACT(YEAR FROM created_at) AS year,
    EXTRACT(MONTH FROM created_at) AS month,
    COUNT(*) AS row_count
FROM
    recurring_income_annotations
GROUP BY
    EXTRACT(YEAR FROM created_at),
    EXTRACT(MONTH FROM created_at)
ORDER BY
    year, month
    """

sql_counts_year = """ SELECT
    EXTRACT(YEAR FROM created_at) AS year,
    COUNT(*) AS row_count
FROM
    recurring_income_annotations
GROUP BY
    EXTRACT(YEAR FROM created_at)
ORDER BY
    year
    """

sql_data = """ SELECT *
FROM
    recurring_income_annotations
WHERE created_at > '2024-01-01'
ORDER BY
    created_at
    """

### How much annotated data is there?

In [38]:
df_data_months = redshift_source.fetch_data(sql_counts_months)
df_data_years = redshift_source.fetch_data(sql_counts_year)

In [39]:
df_data_months.sort_values(by=['year','month'], ascending=[False,True], inplace=True)
df_data_years.sort_values(by=['year'], ascending=[False], inplace=True)

df_data_months

Unnamed: 0,year,month,row_count
32,2024,1,9980
33,2024,2,8107
34,2024,3,8250
35,2024,4,7758
36,2024,5,8251
20,2023,1,15585
21,2023,2,11864
22,2023,3,10949
23,2023,4,10380
24,2023,5,10439


In [40]:
df_data_years

Unnamed: 0,year,row_count
3,2024,42346
2,2023,133148
1,2022,195544
0,2021,67624


In [41]:
df_data = redshift_source.fetch_data(sql_data)
df_data['recurring_income_snapshot_dict'] = df_data['recurring_income_snapshot'].apply(json.loads)
for x  in ['amount','frequency','last_received_at','next_payment_expected']:
    df_data['recurring_income_'+x] = df_data['recurring_income_snapshot_dict'].apply(lambda z: z[x])

df_data

Unnamed: 0,id,user_id,recurring_income_id,original_matched_transaction_id,original_matched_transaction_correct,new_matched_transaction_id,no_matched_transaction_reason,recurring_income_snapshot,created_at,updated_at,deleted_at,originating_response_id,recurring_income_snapshot_dict,recurring_income_amount,recurring_income_frequency,recurring_income_last_received_at,recurring_income_next_payment_expected
0,9e06a6e3-ff07-4aab-85df-7b0fbc21f28f,8294574,4d0b789d-7c13-4ebf-a023-a1f32ae5ac13,,True,,cancelled,"{""amount"": ""1197.3"", ""frequency"": ""fortnightly...",2024-01-01 00:05:19.365063,2024-01-01 00:05:32.516116,,3261480183,"{'amount': '1197.3', 'frequency': 'fortnightly...",1197.3,fortnightly,2023-12-14,2023-12-28
1,eefaeb60-4b70-4781-923c-e1d55900c821,4268487,d34e8e57-84af-4197-9580-ad67f836981f,,True,,delayed,"{""amount"": ""881.0"", ""frequency"": ""monthly"", ""l...",2024-01-01 00:07:29.894694,2024-01-01 00:07:36.144785,,3261491291,"{'amount': '881.0', 'frequency': 'monthly', 'l...",881.0,monthly,2023-11-29,2023-12-27
2,be4accdc-5077-4fd2-b24e-3cbeef445b1a,5558806,c11b01eb-a226-481f-b721-8e421ce658f4,,False,,tx_not_found,"{""amount"": ""491.67"", ""frequency"": ""weekly"", ""l...",2024-01-01 00:11:29.924851,2024-01-01 00:11:29.960064,,3261517316,"{'amount': '491.67', 'frequency': 'weekly', 'l...",491.67,weekly,2023-12-21,2023-12-28
3,1d4881cd-6fc5-43b7-a1bd-69ba1179f8ed,6176988,096b8146-de0f-441a-9b48-ee8c8a8602f8,,False,,no_tx_selected,"{""amount"": ""1172.39"", ""frequency"": ""fortnightl...",2024-01-01 00:24:24.740282,2024-01-01 00:24:32.828322,,3261586866,"{'amount': '1172.39', 'frequency': 'fortnightl...",1172.39,fortnightly,2023-12-20,2023-12-29
4,512cde1f-9452-4f29-b065-ec42890c65b5,6601830,0c76c83b-a347-4cae-a9f0-0b896c551bd7,,True,,,"{""amount"": ""650.3"", ""frequency"": ""weekly"", ""la...",2024-01-01 00:30:04.677027,2024-01-01 00:30:04.677027,,3261627735,"{'amount': '650.3', 'frequency': 'weekly', 'la...",650.3,weekly,2023-12-07,2023-12-21
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
42341,f5f47bf0-5e8f-4d06-a2df-770227c01725,9507564,1636e407-0021-440c-a338-7f4c76b19922,,True,,,"{""amount"": ""750.0"", ""frequency"": ""weekly"", ""la...",2024-05-31 09:48:44.735221,2024-05-31 09:48:44.735221,,3726926122,"{'amount': '750.0', 'frequency': 'weekly', 'la...",750.0,weekly,2024-05-06,2024-05-13
42342,c152f893-311b-433b-92d4-8018f440ae6c,4398703,8d702b01-464d-4d67-acec-0e1bc0d77c70,,False,,no_tx_selected,"{""amount"": ""675.08"", ""frequency"": ""weekly"", ""l...",2024-05-31 09:50:33.760253,2024-05-31 09:50:48.412595,,3726926516,"{'amount': '675.08', 'frequency': 'weekly', 'l...",675.08,weekly,2024-05-08,2024-05-29
42343,ca3abc14-5888-4c35-a48d-bd9928ac30c2,4398703,8d702b01-464d-4d67-acec-0e1bc0d77c70,,True,,delayed,"{""amount"": ""675.08"", ""frequency"": ""weekly"", ""l...",2024-05-31 09:51:19.985023,2024-05-31 09:51:34.079255,,3726927150,"{'amount': '675.08', 'frequency': 'weekly', 'l...",675.08,weekly,2024-05-08,2024-05-29
42344,23a4a0fd-238d-4ed7-b64c-42b2afb2f5f4,2363598,91e3c3c9-9774-4c16-94e1-2529d04844b9,,False,,no_tx_selected,"{""amount"": ""1501.31"", ""frequency"": ""monthly"", ...",2024-05-31 09:56:29.423276,2024-05-31 09:56:47.426939,,3726929518,"{'amount': '1501.31', 'frequency': 'monthly', ...",1501.31,monthly,2024-04-30,2024-05-29


In [42]:
n_rows = df_data.shape[0]

Lets find cases where a recurring transacation was matched correctly to a recurrent payment (and - non null)

In [43]:
df_true_matches = df_data[(df_data['original_matched_transaction_correct']==True) & (~df_data['original_matched_transaction_id'].isnull())]
df_true_matches

Unnamed: 0,id,user_id,recurring_income_id,original_matched_transaction_id,original_matched_transaction_correct,new_matched_transaction_id,no_matched_transaction_reason,recurring_income_snapshot,created_at,updated_at,deleted_at,originating_response_id,recurring_income_snapshot_dict,recurring_income_amount,recurring_income_frequency,recurring_income_last_received_at,recurring_income_next_payment_expected
53,0650a783-61ff-4f96-9989-c9c7a7c01d86,1671679,5b93f1be-7ea9-451f-9c32-8c21e7583832,8475190438,True,,,"{""amount"": ""1711.0"", ""frequency"": ""semi_monthl...",2024-01-01 03:35:30.796961,2024-01-01 03:35:30.796961,,3259394370,"{'amount': '1711.0', 'frequency': 'semi_monthl...",1711.0,semi_monthly,2023-12-29,2024-01-12
86,89ab5304-365a-46e4-8bbe-7076972247e1,4017418,431e9413-dd23-4631-9712-b577eabe408e,8470650146,True,,,"{""amount"": ""1530.57"", ""frequency"": ""fortnightl...",2024-01-01 07:01:22.732706,2024-01-01 07:01:22.732706,,3260375210,"{'amount': '1530.57', 'frequency': 'fortnightl...",1530.57,fortnightly,2023-12-29,2024-01-12
95,dacf1ab5-cab1-4d33-82c0-19c1a744ee7b,1591356,dd7657eb-c138-4308-a76c-117ec9cc921b,8466521912,True,,,"{""amount"": ""3681.53"", ""frequency"": ""monthly"", ...",2024-01-01 07:55:00.085033,2024-01-01 07:55:00.085033,,3261762411,"{'amount': '3681.53', 'frequency': 'monthly', ...",3681.53,monthly,2023-12-29,2024-01-12
113,acf8904a-a8b2-4b42-9aec-454d38f2a9b8,7419796,51f9def4-c9cc-42fa-b1ea-52d21ec5561f,8480678050,True,,,"{""amount"": ""400.0"", ""frequency"": ""weekly"", ""la...",2024-01-01 10:32:04.369144,2024-01-01 10:32:04.369144,,3260030428,"{'amount': '400.0', 'frequency': 'weekly', 'la...",400.0,weekly,2023-12-31,2024-01-08
158,067b32de-681c-4dae-ab48-608cbe7880f7,2004267,b5891c75-0550-4b21-8baf-976fbb4eae00,8469956200,True,,,"{""amount"": ""553.83"", ""frequency"": ""monthly"", ""...",2024-01-01 14:33:10.990390,2024-01-01 14:33:10.990390,,3259389969,"{'amount': '553.83', 'frequency': 'monthly', '...",553.83,monthly,2023-12-29,2024-01-26
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
42195,3ad5c61a-a965-499a-8481-95097e30aaad,2526523,84c24efd-5168-4cf1-9ba2-dc28ea3d1509,9961688618,True,,,"{""amount"": ""428.83"", ""frequency"": ""monthly"", ""...",2024-05-30 20:49:06.301793,2024-05-30 20:49:06.301793,,3724333120,"{'amount': '428.83', 'frequency': 'monthly', '...",428.83,monthly,2024-05-29,2024-06-27
42226,0fce521d-4668-4034-baaa-f63fecc3cc58,8774046,a2121f6d-4879-428a-8706-610a99564d47,9959605048,True,,,"{""amount"": ""1096.29"", ""frequency"": ""fortnightl...",2024-05-30 23:29:41.275032,2024-05-30 23:29:41.275032,,3725867613,"{'amount': '1096.29', 'frequency': 'fortnightl...",1096.29,fortnightly,2024-05-30,2024-06-13
42230,23ebe9d3-3083-47d7-952d-ad8134ffa414,7863486,ce19dc34-72e2-45db-a379-8eafa66d249c,9960181200,True,,,"{""amount"": ""602.47"", ""frequency"": ""weekly"", ""l...",2024-05-30 23:54:05.360260,2024-05-30 23:54:05.360260,,3725361799,"{'amount': '602.47', 'frequency': 'weekly', 'l...",602.47,weekly,2024-05-29,2024-06-12
42295,0cb40527-7e12-4ddc-a478-f119ffc7e9dd,5614747,f844946a-29bc-48a7-9ad4-704afd6ed3c1,9962905685,True,,,"{""amount"": ""3257.91"", ""frequency"": ""fortnightl...",2024-05-31 05:12:05.690817,2024-05-31 05:12:05.690817,,3725087599,"{'amount': '3257.91', 'frequency': 'fortnightl...",3257.91,fortnightly,2024-05-30,2024-06-13


In [44]:
unique_users_true_matches = df_true_matches['user_id'].nunique()
print(f"Users with true matches {unique_users_true_matches}")

Users with true matches 986


### In the recurring_incomes data (not the annotated data), how many recurring_ids per person?

In [45]:
sql_recurr_ids_per_user = """ select user_id, count (distinct  id) as num_recurr_ids
    from recurring_incomes 
    where deleted_at is NULL
group by user_id
order by num_recurr_ids desc """

df_num_recurr_ids = redshift_source.fetch_data(sql_recurr_ids_per_user)
# Set global option to prevent scientific notation
pd.options.display.float_format = '{:.0f}'.format
df_distr_num_recurr_ids = df_num_recurr_ids['num_recurr_ids'].describe([0.5, 0.75, 0.9, 0.95, 0.99, 0.999]).to_frame()
df_distr_num_recurr_ids.reset_index(drop=False, inplace=True)
df_distr_num_recurr_ids

Unnamed: 0,index,num_recurr_ids
0,count,1497780
1,mean,3
2,std,10
3,min,1
4,50%,1
5,75%,2
6,90%,4
7,95%,9
8,99%,37
9,99.9%,106


In [46]:
print(f"75% of users have 2 or less of recurring incomes. 90% of users have 4 or less")

75% of users have 2 or less of recurring incomes. 90% of users have 4 or less


### In the annotated data, how many recurrent_income_ids do users have?

In [47]:
distinct_matches_per_user = df_true_matches.groupby('user_id')['recurring_income_id'].nunique().reset_index()
distinct_matches_per_user.sort_values(by='recurring_income_id', ascending=False, inplace=True)
distinct_matches_per_user

Unnamed: 0,user_id,recurring_income_id
49,1215599,2
901,8983177,2
800,8522482,2
708,7863486,2
572,6389135,2
...,...,...
335,3838122,1
336,3844515,1
337,3852572,1
338,3869241,1


In [48]:
distinct_matches_per_user.drop('user_id', axis=1).describe([0.9,0.95,0.99, 0.999])

Unnamed: 0,recurring_income_id
count,986
mean,1
std,0
min,1
50%,1
90%,1
95%,1
99%,1
99.9%,2
max,2


0.1% of users have 2 recurrent_income_ids correctly matched

### Lets look at one user who has 2 recurrent_income_ids in the annotated data

The annotated data for this user has 3 transactions all matched correctly

In [49]:
one_user_id = 1851268
df_data_one_user = df_data[df_data['user_id']==one_user_id][:]
df_data_one_user['original_matched_transaction_id'] = df_data_one_user['original_matched_transaction_id'].astype(str)
df_data_one_user.reset_index(drop=True, inplace=True)
df_data_one_user[['id', 'user_id', 'recurring_income_id',
       'original_matched_transaction_id',
       'original_matched_transaction_correct', 
       'created_at', 'updated_at', 'recurring_income_amount',
       'recurring_income_frequency', 'recurring_income_last_received_at',
       'recurring_income_next_payment_expected']]


Unnamed: 0,id,user_id,recurring_income_id,original_matched_transaction_id,original_matched_transaction_correct,created_at,updated_at,recurring_income_amount,recurring_income_frequency,recurring_income_last_received_at,recurring_income_next_payment_expected
0,cee566e8-1bc8-4a30-a047-3b4bfea377f7,1851268,4c08ca1c-0e87-427d-9f82-3870cadc3bc9,8505052806.0,True,2024-01-06 06:29:03.799505,2024-01-06 06:29:03.799505,764.99,fortnightly,2024-01-02,2024-01-30
1,b02d2a44-83ee-4baa-bd16-db15f779924d,1851268,e900544c-2453-49d2-9e93-9a2ec5f9c314,8992065986.0,True,2024-02-23 16:26:06.784394,2024-02-23 16:26:06.784394,689.06,monthly,2024-02-21,2024-02-27
2,794e3c8e-c247-4b50-94af-a8832a83dfba,1851268,4c08ca1c-0e87-427d-9f82-3870cadc3bc9,9648247488.0,True,2024-05-02 11:55:14.247678,2024-05-02 11:55:14.247678,764.99,fortnightly,2024-04-30,2024-05-14


We can look these recurrent transactions 

In [50]:
df_recurring_one_recurr_id = redshift_source.fetch_data(""" select *
from recurring_income_transactions
where recurring_income_id = '4c08ca1c-0e87-427d-9f82-3870cadc3bc9'""")
df_recurring_one_recurr_id

Unnamed: 0,id,recurring_income_id,transaction_id,paid_at,expected_paid_at,frequency,created_at,updated_at
0,7a2251e0-2480-4996-b50b-d2be5a56a396,4c08ca1c-0e87-427d-9f82-3870cadc3bc9,3416076362,2021-10-06,2021-10-06,1,2021-10-07 05:53:05.262747,2021-10-07 05:53:05.262747
1,4aa7854d-d206-4e64-b391-99a7c3a48002,4c08ca1c-0e87-427d-9f82-3870cadc3bc9,8505052806,2024-01-02,2023-10-31,1,2024-01-03 10:55:11.520008,2024-01-03 10:55:11.520008
2,fa5f5444-0097-4faf-a187-6f38cec751ae,4c08ca1c-0e87-427d-9f82-3870cadc3bc9,9648247488,2024-04-30,2024-05-16,1,2024-05-01 09:21:01.200948,2024-05-01 09:21:01.200948
3,8529d2b2-78b5-43c8-b530-19d7511c420b,4c08ca1c-0e87-427d-9f82-3870cadc3bc9,3498960353,2021-11-03,2021-11-03,1,2021-11-04 09:30:02.029370,2021-11-04 09:30:02.029370
4,ba61320b-2cf4-4c6b-8038-ea3d6ebabd9a,4c08ca1c-0e87-427d-9f82-3870cadc3bc9,9536587941,2024-04-19,2024-05-16,1,2024-04-20 06:07:19.205076,2024-04-20 06:07:19.205076
5,0918d897-09ee-45fc-802b-0e2e0827cf7b,4c08ca1c-0e87-427d-9f82-3870cadc3bc9,3687389357,2021-12-29,2021-12-29,1,2021-12-30 04:37:12.789282,2021-12-30 04:37:12.789282
6,70385141-04eb-4797-b1c9-e521c9b26eb4,4c08ca1c-0e87-427d-9f82-3870cadc3bc9,9277491072,2024-03-22,2024-04-19,1,2024-03-23 08:48:50.380147,2024-03-23 08:48:50.380147
7,69296ad8-46df-4043-aa16-1d98adf38805,4c08ca1c-0e87-427d-9f82-3870cadc3bc9,3247422514,2021-08-11,2019-09-25,1,2021-08-12 10:44:56.135392,2021-08-12 10:44:56.135392
8,ca4b50b0-0924-4c7f-b355-c65d1a0bb1d1,4c08ca1c-0e87-427d-9f82-3870cadc3bc9,3789317861,2022-01-26,2022-01-26,1,2022-01-26 13:33:13.691457,2022-01-26 13:33:13.691457
9,53546a89-3c10-45bd-b801-b624980922f2,4c08ca1c-0e87-427d-9f82-3870cadc3bc9,3373022348,2021-09-22,2021-09-22,1,2021-09-23 11:43:56.559068,2021-09-23 11:43:56.559068


### We can see in the recurring_income_transactions the annotated data

In [51]:
#fiddle around to match types
df_data_one_user['original_matched_transaction_id'] = df_data_one_user['original_matched_transaction_id'].astype(float)
df_data_one_user['original_matched_transaction_id'] = df_data_one_user['original_matched_transaction_id'].astype(int)
#recurring income transactions
df_recurring_one_recurr_id[df_recurring_one_recurr_id['transaction_id'].isin(df_data_one_user['original_matched_transaction_id'])]

Unnamed: 0,id,recurring_income_id,transaction_id,paid_at,expected_paid_at,frequency,created_at,updated_at
1,4aa7854d-d206-4e64-b391-99a7c3a48002,4c08ca1c-0e87-427d-9f82-3870cadc3bc9,8505052806,2024-01-02,2023-10-31,1,2024-01-03 10:55:11.520008,2024-01-03 10:55:11.520008
2,fa5f5444-0097-4faf-a187-6f38cec751ae,4c08ca1c-0e87-427d-9f82-3870cadc3bc9,9648247488,2024-04-30,2024-05-16,1,2024-05-01 09:21:01.200948,2024-05-01 09:21:01.200948


### We can also see all the transactions associated to one of the recurring_income_ids

In [52]:
df_recurrent_income_ids  = redshift_source.fetch_data(f""" select *
from recurring_income_transactions
where recurring_income_id = '4c08ca1c-0e87-427d-9f82-3870cadc3bc9'
and expected_paid_at > '2023-01-01'""")
df_recurrent_income_ids.sort_values(by='created_at', ascending=True, inplace=True)
df_recurrent_income_ids.reset_index(drop=True, inplace=True)
df_recurrent_income_ids

Unnamed: 0,id,recurring_income_id,transaction_id,paid_at,expected_paid_at,frequency,created_at,updated_at
0,4aa7854d-d206-4e64-b391-99a7c3a48002,4c08ca1c-0e87-427d-9f82-3870cadc3bc9,8505052806,2024-01-02,2023-10-31,1,2024-01-03 10:55:11.520008,2024-01-03 10:55:11.520008
1,ca3a0969-69db-4760-b7ca-5f3b645e427d,4c08ca1c-0e87-427d-9f82-3870cadc3bc9,9147398103,2024-03-08,2024-01-30,1,2024-03-09 09:15:08.369795,2024-03-09 09:15:08.369795
2,b47dd4c5-e82f-4f0c-9957-1d23d8484d6c,4c08ca1c-0e87-427d-9f82-3870cadc3bc9,9247684253,2024-03-19,2024-04-05,1,2024-03-20 07:11:30.572785,2024-03-20 07:11:30.572785
3,70385141-04eb-4797-b1c9-e521c9b26eb4,4c08ca1c-0e87-427d-9f82-3870cadc3bc9,9277491072,2024-03-22,2024-04-19,1,2024-03-23 08:48:50.380147,2024-03-23 08:48:50.380147
4,3c1550d6-5f2b-49bb-94b3-1eb3f8ae4e74,4c08ca1c-0e87-427d-9f82-3870cadc3bc9,9316067954,2024-03-25,2024-04-23,1,2024-03-27 19:44:53.197641,2024-03-27 19:44:53.197641
5,68d1eee3-3768-4bcd-9367-e6c867c68da6,4c08ca1c-0e87-427d-9f82-3870cadc3bc9,9378215575,2024-04-02,2024-04-23,1,2024-04-03 11:41:19.687279,2024-04-03 11:41:19.687279
6,826f7711-1755-414c-98be-f26cc658b471,4c08ca1c-0e87-427d-9f82-3870cadc3bc9,9508075207,2024-04-16,2024-04-30,1,2024-04-17 09:11:38.549337,2024-04-17 09:11:38.549337
7,780cd944-bd47-4e95-9d2e-87dedc365e74,4c08ca1c-0e87-427d-9f82-3870cadc3bc9,9516204498,2024-04-17,2024-05-14,1,2024-04-18 05:28:02.741521,2024-04-18 05:28:02.741521
8,43c7f7a2-18c3-4f9e-bc39-f6958ec6de09,4c08ca1c-0e87-427d-9f82-3870cadc3bc9,9526304076,2024-04-18,2024-05-15,1,2024-04-19 06:46:46.709236,2024-04-19 06:46:46.709236
9,ba61320b-2cf4-4c6b-8038-ea3d6ebabd9a,4c08ca1c-0e87-427d-9f82-3870cadc3bc9,9536587941,2024-04-19,2024-05-16,1,2024-04-20 06:07:19.205076,2024-04-20 06:07:19.205076


### We can look for these transactions in the transaction tables

In [53]:
# df_trans  = redshift_source.fetch_data(""" select * from transactions where id in  (8505052806, 9648247488)""")
# df_trans.columns


Only keep transactions form recurrent_incomes that belong to the user_id of interest

In [54]:
#get transactions data
df_trans_one_user  = redshift_source.fetch_data(f""" select * from transactions where user_id = {one_user_id} and corrected_made_on > '2023-06-30' """)

df_trans_one_user

Unnamed: 0,id,account_id,category,currency_code,amount,description,made_on,duplicated,mode,created_at,...,marked_as_duplicate,transaction_category_id,bill_id,last_enriched_at,user_id,external_transaction_id,login_provider_additional_attributes,extra,recurring_income_id,is_excluded
0,8274179929,15673919,10000000,USD,-4.99,Credit Genie creditgeni *********** Fee,2023-12-08,,,2023-12-11 03:51:45.186109,...,False,18,,2023-12-11 03:51:46.614120,1851268,g6pe4mLANpc8Ora7KORJHgz59DKkZmsKbpA0v,"{""datetime"": null, ""location"": {""lat"": null, ""...",{},,
1,8274179930,15673919,22009000,USD,-7.96,SDIEGO MARINERS MM/GAS237SAN DIEGO,2023-12-08,,,2023-12-11 03:51:45.186109,...,False,15,,2023-12-11 03:51:46.628466,1851268,9Md01Qp9Bduaw3JXjwpbHx8rPa9wDBtxrMyOX,"{""datetime"": null, ""location"": {""lat"": null, ""...",{},,
2,8274179931,15673919,22009000,USD,-12.67,SDIEGO MARINERS MM/GAS237SAN DIEGO,2023-12-08,,,2023-12-11 03:51:45.186109,...,False,15,,2023-12-11 03:51:46.629395,1851268,7zDAL3egPDcrbkN7VbwOCbBK6pE5jLI5z9YdO,"{""datetime"": null, ""location"": {""lat"": null, ""...",{},,
3,8274179932,15673919,22009000,USD,-6.08,SDIEGO MARINERS MM/GAS237SAN DIEGO,2023-12-07,,,2023-12-11 03:51:45.186109,...,False,15,,2023-12-11 03:51:46.630219,1851268,ML5Ym9zA75SBgXo4xg7VI5noxg7XvZupkNPJ0,"{""datetime"": null, ""location"": {""lat"": null, ""...",{},,
4,8274179933,15673919,19047000,USD,0.20,Target,2023-12-07,,,2023-12-11 03:51:45.186109,...,False,11,,2023-12-11 03:51:46.634616,1851268,aZPpkrzbePSq8Rw3o8LdH5Kjn0zgPbu1aD4Q9,"{""datetime"": null, ""location"": {""lat"": null, ""...",{},,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2157,9960655686,15674939,22009000,USD,-7.54,SDIEGO MARINERS MM/GAS,2024-05-29,,,2024-05-30 06:07:02.576506,...,False,15,,2024-05-30 06:09:01.321999,1851268,QgRdoaRRwpixV5m4DV0Xh1P9b3n4j4h9ADbA4,"{""datetime"": null, ""location"": {""lat"": null, ""...",{},,False
2158,9960655687,15674939,22009000,USD,-14.06,SDIEGO MARINERS MM/GAS,2024-05-29,,,2024-05-30 06:07:02.576506,...,False,15,,2024-05-30 06:09:01.323418,1851268,EjRZoqRR1pHgRBxPqRkzF1M79jDnLnhdOk5OQ,"{""datetime"": null, ""location"": {""lat"": null, ""...",{},,False
2159,9960655688,15674939,18030000,USD,-147.50,USAA INSURANCE PAYMENT,2024-05-29,,,2024-05-30 06:07:02.576506,...,False,3,,2024-05-30 06:09:01.325084,1851268,5dL7zoLL5XIRv1rVqvAMhMaBNVPZjZtNr9dro,"{""datetime"": null, ""location"": {""lat"": null, ""...",{},,False
2160,9960655689,15674939,21006000,USD,-113.17,WITHU Debit,2024-05-29,,,2024-05-30 06:07:02.576506,...,False,18,,2024-05-30 06:09:01.345157,1851268,ojpg6wppBNH3xgK1ex0PH7gO5VoBRBcrRZkRe,"{""datetime"": null, ""location"": {""lat"": null, ""...",{},,False


### Verify that the recurring tranasactions are found in the transactions table wiht the recurring id

In [55]:

recurrent_transactions_one_user  =pd.merge(df_recurrent_income_ids, df_trans_one_user, left_on = 'transaction_id', right_on = 'id', how='inner')

In [56]:
recurrent_transactions_one_user

Unnamed: 0,id_x,recurring_income_id_x,transaction_id,paid_at,expected_paid_at,frequency,created_at_x,updated_at_x,id_y,account_id,...,marked_as_duplicate,transaction_category_id,bill_id,last_enriched_at,user_id,external_transaction_id,login_provider_additional_attributes,extra,recurring_income_id_y,is_excluded
0,4aa7854d-d206-4e64-b391-99a7c3a48002,4c08ca1c-0e87-427d-9f82-3870cadc3bc9,8505052806,2024-01-02,2023-10-31,1,2024-01-03 10:55:11.520008,2024-01-03 10:55:11.520008,8505052806,15674938,...,False,16,,2024-01-03 10:55:11.289161,1851268,y1pNkKpp3bH5LjJOVL8BuzLjpn0DrOIoOgXd3,"{""datetime"": null, ""location"": {""lat"": null, ""...","{""excluded_reason"": ""between_account_transfer""}",4c08ca1c-0e87-427d-9f82-3870cadc3bc9,
1,ca3a0969-69db-4760-b7ca-5f3b645e427d,4c08ca1c-0e87-427d-9f82-3870cadc3bc9,9147398103,2024-03-08,2024-01-30,1,2024-03-09 09:15:08.369795,2024-03-09 09:15:08.369795,9147398103,15674938,...,False,2,,2024-03-09 09:15:08.114767,1851268,gBaDMraagOHE7pLYJ703F5V9gxd3vkUENRV6E4,"{""datetime"": null, ""location"": {""lat"": null, ""...","{""excluded_reason"": ""refund""}",4c08ca1c-0e87-427d-9f82-3870cadc3bc9,
2,b47dd4c5-e82f-4f0c-9957-1d23d8484d6c,4c08ca1c-0e87-427d-9f82-3870cadc3bc9,9247684253,2024-03-19,2024-04-05,1,2024-03-20 07:11:30.572785,2024-03-20 07:11:30.572785,9247684253,15674938,...,False,16,,2024-03-20 07:11:29.936058,1851268,RgRwoxRR6Xib9Aya09O0CKqKY3ROXZS94PbPn,"{""datetime"": null, ""location"": {""lat"": null, ""...","{""excluded_reason"": ""between_account_transfer""}",4c08ca1c-0e87-427d-9f82-3870cadc3bc9,
3,70385141-04eb-4797-b1c9-e521c9b26eb4,4c08ca1c-0e87-427d-9f82-3870cadc3bc9,9277491072,2024-03-22,2024-04-19,1,2024-03-23 08:48:50.380147,2024-03-23 08:48:50.380147,9277491072,15674939,...,False,16,,2024-03-23 08:48:50.164062,1851268,BAjZpDjjQXUmeaqnAeR6fYLVNjwLzkh7zjpbn,"{""datetime"": null, ""location"": {""lat"": null, ""...","{""excluded_reason"": ""between_account_transfer""}",4c08ca1c-0e87-427d-9f82-3870cadc3bc9,
4,3c1550d6-5f2b-49bb-94b3-1eb3f8ae4e74,4c08ca1c-0e87-427d-9f82-3870cadc3bc9,9316067954,2024-03-25,2024-04-23,1,2024-03-27 19:44:53.197641,2024-03-27 19:44:53.197641,9316067954,15674939,...,False,16,,2024-03-27 19:44:52.750142,1851268,KgNEoANNOYiqN5Qk1N0dH36aMd5PEXUmB8Lqm,"{""datetime"": null, ""location"": {""lat"": null, ""...","{""excluded_reason"": ""between_account_transfer""}",4c08ca1c-0e87-427d-9f82-3870cadc3bc9,
5,68d1eee3-3768-4bcd-9367-e6c867c68da6,4c08ca1c-0e87-427d-9f82-3870cadc3bc9,9378215575,2024-04-02,2024-04-23,1,2024-04-03 11:41:19.687279,2024-04-03 11:41:19.687279,9378215575,15674938,...,False,16,,2024-04-03 11:41:19.272006,1851268,kkpBXJppvOhQMgq5JMBZcwAxgvY7oBcbDmmgz,"{""datetime"": null, ""location"": {""lat"": null, ""...","{""excluded_reason"": ""between_account_transfer""}",4c08ca1c-0e87-427d-9f82-3870cadc3bc9,
6,826f7711-1755-414c-98be-f26cc658b471,4c08ca1c-0e87-427d-9f82-3870cadc3bc9,9508075207,2024-04-16,2024-04-30,1,2024-04-17 09:11:38.549337,2024-04-17 09:11:38.549337,9508075207,15674938,...,False,2,,2024-04-17 09:11:38.240273,1851268,zPdNAkddvbUNDq8X7Dk6ho8e7Lk8DEIn5ygeA,"{""datetime"": null, ""location"": {""lat"": null, ""...","{""excluded_reason"": ""refund""}",4c08ca1c-0e87-427d-9f82-3870cadc3bc9,
7,780cd944-bd47-4e95-9d2e-87dedc365e74,4c08ca1c-0e87-427d-9f82-3870cadc3bc9,9516204498,2024-04-17,2024-05-14,1,2024-04-18 05:28:02.741521,2024-04-18 05:28:02.741521,9516204498,15674938,...,False,16,,2024-04-18 05:28:02.466167,1851268,NmR5ojRRrpsqEX40bELYU8ZkQo3EAyU3P43V0,"{""datetime"": null, ""location"": {""lat"": null, ""...","{""excluded_reason"": ""between_account_transfer""}",4c08ca1c-0e87-427d-9f82-3870cadc3bc9,
8,43c7f7a2-18c3-4f9e-bc39-f6958ec6de09,4c08ca1c-0e87-427d-9f82-3870cadc3bc9,9526304076,2024-04-18,2024-05-15,1,2024-04-19 06:46:46.709236,2024-04-19 06:46:46.709236,9526304076,15674939,...,False,16,,2024-04-19 06:46:46.190454,1851268,wLpNy3ppobFxw8aEPwL8HXXbxdxdrJIMbyvee,"{""datetime"": null, ""location"": {""lat"": null, ""...","{""excluded_reason"": ""between_account_transfer""}",4c08ca1c-0e87-427d-9f82-3870cadc3bc9,
9,ba61320b-2cf4-4c6b-8038-ea3d6ebabd9a,4c08ca1c-0e87-427d-9f82-3870cadc3bc9,9536587941,2024-04-19,2024-05-16,1,2024-04-20 06:07:19.205076,2024-04-20 06:07:19.205076,9536587941,15674938,...,False,16,,2024-04-20 06:07:18.963758,1851268,rap01DppxbU6kgQXokdrcPz7NzwYwzIKaZrvj,"{""datetime"": null, ""location"": {""lat"": null, ""...","{""excluded_reason"": ""between_account_transfer""}",4c08ca1c-0e87-427d-9f82-3870cadc3bc9,


In [57]:
recurrent_transactions_one_user[['user_id','transaction_id','amount','description','corrected_made_on','recurring_income_id_x']]

Unnamed: 0,user_id,transaction_id,amount,description,corrected_made_on,recurring_income_id_x
0,1851268,8505052806,26.99,USAA FUNDS TRANSFER CR,2024-01-02,4c08ca1c-0e87-427d-9f82-3870cadc3bc9
1,1851268,9147398103,72.86,USAA FUNDS TRANSFER CR,2024-03-08,4c08ca1c-0e87-427d-9f82-3870cadc3bc9
2,1851268,9247684253,583.54,USAA FUNDS TRANSFER CR,2024-03-19,4c08ca1c-0e87-427d-9f82-3870cadc3bc9
3,1851268,9277491072,100.0,USAA FUNDS TRANSFER CR,2024-03-22,4c08ca1c-0e87-427d-9f82-3870cadc3bc9
4,1851268,9316067954,400.67,USAA FUNDS TRANSFER CR,2024-03-25,4c08ca1c-0e87-427d-9f82-3870cadc3bc9
5,1851268,9378215575,502.1,USAA FUNDS TRANSFER CR,2024-04-02,4c08ca1c-0e87-427d-9f82-3870cadc3bc9
6,1851268,9508075207,300.0,USAA FUNDS TRANSFER CR,2024-04-16,4c08ca1c-0e87-427d-9f82-3870cadc3bc9
7,1851268,9516204498,500.0,USAA FUNDS TRANSFER CR,2024-04-17,4c08ca1c-0e87-427d-9f82-3870cadc3bc9
8,1851268,9526304076,400.0,USAA FUNDS TRANSFER CR,2024-04-18,4c08ca1c-0e87-427d-9f82-3870cadc3bc9
9,1851268,9536587941,726.49,USAA FUNDS TRANSFER CR,2024-04-19,4c08ca1c-0e87-427d-9f82-3870cadc3bc9


In [62]:
# df_trans[['id','user_id','made_on','corrected_made_on','updated_at','recurring_income_id']]

I don't understand... the information in the snapshot is the date of the transaction...

In [63]:
df_trans_one_user  = redshift_source.fetch_data(f""" select * from transactions where user_id = {one_user_id} and corrected_made_on >'2024-01-01'""")
df_trans_one_user.columns

Index(['id', 'account_id', 'category', 'currency_code', 'amount',
       'description', 'made_on', 'duplicated', 'mode', 'created_at',
       'updated_at', 'status', 'corrected_made_on', 'categorized_by_user',
       'uuid', 'marked_as_duplicate', 'transaction_category_id', 'bill_id',
       'last_enriched_at', 'user_id', 'external_transaction_id',
       'login_provider_additional_attributes', 'extra', 'recurring_income_id',
       'is_excluded'],
      dtype='object')

In [64]:
df_trans_one_user[df_trans_one_user['recurring_income_id']=='4c08ca1c-0e87-427d-9f82-3870cadc3bc9']

Unnamed: 0,id,account_id,category,currency_code,amount,description,made_on,duplicated,mode,created_at,...,marked_as_duplicate,transaction_category_id,bill_id,last_enriched_at,user_id,external_transaction_id,login_provider_additional_attributes,extra,recurring_income_id,is_excluded
17,8505052806,15674938,21005000,USD,26.99,USAA FUNDS TRANSFER CR,2024-01-02,,,2024-01-03 10:52:56.209095,...,False,16,,2024-01-03 10:55:11.289161,1851268,y1pNkKpp3bH5LjJOVL8BuzLjpn0DrOIoOgXd3,"{""datetime"": null, ""location"": {""lat"": null, ""...","{""excluded_reason"": ""between_account_transfer""}",4c08ca1c-0e87-427d-9f82-3870cadc3bc9,
321,9147398103,15674938,21005000,USD,72.86,USAA FUNDS TRANSFER CR,2024-03-08,,,2024-03-09 09:13:16.744171,...,False,2,,2024-03-09 09:15:08.114767,1851268,gBaDMraagOHE7pLYJ703F5V9gxd3vkUENRV6E4,"{""datetime"": null, ""location"": {""lat"": null, ""...","{""excluded_reason"": ""refund""}",4c08ca1c-0e87-427d-9f82-3870cadc3bc9,
366,9247684253,15674938,21005000,USD,583.54,USAA FUNDS TRANSFER CR,2024-03-19,,,2024-03-20 07:09:20.961325,...,False,16,,2024-03-20 07:11:29.936058,1851268,RgRwoxRR6Xib9Aya09O0CKqKY3ROXZS94PbPn,"{""datetime"": null, ""location"": {""lat"": null, ""...","{""excluded_reason"": ""between_account_transfer""}",4c08ca1c-0e87-427d-9f82-3870cadc3bc9,
386,9277491072,15674939,21005000,USD,100.0,USAA FUNDS TRANSFER CR,2024-03-22,,,2024-03-23 08:47:11.519328,...,False,16,,2024-03-23 08:48:50.164062,1851268,BAjZpDjjQXUmeaqnAeR6fYLVNjwLzkh7zjpbn,"{""datetime"": null, ""location"": {""lat"": null, ""...","{""excluded_reason"": ""between_account_transfer""}",4c08ca1c-0e87-427d-9f82-3870cadc3bc9,
409,9316067954,15674939,21005000,USD,400.67,USAA FUNDS TRANSFER CR,2024-03-25,,,2024-03-27 19:43:15.870825,...,False,16,,2024-03-27 19:44:52.750142,1851268,KgNEoANNOYiqN5Qk1N0dH36aMd5PEXUmB8Lqm,"{""datetime"": null, ""location"": {""lat"": null, ""...","{""excluded_reason"": ""between_account_transfer""}",4c08ca1c-0e87-427d-9f82-3870cadc3bc9,
428,9378215575,15674938,21005000,USD,502.1,USAA FUNDS TRANSFER CR,2024-04-02,,,2024-04-03 11:40:33.385291,...,False,16,,2024-04-03 11:41:19.272006,1851268,kkpBXJppvOhQMgq5JMBZcwAxgvY7oBcbDmmgz,"{""datetime"": null, ""location"": {""lat"": null, ""...","{""excluded_reason"": ""between_account_transfer""}",4c08ca1c-0e87-427d-9f82-3870cadc3bc9,
496,9508075207,15674938,21005000,USD,300.0,USAA FUNDS TRANSFER CR,2024-04-16,,,2024-04-17 09:09:29.688377,...,False,2,,2024-04-17 09:11:38.240273,1851268,zPdNAkddvbUNDq8X7Dk6ho8e7Lk8DEIn5ygeA,"{""datetime"": null, ""location"": {""lat"": null, ""...","{""excluded_reason"": ""refund""}",4c08ca1c-0e87-427d-9f82-3870cadc3bc9,
509,9516204498,15674938,21005000,USD,500.0,USAA FUNDS TRANSFER CR,2024-04-17,,,2024-04-18 05:25:02.428877,...,False,16,,2024-04-18 05:28:02.466167,1851268,NmR5ojRRrpsqEX40bELYU8ZkQo3EAyU3P43V0,"{""datetime"": null, ""location"": {""lat"": null, ""...","{""excluded_reason"": ""between_account_transfer""}",4c08ca1c-0e87-427d-9f82-3870cadc3bc9,
525,9526304076,15674939,21005000,USD,400.0,USAA FUNDS TRANSFER CR,2024-04-18,,,2024-04-19 06:45:26.588920,...,False,16,,2024-04-19 06:46:46.190454,1851268,wLpNy3ppobFxw8aEPwL8HXXbxdxdrJIMbyvee,"{""datetime"": null, ""location"": {""lat"": null, ""...","{""excluded_reason"": ""between_account_transfer""}",4c08ca1c-0e87-427d-9f82-3870cadc3bc9,
526,9526304077,15674938,21005000,USD,300.0,USAA FUNDS TRANSFER CR,2024-04-18,,,2024-04-19 06:45:26.588920,...,False,16,,2024-04-19 06:46:46.194059,1851268,MgRQE0RRrpij4QwmV48QTrrzqQqQyXtkY5pkE,"{""datetime"": null, ""location"": {""lat"": null, ""...","{""excluded_reason"": ""between_account_transfer""}",4c08ca1c-0e87-427d-9f82-3870cadc3bc9,


In [65]:
recurrent_transactions_one_user

Unnamed: 0,id_x,recurring_income_id_x,transaction_id,paid_at,expected_paid_at,frequency,created_at_x,updated_at_x,id_y,account_id,...,marked_as_duplicate,transaction_category_id,bill_id,last_enriched_at,user_id,external_transaction_id,login_provider_additional_attributes,extra,recurring_income_id_y,is_excluded
0,4aa7854d-d206-4e64-b391-99a7c3a48002,4c08ca1c-0e87-427d-9f82-3870cadc3bc9,8505052806,2024-01-02,2023-10-31,1,2024-01-03 10:55:11.520008,2024-01-03 10:55:11.520008,8505052806,15674938,...,False,16,,2024-01-03 10:55:11.289161,1851268,y1pNkKpp3bH5LjJOVL8BuzLjpn0DrOIoOgXd3,"{""datetime"": null, ""location"": {""lat"": null, ""...","{""excluded_reason"": ""between_account_transfer""}",4c08ca1c-0e87-427d-9f82-3870cadc3bc9,
1,ca3a0969-69db-4760-b7ca-5f3b645e427d,4c08ca1c-0e87-427d-9f82-3870cadc3bc9,9147398103,2024-03-08,2024-01-30,1,2024-03-09 09:15:08.369795,2024-03-09 09:15:08.369795,9147398103,15674938,...,False,2,,2024-03-09 09:15:08.114767,1851268,gBaDMraagOHE7pLYJ703F5V9gxd3vkUENRV6E4,"{""datetime"": null, ""location"": {""lat"": null, ""...","{""excluded_reason"": ""refund""}",4c08ca1c-0e87-427d-9f82-3870cadc3bc9,
2,b47dd4c5-e82f-4f0c-9957-1d23d8484d6c,4c08ca1c-0e87-427d-9f82-3870cadc3bc9,9247684253,2024-03-19,2024-04-05,1,2024-03-20 07:11:30.572785,2024-03-20 07:11:30.572785,9247684253,15674938,...,False,16,,2024-03-20 07:11:29.936058,1851268,RgRwoxRR6Xib9Aya09O0CKqKY3ROXZS94PbPn,"{""datetime"": null, ""location"": {""lat"": null, ""...","{""excluded_reason"": ""between_account_transfer""}",4c08ca1c-0e87-427d-9f82-3870cadc3bc9,
3,70385141-04eb-4797-b1c9-e521c9b26eb4,4c08ca1c-0e87-427d-9f82-3870cadc3bc9,9277491072,2024-03-22,2024-04-19,1,2024-03-23 08:48:50.380147,2024-03-23 08:48:50.380147,9277491072,15674939,...,False,16,,2024-03-23 08:48:50.164062,1851268,BAjZpDjjQXUmeaqnAeR6fYLVNjwLzkh7zjpbn,"{""datetime"": null, ""location"": {""lat"": null, ""...","{""excluded_reason"": ""between_account_transfer""}",4c08ca1c-0e87-427d-9f82-3870cadc3bc9,
4,3c1550d6-5f2b-49bb-94b3-1eb3f8ae4e74,4c08ca1c-0e87-427d-9f82-3870cadc3bc9,9316067954,2024-03-25,2024-04-23,1,2024-03-27 19:44:53.197641,2024-03-27 19:44:53.197641,9316067954,15674939,...,False,16,,2024-03-27 19:44:52.750142,1851268,KgNEoANNOYiqN5Qk1N0dH36aMd5PEXUmB8Lqm,"{""datetime"": null, ""location"": {""lat"": null, ""...","{""excluded_reason"": ""between_account_transfer""}",4c08ca1c-0e87-427d-9f82-3870cadc3bc9,
5,68d1eee3-3768-4bcd-9367-e6c867c68da6,4c08ca1c-0e87-427d-9f82-3870cadc3bc9,9378215575,2024-04-02,2024-04-23,1,2024-04-03 11:41:19.687279,2024-04-03 11:41:19.687279,9378215575,15674938,...,False,16,,2024-04-03 11:41:19.272006,1851268,kkpBXJppvOhQMgq5JMBZcwAxgvY7oBcbDmmgz,"{""datetime"": null, ""location"": {""lat"": null, ""...","{""excluded_reason"": ""between_account_transfer""}",4c08ca1c-0e87-427d-9f82-3870cadc3bc9,
6,826f7711-1755-414c-98be-f26cc658b471,4c08ca1c-0e87-427d-9f82-3870cadc3bc9,9508075207,2024-04-16,2024-04-30,1,2024-04-17 09:11:38.549337,2024-04-17 09:11:38.549337,9508075207,15674938,...,False,2,,2024-04-17 09:11:38.240273,1851268,zPdNAkddvbUNDq8X7Dk6ho8e7Lk8DEIn5ygeA,"{""datetime"": null, ""location"": {""lat"": null, ""...","{""excluded_reason"": ""refund""}",4c08ca1c-0e87-427d-9f82-3870cadc3bc9,
7,780cd944-bd47-4e95-9d2e-87dedc365e74,4c08ca1c-0e87-427d-9f82-3870cadc3bc9,9516204498,2024-04-17,2024-05-14,1,2024-04-18 05:28:02.741521,2024-04-18 05:28:02.741521,9516204498,15674938,...,False,16,,2024-04-18 05:28:02.466167,1851268,NmR5ojRRrpsqEX40bELYU8ZkQo3EAyU3P43V0,"{""datetime"": null, ""location"": {""lat"": null, ""...","{""excluded_reason"": ""between_account_transfer""}",4c08ca1c-0e87-427d-9f82-3870cadc3bc9,
8,43c7f7a2-18c3-4f9e-bc39-f6958ec6de09,4c08ca1c-0e87-427d-9f82-3870cadc3bc9,9526304076,2024-04-18,2024-05-15,1,2024-04-19 06:46:46.709236,2024-04-19 06:46:46.709236,9526304076,15674939,...,False,16,,2024-04-19 06:46:46.190454,1851268,wLpNy3ppobFxw8aEPwL8HXXbxdxdrJIMbyvee,"{""datetime"": null, ""location"": {""lat"": null, ""...","{""excluded_reason"": ""between_account_transfer""}",4c08ca1c-0e87-427d-9f82-3870cadc3bc9,
9,ba61320b-2cf4-4c6b-8038-ea3d6ebabd9a,4c08ca1c-0e87-427d-9f82-3870cadc3bc9,9536587941,2024-04-19,2024-05-16,1,2024-04-20 06:07:19.205076,2024-04-20 06:07:19.205076,9536587941,15674938,...,False,16,,2024-04-20 06:07:18.963758,1851268,rap01DppxbU6kgQXokdrcPz7NzwYwzIKaZrvj,"{""datetime"": null, ""location"": {""lat"": null, ""...","{""excluded_reason"": ""between_account_transfer""}",4c08ca1c-0e87-427d-9f82-3870cadc3bc9,


# Count different type of errors 

Use the data model described [here](https://www.notion.so/meetcleo/Recurring-Income-Annotations-0746262cb28f4a5f8d2da7cf7cf3f63b)

In [66]:
df_data.head()

Unnamed: 0,id,user_id,recurring_income_id,original_matched_transaction_id,original_matched_transaction_correct,new_matched_transaction_id,no_matched_transaction_reason,recurring_income_snapshot,created_at,updated_at,deleted_at,originating_response_id,recurring_income_snapshot_dict,recurring_income_amount,recurring_income_frequency,recurring_income_last_received_at,recurring_income_next_payment_expected
0,9e06a6e3-ff07-4aab-85df-7b0fbc21f28f,8294574,4d0b789d-7c13-4ebf-a023-a1f32ae5ac13,,True,,cancelled,"{""amount"": ""1197.3"", ""frequency"": ""fortnightly...",2024-01-01 00:05:19.365063,2024-01-01 00:05:32.516116,,3261480183,"{'amount': '1197.3', 'frequency': 'fortnightly...",1197.3,fortnightly,2023-12-14,2023-12-28
1,eefaeb60-4b70-4781-923c-e1d55900c821,4268487,d34e8e57-84af-4197-9580-ad67f836981f,,True,,delayed,"{""amount"": ""881.0"", ""frequency"": ""monthly"", ""l...",2024-01-01 00:07:29.894694,2024-01-01 00:07:36.144785,,3261491291,"{'amount': '881.0', 'frequency': 'monthly', 'l...",881.0,monthly,2023-11-29,2023-12-27
2,be4accdc-5077-4fd2-b24e-3cbeef445b1a,5558806,c11b01eb-a226-481f-b721-8e421ce658f4,,False,,tx_not_found,"{""amount"": ""491.67"", ""frequency"": ""weekly"", ""l...",2024-01-01 00:11:29.924851,2024-01-01 00:11:29.960064,,3261517316,"{'amount': '491.67', 'frequency': 'weekly', 'l...",491.67,weekly,2023-12-21,2023-12-28
3,1d4881cd-6fc5-43b7-a1bd-69ba1179f8ed,6176988,096b8146-de0f-441a-9b48-ee8c8a8602f8,,False,,no_tx_selected,"{""amount"": ""1172.39"", ""frequency"": ""fortnightl...",2024-01-01 00:24:24.740282,2024-01-01 00:24:32.828322,,3261586866,"{'amount': '1172.39', 'frequency': 'fortnightl...",1172.39,fortnightly,2023-12-20,2023-12-29
4,512cde1f-9452-4f29-b065-ec42890c65b5,6601830,0c76c83b-a347-4cae-a9f0-0b896c551bd7,,True,,,"{""amount"": ""650.3"", ""frequency"": ""weekly"", ""la...",2024-01-01 00:30:04.677027,2024-01-01 00:30:04.677027,,3261627735,"{'amount': '650.3', 'frequency': 'weekly', 'la...",650.3,weekly,2023-12-07,2023-12-21


### True positives 

original_matched_transaction_correct = True, and original_matched_transaction_id != None

In [67]:

true_positives_data = df_data[(df_data['original_matched_transaction_correct']==True) & (~df_data['original_matched_transaction_id'].isnull())][:]
true_positives_data.reset_index(drop=True, inplace=True)
true_positives = true_positives_data.shape[0]

### False positive

original_matched_transaction_correct = False, and original_matched_transaction_id != None


- matched to wrong transaction and user can select new transaction
- matched to wrong transaction, but the true transaction is delayed, cancelled or replaced
- matched to wrong transaction , but user can't find the right transaction among the options shown




If the reason is delayed, cancelled or None, then I think this isn't that we missed something, so these should be excluded and maybe counted sepearately
IF the reason is tx_not_found , no_tx_selected then it means the user is looking for the transaction but couldn't find it (not sure about this interpretation), so we count these as misses. 

In [68]:

false_positives_data = df_data[(df_data['original_matched_transaction_correct']==False) & (~df_data['original_matched_transaction_id'].isnull())][:]
false_positives_data.reset_index(drop=True, inplace=True)
false_positives = false_positives_data.shape[0]

### False negatives

original_matched_transaction_correct = False, and original_matched_transaction_id = None 


- It's falsely saying there is no match. 
- The user can say which transaction it should have matched to  or 
- the user can say it didn't see which transaction was the right one, 




In [69]:
false_negatives_data = df_data[(df_data['original_matched_transaction_correct']==False)  & (df_data['original_matched_transaction_id'].isnull() ) ]


false_negatives = false_negatives_data.shape[0]

### True negatives

original_matched_transaction_correct = True, and original_matched_transaction_id = None

Find all transactions between 'recurring_income_last_received_at' and 'created_at'

In [70]:
sql_trans_with_no_recurr_id_and_correct = """  
SELECT
    r.*, t.*
FROM
    recurring_income_annotations r
JOIN
    transactions t
ON
    r.user_id = t.user_id
WHERE
    r.original_matched_transaction_correct = True
    AND
    r.original_matched_transaction_id is NULL
    AND
    r.created_at>'2024-01-01'
    AND
    t.corrected_made_on BETWEEN
        to_date(split_part(split_part(r.recurring_income_snapshot, '"recurring_income_last_received_at":"', 2), '"', 1), 'YYYY-MM-DD')
        AND r.created_at

    LIMIT 1000
"""

sql_trans_with_no_recurr_id_and_correct_count = """  
SELECT
    COUNT(*)
FROM
    recurring_income_annotations r
JOIN
    transactions t
ON
    r.user_id = t.user_id
WHERE
    r.original_matched_transaction_correct = True
    AND
    r.original_matched_transaction_id is NULL
    AND
    r.created_at>'2024-01-01'
    AND
    t.corrected_made_on BETWEEN
        to_date(split_part(split_part(r.recurring_income_snapshot, '"recurring_income_last_received_at":"', 2), '"', 1), 'YYYY-MM-DD')
        AND r.created_at
"""

In [71]:
df_trans_no_recurr_ids_correct_sample = redshift_source.fetch_data(sql_trans_with_no_recurr_id_and_correct)
true_negatives = redshift_source.fetch_data(sql_trans_with_no_recurr_id_and_correct_count)['count'][0]
true_negatives

44149607

## Precision and Recall for each class

There are no special weights applied here. There is also a sampling of the transactions when the data is collected


### Precision of the positive class (recurrent transaction matching) and the negative class (not a recurrent transaction)

In [72]:
precision_1 = true_positives/(true_positives+false_positives)
recall_1 = true_positives/(true_positives + false_negatives)

precision_0 = true_negatives/(true_negatives+false_negatives)
recall_0 = true_negatives/(true_negatives + false_positives)

print(f"Positive class: precision: {precision_1} recall: {recall_1}")
print(f"Negative class: precision:  {precision_0} recall: {recall_0}")

print(f"true_positives {true_positives} false_positives {false_positives} ")
print(f"true_negatives {true_negatives} false_negatives {false_negatives} ")

Positive class: precision: 0.968503937007874 recall: 0.04913121629718394
Negative class: precision:  0.999461101170994 recall: 0.999999093990491
true_positives 1230 false_positives 40 
true_negatives 44149607 false_negatives 23805 


### False negatives 2

If we only consider false negatives as those transactions where  new_matched_transaction_id is present

In [73]:
false_negatives_data2 = df_data[(df_data['original_matched_transaction_correct']==False)  & (df_data['original_matched_transaction_id'].isnull() )  & (~df_data['new_matched_transaction_id'].isnull() ) ]


false_negatives2 = false_negatives_data2.shape[0]

In [74]:
precision_1 = true_positives/(true_positives+false_positives)
recall_1 = true_positives/(true_positives + false_negatives2)

precision_0 = true_negatives/(true_negatives+false_negatives2)
recall_0 = true_negatives/(true_negatives + false_positives)

print(f"Positive class: precision: {precision_1} recall: {recall_1}")
print(f"Negative class: precision:  {precision_0} recall: {recall_0}")

print(f"true_positives {true_positives} false_positives {false_positives} ")
print(f"true_negatives {true_negatives} false_negatives2 {false_negatives2} ")

Positive class: precision: 0.968503937007874 recall: 0.1320734457210351
Negative class: precision:  0.9998169514754961 recall: 0.999999093990491
true_positives 1230 false_positives 40 
true_negatives 44149607 false_negatives2 8083 


### As a separate case, we can look at when the recurrent matcher is asking for a match possibly before the time when  it should

In [75]:
delayed_data = df_data[df_data['no_matched_transaction_reason'] == 'delayed']

delayed_data_count = delayed_data.shape[0]
delayed_data_count

11742

Rate of true matches but wrong timing

rate of delayed tagging = marked as delayed/(true posities + false negatives)

In [76]:
print(f"Percentage of delayed matches   {delayed_data_count/(true_positives + false_negatives)}")

Percentage of delayed matches   0.46902336728579985


# Summary

- There is a recall problem for the recurrent transactions: meaning we have a lot of false negatives. 
- The percentage of no match due to delayed, probably means that if timing had been better the match would have probably happened

# Debug some users or transactions

In [77]:
#cleo is expecting a transaction that has not come in
debug_user_ids = ['9474972','9573961', '5463590']

In [78]:
debug_user_id = debug_user_ids[2]

This user was asked "where the money had gone" on **May 28**.

As can be seen, the payment was expected on the 26th and hadn't come in yet by the 28th

ACTIONS: The user annotations has 2 rows --> **this is something we need to get rid off**, as probably it is the second row the one we want to keep, where it is specified that there is a delayed payment.

In [79]:
sql_annotations_one_user = f""" select *
from recurring_income_annotations
where  user_id = {debug_user_id}
"""

df_annot_one_user = redshift_source.fetch_data(sql_annotations_one_user)
df_annot_one_user['recurring_income_snapshot_dict'] = df_annot_one_user['recurring_income_snapshot'].apply(json.loads)
for x  in ['amount','frequency','last_received_at','next_payment_expected']:
    df_annot_one_user['recurring_income_'+x] = df_annot_one_user['recurring_income_snapshot_dict'].apply(lambda z: z[x])

df_annot_one_user

Unnamed: 0,id,user_id,recurring_income_id,original_matched_transaction_id,original_matched_transaction_correct,new_matched_transaction_id,no_matched_transaction_reason,recurring_income_snapshot,created_at,updated_at,deleted_at,originating_response_id,recurring_income_snapshot_dict,recurring_income_amount,recurring_income_frequency,recurring_income_last_received_at,recurring_income_next_payment_expected
0,3fec7bcf-0142-4295-94f3-e992c0e1e2d0,5463590,1cdae4c1-1e41-4a36-90ce-20931c037c33,,False,,,"{""amount"": ""6232.54"", ""frequency"": ""monthly"", ...",2024-05-28 12:54:47.251183,2024-05-28 12:54:47.251183,,3717393426,"{'amount': '6232.54', 'frequency': 'monthly', ...",6232.54,monthly,2024-04-26,2024-05-26
1,97783eb9-5472-48fe-b5fc-dd687671ba7c,5463590,1cdae4c1-1e41-4a36-90ce-20931c037c33,,True,,delayed,"{""amount"": ""6232.54"", ""frequency"": ""monthly"", ...",2024-05-28 12:55:12.503001,2024-05-28 12:55:16.261497,,3717393575,"{'amount': '6232.54', 'frequency': 'monthly', ...",6232.54,monthly,2024-04-26,2024-05-26


Let's look at all the transactions for this user

In [80]:

sql_recurr_trx_debug = f""" with recurr_one_user as (select *
    from recurring_incomes
where user_id = {debug_user_id})

select *
from recurring_income_transactions rit
join recurr_one_user
on recurr_one_user.id = rit.recurring_income_id """

df_data_recurr_tx = redshift_source.fetch_data(sql_recurr_trx_debug)
df_data_recurr_tx

Unnamed: 0,id,recurring_income_id,transaction_id,paid_at,expected_paid_at,frequency,created_at,updated_at,id.1,user_id,...,created_at.1,updated_at.1,source,schedule_classification,manual,payer,deleted_at,external_candidate_id,cancelled_at,cancelled_reason
0,55b8dc8f-dba8-48e5-8df9-17b46f6d24c0,1cdae4c1-1e41-4a36-90ce-20931c037c33,9944690909,2024-05-28,2024-05-28,0,2024-05-29 03:23:59.043315,2024-05-29 03:23:59.043315,1cdae4c1-1e41-4a36-90ce-20931c037c33,5463590,...,2022-06-07 09:17:48.956332,2024-05-29 03:23:59.049522,new_income_flow,,,,,,,
1,15b45b01-e09c-4362-9f5f-096db01b7e8b,1cdae4c1-1e41-4a36-90ce-20931c037c33,9861895887,2024-04-26,2022-08-26,0,2024-05-21 11:43:07.665687,2024-05-21 11:43:07.665687,1cdae4c1-1e41-4a36-90ce-20931c037c33,5463590,...,2022-06-07 09:17:48.956332,2024-05-29 03:23:59.049522,new_income_flow,,,,,,,
2,82376863-e1e7-45d6-94a8-2d3c7e3902fb,1cdae4c1-1e41-4a36-90ce-20931c037c33,4621710623,2022-07-28,2022-07-28,0,2022-07-28 05:27:32.377402,2022-07-28 05:27:32.377402,1cdae4c1-1e41-4a36-90ce-20931c037c33,5463590,...,2022-06-07 09:17:48.956332,2024-05-29 03:23:59.049522,new_income_flow,,,,,,,
3,8b81e6e7-f16b-423f-9abb-b9153f02d227,1cdae4c1-1e41-4a36-90ce-20931c037c33,4464173803,2022-06-28,2022-06-28,0,2022-06-28 04:21:54.804682,2022-06-28 04:21:54.804682,1cdae4c1-1e41-4a36-90ce-20931c037c33,5463590,...,2022-06-07 09:17:48.956332,2024-05-29 03:23:59.049522,new_income_flow,,,,,,,


For this user, it appears the timing was off, previous pay had been done on the 26th, because of weekend, so payday would have fallen on weekend.

ACTIONS
* It seems that if the previous payday is a Friday, we have a to give a range of 3 days (Friday, Saturday or Sunday) to take as expected pay date, or, if there are more than 1 previous transactions look at the most common day of the month

# Deep dive into recall 

In [81]:
false_negatives_data2

Unnamed: 0,id,user_id,recurring_income_id,original_matched_transaction_id,original_matched_transaction_correct,new_matched_transaction_id,no_matched_transaction_reason,recurring_income_snapshot,created_at,updated_at,deleted_at,originating_response_id,recurring_income_snapshot_dict,recurring_income_amount,recurring_income_frequency,recurring_income_last_received_at,recurring_income_next_payment_expected
12,488888e1-b7ae-45d0-ab28-153361a680f0,6649936,ca056fc3-1484-4b5a-92d2-ac8e9cf1a456,,False,8486566427,,"{""amount"": ""962.98"", ""frequency"": ""fortnightly...",2024-01-01 01:06:52.383231,2024-01-01 01:07:05.372498,,3261781977,"{'amount': '962.98', 'frequency': 'fortnightly...",962.98,fortnightly,2023-12-13,2023-12-27
15,a50b736a-6d6f-4009-954d-ce5ef82b3930,7785305,b2874dd8-0a22-46af-807a-6feac01d2db9,,False,8389495128,,"{""amount"": ""275.5"", ""frequency"": ""weekly"", ""la...",2024-01-01 01:21:26.804790,2024-01-01 01:21:34.698709,,3261802644,"{'amount': '275.5', 'frequency': 'weekly', 'la...",275.5,weekly,2023-12-15,2023-12-22
20,413c1306-5357-42d4-a72a-dd43b26977de,7076098,aa3eea3e-e39e-4f80-b59b-06c41da92da2,,False,8444667461,,"{""amount"": ""685.67"", ""frequency"": ""weekly"", ""l...",2024-01-01 01:42:49.718000,2024-01-01 01:42:59.231836,,3261843804,"{'amount': '685.67', 'frequency': 'weekly', 'l...",685.67,weekly,2023-12-22,2023-12-29
21,1528ec90-0491-41ed-bcde-4a61a90621c5,8416212,5949b1b3-711d-4e7c-8ff7-4aad473744e0,,False,8468593965,,"{""amount"": ""4540.31"", ""frequency"": ""monthly"", ...",2024-01-01 01:45:42.246468,2024-01-01 01:45:53.054392,,3261850443,"{'amount': '4540.31', 'frequency': 'monthly', ...",4540.31,monthly,2023-12-01,2023-12-29
22,d90729c3-07d8-4a83-9f62-05e7ac4eec08,8592519,3478f4c8-1b4a-439e-8fda-ab1e2d8b21ad,,False,8468459174,,"{""amount"": ""362.74"", ""frequency"": ""monthly"", ""...",2024-01-01 01:49:28.224574,2024-01-01 01:49:44.432317,,3261851762,"{'amount': '362.74', 'frequency': 'monthly', '...",362.74,monthly,2023-12-28,2023-12-28
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
42324,bb1deaf6-8fad-4b31-9809-3316f6a7cb4c,9285711,b796c6eb-a019-465a-a60a-0cf57ac9a2b9,,False,9894623122,,"{""amount"": ""946.58"", ""frequency"": ""weekly"", ""l...",2024-05-31 07:24:35.994606,2024-05-31 07:24:43.301335,,3726846065,"{'amount': '946.58', 'frequency': 'weekly', 'l...",946.58,weekly,2024-05-15,2024-05-22
42325,fe3149cc-63b2-4011-8c80-c95d25a76fa0,8458723,f7005993-b33d-4356-8b7e-5ac220a47626,,False,9818834601,,"{""amount"": ""1875.0"", ""frequency"": ""fortnightly...",2024-05-31 07:45:11.623153,2024-05-31 07:45:31.394570,,3726851781,"{'amount': '1875.0', 'frequency': 'fortnightly...",1875.0,fortnightly,2024-05-15,2024-05-29
42330,6b26bf9c-a911-4f5d-8873-60516743f66b,7753696,7310144c-2d80-4042-82f5-420e7c74ca62,,False,9810927649,,"{""amount"": ""962.33"", ""frequency"": ""weekly"", ""l...",2024-05-31 08:42:14.396038,2024-05-31 08:42:22.351929,,3726887852,"{'amount': '962.33', 'frequency': 'weekly', 'l...",962.33,weekly,2024-04-30,2024-05-07
42333,5fe62524-8c1c-45f7-aef2-c4628025b208,9629455,5bbdae6c-03c4-47a0-933a-f28fd0337663,,False,9885887281,,"{""amount"": ""568.47"", ""frequency"": ""weekly"", ""l...",2024-05-31 09:01:04.390238,2024-05-31 09:01:18.509276,,3726907285,"{'amount': '568.47', 'frequency': 'weekly', 'l...",568.47,weekly,2024-05-08,2024-05-29


# In the missed matched: number of transactions per recurrent_id

In [82]:
df_missed_recurr_ids_counts =  false_negatives_data2.groupby('recurring_income_id')['id'].count().to_frame().sort_values(by='id', ascending=False)
df_missed_recurr_ids_counts.reset_index(drop=False, inplace=True)
df_missed_recurr_ids_counts

Unnamed: 0,recurring_income_id,id
0,22e0da47-ab9a-4bea-bbe8-6abcdf270a7a,11
1,4e2e5072-8976-4c0e-b21a-0c36ab63f414,8
2,fe60cf39-53a2-40c7-b2ba-8cf8e7f232de,7
3,ce65fa76-b61a-41fd-94b0-92f8f9d6e8ec,7
4,6ae64e79-5567-4c91-a894-5bc9cc638a9e,6
...,...,...
7308,58d5b862-5743-48c8-86b3-2256a2c35f85,1
7309,58d2e31f-7802-402d-90a1-05b28de841fd,1
7310,58cce058-461c-4780-b2f3-16446465a9a0,1
7311,58bdd4a6-4d71-4f38-86d3-fe1b17717620,1


In [83]:
false_negatives_data2[false_negatives_data2['recurring_income_id'] =='22e0da47-ab9a-4bea-bbe8-6abcdf270a7a']

Unnamed: 0,id,user_id,recurring_income_id,original_matched_transaction_id,original_matched_transaction_correct,new_matched_transaction_id,no_matched_transaction_reason,recurring_income_snapshot,created_at,updated_at,deleted_at,originating_response_id,recurring_income_snapshot_dict,recurring_income_amount,recurring_income_frequency,recurring_income_last_received_at,recurring_income_next_payment_expected
4239,9b86a426-74fe-40ad-a866-1657a22ab4a0,4697263,22e0da47-ab9a-4bea-bbe8-6abcdf270a7a,,False,8590126606,,"{""amount"": ""250.0"", ""frequency"": ""weekly"", ""la...",2024-01-13 20:15:25.104675,2024-01-13 20:15:30.967511,,3302501258,"{'amount': '250.0', 'frequency': 'weekly', 'la...",250.0,weekly,2024-01-04,2024-01-11
6687,49e08781-b19a-4c60-b0e5-a6824f9b15bf,4697263,22e0da47-ab9a-4bea-bbe8-6abcdf270a7a,,False,8656369025,,"{""amount"": ""250.0"", ""frequency"": ""weekly"", ""la...",2024-01-22 12:02:09.278476,2024-01-22 12:02:14.591111,,3329804649,"{'amount': '250.0', 'frequency': 'weekly', 'la...",250.0,weekly,2024-01-11,2024-01-18
8460,9e41274f-57b5-4b39-a91d-9b9b8680b7c4,4697263,22e0da47-ab9a-4bea-bbe8-6abcdf270a7a,,False,8715474644,,"{""amount"": ""250.0"", ""frequency"": ""weekly"", ""la...",2024-01-28 01:03:30.112215,2024-01-28 01:03:47.013701,,3347980640,"{'amount': '250.0', 'frequency': 'weekly', 'la...",250.0,weekly,2024-01-18,2024-01-25
11989,b8ac1933-39b6-4f97-8666-e04303024562,4697263,22e0da47-ab9a-4bea-bbe8-6abcdf270a7a,,False,8792807021,,"{""amount"": ""250.0"", ""frequency"": ""weekly"", ""la...",2024-02-07 13:30:56.619166,2024-02-07 13:31:04.145147,,3384421216,"{'amount': '250.0', 'frequency': 'weekly', 'la...",250.0,weekly,2024-01-25,2024-02-01
13946,089add58-a77a-4189-a495-dff777b370c9,4697263,22e0da47-ab9a-4bea-bbe8-6abcdf270a7a,,False,8861621130,,"{""amount"": ""250.0"", ""frequency"": ""weekly"", ""la...",2024-02-14 15:58:47.927596,2024-02-14 15:58:59.972975,,3405283301,"{'amount': '250.0', 'frequency': 'weekly', 'la...",250.0,weekly,2024-02-01,2024-02-08
15953,31922a7e-22d2-4dd1-bc90-3d4dd019d39b,4697263,22e0da47-ab9a-4bea-bbe8-6abcdf270a7a,,False,8924632601,,"{""amount"": ""250.0"", ""frequency"": ""weekly"", ""la...",2024-02-21 23:26:34.382026,2024-02-21 23:26:39.965157,,3429975127,"{'amount': '250.0', 'frequency': 'weekly', 'la...",250.0,weekly,2024-02-08,2024-02-15
17737,9a70eb55-0a87-403e-982c-7639a293acbb,4697263,22e0da47-ab9a-4bea-bbe8-6abcdf270a7a,,False,8996923006,,"{""amount"": ""250.0"", ""frequency"": ""weekly"", ""la...",2024-02-28 18:14:11.188985,2024-02-28 18:14:15.925120,,3448961434,"{'amount': '250.0', 'frequency': 'weekly', 'la...",250.0,weekly,2024-02-15,2024-02-22
21232,b40650e6-5818-4a15-af00-9bf67d05d972,4697263,22e0da47-ab9a-4bea-bbe8-6abcdf270a7a,,False,9055684235,,"{""amount"": ""250.0"", ""frequency"": ""weekly"", ""la...",2024-03-13 19:53:02.097030,2024-03-13 19:53:08.955113,,3498051862,"{'amount': '250.0', 'frequency': 'weekly', 'la...",250.0,weekly,2024-02-22,2024-02-29
30943,e0d6a661-9af7-4b05-8365-015c9832c8f7,4697263,22e0da47-ab9a-4bea-bbe8-6abcdf270a7a,,False,9521471211,,"{""amount"": ""250.0"", ""frequency"": ""weekly"", ""la...",2024-04-19 10:34:09.545480,2024-04-19 10:34:30.779681,,3598979131,"{'amount': '250.0', 'frequency': 'weekly', 'la...",250.0,weekly,2024-04-11,2024-04-17
36520,c860e10a-2c00-41c1-b20d-dc890e5cb12c,4697263,22e0da47-ab9a-4bea-bbe8-6abcdf270a7a,,False,9663691396,,"{""amount"": ""250.0"", ""frequency"": ""weekly"", ""la...",2024-05-09 16:00:52.989924,2024-05-09 16:01:05.388947,,3665197497,"{'amount': '250.0', 'frequency': 'weekly', 'la...",250.0,weekly,2024-04-25,2024-05-02


In [84]:
debug_user_id = '4697263'
# How many recurrent incomes does this user have? Only one active
sql_recurr_ids_per_user =f"""select *
from recurring_incomes
where user_id = {debug_user_id} """
df_recurrent_income_ids_one_user = redshift_source.fetch_data(sql_recurr_ids_per_user)
df_recurrent_income_ids_one_user

Unnamed: 0,id,user_id,description,next_payment_expected,last_received_at,amount,frequency,currency_code,merchant_id,created_at,updated_at,source,schedule_classification,manual,payer,deleted_at,external_candidate_id,cancelled_at,cancelled_reason
0,13cfd274-d7ce-44f8-8ccb-cfd1560a7acb,4697263,Cash App - Cash Out,2024-01-12,2023-12-21,414.84,2,USD,1115910,2023-07-28 00:36:02.764167,2024-01-09 22:23:54.798525,new_income_flow,,,,,,2024-01-09 22:23:13.202506,
1,22e0da47-ab9a-4bea-bbe8-6abcdf270a7a,4697263,Ug2 Llc,2024-05-30,2024-05-23,250.0,2,USD,11510235,2024-01-10 15:48:25.161991,2024-05-23 22:54:20.358601,new_income_flow,,,,,,NaT,


## What was the problem with the recurrent transaction set up in 2024-01-10 

this user only set updated the recurrent transaction on 2024-05-23 ... It seems that what was done on 2024-01-10 was not done correctly?

The transaction on 2024-05-23, 2024-05-30 seems to have been missed, or maybe the data hasn't passed yet?


In [85]:
#get all transactions from this user

In [86]:
df_all_trans_one_user = redshift_source.fetch_data(f"""select * from transactions where corrected_made_on > '2023-01-01' and user_id = {debug_user_id} and amount >0 """)
df_all_trans_one_user

Unnamed: 0,id,account_id,category,currency_code,amount,description,made_on,duplicated,mode,created_at,...,marked_as_duplicate,transaction_category_id,bill_id,last_enriched_at,user_id,external_transaction_id,login_provider_additional_attributes,extra,recurring_income_id,is_excluded
0,6911012559,13526952,21005000,USD,31.15,Refund From CASH APP*CASH OUT,2023-07-14,,,2023-07-14 22:33:58.192972,...,False,16,,2023-07-14 22:34:02.777517,4697263,wr0Pdn8qeEtBe7Ev9QgbfDpwa489ZNFROvOQD,"{""datetime"": null, ""location"": {""lat"": null, ""...",{},,
1,6911012563,13526952,21010001,USD,150.00,Refund From METAPAY Lindsay Griffi,2023-07-11,,,2023-07-14 22:33:58.192972,...,False,16,,2023-07-14 22:34:02.782869,4697263,PkALbQn3waFgyVr0583LTqznp5N9xXFOgAg9p,"{""datetime"": null, ""location"": {""lat"": null, ""...",{},,
2,6911012566,13526952,21005000,USD,6.75,Refund From CASH APP*CASH OUT,2023-07-09,,,2023-07-14 22:33:58.192972,...,False,16,,2023-07-14 22:34:02.785089,4697263,3rJyqPB0ANtwZDKJqoj1Tk78YbNwz3cPVAVkY,"{""datetime"": null, ""location"": {""lat"": null, ""...",{},,
3,6911012569,13526952,21005000,USD,23.69,Refund From CASH APP*CASH OUT,2023-07-05,,,2023-07-14 22:33:58.192972,...,False,16,,2023-07-14 22:34:02.787258,4697263,AOQ5Nx1KgRsDV3odEqRKipa6BOxMNYIwYnYya,"{""datetime"": null, ""location"": {""lat"": null, ""...",{},,
4,6911012572,13526952,21005000,USD,4.75,Refund From CASH APP*CASH OUT,2023-06-29,,,2023-07-14 22:33:58.192972,...,False,16,,2023-07-14 22:34:02.789304,4697263,0v4qDL1QmktDx7V4pjrQizwBR15neKC9yvyz7,"{""datetime"": null, ""location"": {""lat"": null, ""...",{},,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
169,9890765047,13526952,21009000,USD,200.00,Refund From Cleo Salary Advance,2024-05-23,,,2024-05-23 22:52:58.755677,...,False,2,,2024-05-23 22:54:20.141639,4697263,DdwgYZBOV4HX361Ley8LHn4gmN73zYCXOkkye,"{""datetime"": null, ""location"": {""lat"": null, ""...","{""excluded_reason"": ""refund""}",22e0da47-ab9a-4bea-bbe8-6abcdf270a7a,False
170,9912325897,13526952,21005000,USD,245.43,Refund From CASH APP*COREY LETENDR,2024-05-25,,,2024-05-25 18:33:55.688737,...,False,11,,2024-05-25 18:36:31.917253,4697263,KqBRmV0Qp4U3Vokyzrj1FLzdEq97o5FmaaLnr,"{""datetime"": null, ""location"": {""lat"": null, ""...",{},,False
171,9906446960,13526952,21010001,USD,50.00,Refund From METAPAY Mike Clemente,2024-05-24,,,2024-05-25 06:03:14.305271,...,False,11,,2024-05-25 06:05:55.969608,4697263,KqBRmV0Qp4U3VokyzrjMT3M3J4DA8VUN0D9wr,"{""datetime"": null, ""location"": {""lat"": null, ""...",{},,False
172,9952252356,13526952,21005000,USD,33.71,Refund From CASH APP*COREY LETENDR,2024-05-29,,,2024-05-29 17:02:24.458484,...,False,11,,2024-05-29 17:02:57.596885,4697263,dMEZ9rPyJjcE69vojV8eTEVQEMdN4BFobo0jX,"{""datetime"": null, ""location"": {""lat"": null, ""...",{},,False


In [88]:
df_all_trans_one_user[df_all_trans_one_user['description'] =='UG2 LLC'].sort_values(by='corrected_made_on', ascending=False)

Unnamed: 0,id,account_id,category,currency_code,amount,description,made_on,duplicated,mode,created_at,...,marked_as_duplicate,transaction_category_id,bill_id,last_enriched_at,user_id,external_transaction_id,login_provider_additional_attributes,extra,recurring_income_id,is_excluded
173,9958634125,13805513,,USD,250.0,UG2 LLC,2024-05-30,False,,2024-05-30 02:10:30.638534,...,False,16,,2024-05-30 02:10:30.686246,4697263,,"{""internal_source"": ""direct_deposit"", ""interna...",{},,False
166,9882099268,13805513,,USD,250.0,UG2 LLC,2024-05-23,False,,2024-05-23 03:28:34.369524,...,False,16,,2024-05-23 03:28:34.423269,4697263,,"{""internal_source"": ""direct_deposit"", ""interna...",{},,False
164,9806249847,13805513,,USD,250.0,UG2 LLC,2024-05-16,False,,2024-05-16 02:02:13.510556,...,False,16,,2024-05-16 02:02:13.616862,4697263,,"{""internal_source"": ""direct_deposit"", ""interna...",{},22e0da47-ab9a-4bea-bbe8-6abcdf270a7a,False
160,9731619960,13805513,,USD,250.0,UG2 LLC,2024-05-09,False,,2024-05-09 01:36:56.849098,...,False,16,,2024-05-09 01:36:56.897532,4697263,,"{""internal_source"": ""direct_deposit"", ""interna...",{},,
158,9663691396,13805513,,USD,250.0,UG2 LLC,2024-05-02,False,,2024-05-02 19:29:45.153155,...,False,16,,2024-05-02 19:29:45.239072,4697263,,"{""internal_source"": ""direct_deposit"", ""interna...",{},22e0da47-ab9a-4bea-bbe8-6abcdf270a7a,
154,9582918183,13805513,,USD,250.0,UG2 LLC,2024-04-25,False,,2024-04-25 01:05:02.493088,...,False,16,,2024-04-25 01:05:02.548724,4697263,,"{""internal_source"": ""direct_deposit"", ""interna...",{},,
147,9521471211,13805513,,USD,46.01,UG2 LLC,2024-04-18,False,,2024-04-18 19:27:26.056980,...,False,16,,2024-04-18 19:27:26.112418,4697263,,"{""internal_source"": ""direct_deposit"", ""interna...",{},22e0da47-ab9a-4bea-bbe8-6abcdf270a7a,
140,9448191181,13805513,,USD,550.0,UG2 LLC,2024-04-11,False,,2024-04-11 00:32:32.712687,...,False,16,,2024-04-11 00:32:32.821028,4697263,,"{""internal_source"": ""direct_deposit"", ""interna...",{},22e0da47-ab9a-4bea-bbe8-6abcdf270a7a,
138,9383347137,13805513,,USD,550.0,UG2 LLC,2024-04-04,False,,2024-04-04 00:49:17.883230,...,False,16,,2024-04-04 00:49:18.003028,4697263,,"{""internal_source"": ""direct_deposit"", ""interna...",{},22e0da47-ab9a-4bea-bbe8-6abcdf270a7a,
134,9318699394,13805513,,USD,250.0,UG2 LLC,2024-03-28,False,,2024-03-28 01:26:09.913018,...,False,16,,2024-03-28 01:26:10.014837,4697263,,"{""internal_source"": ""direct_deposit"", ""interna...",{},22e0da47-ab9a-4bea-bbe8-6abcdf270a7a,


There are many recurrent payments with the same description (UG2 LLC)  and many of those are for the same amount (250) but had been missed

In [90]:
df_all_trans_one_user.columns

Index(['id', 'account_id', 'category', 'currency_code', 'amount',
       'description', 'made_on', 'duplicated', 'mode', 'created_at',
       'updated_at', 'status', 'corrected_made_on', 'categorized_by_user',
       'uuid', 'marked_as_duplicate', 'transaction_category_id', 'bill_id',
       'last_enriched_at', 'user_id', 'external_transaction_id',
       'login_provider_additional_attributes', 'extra', 'recurring_income_id',
       'is_excluded'],
      dtype='object')

In [91]:
df_all_trans_one_user

Unnamed: 0,id,account_id,category,currency_code,amount,description,made_on,duplicated,mode,created_at,...,marked_as_duplicate,transaction_category_id,bill_id,last_enriched_at,user_id,external_transaction_id,login_provider_additional_attributes,extra,recurring_income_id,is_excluded
0,6911012559,13526952,21005000,USD,31.15,Refund From CASH APP*CASH OUT,2023-07-14,,,2023-07-14 22:33:58.192972,...,False,16,,2023-07-14 22:34:02.777517,4697263,wr0Pdn8qeEtBe7Ev9QgbfDpwa489ZNFROvOQD,"{""datetime"": null, ""location"": {""lat"": null, ""...",{},,
1,6911012563,13526952,21010001,USD,150.00,Refund From METAPAY Lindsay Griffi,2023-07-11,,,2023-07-14 22:33:58.192972,...,False,16,,2023-07-14 22:34:02.782869,4697263,PkALbQn3waFgyVr0583LTqznp5N9xXFOgAg9p,"{""datetime"": null, ""location"": {""lat"": null, ""...",{},,
2,6911012566,13526952,21005000,USD,6.75,Refund From CASH APP*CASH OUT,2023-07-09,,,2023-07-14 22:33:58.192972,...,False,16,,2023-07-14 22:34:02.785089,4697263,3rJyqPB0ANtwZDKJqoj1Tk78YbNwz3cPVAVkY,"{""datetime"": null, ""location"": {""lat"": null, ""...",{},,
3,6911012569,13526952,21005000,USD,23.69,Refund From CASH APP*CASH OUT,2023-07-05,,,2023-07-14 22:33:58.192972,...,False,16,,2023-07-14 22:34:02.787258,4697263,AOQ5Nx1KgRsDV3odEqRKipa6BOxMNYIwYnYya,"{""datetime"": null, ""location"": {""lat"": null, ""...",{},,
4,6911012572,13526952,21005000,USD,4.75,Refund From CASH APP*CASH OUT,2023-06-29,,,2023-07-14 22:33:58.192972,...,False,16,,2023-07-14 22:34:02.789304,4697263,0v4qDL1QmktDx7V4pjrQizwBR15neKC9yvyz7,"{""datetime"": null, ""location"": {""lat"": null, ""...",{},,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
169,9890765047,13526952,21009000,USD,200.00,Refund From Cleo Salary Advance,2024-05-23,,,2024-05-23 22:52:58.755677,...,False,2,,2024-05-23 22:54:20.141639,4697263,DdwgYZBOV4HX361Ley8LHn4gmN73zYCXOkkye,"{""datetime"": null, ""location"": {""lat"": null, ""...","{""excluded_reason"": ""refund""}",22e0da47-ab9a-4bea-bbe8-6abcdf270a7a,False
170,9912325897,13526952,21005000,USD,245.43,Refund From CASH APP*COREY LETENDR,2024-05-25,,,2024-05-25 18:33:55.688737,...,False,11,,2024-05-25 18:36:31.917253,4697263,KqBRmV0Qp4U3Vokyzrj1FLzdEq97o5FmaaLnr,"{""datetime"": null, ""location"": {""lat"": null, ""...",{},,False
171,9906446960,13526952,21010001,USD,50.00,Refund From METAPAY Mike Clemente,2024-05-24,,,2024-05-25 06:03:14.305271,...,False,11,,2024-05-25 06:05:55.969608,4697263,KqBRmV0Qp4U3VokyzrjMT3M3J4DA8VUN0D9wr,"{""datetime"": null, ""location"": {""lat"": null, ""...",{},,False
172,9952252356,13526952,21005000,USD,33.71,Refund From CASH APP*COREY LETENDR,2024-05-29,,,2024-05-29 17:02:24.458484,...,False,11,,2024-05-29 17:02:57.596885,4697263,dMEZ9rPyJjcE69vojV8eTEVQEMdN4BFobo0jX,"{""datetime"": null, ""location"": {""lat"": null, ""...",{},,False


In [94]:
# data for model
# df_all_trans_one_user[df_all_trans_one_user['description'] =='UG2 LLC'].sort_values(by='corrected_made_on', ascending=False)[['id','amount','corrected_made_on','description','user_id','description']]

In [None]:
df_all_trans_one_user[df_all_trans_one_user['description'] =='UG2 LLC'].sort_values(by='corrected_made_on', ascending=False)

Unnamed: 0,id,account_id,category,currency_code,amount,description,made_on,duplicated,mode,created_at,...,marked_as_duplicate,transaction_category_id,bill_id,last_enriched_at,user_id,external_transaction_id,login_provider_additional_attributes,extra,recurring_income_id,is_excluded
173,9958634125,13805513,,USD,250.0,UG2 LLC,2024-05-30,False,,2024-05-30 02:10:30.638534,...,False,16,,2024-05-30 02:10:30.686246,4697263,,"{""internal_source"": ""direct_deposit"", ""interna...",{},,False
166,9882099268,13805513,,USD,250.0,UG2 LLC,2024-05-23,False,,2024-05-23 03:28:34.369524,...,False,16,,2024-05-23 03:28:34.423269,4697263,,"{""internal_source"": ""direct_deposit"", ""interna...",{},,False
164,9806249847,13805513,,USD,250.0,UG2 LLC,2024-05-16,False,,2024-05-16 02:02:13.510556,...,False,16,,2024-05-16 02:02:13.616862,4697263,,"{""internal_source"": ""direct_deposit"", ""interna...",{},22e0da47-ab9a-4bea-bbe8-6abcdf270a7a,False
160,9731619960,13805513,,USD,250.0,UG2 LLC,2024-05-09,False,,2024-05-09 01:36:56.849098,...,False,16,,2024-05-09 01:36:56.897532,4697263,,"{""internal_source"": ""direct_deposit"", ""interna...",{},,
158,9663691396,13805513,,USD,250.0,UG2 LLC,2024-05-02,False,,2024-05-02 19:29:45.153155,...,False,16,,2024-05-02 19:29:45.239072,4697263,,"{""internal_source"": ""direct_deposit"", ""interna...",{},22e0da47-ab9a-4bea-bbe8-6abcdf270a7a,
154,9582918183,13805513,,USD,250.0,UG2 LLC,2024-04-25,False,,2024-04-25 01:05:02.493088,...,False,16,,2024-04-25 01:05:02.548724,4697263,,"{""internal_source"": ""direct_deposit"", ""interna...",{},,
147,9521471211,13805513,,USD,46.01,UG2 LLC,2024-04-18,False,,2024-04-18 19:27:26.056980,...,False,16,,2024-04-18 19:27:26.112418,4697263,,"{""internal_source"": ""direct_deposit"", ""interna...",{},22e0da47-ab9a-4bea-bbe8-6abcdf270a7a,
140,9448191181,13805513,,USD,550.0,UG2 LLC,2024-04-11,False,,2024-04-11 00:32:32.712687,...,False,16,,2024-04-11 00:32:32.821028,4697263,,"{""internal_source"": ""direct_deposit"", ""interna...",{},22e0da47-ab9a-4bea-bbe8-6abcdf270a7a,
138,9383347137,13805513,,USD,550.0,UG2 LLC,2024-04-04,False,,2024-04-04 00:49:17.883230,...,False,16,,2024-04-04 00:49:18.003028,4697263,,"{""internal_source"": ""direct_deposit"", ""interna...",{},22e0da47-ab9a-4bea-bbe8-6abcdf270a7a,
134,9318699394,13805513,,USD,250.0,UG2 LLC,2024-03-28,False,,2024-03-28 01:26:09.913018,...,False,16,,2024-03-28 01:26:10.014837,4697263,,"{""internal_source"": ""direct_deposit"", ""interna...",{},22e0da47-ab9a-4bea-bbe8-6abcdf270a7a,


I couldn't get the endpoint working

{
  "payload": {
    "transactions": [
      {
        "transaction_id": 9958634125,
        "amount": 250.0,
        "corrected_made_on": "2024-05-30",
        "description": "UG2 LLC",
        "user_id": 4697263,
        "merchant_name": "Ug2 Llc",
        "transactions_company": "Ug2 Llc",
        "account_id": 13805513
      }
    ],
    "recurring_transaction_ids": [
      "22e0da47-ab9a-4bea-bbe8-6abcdf270a7a"
    ],
    "recurring_transaction_type": "recurring_income"
  }
}

https://github.com/meetcleo/data-science-services/blob/990cf1f1dda8e74815134fd02d0ffa79d97b2018/recurring_transaction_matcher/serve/rec_tx_matcher/tests/test_api.py#L201

# Summmary

This user hadn't set up a recurrrent income. It is not yet a recurrent income, it can't be matched

In [None]:
# Lets look at another user

false_negatives_data2[false_negatives_data2['recurring_income_id'] =='4e2e5072-8976-4c0e-b21a-0c36ab63f414']

Unnamed: 0,id,user_id,recurring_income_id,original_matched_transaction_id,original_matched_transaction_correct,new_matched_transaction_id,no_matched_transaction_reason,recurring_income_snapshot,created_at,updated_at,deleted_at,originating_response_id,recurring_income_snapshot_dict,recurring_income_amount,recurring_income_frequency,recurring_income_last_received_at,recurring_income_next_payment_expected
352,96db1f9b-1ce3-4d66-ac31-200f6ef244f3,7397730,4e2e5072-8976-4c0e-b21a-0c36ab63f414,,False,8463108054,,"{""amount"": ""562.52"", ""frequency"": ""weekly"", ""l...",2024-01-02 00:56:50.058522,2024-01-02 00:56:57.630106,,3269794412,"{'amount': '562.52', 'frequency': 'weekly', 'l...",562.52,weekly,2023-12-22,2023-12-29
2389,21d7e98c-ac86-4f2f-b613-4d3321287a63,7397730,4e2e5072-8976-4c0e-b21a-0c36ab63f414,,False,8549854788,,"{""amount"": ""562.52"", ""frequency"": ""weekly"", ""l...",2024-01-07 21:57:18.774748,2024-01-07 21:57:25.235504,,3285741379,"{'amount': '562.52', 'frequency': 'weekly', 'l...",562.52,weekly,2023-12-29,2024-01-05
6003,6f707c50-be49-4643-beb8-2a26fed08301,7397730,4e2e5072-8976-4c0e-b21a-0c36ab63f414,,False,8600998809,,"{""amount"": ""562.52"", ""frequency"": ""weekly"", ""l...",2024-01-19 20:43:02.067330,2024-01-19 20:43:55.174595,,3321630602,"{'amount': '562.52', 'frequency': 'weekly', 'l...",562.52,weekly,2024-01-07,2024-01-15
27095,fc78216b-7d40-44cf-9fa6-c1486ae9ea65,7397730,4e2e5072-8976-4c0e-b21a-0c36ab63f414,,False,9360274163,,"{""amount"": ""562.52"", ""frequency"": ""weekly"", ""l...",2024-04-03 22:45:17.659121,2024-04-03 22:45:36.599500,,3567104213,"{'amount': '562.52', 'frequency': 'weekly', 'l...",562.52,weekly,2024-02-26,2024-03-04
28922,9d82264a-be59-4b45-9bd4-3a6b3f84c5ee,7397730,4e2e5072-8976-4c0e-b21a-0c36ab63f414,,False,9425988283,,"{""amount"": ""562.52"", ""frequency"": ""weekly"", ""l...",2024-04-11 08:56:19.893749,2024-04-11 08:56:33.394273,,3583355280,"{'amount': '562.52', 'frequency': 'weekly', 'l...",562.52,weekly,2024-04-01,2024-04-08
31095,fb93521a-73d1-4d82-ac2a-7d3345c1fbcc,7397730,4e2e5072-8976-4c0e-b21a-0c36ab63f414,,False,9501526016,,"{""amount"": ""562.52"", ""frequency"": ""weekly"", ""l...",2024-04-19 20:12:41.194602,2024-04-19 20:12:56.294170,,3600214534,"{'amount': '562.52', 'frequency': 'weekly', 'l...",562.52,weekly,2024-04-08,2024-04-15
32442,83f61146-65f7-4ba4-9549-eb9496417f2a,7397730,4e2e5072-8976-4c0e-b21a-0c36ab63f414,,False,9558995843,,"{""amount"": ""562.52"", ""frequency"": ""weekly"", ""l...",2024-04-25 14:32:00.227013,2024-04-25 14:32:13.036688,,3618578753,"{'amount': '562.52', 'frequency': 'weekly', 'l...",562.52,weekly,2024-04-16,2024-04-23
36308,b44ba1f9-8744-41ce-8f4f-4bb7a6811a40,7397730,4e2e5072-8976-4c0e-b21a-0c36ab63f414,,False,9629517715,,"{""amount"": ""562.52"", ""frequency"": ""weekly"", ""l...",2024-05-08 18:23:20.275670,2024-05-08 18:23:34.003605,,3662285470,"{'amount': '562.52', 'frequency': 'weekly', 'l...",562.52,weekly,2024-04-22,2024-04-29


In [None]:
debug_user_id = '7397730'
# How many recurrent incomes does this user have?
sql_recurr_ids_per_user =f"""select *
from recurring_incomes
where user_id = {debug_user_id} """
df_recurrent_income_ids_one_user = redshift_source.fetch_data(sql_recurr_ids_per_user)
df_recurrent_income_ids_one_user

Unnamed: 0,id,user_id,description,next_payment_expected,last_received_at,amount,frequency,currency_code,merchant_id,created_at,updated_at,source,schedule_classification,manual,payer,deleted_at,external_candidate_id,cancelled_at,cancelled_reason
0,07fcfad3-536b-4b1c-ae7d-05dc743b330d,7397730,Cleo Credit Builder Card,2023-11-13,2023-11-06,600.0,2,USD,15087285.0,2023-10-13 18:40:07.779011,2023-11-13 19:57:30.582250,transaction_recategorisation,,,,,,,
1,2c6695db-89cd-4ceb-9c9b-409fee09c5eb,7397730,Deposit Ach 1582025011...,2023-10-10,2023-10-06,500.0,2,USD,,2023-06-27 01:29:30.452423,2023-10-10 12:36:49.269799,budget_upsell_flow,,,,,,,
2,d7bac3ee-c04c-4348-9a1b-c74cb1a9fdcb,7397730,Ga4204 Canyon Mc,2024-04-22,2024-04-15,400.0,2,USD,,2024-04-22 01:15:57.721666,2024-04-22 15:38:23.241513,recurring_incomes_page,"{""best"": ""Weekly"", ""uuid"": ""ace23308-7065-42bb...",,,,ace23308-7065-42bb-b94c-b93ea0b7e443,,
3,4e2e5072-8976-4c0e-b21a-0c36ab63f414,7397730,Ga4204 Canyon Mc,2024-05-06,2024-04-29,562.52,2,USD,,2023-10-10 12:37:33.622219,2024-05-08 18:23:33.999098,recurring_incomes_page,"{""best"": ""Biweekly"", ""uuid"": ""fcca59a7-7b18-4f...",,,,fcca59a7-7b18-4fe0-be6f-ad0f215c5824,,


Recurrent matcher was set up on 2024-05-08

In [None]:
df_all_trans_one_user = redshift_source.fetch_data(f"""select * from transactions where corrected_made_on > '2023-01-01' and user_id = {debug_user_id} and amount >0 """)
df_all_trans_one_user

Unnamed: 0,id,account_id,category,currency_code,amount,description,made_on,duplicated,mode,created_at,...,marked_as_duplicate,transaction_category_id,bill_id,last_enriched_at,user_id,external_transaction_id,login_provider_additional_attributes,extra,recurring_income_id,is_excluded
0,6756979806,13283656,21009000,USD,468.70,Deposit-ACH-1582025011 GA4204 CANYON MC (DIRDEP),2023-06-26,,,2023-06-27 01:22:16.286368,...,False,16,,2023-06-27 01:22:26.734936,7397730,MdDZVA4ekdCkEZ6jp74Qtgg7Yzv47YS5nDByJ,"{""datetime"": null, ""location"": {""lat"": null, ""...",{},2c6695db-89cd-4ceb-9c9b-409fee09c5eb,
1,6756979812,13283656,21007000,USD,10.00,Deposit @ IL Chicago KLOVER APP BOOST USKLOVER...,2023-06-21,,,2023-06-27 01:22:16.286368,...,False,11,,2023-06-27 01:22:27.511262,7397730,80Od31Z9E0FmzJpkgM0eI44y3XpZy3fJepmrM,"{""datetime"": null, ""location"": {""lat"": null, ""...",{},,
2,6756979818,13283657,21007000,USD,313.00,Deposit @ ATMTX004359 TX AMARILLO 2410 N. GRAN...,2023-06-20,,,2023-06-27 01:22:16.286368,...,False,11,,2023-06-27 01:22:27.519503,7397730,1B946zj8bBtaJbMLAZk1CKKLE75ALESjd0oQe,"{""datetime"": null, ""location"": {""lat"": null, ""...",{},,
3,6756979819,13283656,21005000,USD,43.00,Deposit @ ATMTX004359 TX AMARILLO 2410 N. GRAN...,2023-06-20,,,2023-06-27 01:22:16.286368,...,False,2,,2023-06-27 01:22:27.524966,7397730,pQODwdMZ1QfAaXqkY5nVFXXr6nj5r6fZMN7re,"{""datetime"": null, ""location"": {""lat"": null, ""...","{""excluded_reason"": ""between_account_transfer""}",,
4,6756979841,13283657,21005000,USD,400.00,Deposit @ ATMTX004359 TX AMARILLO 2410 N. GRAN...,2023-06-16,,,2023-06-27 01:22:16.286368,...,False,2,,2023-06-27 01:22:27.541403,7397730,qdZ16wM5PdC1oaY9O5kKUEEbLv0gbLhByqxoa,"{""datetime"": null, ""location"": {""lat"": null, ""...","{""excluded_reason"": ""refund""}",,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
355,9724554305,13700341,0,USD,5810.42,Cleo card balance settled,2024-05-08,False,,2024-05-08 10:23:00.900170,...,False,19,,2024-05-08 10:23:01.133110,7397730,i2c-36057893,"{""type"": ""21"", ""internal_category"": ""cleo_card...",{},,
356,9780302000,13700341,,USD,309.49,GA4204 CANYON MC,2024-05-13,False,,2024-05-13 19:32:45.409708,...,False,16,,2024-05-13 19:32:45.484539,7397730,,"{""internal_source"": ""direct_deposit"", ""interna...",{},,
357,9790252206,13700341,6011,USD,2.50,2323 ROSS-OSAGE,2024-05-14,False,,2024-05-14 16:44:39.939596,...,False,11,,2024-05-14 16:44:40.003349,7397730,i2c-36608784,"{""type"": ""01"", ""internal_category"": ""cleo_card...",{},,
358,9856132216,13700341,,USD,448.27,GA4204 CANYON MC,2024-05-20,False,,2024-05-20 21:44:58.044603,...,False,16,,2024-05-20 21:44:58.109817,7397730,,"{""internal_source"": ""direct_deposit"", ""interna...",{},,False


In [None]:
df_missed_transactions = pd.merge(df_all_trans_one_user, false_negatives_data2[false_negatives_data2['recurring_income_id'] =='4e2e5072-8976-4c0e-b21a-0c36ab63f414'], left_on = 'id', right_on =  'new_matched_transaction_id',how='right')
df_missed_transactions

Unnamed: 0,id_x,account_id,category,currency_code,amount,description,made_on,duplicated,mode,created_at_x,...,recurring_income_snapshot,created_at_y,updated_at_y,deleted_at,originating_response_id,recurring_income_snapshot_dict,recurring_income_amount,recurring_income_frequency,recurring_income_last_received_at,recurring_income_next_payment_expected
0,8463108054,13700341,,USD,251.73,GA4204 CANYON MC,2023-12-29,False,,2023-12-29 20:34:17.420173,...,"{""amount"": ""562.52"", ""frequency"": ""weekly"", ""l...",2024-01-02 00:56:50.058522,2024-01-02 00:56:57.630106,,3269794412,"{'amount': '562.52', 'frequency': 'weekly', 'l...",562.52,weekly,2023-12-22,2023-12-29
1,8549854788,13700341,0.0,USD,2382.73,Cleo card balance settled,2024-01-07,False,,2024-01-07 10:34:14.848555,...,"{""amount"": ""562.52"", ""frequency"": ""weekly"", ""l...",2024-01-07 21:57:18.774748,2024-01-07 21:57:25.235504,,3285741379,"{'amount': '562.52', 'frequency': 'weekly', 'l...",562.52,weekly,2023-12-29,2024-01-05
2,8600998809,13700341,,USD,622.99,GA4204 CANYON MC,2024-01-12,False,,2024-01-12 21:57:28.692170,...,"{""amount"": ""562.52"", ""frequency"": ""weekly"", ""l...",2024-01-19 20:43:02.067330,2024-01-19 20:43:55.174595,,3321630602,"{'amount': '562.52', 'frequency': 'weekly', 'l...",562.52,weekly,2024-01-07,2024-01-15
3,9360274163,13700341,,USD,119.62,GA4204 CANYON MC,2024-04-01,False,,2024-04-01 19:37:59.478773,...,"{""amount"": ""562.52"", ""frequency"": ""weekly"", ""l...",2024-04-03 22:45:17.659121,2024-04-03 22:45:36.599500,,3567104213,"{'amount': '562.52', 'frequency': 'weekly', 'l...",562.52,weekly,2024-02-26,2024-03-04
4,9425988283,13700341,,USD,251.58,GA4204 CANYON MC,2024-04-08,False,,2024-04-08 20:17:26.380056,...,"{""amount"": ""562.52"", ""frequency"": ""weekly"", ""l...",2024-04-11 08:56:19.893749,2024-04-11 08:56:33.394273,,3583355280,"{'amount': '562.52', 'frequency': 'weekly', 'l...",562.52,weekly,2024-04-01,2024-04-08
5,9501526016,13700341,,USD,3762.54,TPG PRODUCTS,2024-04-16,False,,2024-04-16 19:31:32.236653,...,"{""amount"": ""562.52"", ""frequency"": ""weekly"", ""l...",2024-04-19 20:12:41.194602,2024-04-19 20:12:56.294170,,3600214534,"{'amount': '562.52', 'frequency': 'weekly', 'l...",562.52,weekly,2024-04-08,2024-04-15
6,9558995843,13700341,,USD,353.69,GA4204 CANYON MC,2024-04-22,False,,2024-04-22 19:51:19.691299,...,"{""amount"": ""562.52"", ""frequency"": ""weekly"", ""l...",2024-04-25 14:32:00.227013,2024-04-25 14:32:13.036688,,3618578753,"{'amount': '562.52', 'frequency': 'weekly', 'l...",562.52,weekly,2024-04-16,2024-04-23
7,9629517715,13700341,,USD,132.17,GA4204 CANYON MC,2024-04-29,False,,2024-04-29 19:50:09.023997,...,"{""amount"": ""562.52"", ""frequency"": ""weekly"", ""l...",2024-05-08 18:23:20.275670,2024-05-08 18:23:34.003605,,3662285470,"{'amount': '562.52', 'frequency': 'weekly', 'l...",562.52,weekly,2024-04-22,2024-04-29


###  All Missed transactions were before the recurrent transaction was set up 

Also, have same descriptons and good frequency, but amounts vary a lot

In [None]:
df_all_trans_one_user['corrected_made_on'] = pd.to_datetime(df_all_trans_one_user['corrected_made_on'])
df_all_trans_one_user[df_all_trans_one_user['corrected_made_on']>'2024-05-08']


Unnamed: 0,id,account_id,category,currency_code,amount,description,made_on,duplicated,mode,created_at,...,marked_as_duplicate,transaction_category_id,bill_id,last_enriched_at,user_id,external_transaction_id,login_provider_additional_attributes,extra,recurring_income_id,is_excluded
356,9780302000,13700341,,USD,309.49,GA4204 CANYON MC,2024-05-13,False,,2024-05-13 19:32:45.409708,...,False,16,,2024-05-13 19:32:45.484539,7397730,,"{""internal_source"": ""direct_deposit"", ""interna...",{},,
357,9790252206,13700341,6011.0,USD,2.5,2323 ROSS-OSAGE,2024-05-14,False,,2024-05-14 16:44:39.939596,...,False,11,,2024-05-14 16:44:40.003349,7397730,i2c-36608784,"{""type"": ""01"", ""internal_category"": ""cleo_card...",{},,
358,9856132216,13700341,,USD,448.27,GA4204 CANYON MC,2024-05-20,False,,2024-05-20 21:44:58.044603,...,False,16,,2024-05-20 21:44:58.109817,7397730,,"{""internal_source"": ""direct_deposit"", ""interna...",{},,False
359,9905527419,13700341,,USD,428.32,GA4204 CANYON MC,2024-05-25,False,,2024-05-25 03:37:33.686723,...,False,16,,2024-05-25 03:37:33.789426,7397730,,"{""internal_source"": ""direct_deposit"", ""interna...",{},,False


## The recurrent income had been set up , but it still isnt't picking up these transactions. Could it be due to varying amounts?

In [None]:
# df_all_trans_one_user['has_ga4204'] = df_all_trans_one_user['description'].apply(lambda x: 'ga4204' in x.lower())
# df_all_trans_one_user[df_all_trans_one_user['has_ga4204']]

# Reasons for low  recall

1. If recurrent income hasn't been set up, we don't pick it up
2. Amounts vary from the recurrency set up (?) 

# TO DO

1. Understand the recurrent transaction matcher model

2. If a set of these recurrent transactions were sent to the income classifier, would they be identified as income because of their recurrent nature? If so, could we then automatically add a new recurrent_id for that user?

# Model  resources


- Test it out with prod endpoint
    - https://api-docs.meetcleo.com/api.html?api=https://api-docs.meetcleo.com/files/prod-user-income-classifier.json#/serving/invocations_invocations_post


- Metrics
    - https://grafana.mikeverse.cleoites.tech/d/hfzMxtJ4k/espresso-services?orgId=1&var-Environment=ml-serving&var-Service=recurring-transaction-matcher&var-Variant=All&from=now-6h&to=now
    - Latency : p95 500ms 
    - RPS: peaks at 15 RPS


- Repo
    - https://github.com/meetcleo/data-science-services/tree/master/recurring_transaction_matcher/lib/recurring_transaction_matcher/recurring_transaction_matcher

    