# Data Exploration
- This notebook performs exploratory data analysis on the dataset.
- To expand on the analysis, attach this notebook to a cluster with runtime version **14.0.x-cpu-ml-scala2.12**,
edit [the options of pandas-profiling](https://pandas-profiling.ydata.ai/docs/master/rtd/pages/advanced_usage.html), and rerun it.
- Explore completed trials in the [MLflow experiment](#mlflow/experiments/1999048449271427).

In [0]:
import mlflow
import os
import uuid
import shutil
import pandas as pd
import databricks.automl_runtime

# Download input data from mlflow into a pandas DataFrame
# Create temporary directory to download data
temp_dir = os.path.join(os.environ["SPARK_LOCAL_DIRS"], "tmp", str(uuid.uuid4())[:8])
os.makedirs(temp_dir)

# Download the artifact and read it
training_data_path = mlflow.artifacts.download_artifacts(run_id="8f3f1026e0f24fb89ead40ab214d9d42", artifact_path="data", dst_path=temp_dir)
df = pd.read_parquet(os.path.join(training_data_path, "training_data"))

# Delete the temporary data
shutil.rmtree(temp_dir)

target_col = "status"

# Drop columns created by AutoML before pandas-profiling
df = df.drop(['_automl_split_col_0000', '_automl_sample_weight_0000'], axis=1)

## Semantic Type Detection Alerts

For details about the definition of the semantic types and how to override the detection, see
[Databricks documentation on semantic type detection](https://docs.microsoft.com/azure/databricks/applications/machine-learning/automl#semantic-type-detection).

- Semantic type `categorical` detected for columns `churn`, `duration`. Training notebooks will encode features based on categorical transformations.

## Truncate rows
Only the first 10000 rows will be considered for pandas-profiling to avoid out-of-memory issues.
Comment out next cell and rerun the notebook to profile the full dataset.

In [0]:
df = df.iloc[:10000, :]

## Profiling Results

In [0]:
from ydata_profiling import ProfileReport
df_profile = ProfileReport(df,
                           correlations={
                               "auto": {"calculate": True},
                               "pearson": {"calculate": True},
                               "spearman": {"calculate": True},
                               "kendall": {"calculate": True},
                               "phi_k": {"calculate": True},
                               "cramers": {"calculate": True},
                           }, title="Profiling Report", progress_bar=False, infer_dtypes=False)
profile_html = df_profile.to_html()

displayHTML(profile_html)

  return df.corr(method="pearson")
  return df.corr(method="spearman")
  return df.corr(method="kendall")


0,1
Number of variables,10
Number of observations,10000
Missing cells,0
Missing cells (%),0.0%
Duplicate rows,180
Duplicate rows (%),1.8%
Total size in memory,507.9 KiB
Average record size in memory,52.0 B

0,1
Text,2
Numeric,8

0,1
Dataset has 180 (1.8%) duplicate rows,Duplicates
amount is highly overall correlated with frequency and 6 other fields,High correlation
duration is highly overall correlated with date1 and 3 other fields,High correlation
payments is highly overall correlated with frequency and 6 other fields,High correlation
churn is highly overall correlated with date1 and 4 other fields,High correlation
frequency is highly overall correlated with amount and 1 other fields,High correlation
date1 is highly overall correlated with date and 5 other fields,High correlation
date is highly overall correlated with date1 and 4 other fields,High correlation
status is highly overall correlated with date1 and 5 other fields,High correlation
churn has 1706 (17.1%) zeros,Zeros

0,1
Analysis started,2023-09-18 03:01:32.395898
Analysis finished,2023-09-18 03:01:46.005315
Duration,13.61 seconds
Software version,ydata-profiling vv4.2.0
Download configuration,config.json

0,1
Distinct,3
Distinct (%),< 0.1%
Missing,0
Missing (%),0.0%
Memory size,78.2 KiB

0,1
Max length,18.0
Median length,16.0
Mean length,15.6186
Min length,14.0

0,1
Total characters,156186
Distinct characters,18
Distinct categories,2 ?
Distinct scripts,2 ?
Distinct blocks,1 ?

0,1
Unique,0 ?
Unique (%),0.0%

0,1
1st row,POPLATEK MESICNE
2nd row,POPLATEK MESICNE
3rd row,POPLATEK MESICNE
4th row,POPLATEK MESICNE
5th row,POPLATEK MESICNE

Value,Count,Frequency (%)
poplatek,10000,49.6%
mesicne,7801,38.7%
tydne,2053,10.2%
po,146,0.7%
obratu,146,0.7%

Value,Count,Frequency (%)
E,27655,17.7%
P,20146,12.9%
T,12199,7.8%
O,10292,6.6%
A,10146,6.5%
,10146,6.5%
L,10000,6.4%
K,10000,6.4%
N,9854,6.3%
C,7801,5.0%

Value,Count,Frequency (%)
Uppercase Letter,146040,93.5%
Space Separator,10146,6.5%

Value,Count,Frequency (%)
E,27655,18.9%
P,20146,13.8%
T,12199,8.4%
O,10292,7.0%
A,10146,6.9%
L,10000,6.8%
K,10000,6.8%
N,9854,6.7%
C,7801,5.3%
S,7801,5.3%

Value,Count,Frequency (%)
,10146,100.0%

Value,Count,Frequency (%)
Latin,146040,93.5%
Common,10146,6.5%

Value,Count,Frequency (%)
E,27655,18.9%
P,20146,13.8%
T,12199,8.4%
O,10292,7.0%
A,10146,6.9%
L,10000,6.8%
K,10000,6.8%
N,9854,6.7%
C,7801,5.3%
S,7801,5.3%

Value,Count,Frequency (%)
,10146,100.0%

Value,Count,Frequency (%)
ASCII,156186,100.0%

Value,Count,Frequency (%)
E,27655,17.7%
P,20146,12.9%
T,12199,7.8%
O,10292,6.6%
A,10146,6.5%
,10146,6.5%
L,10000,6.4%
K,10000,6.4%
N,9854,6.3%
C,7801,5.0%

0,1
Distinct,93
Distinct (%),0.9%
Missing,0
Missing (%),0.0%
Infinite,0
Infinite (%),0.0%
Mean,930478.8

0,1
Minimum,930113
Maximum,930908
Zeros,0
Zeros (%),0.0%
Negative,0
Negative (%),0.0%
Memory size,39.2 KiB

0,1
Minimum,930113
5-th percentile,930204
Q1,930227
median,930509
Q3,930628
95-th percentile,930829
Maximum,930908
Range,795
Interquartile range (IQR),401

0,1
Standard deviation,231.51797
Coefficient of variation (CV),0.00024881596
Kurtosis,-1.1665619
Mean,930478.8
Median Absolute Deviation (MAD),197
Skewness,0.27008366
Sum,9.304788 × 109
Variance,53600.571
Monotonicity,Increasing

Value,Count,Frequency (%)
930226,696,7.0%
930520,466,4.7%
930810,448,4.5%
930829,421,4.2%
930324,401,4.0%
930905,364,3.6%
930529,360,3.6%
930207,352,3.5%
930227,351,3.5%
930509,350,3.5%

Value,Count,Frequency (%)
930113,28,0.3%
930114,27,0.3%
930117,33,0.3%
930119,77,0.8%
930124,42,0.4%
930125,77,0.8%
930126,29,0.3%
930130,26,0.3%
930204,325,3.2%
930207,352,3.5%

Value,Count,Frequency (%)
930908,42,0.4%
930907,39,0.4%
930906,14,0.1%
930905,364,3.6%
930903,35,0.4%
930829,421,4.2%
930828,32,0.3%
930824,34,0.3%
930821,10,0.1%
930818,31,0.3%

0,1
Distinct,97
Distinct (%),1.0%
Missing,0
Missing (%),0.0%
Infinite,0
Infinite (%),0.0%
Mean,941592.08

0,1
Minimum,930705
Maximum,950606
Zeros,0
Zeros (%),0.0%
Negative,0
Negative (%),0.0%
Memory size,39.2 KiB

0,1
Minimum,930705
5-th percentile,930915
Q1,940131
median,940728
Q3,941221
95-th percentile,950411
Maximum,950606
Range,19901
Interquartile range (IQR),1090

0,1
Standard deviation,5678.8977
Coefficient of variation (CV),0.0060311655
Kurtosis,-0.21286707
Mean,941592.08
Median Absolute Deviation (MAD),597
Skewness,-0.010217393
Sum,9.4159208 × 109
Variance,32249879
Monotonicity,Not monotonic

Value,Count,Frequency (%)
940105,696,7.0%
950304,454,4.5%
941219,396,4.0%
950205,376,3.8%
941221,370,3.7%
950327,364,3.6%
940728,360,3.6%
940601,352,3.5%
940110,351,3.5%
940206,350,3.5%

Value,Count,Frequency (%)
930705,24,0.2%
930711,37,0.4%
930728,30,0.3%
930803,26,0.3%
930906,330,3.3%
930913,29,0.3%
930915,27,0.3%
930924,23,0.2%
931013,34,0.3%
931104,26,0.3%

Value,Count,Frequency (%)
950606,14,0.1%
950530,27,0.3%
950508,339,3.4%
950413,21,0.2%
950411,102,1.0%
950327,364,3.6%
950321,34,0.3%
950316,52,0.5%
950315,10,0.1%
950304,454,4.5%

0,1
Distinct,105
Distinct (%),1.1%
Missing,0
Missing (%),0.0%
Infinite,0
Infinite (%),0.0%
Mean,131324.99

0,1
Minimum,7656
Maximum,482940
Zeros,0
Zeros (%),0.0%
Negative,0
Negative (%),0.0%
Memory size,39.2 KiB

0,1
Minimum,7656
5-th percentile,17508
Q1,38148
median,80952
Q3,170256
95-th percentile,347952
Maximum,482940
Range,475284
Interquartile range (IQR),132108

0,1
Standard deviation,117853.88
Coefficient of variation (CV),0.89742161
Kurtosis,1.3398336
Mean,131324.99
Median Absolute Deviation (MAD),49704
Skewness,1.3987854
Sum,1.3132499 × 109
Variance,1.3889538 × 1010
Monotonicity,Not monotonic

Value,Count,Frequency (%)
80952,696,7.0%
168984,420,4.2%
482940,396,4.0%
316140,376,3.8%
31728,364,3.6%
131292,360,3.6%
208320,352,3.5%
24516,351,3.5%
78936,350,3.5%
323472,349,3.5%

Value,Count,Frequency (%)
7656,31,0.3%
11736,348,3.5%
14628,33,0.3%
15420,34,0.3%
17508,84,0.8%
20832,27,0.3%
21072,23,0.2%
21924,33,0.3%
23052,77,0.8%
23628,35,0.4%

Value,Count,Frequency (%)
482940,396,4.0%
464520,35,0.4%
398640,53,0.5%
347952,24,0.2%
331560,10,0.1%
323472,349,3.5%
316140,376,3.8%
300660,15,0.1%
300204,34,0.3%
299088,27,0.3%

0,1
Distinct,5
Distinct (%),0.1%
Missing,0
Missing (%),0.0%
Infinite,0
Infinite (%),0.0%
Mean,29.9412

0,1
Minimum,12
Maximum,60
Zeros,0
Zeros (%),0.0%
Negative,0
Negative (%),0.0%
Memory size,39.2 KiB

0,1
Minimum,12
5-th percentile,12
Q1,12
median,24
Q3,48
95-th percentile,60
Maximum,60
Range,48
Interquartile range (IQR),36

0,1
Standard deviation,16.70618
Coefficient of variation (CV),0.55796628
Kurtosis,-0.91364096
Mean,29.9412
Median Absolute Deviation (MAD),12
Skewness,0.61066862
Sum,299412
Variance,279.09645
Monotonicity,Not monotonic

Value,Count,Frequency (%)
24,3061,30.6%
12,2987,29.9%
60,1467,14.7%
36,1433,14.3%
48,1052,10.5%

Value,Count,Frequency (%)
12,2987,29.9%
24,3061,30.6%
36,1433,14.3%
48,1052,10.5%
60,1467,14.7%

Value,Count,Frequency (%)
60,1467,14.7%
48,1052,10.5%
36,1433,14.3%
24,3061,30.6%
12,2987,29.9%

0,1
Distinct,104
Distinct (%),1.0%
Missing,0
Missing (%),0.0%
Infinite,0
Infinite (%),0.0%
Mean,4039.258

0,1
Minimum,319
Maximum,8339
Zeros,0
Zeros (%),0.0%
Negative,0
Negative (%),0.0%
Memory size,78.2 KiB

0,1
Minimum,319
5-th percentile,1177
Q1,3004
median,3373
Q3,5275
95-th percentile,8023
Maximum,8339
Range,8020
Interquartile range (IQR),2271

0,1
Standard deviation,2016.0087
Coefficient of variation (CV),0.49910373
Kurtosis,-0.67337442
Mean,4039.258
Median Absolute Deviation (MAD),1206
Skewness,0.40509925
Sum,40392580
Variance,4064291.2
Monotonicity,Not monotonic

Value,Count,Frequency (%)
3373,696,7.0%
7041,420,4.2%
8049,396,4.0%
5269,376,3.8%
1322,364,3.6%
3647,360,3.6%
4340,352,3.5%
2043,351,3.5%
6578,350,3.5%
6739,349,3.5%

Value,Count,Frequency (%)
319,31,0.3%
434,27,0.3%
489,348,3.5%
609,33,0.3%
878,23,0.2%
1056,14,0.1%
1081,19,0.2%
1177,34,0.3%
1219,33,0.3%
1285,34,0.3%

Value,Count,Frequency (%)
8339,34,0.3%
8308,27,0.3%
8136,18,0.2%
8049,396,4.0%
8033,24,0.2%
8023,25,0.2%
7742,35,0.4%
7685,32,0.3%
7636,18,0.2%
7318,26,0.3%

0,1
Distinct,2746
Distinct (%),27.5%
Missing,0
Missing (%),0.0%
Infinite,0
Infinite (%),0.0%
Mean,3387.2735

0,1
Minimum,2
Maximum,13998
Zeros,0
Zeros (%),0.0%
Negative,0
Negative (%),0.0%
Memory size,39.2 KiB

0,1
Minimum,2.0
5-th percentile,328.0
Q1,1448.75
median,2908.0
Q3,4188.0
95-th percentile,10998.0
Maximum,13998.0
Range,13996.0
Interquartile range (IQR),2739.25

0,1
Standard deviation,2865.9701
Coefficient of variation (CV),0.84609942
Kurtosis,3.1758015
Mean,3387.2735
Median Absolute Deviation (MAD),1399
Skewness,1.7549028
Sum,33872735
Variance,8213784.9
Monotonicity,Not monotonic

Value,Count,Frequency (%)
735,17,0.2%
2084,17,0.2%
924,16,0.2%
4334,16,0.2%
3673,16,0.2%
3268,16,0.2%
3198,16,0.2%
23,16,0.2%
776,16,0.2%
904,16,0.2%

Value,Count,Frequency (%)
2,14,0.1%
3,9,0.1%
6,1,< 0.1%
7,1,< 0.1%
9,1,< 0.1%
11,1,< 0.1%
13,1,< 0.1%
21,2,< 0.1%
22,7,0.1%
23,16,0.2%

Value,Count,Frequency (%)
13998,1,< 0.1%
13956,13,0.1%
13955,10,0.1%
13924,2,< 0.1%
13915,1,< 0.1%
13912,1,< 0.1%
13886,2,< 0.1%
13845,1,< 0.1%
13803,12,0.1%
13751,3,< 0.1%

0,1
Distinct,2647
Distinct (%),26.5%
Missing,0
Missing (%),0.0%
Infinite,0
Infinite (%),0.0%
Mean,533209.26

0,1
Minimum,120302
Maximum,836013
Zeros,0
Zeros (%),0.0%
Negative,0
Negative (%),0.0%
Memory size,39.2 KiB

0,1
Minimum,120302.0
5-th percentile,230625.0
Q1,401125.0
median,540314.5
Q3,675706.0
95-th percentile,790330.0
Maximum,836013.0
Range,715711.0
Interquartile range (IQR),274581.0

0,1
Standard deviation,171723.84
Coefficient of variation (CV),0.32205712
Kurtosis,-0.95208588
Mean,533209.26
Median Absolute Deviation (MAD),135397.5
Skewness,-0.18347186
Sum,5.3320926 × 109
Variance,2.9489077 × 1010
Monotonicity,Not monotonic

Value,Count,Frequency (%)
535305,29,0.3%
476105,26,0.3%
450204,24,0.2%
406009,21,0.2%
775810,20,0.2%
520726,19,0.2%
731126,18,0.2%
355313,17,0.2%
685629,17,0.2%
666129,17,0.2%

Value,Count,Frequency (%)
120302,1,< 0.1%
130713,1,< 0.1%
130912,9,0.1%
140519,2,< 0.1%
146003,1,< 0.1%
155322,11,0.1%
160120,2,< 0.1%
166027,1,< 0.1%
170506,1,< 0.1%
170802,12,0.1%

Value,Count,Frequency (%)
836013,1,< 0.1%
826211,8,0.1%
825907,1,< 0.1%
825820,12,0.1%
825606,8,0.1%
825529,2,< 0.1%
825309,1,< 0.1%
825227,13,0.1%
825210,2,< 0.1%
821020,3,< 0.1%

0,1
Distinct,2
Distinct (%),< 0.1%
Missing,0
Missing (%),0.0%
Infinite,0
Infinite (%),0.0%
Mean,0.8294

0,1
Minimum,0
Maximum,1
Zeros,1706
Zeros (%),17.1%
Negative,0
Negative (%),0.0%
Memory size,39.2 KiB

0,1
Minimum,0
5-th percentile,0
Q1,1
median,1
Q3,1
95-th percentile,1
Maximum,1
Range,1
Interquartile range (IQR),0

0,1
Standard deviation,0.37617787
Coefficient of variation (CV),0.45355422
Kurtosis,1.0684897
Mean,0.8294
Median Absolute Deviation (MAD),0
Skewness,-1.7516495
Sum,8294
Variance,0.14150979
Monotonicity,Not monotonic

Value,Count,Frequency (%)
1,8294,82.9%
0,1706,17.1%

Value,Count,Frequency (%)
0,1706,17.1%
1,8294,82.9%

Value,Count,Frequency (%)
1,8294,82.9%
0,1706,17.1%

0,1
Distinct,4
Distinct (%),< 0.1%
Missing,0
Missing (%),0.0%
Memory size,78.2 KiB

0,1
Max length,1
Median length,1
Mean length,1
Min length,1

0,1
Total characters,10000
Distinct characters,4
Distinct categories,1 ?
Distinct scripts,1 ?
Distinct blocks,1 ?

0,1
Unique,0 ?
Unique (%),0.0%

0,1
1st row,A
2nd row,A
3rd row,A
4th row,A
5th row,A

Value,Count,Frequency (%)
a,8014,80.1%
b,854,8.5%
d,852,8.5%
c,280,2.8%

Value,Count,Frequency (%)
A,8014,80.1%
B,854,8.5%
D,852,8.5%
C,280,2.8%

Value,Count,Frequency (%)
Uppercase Letter,10000,100.0%

Value,Count,Frequency (%)
A,8014,80.1%
B,854,8.5%
D,852,8.5%
C,280,2.8%

Value,Count,Frequency (%)
Latin,10000,100.0%

Value,Count,Frequency (%)
A,8014,80.1%
B,854,8.5%
D,852,8.5%
C,280,2.8%

Value,Count,Frequency (%)
ASCII,10000,100.0%

Value,Count,Frequency (%)
A,8014,80.1%
B,854,8.5%
D,852,8.5%
C,280,2.8%

Unnamed: 0,date1,date,amount,duration,payments,client_id,birth_number,churn
date1,1.0,0.481,0.065,0.045,0.082,0.004,0.005,-0.301
date,0.481,1.0,-0.021,0.016,-0.007,-0.002,0.005,-0.29
amount,0.065,-0.021,1.0,0.792,0.799,-0.006,-0.002,-0.463
duration,0.045,0.016,0.792,1.0,0.324,0.002,-0.002,-0.334
payments,0.082,-0.007,0.799,0.324,1.0,-0.008,-0.005,-0.476
client_id,0.004,-0.002,-0.006,0.002,-0.008,1.0,0.096,0.007
birth_number,0.005,0.005,-0.002,-0.002,-0.005,0.096,1.0,0.005
churn,-0.301,-0.29,-0.463,-0.334,-0.476,0.007,0.005,1.0

Unnamed: 0,date1,date,amount,duration,payments,client_id,birth_number,churn
date1,1.0,0.327,0.167,0.03,0.207,-0.0,0.007,-0.307
date,0.327,1.0,-0.023,-0.039,-0.009,0.002,0.008,-0.253
amount,0.167,-0.023,1.0,0.83,0.718,-0.001,-0.002,-0.553
duration,0.03,-0.039,0.83,1.0,0.308,0.001,-0.002,-0.366
payments,0.207,-0.009,0.718,0.308,1.0,-0.003,-0.003,-0.516
client_id,-0.0,0.002,-0.001,0.001,-0.003,1.0,0.133,0.005
birth_number,0.007,0.008,-0.002,-0.002,-0.003,0.133,1.0,0.004
churn,-0.307,-0.253,-0.553,-0.366,-0.516,0.005,0.004,1.0

Unnamed: 0,date1,date,amount,duration,payments,client_id,birth_number,churn
date1,1.0,0.481,0.065,0.045,0.082,0.004,0.005,-0.301
date,0.481,1.0,-0.021,0.016,-0.007,-0.002,0.005,-0.29
amount,0.065,-0.021,1.0,0.792,0.799,-0.006,-0.002,-0.463
duration,0.045,0.016,0.792,1.0,0.324,0.002,-0.002,-0.334
payments,0.082,-0.007,0.799,0.324,1.0,-0.008,-0.005,-0.476
client_id,0.004,-0.002,-0.006,0.002,-0.008,1.0,0.096,0.007
birth_number,0.005,0.005,-0.002,-0.002,-0.005,0.096,1.0,0.005
churn,-0.301,-0.29,-0.463,-0.334,-0.476,0.007,0.005,1.0

Unnamed: 0,date1,date,amount,duration,payments,client_id,birth_number,churn
date1,1.0,0.331,0.06,0.038,0.057,0.002,0.004,-0.249
date,0.331,1.0,-0.003,0.016,-0.0,-0.001,0.003,-0.24
amount,0.06,-0.003,1.0,0.653,0.65,-0.004,-0.001,-0.383
duration,0.038,0.016,0.653,1.0,0.258,0.001,-0.002,-0.302
payments,0.057,-0.0,0.65,0.258,1.0,-0.005,-0.003,-0.394
client_id,0.002,-0.001,-0.004,0.001,-0.005,1.0,0.064,0.006
birth_number,0.004,0.003,-0.001,-0.002,-0.003,0.064,1.0,0.004
churn,-0.249,-0.24,-0.383,-0.302,-0.394,0.006,0.004,1.0

Unnamed: 0,frequency,date1,date,amount,duration,payments,client_id,birth_number,churn,status
frequency,1.0,0.479,0.182,0.597,0.407,0.588,0.014,0.0,0.106,0.151
date1,0.479,1.0,0.705,0.858,0.771,0.901,0.0,0.0,0.679,0.635
date,0.182,0.705,1.0,0.666,0.348,0.609,0.0,0.0,0.507,0.517
amount,0.597,0.858,0.666,1.0,0.951,0.919,0.0,0.0,0.886,0.819
duration,0.407,0.771,0.348,0.951,1.0,0.835,0.0,0.0,0.391,0.544
payments,0.588,0.901,0.609,0.919,0.835,1.0,0.0,0.0,0.835,0.708
client_id,0.014,0.0,0.0,0.0,0.0,0.0,1.0,0.299,0.0,0.0
birth_number,0.0,0.0,0.0,0.0,0.0,0.0,0.299,1.0,0.014,0.0
churn,0.106,0.679,0.507,0.886,0.391,0.835,0.0,0.014,1.0,1.0
status,0.151,0.635,0.517,0.819,0.544,0.708,0.0,0.0,1.0,1.0

Unnamed: 0,frequency,date1,date,amount,duration,payments,client_id,birth_number,churn,status
0,POPLATEK MESICNE,930113,931122,79608,24,3317.0,13886,545412,1,A
1,POPLATEK MESICNE,930113,931122,79608,24,3317.0,6967,531217,1,A
2,POPLATEK MESICNE,930113,931122,79608,24,3317.0,4557,326024,1,A
3,POPLATEK MESICNE,930113,931122,79608,24,3317.0,4258,665511,1,A
4,POPLATEK MESICNE,930113,931122,79608,24,3317.0,4251,600426,1,A
5,POPLATEK MESICNE,930113,931122,79608,24,3317.0,4030,765921,1,A
6,POPLATEK MESICNE,930113,931122,79608,24,3317.0,3949,740826,1,A
7,POPLATEK MESICNE,930113,931122,79608,24,3317.0,3920,255813,1,A
8,POPLATEK MESICNE,930113,931122,79608,24,3317.0,3698,710425,1,A
9,POPLATEK MESICNE,930113,931122,79608,24,3317.0,3690,600802,1,A

Unnamed: 0,frequency,date1,date,amount,duration,payments,client_id,birth_number,churn,status
9990,POPLATEK MESICNE,930908,940121,21072,24,878.0,3521,640720,1,A
9991,POPLATEK MESICNE,930908,940121,21072,24,878.0,3435,455527,1,A
9992,POPLATEK MESICNE,930908,940121,21072,24,878.0,2979,616222,1,A
9993,POPLATEK MESICNE,930908,940121,21072,24,878.0,2584,446215,1,A
9994,POPLATEK MESICNE,930908,940121,21072,24,878.0,2353,295917,1,A
9995,POPLATEK MESICNE,930908,940121,21072,24,878.0,1459,760908,1,A
9996,POPLATEK MESICNE,930908,940121,21072,24,878.0,1322,455922,1,A
9997,POPLATEK MESICNE,930908,940121,21072,24,878.0,1055,190517,1,A
9998,POPLATEK MESICNE,930908,940121,21072,24,878.0,773,351030,1,A
9999,POPLATEK MESICNE,930908,940121,21072,24,878.0,703,590426,1,A

Unnamed: 0,frequency,date1,date,amount,duration,payments,client_id,birth_number,churn,status,# duplicates
0,POPLATEK MESICNE,930226,940105,80952,24,3373.0,23,730529,1,A,2
1,POPLATEK MESICNE,930226,940105,80952,24,3373.0,85,610827,1,A,2
2,POPLATEK MESICNE,930226,940105,80952,24,3373.0,112,350125,1,A,2
3,POPLATEK MESICNE,930226,940105,80952,24,3373.0,351,730501,1,A,2
4,POPLATEK MESICNE,930226,940105,80952,24,3373.0,356,796207,1,A,2
5,POPLATEK MESICNE,930226,940105,80952,24,3373.0,400,331227,1,A,2
6,POPLATEK MESICNE,930226,940105,80952,24,3373.0,416,326007,1,A,2
7,POPLATEK MESICNE,930226,940105,80952,24,3373.0,424,471008,1,A,2
8,POPLATEK MESICNE,930226,940105,80952,24,3373.0,442,716216,1,A,2
9,POPLATEK MESICNE,930226,940105,80952,24,3373.0,466,476119,1,A,2
