# MIxS Triad Classification
#### Goal: Classify a GOLD biosample's `env_broad_scale`, `env_local_scale`, and `env_medium`.
The MIxS triad consists of
* env_broad_scale (biome)
* env_local_scale (geographic feature)
* env_medium (material)

For example, consider a water sample collected from pond in a rainforest. The triad values would be:
* env_broad_scale: rainforest
* env_local_scale: pond
* env_medium: water

# TOC
* [load biosample data before NER](#load-biosample-data-before-NER)
* [load runNER output](#load-runNER-output)
* [load one hot encoded runNER output](#load-one-hot-encoded-runNER-output)
* [create target sets](#create-target-sets)
* [experiment 1](#experiment-1)  
  Predict env_broad_scale using the one-hot-encoded NER results as features.  
  10 iteration AUC: **0.621**   
  100 iteration AUC: **0.714**   
* [experiment 2](#experiment-2)  
  Predict env_broad_scale using one-hot-encoded NER results and GOLD paths as features.  
  10 iteration AUC: **0.621**   
  100 iteration AUC: **0.714**   
* [experiment 3](#experiment-3)  
  Predict env_broad_scale using just the GOLD paths as features.  
  10 iteration AUC: **0.600**   
  100 iteration AUC: **0.713**  
* [experiment 4](#experiment-4)  
  note: this is slowest
  Predict env_broad_scale using just the text columns as features: biosample_name, decription, habitat, and sample_selection_site  
  10 iteration AUC: **0.710**   
  100 iteration AUC: **0.7182** 

In [2]:
import pandas as pd
from catboost import *
from catboost.utils import get_confusion_matrix, get_roc_curve
from sklearn.model_selection import train_test_split

# load biosample data before NER <a id="load-biosample-data-before-NER"/>

In [3]:
biosampleDf = pd.read_csv('../../downloads/nmdc-gold-path-ner/nmdc-biosample-table-for-ner-20201016.tsv', sep='\t')
biosampleDf.rename(columns={"GOLD_ID": "gold_id"}, inplace=True) # lower case gold id
biosampleDf.drop_duplicates(inplace=True)
len(biosampleDf)

32236

drop rows where either env_broad_scale, env_local_scale, or env_medium are null

In [4]:
biosampleDf = biosampleDf[biosampleDf["ENV_BROAD_SCALE"].notnull()]
biosampleDf = biosampleDf[biosampleDf["ENV_LOCAL_SCALE"].notnull()]
biosampleDf = biosampleDf[biosampleDf["ENV_MEDIUM"].notnull()]

In [4]:
len(biosampleDf)

26846

In [5]:
biosampleDf.head() # peek at data

Unnamed: 0,gold_id,BIOSAMPLE_NAME,DESCRIPTION,HABITAT,IDENTIFIER,SAMPLE_COLLECTION_SITE,ECOSYSTEM,ECOSYSTEM_CATEGORY,ECOSYSTEM_TYPE,ECOSYSTEM_SUBTYPE,SPECIFIC_ECOSYSTEM,BROAD_SCALE_LABEL,LOCAL_SCALE_LABEL,MEDIUM_LABEL,ENV_BROAD_SCALE,ENV_LOCAL_SCALE,ENV_MEDIUM
0,Gb0173867,Freshwater microbial communities from Amazon R...,Freshwater microbial communities from Amazon R...,Freshwater,RCJ6,river water,Environmental,Aquatic,Freshwater,River,Unclassified,freshwater river biome,river,river water,ENVO_01000253,ENVO_00000022,ENVO_01000599
1,Gb0173872,Freshwater microbial communities from Amazon R...,Freshwater microbial communities from Amazon R...,Freshwater,RCJ3,river water,Environmental,Aquatic,Freshwater,River,Unclassified,freshwater river biome,river,river water,ENVO_01000253,ENVO_00000022,ENVO_01000599
2,Gb0173903,Lake sediment microbial communtites from St. P...,Lake sediment microbial communtites from St. P...,Lake sediment,PH082_579,Lake sediment,Environmental,Aquatic,Freshwater,Lake,Sediment,freshwater lake biome,freshwater lake,lake sediment,ENVO_01000252,ENVO_00000021,ENVO_00000546
3,Gb0173935,Lake sediment microbial communtites from St. P...,Lake sediment microbial communtites from St. P...,Lake sediment,PH-EC31_na,Lake sediment,Environmental,Aquatic,Freshwater,Lake,Sediment,freshwater lake biome,freshwater lake,lake sediment,ENVO_01000252,ENVO_00000021,ENVO_00000546
4,Gb0173942,Freshwater microbial communities from thermoka...,Freshwater microbial communities from thermoka...,Freshwater,,Thermokarst lake,Environmental,Aquatic,Freshwater,Lake,Unclassified,freshwater lake biome,thermokarst lake,lake water,ENVO_01000252,ENVO_03000082,ENVO_04000007


# load runNER output <a id="load-runNER-output"/>
`runNER` was used to perform NER on biosample_name, decription, habitat, and sample_selection_site fields  
cf. biosample-analysis issue [#47](https://github.com/INCATools/biosample-analysis/issues/47)

In [6]:
nerDf = pd.read_csv('../../downloads/nmdc-gold-path-ner/runner/runNER_Output.tsv', sep='\t')

In [7]:
nerDf.head() # peek at data

Unnamed: 0,DOCUMENT ID,TYPE,START POSITION,END POSITION,MATCHED TERM,PREFERRED FORM,ENTITY ID,ZONE,SENTENCE ID,ORIGIN,UMLS CUI,SENTENCE
0,Gb0173867,biolink:OntologyClass,0,10,Freshwater,fresh water,ENVO:00002011_SYNONYM,,S1,envo.json,CUI-less,Freshwater microbial communities from Amazon R...
1,Gb0173867,biolink:OntologyClass,45,50,River,river,ENVO:00000022,,S1,envo.json,CUI-less,Freshwater microbial communities from Amazon R...
2,Gb0173867,biolink:OntologyClass,67,77,Freshwater,fresh water,ENVO:00002011_SYNONYM,,S2,envo.json,CUI-less,Freshwater microbial communities from Amazon R...
3,Gb0173867,biolink:OntologyClass,112,117,River,river,ENVO:00000022,,S2,envo.json,CUI-less,Freshwater microbial communities from Amazon R...
4,Gb0173867,biolink:OntologyClass,127,137,Freshwater,fresh water,ENVO:00002011_SYNONYM,,S3,envo.json,CUI-less,Freshwater.


# load one-hot-encoded runNER output <a id="load-one-hot-encoded-runNER-output"/>

In [8]:
onehotDf = pd.read_csv('../../target/nmdc-biosample-one-hot.tsv', sep='\t')
onehotDf.drop_duplicates(inplace=True)
len(onehotDf)

25015

In [9]:
onehotDf.head()

Unnamed: 0,gold_id,fresh water,river,river water,water,liquid water,saline evaporation pond,lake bed,container of an intermittent saline lake,bayou,...,cave entrance,house,flume,irrigation canal,canalized stream,drainage canal,irrigation ditch,canal,chernozem,Earth
0,Gb0173867,1,1,1,1,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Gb0173872,1,1,1,1,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Gb0173903,0,0,0,0,0,1,1,1,1,...,0,0,0,0,0,0,0,0,0,0
3,Gb0173935,0,0,0,0,0,1,1,1,1,...,0,0,0,0,0,0,0,0,0,0
4,Gb0173942,1,0,0,0,0,1,1,1,1,...,0,0,0,0,0,0,0,0,0,0


**There are fewer rows in the one-hot output than in the biosample data. So, subset the biosample data.**  
**note:** We end up with less rows than the one-hot-encoded data b/c we dropped null values from biosample data above.

In [11]:
onehotIds = list(onehotDf["gold_id"])
len(onehotIds)

25015

In [12]:
subsetDf = biosampleDf[biosampleDf["gold_id"].isin(onehotIds)]
subsetDf.drop_duplicates(inplace=True)
len(subsetDf)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return func(*args, **kwargs)


20220

**Due to difference between the one-hot and subset data, there are fewer rows in the one-hot data**  
Subset one-hot to ids in subset

In [16]:
subsetIds = list(subsetDf["gold_id"])
onehotDf = onehotDf[onehotDf["gold_id"].isin(subsetIds)]
onehotDf.drop_duplicates(inplace=True)
assert len(onehotDf) == len(subsetDf)

# create target sets <a id="create-target-sets" /> 
A target set is created for each of the MIxS triad terms.

In [18]:
# the sklearn convention is to use y for targets/labels
y_broad = subsetDf[["ENV_BROAD_SCALE"]]
y_local = subsetDf[["ENV_LOCAL_SCALE"]]
y_medium = subsetDf[["ENV_MEDIUM"]]
# y_broad["ENV_BROAD_SCALE"].unique() # peek at data

# experiment 1 <a id="experiment-1"/>
Create feature set using just the one-hot-encode data.

In [19]:
X = onehotDf.copy()
X.pop("gold_id") # drop the gold_id
assert len(X) == len(subsetDf) # verify that the lengths are the same

## classify env_broad_scale

In [38]:
cat_features = list(X.columns)

In [23]:
# if we try to stratify (i.e., pass the param stratify=y_broad["ENV_BROAD_SCALE"].values) it results in an error:
# The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.
X_train, X_test, y_train, y_test = train_test_split(X, y_broad, test_size=0.2, random_state=42)

#### run classifier for 10 iterations

In [20]:
# see https://www.kaggle.com/mitribunskiy/tutorial-catboost-overview for param examples
# supported metrics https://catboost.ai/en/references/custom-metric__supported-metrics
model_10 = CatBoostClassifier(iterations=10, eval_metric='AUC', random_seed=1, use_best_model=True, verbose=5)

In [39]:
# set logging_level='Silent' to turn of output; set plot=True to see figure
model_10.fit(X_train, y_train, eval_set=(X_test, y_test), cat_features=cat_features)

Learning rate set to 0.5
0:	test: 0.6211100	best: 0.6211100 (0)	total: 389ms	remaining: 3.5s
5:	test: 0.5658558	best: 0.6211100 (0)	total: 2.24s	remaining: 1.5s
9:	test: 0.5969272	best: 0.6211100 (0)	total: 3.76s	remaining: 0us

bestTest = 0.6211100304
bestIteration = 0

Shrink model to first 1 iterations.


<catboost.core.CatBoostClassifier at 0x144838df0>

#### run classifier for 100 iterations

In [26]:
model_100 = CatBoostClassifier(iterations=100, eval_metric='AUC', random_seed=1, use_best_model=True, verbose=20)

In [40]:
# set logging_level='Silent' to turn of output; set plot=True to see figure
model_100.fit(X_train, y_train, eval_set=(X_test, y_test), cat_features=cat_features)

Learning rate set to 0.275109
0:	test: 0.6211100	best: 0.6211100 (0)	total: 403ms	remaining: 39.9s
20:	test: 0.6976440	best: 0.6976440 (20)	total: 8.26s	remaining: 31.1s
40:	test: 0.7087091	best: 0.7087091 (40)	total: 17s	remaining: 24.4s
60:	test: 0.7123104	best: 0.7123104 (60)	total: 25.6s	remaining: 16.4s
80:	test: 0.7137574	best: 0.7137574 (80)	total: 34.3s	remaining: 8.05s
99:	test: 0.7146239	best: 0.7146290 (97)	total: 42.7s	remaining: 0us

bestTest = 0.7146289515
bestIteration = 97

Shrink model to first 98 iterations.


<catboost.core.CatBoostClassifier at 0x144844d00>

# experiment 2 <a id="experiment-2"/>
Create feature set by adding gold path info to one-hot-encoded data

In [41]:
# list of biosample columns to use (note: need to include gold_id for merging)
gold_paths = ["gold_id", "ECOSYSTEM", "ECOSYSTEM_CATEGORY", "ECOSYSTEM_TYPE", "ECOSYSTEM_SUBTYPE", "SPECIFIC_ECOSYSTEM"]

# the sklearn convention is to use X for the feature set
X = pd.merge(onehotDf, subsetDf[gold_paths], how="inner", on="gold_id")
X.pop("gold_id") # drop the gold_id
assert len(X) == len(subsetDf) # verify that the lengths are the same

## classify env_broad_scale

In [42]:
cat_features = \
    list(X.columns) + ["ECOSYSTEM", "ECOSYSTEM_CATEGORY", "ECOSYSTEM_TYPE", "ECOSYSTEM_SUBTYPE", "SPECIFIC_ECOSYSTEM"]

In [43]:
# if we try to stratify (i.e., pass the param stratify=y_broad["ENV_BROAD_SCALE"].values) it results in an error:
# The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.
X_train, X_test, y_train, y_test = train_test_split(X, y_broad, test_size=0.2, random_state=42)

#### run classifier for 10 iterations

In [44]:
# see https://www.kaggle.com/mitribunskiy/tutorial-catboost-overview for param examples
# supported metrics https://catboost.ai/en/references/custom-metric__supported-metrics
model_10 = CatBoostClassifier(iterations=10, eval_metric='AUC', random_seed=1, use_best_model=True, verbose=5)

In [45]:
# set logging_level='Silent' to turn of output; set plot=True to see figure
model_10.fit(X_train, y_train, cat_features=cat_features, eval_set=(X_test, y_test))

Learning rate set to 0.5
0:	test: 0.6211100	best: 0.6211100 (0)	total: 406ms	remaining: 3.66s
5:	test: 0.5658558	best: 0.6211100 (0)	total: 2.25s	remaining: 1.5s
9:	test: 0.5969272	best: 0.6211100 (0)	total: 3.73s	remaining: 0us

bestTest = 0.6211100304
bestIteration = 0

Shrink model to first 1 iterations.


<catboost.core.CatBoostClassifier at 0x144a755e0>

#### run classifier for 100 iterations

In [46]:
model_100 = CatBoostClassifier(iterations=100, eval_metric='AUC', random_seed=1, use_best_model=True, verbose=20)

In [47]:
# set logging_level='Silent' to turn of output; set plot=True to see figure
model_100.fit(X_train, y_train, cat_features=cat_features, eval_set=(X_test, y_test))

Learning rate set to 0.275109
0:	test: 0.6211100	best: 0.6211100 (0)	total: 473ms	remaining: 46.8s
20:	test: 0.6976440	best: 0.6976440 (20)	total: 8.77s	remaining: 33s
40:	test: 0.7087091	best: 0.7087091 (40)	total: 17.5s	remaining: 25.2s
60:	test: 0.7123104	best: 0.7123104 (60)	total: 26.2s	remaining: 16.7s
80:	test: 0.7137574	best: 0.7137574 (80)	total: 34.9s	remaining: 8.18s
99:	test: 0.7146239	best: 0.7146290 (97)	total: 43.2s	remaining: 0us

bestTest = 0.7146289515
bestIteration = 97

Shrink model to first 98 iterations.


<catboost.core.CatBoostClassifier at 0x188988130>

# experiment 3 <a id="experiment-3" />
Create feature set using only GOLD paths

In [63]:
# list of biosample columns to use (note: need to include gold_id for merging)
gold_paths = ["ECOSYSTEM", "ECOSYSTEM_CATEGORY", "ECOSYSTEM_TYPE", "ECOSYSTEM_SUBTYPE", "SPECIFIC_ECOSYSTEM"]

# the sklearn convention is to use X for the feature set
X = subsetDf[gold_paths]
assert len(X) == len(subsetDf) # verify that the lengths are the same

## classify env_broad_scale

In [64]:
cat_features = list(X.columns)

In [65]:
# if we try to stratify (i.e., pass the param stratify=y_broad["ENV_BROAD_SCALE"].values) it results in an error:
# The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.
X_train, X_test, y_train, y_test = train_test_split(X, y_broad, test_size=0.2, random_state=42)

#### run classifier for 10 iterations

In [66]:
# see https://www.kaggle.com/mitribunskiy/tutorial-catboost-overview for param examples
# supported metrics https://catboost.ai/en/references/custom-metric__supported-metrics
model_10 = CatBoostClassifier(iterations=10, eval_metric='AUC', random_seed=1, use_best_model=True, verbose=5)

In [67]:
# set logging_level='Silent' to turn of output; set plot=True to see figure
model_10.fit(X_train, y_train, eval_set=(X_test, y_test), cat_features=cat_features)

Learning rate set to 0.5
0:	test: 0.6002596	best: 0.6002596 (0)	total: 420ms	remaining: 3.78s
5:	test: 0.4803720	best: 0.6002596 (0)	total: 2.04s	remaining: 1.36s
9:	test: 0.4666306	best: 0.6002596 (0)	total: 3.37s	remaining: 0us

bestTest = 0.6002596345
bestIteration = 0

Shrink model to first 1 iterations.


<catboost.core.CatBoostClassifier at 0x188036df0>

#### run classifier for 100 iterations

In [68]:
model_100 = CatBoostClassifier(iterations=100, eval_metric='AUC', random_seed=1, use_best_model=True, verbose=20)

In [69]:
# set logging_level='Silent' to turn of output; set plot=True to see figure
model_100.fit(X_train, y_train, eval_set=(X_test, y_test), cat_features=cat_features)

Learning rate set to 0.275109
0:	test: 0.6002596	best: 0.6002596 (0)	total: 468ms	remaining: 46.4s
20:	test: 0.7106030	best: 0.7106030 (20)	total: 8.4s	remaining: 31.6s
40:	test: 0.7125787	best: 0.7125787 (40)	total: 16.5s	remaining: 23.7s
60:	test: 0.7128292	best: 0.7128394 (55)	total: 24.5s	remaining: 15.7s
80:	test: 0.7128616	best: 0.7128616 (80)	total: 32.8s	remaining: 7.7s
99:	test: 0.7128956	best: 0.7129017 (97)	total: 40.3s	remaining: 0us

bestTest = 0.7129017198
bestIteration = 97

Shrink model to first 98 iterations.


<catboost.core.CatBoostClassifier at 0x188967a90>

# experiment 4 <a id="experiment-4"/>
Create feature set using the textual columns of the biosample data: biosample_name, decription, habitat, and sample_selection_site

In [73]:
text_cols = ["BIOSAMPLE_NAME", "DESCRIPTION", "HABITAT", "SAMPLE_COLLECTION_SITE"]

In [74]:
# the sklearn convention is to use X for the feature set
X = subsetDf[text_cols]
assert len(X) == len(subsetDf) # verify that the lengths are the same 

## classify env_broad_scale

In [76]:
# if we try to stratify (i.e., pass the param stratify=y_broad["ENV_BROAD_SCALE"].values) it results in an error:
# The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.
X_train, X_test, y_train, y_test = train_test_split(X, y_broad, test_size=0.2, random_state=42)

Use Pools as a convenience for combining features.  
see https://towardsdatascience.com/unconventional-sentiment-analysis-bert-vs-catboost-90645f2437a9

In [78]:
train_pool = Pool(
    data=X_train,
    label=y_train,
    text_features=text_cols
)

test_pool = Pool(
    data=X_test, 
    label=y_test,
    text_features=text_cols
)

#### run classifier for 10 iterations

In [79]:
# see https://www.kaggle.com/mitribunskiy/tutorial-catboost-overview for param examples
# supported metrics https://catboost.ai/en/references/custom-metric__supported-metrics
model_10 = CatBoostClassifier(iterations=10, eval_metric='AUC', random_seed=1, use_best_model=True, verbose=5)

In [81]:
# set logging_level='Silent' to turn of output; set plot=True to see figure
model_10.fit(train_pool, eval_set=test_pool)

Learning rate set to 0.5
0:	test: 0.5635498	best: 0.5635498 (0)	total: 4.69s	remaining: 42.3s
5:	test: 0.7025089	best: 0.7025089 (5)	total: 30.8s	remaining: 20.5s
9:	test: 0.7109578	best: 0.7109578 (9)	total: 52s	remaining: 0us

bestTest = 0.7109577871
bestIteration = 9



<catboost.core.CatBoostClassifier at 0x1448dadc0>

#### run classifier for 100 iterations

In [83]:
model_100 = CatBoostClassifier(iterations=100, eval_metric='AUC', random_seed=1, use_best_model=True, verbose=20)

In [85]:
# set logging_level='Silent' to turn of output; set plot=True to see figure
model_100.fit(train_pool, eval_set=test_pool)

Learning rate set to 0.275109
0:	test: 0.5635498	best: 0.5635498 (0)	total: 4.89s	remaining: 8m 4s
20:	test: 0.7150257	best: 0.7150257 (20)	total: 1m 55s	remaining: 7m 13s
40:	test: 0.7162352	best: 0.7162352 (40)	total: 3m 56s	remaining: 5m 40s
60:	test: 0.7171351	best: 0.7171351 (60)	total: 6m	remaining: 3m 50s
80:	test: 0.7180597	best: 0.7180597 (80)	total: 8m 4s	remaining: 1m 53s
99:	test: 0.7182775	best: 0.7182775 (99)	total: 10m 21s	remaining: 0us

bestTest = 0.7182774832
bestIteration = 99



<catboost.core.CatBoostClassifier at 0x188051940>

# tuning ...

In [None]:
train_pool = Pool(
    data=X_train,
    label=y_train,
    text_features=text_cols
)

test_pool = Pool(
    data=X_test, 
    label=y_test,
    text_features=text_cols
)

In [148]:
tokenizer = \
    {
        'tokenizer_id': 'Sense', # not sure what this does
        'separator_type': 'BySense',
        'lowercasing': 'True',
        #'lemmatizing': 'True', # not implemented
        'token_types':['Word', 'Number', 'SentenceBreak'],
        'sub_tokens_policy':'SeveralTokens',
        #'sub_tokens_policy':'SingleToken'
    }  

dictionary = \
    {
        'dictionary_id': 'Word',
        'max_dictionary_size': '50000',
        'dictionary_type': 'FrequencyBased',
        #'dictionary_type': 'Bpe',
        'num_bpe_units': 3
    }

# feature_calcer = 'BoW:top_tokens_count=10000' # 10 iterations -> 0.692
feature_calcer = 'NaiveBayes' # 10 iterations -> 0.715
# feature_calcer = 'BM25' # 10 iterations -> 0.716

In [149]:
model_10 = CatBoostClassifier(
    iterations=10, 
    eval_metric='AUC', 
    random_seed=1, 
    use_best_model=True, 
    verbose=5,
    #tokenizers=[tokenizer], 
    #dictionaries=[dictionary],
    feature_calcers=[feature_calcer]
)

In [150]:
model_10.fit(train_pool, eval_set=test_pool)

Learning rate set to 0.5
0:	test: 0.5662492	best: 0.5662492 (0)	total: 895ms	remaining: 8.06s
5:	test: 0.7109790	best: 0.7109790 (5)	total: 5.81s	remaining: 3.87s
9:	test: 0.7150713	best: 0.7150713 (9)	total: 10.2s	remaining: 0us

bestTest = 0.7150712732
bestIteration = 9



<catboost.core.CatBoostClassifier at 0x1874b17f0>

In [151]:
model_100 = CatBoostClassifier(
    iterations=100,
    eval_metric='AUC', 
    random_seed=1, 
    use_best_model=True, 
    verbose=5,
    #tokenizers=[tokenizer], 
    #dictionaries=[dictionary],
    feature_calcers=[feature_calcer]
)

In [152]:
model_100.fit(train_pool, eval_set=test_pool)

Learning rate set to 0.275109
0:	test: 0.5662492	best: 0.5662492 (0)	total: 880ms	remaining: 1m 27s
5:	test: 0.7072197	best: 0.7072197 (5)	total: 5.42s	remaining: 1m 24s
10:	test: 0.7154631	best: 0.7154631 (10)	total: 10.6s	remaining: 1m 25s
15:	test: 0.7163519	best: 0.7163519 (15)	total: 15.1s	remaining: 1m 19s
20:	test: 0.7166305	best: 0.7166305 (20)	total: 20s	remaining: 1m 15s
25:	test: 0.7168103	best: 0.7170166 (22)	total: 24.9s	remaining: 1m 10s
30:	test: 0.7170744	best: 0.7171693 (29)	total: 30.1s	remaining: 1m 6s
35:	test: 0.7172732	best: 0.7172732 (35)	total: 36.1s	remaining: 1m 4s
40:	test: 0.7172733	best: 0.7173024 (36)	total: 40.7s	remaining: 58.6s
45:	test: 0.7174521	best: 0.7174521 (45)	total: 45.4s	remaining: 53.3s
50:	test: 0.7177022	best: 0.7177022 (50)	total: 49.9s	remaining: 47.9s
55:	test: 0.7177504	best: 0.7177504 (55)	total: 54.6s	remaining: 42.9s
60:	test: 0.7180693	best: 0.7180948 (59)	total: 59.4s	remaining: 38s
65:	test: 0.7183358	best: 0.7183358 (65)	total: 1

<catboost.core.CatBoostClassifier at 0x18803f370>