# Language-based generative models

LSTM and GPT-based vanila chemical language models have been implemented and simple usage is explained. Development history and statistics for parameter selection are reported separately in `lstm_generative_model_flow.md`.

Currently, LSTM-based model is stable and should be used for language modeling. The GPT-based model is okay, but still hard to understand its behaivors in particular hyperparameters (nblocks, nheads, etc).

In [2]:
import sys
sys.path.append('../')
from src.paths import ensure_dirs, PRETRAIN_DATA, PRETRAIN_RESULTS, FINETUNE_FILTER
ensure_dirs()
print(f'Fine-tuning data status: {FINETUNE_FILTER}')

Fine-tuning data status: filtered


## 0. Function to run command line application on a jupyter notebook

In [2]:
def importstr(module_str, from_=None):
	"""
	module_str: module to be loaded as string 
	>>> importstr('os) -> <module 'os'>
	"""
	if (from_ is None) and ':' in module_str:
		module_str, from_ = module_str.rsplit(':')
	module = __import__(module_str)
	for sub_str in module_str.split('.')[1:]:
		module = getattr(module, sub_str)
	
	if from_:
		try:
			return getattr(module, from_)
		except:
			raise ImportError(f'{module_str}.{from_}')
	return module


def run(app, *argv):
	argv=list(argv)
	app_cls=importstr(*app.rsplit('.',1))
	app_cls(argv).main()

## 1. Tokenize SMILES
To determine which tokens are used ('[C@H]', 'c', '8', etc.), a data set is tokenized to calculate the frequency of tokens. This process can be skipped if you already know the tokens to be used for your generative models.

### 1. Calculate frequency of tokens
To calcualte the frequency of tokens, `apps.count_token_frequency.py` is used. Example data set is provided as well in the data folder.



If you want to run the script as a command line program. Use the following codes. If you want to run on a notebook, run the codes in the next cell.
```
$ python generative_models/apps/count_token.py --data data/top2000_curated_cpds_chembl31.tsv --smi-colname washed_openeye_smiles --outdir tokenize_results --heavy-atom-ratio-thres 0.95
```

In [None]:
dataset_list = ['pubchem_filtered_ac', 
                'pubchem_unfiltered_ac', 
                'pubchem_inac', 
                'chembl_filtered', 
                'chembl_unfiltered', 
                'zinc']

In [None]:
import sys 
sys.path.append('../src/model/')

for dataset in dataset_list:
	run("generative_models.apps.CountTokens.CountTokenFreqApp",
		f"--data={PRETRAIN_DATA}/{dataset}_rdsmi3.tsv",
		"--smi-colname=rdkit_smiles",
		f"--outdir={PRETRAIN_RESULTS}/{dataset}_results/tokenize_results",
		"--heavy-atom-ratio-thres=0.95")

Start logging
Loaded molecules: 125149


cannnot handeld mols in rdkit: 0
passed mols: 125149
maximum heavy atom number is 128
minimum heavy atom number is 5
the NHA percentile 0.95
the NHA threshold 38.0
selected molecules: 119665


Then, you can find **tokenize_resluts** folder specficed as input arguments. Looking at token frequencies in `token_requency.tsv`, you can determine which tokens you are going to use for a subsequent analysis, as follows.

In [5]:
import pandas as pd 
freq = pd.read_csv(f'{PRETRAIN_RESULTS}/{dataset}_results/tokenize_results/token_frequency.tsv', sep='\t', index_col=0)
print(freq)


        frequency
c         1342612
C          898109
(          510468
)          510468
1          352208
O          325534
2          260868
=          239737
N          201959
n          135041
3          118084
S           39204
F           34531
-           28161
4           27192
Cl          23110
s           19913
/           18058
o           15958
[C@H]       15689
[C@@H]      15345
[nH]        11740
[N+]         7766
[O-]         7745
#            7281
Br           6537
\            4300
5            2970
I             767
[C@@]         516
[C@]          501
P             342
[n+]          306
6             198
7              18
[c-]           15
[S@@]           9
B               6
[S+]            6
[P+]            3
[s+]            3
[As]            3


There is a gap between `[+]` and `6` as below:
|Token| Frequency|
|----|----|
|[S+]|183|
|[n+]|178|
|6|44|

Thus, tokens with more than 100 frequencies are employed here.


In [6]:
ok_tokens = freq[freq['frequency'] > 100].index
ok_tokens

Index(['c', 'C', '(', ')', '1', 'O', '2', '=', 'N', 'n', '3', 'S', 'F', '-',
       '4', 'Cl', 's', '/', 'o', '[C@H]', '[C@@H]', '[nH]', '[N+]', '[O-]',
       '#', 'Br', '\', '5', 'I', '[C@@]', '[C@]', 'P', '[n+]', '6'],
      dtype='object')

### 2. Select eligible molecules and vocabulary set (dictionarr) for generation
We decided the tokens eligible for generative models. The training data set `ok_mols.tsv` were further filtered based on the selected tokens above.
`app.select_mols_vocab.py` is conducted.

In [None]:
import sys 
sys.path.append('../src/model/')

for dataset in dataset_list:
	appname  = 'generative_models.apps.SelectMolsVocab.SelectMolsVocabApp'
	run(appname,
		f"--data={PRETRAIN_RESULTS}/{dataset}_results/tokenize_results/ok_mols.tsv",
		"--smi-colname=rdkit_smiles",
		f"--tokens={PRETRAIN_RESULTS}/{dataset}_results/tokenize_results/token_frequency.tsv",
		f"--outdir={PRETRAIN_RESULTS}/{dataset}_results/dataset4lstm",
		"--token-frequency-thres=50")

Start logging
Loaded molecules: 119665
curating tokens: frequency threshold: 50
ok tokens: 34
Index(['c', 'C', '(', ')', '1', 'O', '2', '=', 'N', 'n', '3', 'S', 'F', '-',
       '4', 'Cl', 's', '/', 'o', '[C@H]', '[C@@H]', '[nH]', '[N+]', '[O-]',
       '#', 'Br', '\', '5', 'I', '[C@@]', '[C@]', 'P', '[n+]', '6'],
      dtype='object')
eligible smiles: 119623


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  okmols['ntokens_srt_end'] = okmols['rdkit_smiles'].apply(lambda x: count_moltokens(x, include_begin_and_end=True))


You can find `dataset4lstm` folder, where passed_molecules tsv and vocaburary dictionary pickle files are stored. These two files are used for training LSTM model.

## 2 Build LSTM model
An LSTM model is trained on the data set specified. Various options are available. You can decide appropriate one by traial, or you can read the report on benchmark testing regarding hyperparameteres.

In [None]:
import sys 
sys.path.append('../src/model/')

for dataset in dataset_list:
	dataset_dir = f'{PRETRAIN_RESULTS}/{dataset}_results/dataset4lstm'

	appname  	= 'generative_models.apps.BuildModel.TrainerApp'
	nworkers 	= 20
	data 	   	= f'{dataset_dir}/passed_mols.tsv'
	vocab 		= f'{dataset_dir}/vocab.pickle'
	outfd 		= f'{PRETRAIN_RESULTS}/{dataset}_results/vanilalstm'
	smicol     	= 'rdkit_smiles'
	valratio 	= 0.1
	espatience 	= 4
	epochs 		= 30
	nsamples 	= 1000
	batch_size  = 128
	save_snapshot = '10,20,30'
	# model parameters 
	model 		= 'lstm'
	lr 			= 0.0005
	embed		= 256
	nlayers 	= 4
	hidden  	= 512
	dropout 	= 0.2
	layernorm 	= True


	general_params = [
				f'--num-workers={nworkers}',
				'--override-folder=1',
				f'--epochs={epochs}', 
				f'--validation-ratio={valratio}',
				'--exclude-pad-loss=1',
				f'--early-stopping-patience={espatience}',
				f'--data={data}',
				f'--smi-colname={smicol}',
				f'--vocab={vocab}',
				f'--outdir={outfd}',
				f'--sampling-epoch={nsamples}',
				f'--batch-size={batch_size}',
				f'--save-snapshot-models={save_snapshot}'
				]

	model_params = [
		f'--model={model}',
		f'--lr={lr}',
		f'--embed-dim={embed}',
		f'--dropout-ratio={dropout}',
		f'--nlayers={nlayers}',
		f'--hidden-dim={hidden}',
		'--layernorm' if layernorm else '--no-layernorm'
	]
	params = general_params + model_params
	run(appname, *params)


lstm
Start logging
CUDA is on: 1 devices
Model initialization is done
Starting: GenerativeModelTrainer, Namespace(num_workers=20, outdir='/home/abe/Paper/Pretraining-Assesment-for-LSTM-Molecular-Generation/results/pretrain/pubchem_inac_results/vanilalstm', case='', override_folder=1, tensorboard_prefix='tensor_board', data='/home/abe/Paper/Pretraining-Assesment-for-LSTM-Molecular-Generation/results/pretrain/pubchem_inac_results/dataset4lstm/passed_mols.tsv', smi_colname='rdkit_smiles', vocab='/home/abe/Paper/Pretraining-Assesment-for-LSTM-Molecular-Generation/results/pretrain/pubchem_inac_results/dataset4lstm/vocab.pickle', random_seed=42, debug=None, sampling_epoch=1000, model='lstm', write_xlsx=None, save_snapshot_models='10,20,30', batch_size=128, epochs=30, validation_ratio=0.1, lr=0.0005, exclude_pad_loss=1, early_stopping_patience=4, load_model=None, model_structure=None, model_state=None, standardize_smiles=None, embed_dim=256, dropout_ratio=0.2, nlayers=4, hidden_dim=512, layer

[19:03:55] Can't kekulize mol.  Unkekulized atoms: 2 3 4 5 18
[19:03:55] SMILES Parse Error: extra open parentheses for input: 'Cc1ccc(NC1Cn2cc(NC(=O)Nc3ccc(OC(F)(F)F)cc3F)nc3nc2s1'
[19:03:55] SMILES Parse Error: unclosed ring for input: 'CC1(C)C(C)C(C)N2CCCCN(C(=O)NCc3ccccc3)C1'
[19:03:55] Explicit valence for atom # 9 C, 5, is greater than permitted
[19:03:55] SMILES Parse Error: unclosed ring for input: 'COC(=O)C1/C(=N/NC2=O)c(=O)oc1N'
[19:03:55] Can't kekulize mol.  Unkekulized atoms: 1 2 20 21 22 23 24 25 26
[19:03:55] SMILES Parse Error: unclosed ring for input: 'CCCCC(=O)Nc1cccc(-n2nc(=S)n(C3CC(=O)N(C)C)(C(=O)N(C)C)C2)c1'
[19:03:55] Can't kekulize mol.  Unkekulized atoms: 19 20 21 22 23
[19:03:55] SMILES Parse Error: unclosed ring for input: 'Cn1c(S(=O)(=O)Nc2ccc(F)cc2)nc2c1ccccc12'
[19:03:55] SMILES Parse Error: unclosed ring for input: 'COc1ccc(C(=O)Nc2ccc([N+](=O)[O-])c3nccn22)cc1'
[19:03:55] SMILES Parse Error: unclosed ring for input: 'COc1cc(-c2cc(C(=O)NCc3ccccc3)sn2c2CNC(

Epoch 2 -- Batch 1/ 842, training loss 0.6875993013381958
Epoch 2 -- Batch 2/ 842, training loss 0.7156004309654236
Epoch 2 -- Batch 3/ 842, training loss 0.6980999112129211
Epoch 2 -- Batch 4/ 842, training loss 0.7139374017715454
Epoch 2 -- Batch 5/ 842, training loss 0.6910369992256165
Epoch 2 -- Batch 6/ 842, training loss 0.7170813083648682
Epoch 2 -- Batch 7/ 842, training loss 0.7075843214988708
Epoch 2 -- Batch 8/ 842, training loss 0.7003893852233887
Epoch 2 -- Batch 9/ 842, training loss 0.7086516618728638
Epoch 2 -- Batch 10/ 842, training loss 0.7113326787948608
Epoch 2 -- Batch 11/ 842, training loss 0.6944671869277954
Epoch 2 -- Batch 12/ 842, training loss 0.727820634841919
Epoch 2 -- Batch 13/ 842, training loss 0.6805467009544373
Epoch 2 -- Batch 14/ 842, training loss 0.7049128413200378
Epoch 2 -- Batch 15/ 842, training loss 0.6942419409751892
Epoch 2 -- Batch 16/ 842, training loss 0.7016972303390503
Epoch 2 -- Batch 17/ 842, training loss 0.6909859776496887
Epoch 2

[19:04:05] SMILES Parse Error: syntax error while parsing: Cc1ccc(NC(=O)COc2c(C)[nH]c(=O)c3cc(-c4ccc(Cl)cc4)nc(4)c23)cc1
[19:04:05] SMILES Parse Error: Failed parsing SMILES 'Cc1ccc(NC(=O)COc2c(C)[nH]c(=O)c3cc(-c4ccc(Cl)cc4)nc(4)c23)cc1' for input: 'Cc1ccc(NC(=O)COc2c(C)[nH]c(=O)c3cc(-c4ccc(Cl)cc4)nc(4)c23)cc1'
[19:04:05] Can't kekulize mol.  Unkekulized atoms: 1 2 4 5 6 19 20
[19:04:05] Can't kekulize mol.  Unkekulized atoms: 4 5 6 7 8 9 10
[19:04:05] Can't kekulize mol.  Unkekulized atoms: 2 3 4 5 6 10 12 13 14 15 16
[19:04:05] SMILES Parse Error: unclosed ring for input: 'CC(C)=CCn1c(=S)sc2nc3c(cc22)CCCC3'
[19:04:05] SMILES Parse Error: unclosed ring for input: 'Cc1cccc2c(c1)ccc(C(=O)NC(=O)C2CC2)c1C'
[19:04:05] SMILES Parse Error: extra close parentheses while parsing: Cc1ccc2[nH]c(=O)c(SCC(=O)N3CCc4ccccc43)n2)cc1
[19:04:05] SMILES Parse Error: Failed parsing SMILES 'Cc1ccc2[nH]c(=O)c(SCC(=O)N3CCc4ccccc43)n2)cc1' for input: 'Cc1ccc2[nH]c(=O)c(SCC(=O)N3CCc4ccccc43)n2)cc1'
[19:04:05] 

Epoch 3 -- Batch 1/ 842, training loss 0.5862991213798523
Epoch 3 -- Batch 2/ 842, training loss 0.6026047468185425
Epoch 3 -- Batch 3/ 842, training loss 0.6008363366127014
Epoch 3 -- Batch 4/ 842, training loss 0.6324578523635864
Epoch 3 -- Batch 5/ 842, training loss 0.5863271355628967
Epoch 3 -- Batch 6/ 842, training loss 0.6186871528625488
Epoch 3 -- Batch 7/ 842, training loss 0.6066606640815735
Epoch 3 -- Batch 8/ 842, training loss 0.6082261800765991
Epoch 3 -- Batch 9/ 842, training loss 0.6186850070953369
Epoch 3 -- Batch 10/ 842, training loss 0.6124225854873657
Epoch 3 -- Batch 11/ 842, training loss 0.6059475541114807
Epoch 3 -- Batch 12/ 842, training loss 0.5809412598609924
Epoch 3 -- Batch 13/ 842, training loss 0.5831202864646912
Epoch 3 -- Batch 14/ 842, training loss 0.6151729822158813
Epoch 3 -- Batch 15/ 842, training loss 0.5801334977149963
Epoch 3 -- Batch 16/ 842, training loss 0.6074866056442261
Epoch 3 -- Batch 17/ 842, training loss 0.5938974618911743
Epoch 

[19:04:15] Can't kekulize mol.  Unkekulized atoms: 2 3 16
[19:04:15] Can't kekulize mol.  Unkekulized atoms: 10 11 13 14 15 18
[19:04:15] Explicit valence for atom # 23 O, 3, is greater than permitted
[19:04:15] Can't kekulize mol.  Unkekulized atoms: 17 19 20 21 22 23 24
[19:04:15] Explicit valence for atom # 1 C, 5, is greater than permitted
[19:04:15] Can't kekulize mol.  Unkekulized atoms: 4 5 6 7 8
[19:04:15] SMILES Parse Error: unclosed ring for input: 'CN(C)c1cccc(CNC(=O)C[C@@H]2C[C@@H]3c4ccc(C5=CC5CC(=O)N4CC5)ccc3[C@H]23)c1'
[19:04:15] SMILES Parse Error: unclosed ring for input: 'C[C@]12N=C(C(=O)N(CC)C3c2ccccc2Cl)N1'
[19:04:15] Can't kekulize mol.  Unkekulized atoms: 1 2 3 4 5
[19:04:15] Can't kekulize mol.  Unkekulized atoms: 6 7 8
[19:04:15] Can't kekulize mol.  Unkekulized atoms: 2 5 7 8 20 21 22
[19:04:15] Can't kekulize mol.  Unkekulized atoms: 22 23 24 25 27
[19:04:15] SMILES Parse Error: unclosed ring for input: 'COc1cc(S(=O)(=O)N2CCC2c3ccccc3)ccc2N(C(=O)CSC)C(C)C'
[19:

Epoch 4 -- Batch 1/ 842, training loss 0.5487864017486572
Epoch 4 -- Batch 2/ 842, training loss 0.5496375560760498
Epoch 4 -- Batch 3/ 842, training loss 0.5429785251617432
Epoch 4 -- Batch 4/ 842, training loss 0.5710179805755615
Epoch 4 -- Batch 5/ 842, training loss 0.5716375112533569
Epoch 4 -- Batch 6/ 842, training loss 0.5560423135757446
Epoch 4 -- Batch 7/ 842, training loss 0.5667555928230286
Epoch 4 -- Batch 8/ 842, training loss 0.5533558130264282
Epoch 4 -- Batch 9/ 842, training loss 0.5472891330718994
Epoch 4 -- Batch 10/ 842, training loss 0.5442296266555786
Epoch 4 -- Batch 11/ 842, training loss 0.5551198124885559
Epoch 4 -- Batch 12/ 842, training loss 0.5374045372009277
Epoch 4 -- Batch 13/ 842, training loss 0.5521165132522583
Epoch 4 -- Batch 14/ 842, training loss 0.547013521194458
Epoch 4 -- Batch 15/ 842, training loss 0.5495086908340454
Epoch 4 -- Batch 16/ 842, training loss 0.5729098320007324
Epoch 4 -- Batch 17/ 842, training loss 0.5376161932945251
Epoch 4

[19:04:26] Can't kekulize mol.  Unkekulized atoms: 3 4 5 13 14
[19:04:26] SMILES Parse Error: unclosed ring for input: 'CSc1ncc(CNc2cccn(CC)c2C)n2c1ccnc12'
[19:04:26] Can't kekulize mol.  Unkekulized atoms: 11 12 20
[19:04:26] SMILES Parse Error: unclosed ring for input: 'CCS(=O)(=O)N1CCCC1C(=O)NC1CCC2(C)CCN(C(=O)/C=C/c2cccs2)C1'
[19:04:26] SMILES Parse Error: unclosed ring for input: 'CC[C@@H]1CN([C@@H](C)CO)S(=O)(=O)c2ccc(C#Cc3ccc(F)cc3)cc2O[C@@H]1CN(C)CC1C'
[19:04:26] Can't kekulize mol.  Unkekulized atoms: 6 7 8 14 15
[19:04:26] Explicit valence for atom # 15 Cl, 2, is greater than permitted
[19:04:26] SMILES Parse Error: unclosed ring for input: 'COc1ccc2c3c(n(C2O)[C@@H](CO)c2c1c(=O)c1ccccc1n4C)OCO3'
[19:04:26] SMILES Parse Error: unclosed ring for input: 'O=C(Nc1cccc([N+](=O)[O-])c1)N[C@H]1O[C@@H]2c3ccc(c4ccccc5O)[C@@H](C)[C@H]1N(C(=O)Cc1cccnc1)C[C@@H](C)[C@H](OC)CN(C)C2=O'
[19:04:26] SMILES Parse Error: extra close parentheses while parsing: CN(C)CCCN(C(=O)c1ccccc1)C1CC1)c1cccc(

Epoch 5 -- Batch 1/ 842, training loss 0.530566394329071
Epoch 5 -- Batch 2/ 842, training loss 0.5108798146247864
Epoch 5 -- Batch 3/ 842, training loss 0.5117610692977905
Epoch 5 -- Batch 4/ 842, training loss 0.50933438539505
Epoch 5 -- Batch 5/ 842, training loss 0.5013951063156128
Epoch 5 -- Batch 6/ 842, training loss 0.5407463312149048
Epoch 5 -- Batch 7/ 842, training loss 0.5334701538085938
Epoch 5 -- Batch 8/ 842, training loss 0.5047807693481445
Epoch 5 -- Batch 9/ 842, training loss 0.5363180041313171
Epoch 5 -- Batch 10/ 842, training loss 0.5098373889923096
Epoch 5 -- Batch 11/ 842, training loss 0.5127512812614441
Epoch 5 -- Batch 12/ 842, training loss 0.5087909698486328
Epoch 5 -- Batch 13/ 842, training loss 0.49858328700065613
Epoch 5 -- Batch 14/ 842, training loss 0.5260335803031921
Epoch 5 -- Batch 15/ 842, training loss 0.5238381028175354
Epoch 5 -- Batch 16/ 842, training loss 0.5559899210929871
Epoch 5 -- Batch 17/ 842, training loss 0.5156364440917969
Epoch 5 

[19:04:36] SMILES Parse Error: syntax error while parsing: CC((O)(c1ccccc1)c1ccccc1)C(=O)C(F)(F)F
[19:04:36] SMILES Parse Error: Failed parsing SMILES 'CC((O)(c1ccccc1)c1ccccc1)C(=O)C(F)(F)F' for input: 'CC((O)(c1ccccc1)c1ccccc1)C(=O)C(F)(F)F'
[19:04:36] Can't kekulize mol.  Unkekulized atoms: 7 8 9 10 17
[19:04:36] SMILES Parse Error: unclosed ring for input: 'COc1ccc(OC)c(/C=N/NC(=O)CNC(=O)C23CC4CC(CC(C4)C2)C4)c1'
[19:04:36] SMILES Parse Error: extra open parentheses for input: 'COc1ccc(/C=c2\sc3n(c2=O)NC2c2ccccc2-c2ccccc21'
[19:04:36] SMILES Parse Error: ring closure 2 duplicates bond between atom 20 and atom 21 for input: 'NC(=O)c1ccc(COc2ccc3c(C2CCO4)noc2c2)cc1'
[19:04:36] Can't kekulize mol.  Unkekulized atoms: 10 11 12 14 18
[19:04:36] SMILES Parse Error: unclosed ring for input: 'COc1ccccc1-c1noc(C2CCC3)n1'
[19:04:36] Can't kekulize mol.  Unkekulized atoms: 7 9 10 11 12 22 23
[19:04:36] SMILES Parse Error: extra open parentheses for input: 'COc1ccccc1-n1nnnc1SCCN(C'
[19:04:36] 

Epoch 6 -- Batch 1/ 842, training loss 0.492296427488327
Epoch 6 -- Batch 2/ 842, training loss 0.47159305214881897
Epoch 6 -- Batch 3/ 842, training loss 0.47161224484443665
Epoch 6 -- Batch 4/ 842, training loss 0.5132691264152527
Epoch 6 -- Batch 5/ 842, training loss 0.49560967087745667
Epoch 6 -- Batch 6/ 842, training loss 0.4806448221206665
Epoch 6 -- Batch 7/ 842, training loss 0.4796527028083801
Epoch 6 -- Batch 8/ 842, training loss 0.4840667247772217
Epoch 6 -- Batch 9/ 842, training loss 0.47847262024879456
Epoch 6 -- Batch 10/ 842, training loss 0.48538416624069214
Epoch 6 -- Batch 11/ 842, training loss 0.4878973066806793
Epoch 6 -- Batch 12/ 842, training loss 0.49798741936683655
Epoch 6 -- Batch 13/ 842, training loss 0.4823172986507416
Epoch 6 -- Batch 14/ 842, training loss 0.49365442991256714
Epoch 6 -- Batch 15/ 842, training loss 0.4934558868408203
Epoch 6 -- Batch 16/ 842, training loss 0.4941138029098511
Epoch 6 -- Batch 17/ 842, training loss 0.4852067828178406


[19:04:46] Can't kekulize mol.  Unkekulized atoms: 2 14 15 16 17 18 19
[19:04:46] Can't kekulize mol.  Unkekulized atoms: 5 7 26 27 29 32
[19:04:46] SMILES Parse Error: unclosed ring for input: 'CN1C(=O)C2(c3ccccc3N2C(=O)Nc2ccccc21)C(C#N)=C(N)O2'
[19:04:46] Can't kekulize mol.  Unkekulized atoms: 2 3 4 15 16 18 19
[19:04:46] SMILES Parse Error: unclosed ring for input: 'CC(C)(C)N1CCC2(CC1)CC2OC(=O)CNC2c1ccc(OCc2ccc(C(F)(F)F)cc2)cc1'
[19:04:46] Can't kekulize mol.  Unkekulized atoms: 9 10 17 18 19
[19:04:46] SMILES Parse Error: unclosed ring for input: 'N#Cc1c(NC(=O)CCCSc2ccc([N+](=O)[O-])cc2ss2)sc2c1CCCC2'
[19:04:46] SMILES Parse Error: unclosed ring for input: 'CC(C)NC(=O)C(C(C)C)N1CCN(c2nnc(Cc3ccccc3)n3CCCCC2)CC1'
[19:04:46] SMILES Parse Error: unclosed ring for input: 'CCOc1ccccc1N1CCn2c1nc1c2c(=O)n(CCc3ccccc3)c(=O)n2C'
[19:04:46] SMILES Parse Error: unclosed ring for input: 'Cc1ccc(Cn2ncc3c([nH]c4cccc(C(F)(F)F)c4)c(=O)c3c2c2=O)c(=O)[nH]1'
[19:04:46] SMILES Parse Error: unclosed rin

Epoch 7 -- Batch 1/ 842, training loss 0.45569559931755066
Epoch 7 -- Batch 2/ 842, training loss 0.45760148763656616
Epoch 7 -- Batch 3/ 842, training loss 0.4716483950614929
Epoch 7 -- Batch 4/ 842, training loss 0.4496441185474396
Epoch 7 -- Batch 5/ 842, training loss 0.4595576524734497
Epoch 7 -- Batch 6/ 842, training loss 0.4729387164115906
Epoch 7 -- Batch 7/ 842, training loss 0.4662638306617737
Epoch 7 -- Batch 8/ 842, training loss 0.47524675726890564
Epoch 7 -- Batch 9/ 842, training loss 0.46441495418548584
Epoch 7 -- Batch 10/ 842, training loss 0.4423176348209381
Epoch 7 -- Batch 11/ 842, training loss 0.4655456840991974
Epoch 7 -- Batch 12/ 842, training loss 0.45928412675857544
Epoch 7 -- Batch 13/ 842, training loss 0.46625304222106934
Epoch 7 -- Batch 14/ 842, training loss 0.44134417176246643
Epoch 7 -- Batch 15/ 842, training loss 0.4607570767402649
Epoch 7 -- Batch 16/ 842, training loss 0.4700063169002533
Epoch 7 -- Batch 17/ 842, training loss 0.4612807631492615

[19:04:57] SMILES Parse Error: extra open parentheses for input: 'c1cc2c(c3c1CCN(Cc1ccco1)C(=O)CO2'
[19:04:57] Can't kekulize mol.  Unkekulized atoms: 13 14 15 17 18
[19:04:57] Can't kekulize mol.  Unkekulized atoms: 8 9 10 11 12 16 18
[19:04:57] Can't kekulize mol.  Unkekulized atoms: 5 6 7 16 17 18 19 20 21
[19:04:57] Can't kekulize mol.  Unkekulized atoms: 16 17 19 20 21
[19:04:57] Can't kekulize mol.  Unkekulized atoms: 9 10 11 12 13 21 23
[19:04:57] Explicit valence for atom # 12 Cl, 2, is greater than permitted
[19:04:57] SMILES Parse Error: unclosed ring for input: 'COc1cccc(OCCOc2ccc(/C=C3/C(=N)N4N=C5C=CSC4=NC3=O)cc2)c1'
[19:04:57] SMILES Parse Error: unclosed ring for input: 'Cc1cc(C)n(CCc2nnc(CCC(=O)NCC3CC4)n(C)c2=O)n1'
[19:04:57] SMILES Parse Error: extra open parentheses for input: 'COc1ccc(-c2nc3n(n2)C(c2ccc(OC)cc2)C(C(=O)Nc2ccc(Cl)c(C(=O)O)c2)c2ccccc12'
[19:04:57] SMILES Parse Error: unclosed ring for input: 'O=C(Nc1ccc2nc3c(c(=O)o1)CCCC3)C(F)(F)F'
[19:04:57] SMILES Parse

Epoch 8 -- Batch 1/ 842, training loss 0.42428427934646606
Epoch 8 -- Batch 2/ 842, training loss 0.45552051067352295
Epoch 8 -- Batch 3/ 842, training loss 0.45405200123786926
Epoch 8 -- Batch 4/ 842, training loss 0.44858989119529724
Epoch 8 -- Batch 5/ 842, training loss 0.4605257213115692
Epoch 8 -- Batch 6/ 842, training loss 0.44884026050567627
Epoch 8 -- Batch 7/ 842, training loss 0.44002315402030945
Epoch 8 -- Batch 8/ 842, training loss 0.43489113450050354
Epoch 8 -- Batch 9/ 842, training loss 0.44847923517227173
Epoch 8 -- Batch 10/ 842, training loss 0.4423667788505554
Epoch 8 -- Batch 11/ 842, training loss 0.4344923794269562
Epoch 8 -- Batch 12/ 842, training loss 0.4598734676837921
Epoch 8 -- Batch 13/ 842, training loss 0.45212602615356445
Epoch 8 -- Batch 14/ 842, training loss 0.44335272908210754
Epoch 8 -- Batch 15/ 842, training loss 0.46586400270462036
Epoch 8 -- Batch 16/ 842, training loss 0.4415931701660156
Epoch 8 -- Batch 17/ 842, training loss 0.444210112094

[19:05:07] Can't kekulize mol.  Unkekulized atoms: 6 7 8 9 10 12 13
[19:05:07] Can't kekulize mol.  Unkekulized atoms: 5 6 23
[19:05:07] SMILES Parse Error: unclosed ring for input: 'COc1ccccc1-c1ccc2c(c1)-c1c(cnn2CC(=O)NCc1ccc2c(c1)OCCO2)C2'
[19:05:07] Can't kekulize mol.  Unkekulized atoms: 9 10 11 12 13 14 15 16 17 18 19 20 23
[19:05:07] Explicit valence for atom # 15 S, 7, is greater than permitted
[19:05:07] non-ring atom 9 marked aromatic
[19:05:07] Can't kekulize mol.  Unkekulized atoms: 9 10 11 12 13 19 20
[19:05:07] Can't kekulize mol.  Unkekulized atoms: 1 2 3 5 22 23
[19:05:07] Can't kekulize mol.  Unkekulized atoms: 1 2 10
[19:05:07] Can't kekulize mol.  Unkekulized atoms: 12 16 17 18 19 20 21
[19:05:07] Can't kekulize mol.  Unkekulized atoms: 12 13 14 16 21 22
[19:05:07] Can't kekulize mol.  Unkekulized atoms: 1 6 7 8 13 14 15
[19:05:07] Can't kekulize mol.  Unkekulized atoms: 12 13 15 17 19
[19:05:07] Can't kekulize mol.  Unkekulized atoms: 1 2 3
[19:05:07] Can't kekulize

Epoch 9 -- Batch 1/ 842, training loss 0.4141328036785126
Epoch 9 -- Batch 2/ 842, training loss 0.405374675989151
Epoch 9 -- Batch 3/ 842, training loss 0.44037681818008423
Epoch 9 -- Batch 4/ 842, training loss 0.4240405559539795
Epoch 9 -- Batch 5/ 842, training loss 0.43104496598243713
Epoch 9 -- Batch 6/ 842, training loss 0.4211340546607971
Epoch 9 -- Batch 7/ 842, training loss 0.4258345067501068
Epoch 9 -- Batch 8/ 842, training loss 0.41820770502090454
Epoch 9 -- Batch 9/ 842, training loss 0.41410553455352783
Epoch 9 -- Batch 10/ 842, training loss 0.4191223978996277
Epoch 9 -- Batch 11/ 842, training loss 0.4063754379749298
Epoch 9 -- Batch 12/ 842, training loss 0.4177328050136566
Epoch 9 -- Batch 13/ 842, training loss 0.42494046688079834
Epoch 9 -- Batch 14/ 842, training loss 0.4191119074821472
Epoch 9 -- Batch 15/ 842, training loss 0.42055267095565796
Epoch 9 -- Batch 16/ 842, training loss 0.41784876585006714
Epoch 9 -- Batch 17/ 842, training loss 0.42059874534606934

[19:05:17] SMILES Parse Error: extra close parentheses while parsing: CNC(=O)C1(C)CCN(C(=O)c2cc3c(c(C)nn4-c3ccc(F)cc3)c2)CCCC3)C1
[19:05:17] SMILES Parse Error: Failed parsing SMILES 'CNC(=O)C1(C)CCN(C(=O)c2cc3c(c(C)nn4-c3ccc(F)cc3)c2)CCCC3)C1' for input: 'CNC(=O)C1(C)CCN(C(=O)c2cc3c(c(C)nn4-c3ccc(F)cc3)c2)CCCC3)C1'
[19:05:17] SMILES Parse Error: unclosed ring for input: 'COc1cc2c(C=NNc3nnnn33)csc2cc1OC(C)=O'
[19:05:17] Can't kekulize mol.  Unkekulized atoms: 5 6 8 12 13
[19:05:17] SMILES Parse Error: unclosed ring for input: 'Cc1cccc(NC(=O)C(NC(=O)CCc2cc(C(=O)OC)c3c(C)cc(C)cc34)n2)c1'
[19:05:17] SMILES Parse Error: unclosed ring for input: 'CC(C(=O)O)C1C2CC3CC(C3)CC2CC41CC[C@@]2(C)C(=O)CCC2'
[19:05:17] Explicit valence for atom # 11 Br, 2, is greater than permitted
[19:05:17] SMILES Parse Error: unclosed ring for input: 'Cc1c(S(=O)(=O)NCc2ccc(C#N)cc2)cccc12'
[19:05:17] Can't kekulize mol.  Unkekulized atoms: 17
[19:05:17] Can't kekulize mol.  Unkekulized atoms: 13 14 15 21 22
[19:05:1

Epoch 10 -- Batch 1/ 842, training loss 0.3968014419078827
Epoch 10 -- Batch 2/ 842, training loss 0.4055347144603729
Epoch 10 -- Batch 3/ 842, training loss 0.3834630250930786
Epoch 10 -- Batch 4/ 842, training loss 0.40947389602661133
Epoch 10 -- Batch 5/ 842, training loss 0.39625853300094604
Epoch 10 -- Batch 6/ 842, training loss 0.4041198790073395
Epoch 10 -- Batch 7/ 842, training loss 0.4093148112297058
Epoch 10 -- Batch 8/ 842, training loss 0.4101628065109253
Epoch 10 -- Batch 9/ 842, training loss 0.4296288788318634
Epoch 10 -- Batch 10/ 842, training loss 0.4103587865829468
Epoch 10 -- Batch 11/ 842, training loss 0.39165323972702026
Epoch 10 -- Batch 12/ 842, training loss 0.4026673436164856
Epoch 10 -- Batch 13/ 842, training loss 0.3870570957660675
Epoch 10 -- Batch 14/ 842, training loss 0.3966355621814728
Epoch 10 -- Batch 15/ 842, training loss 0.39804041385650635
Epoch 10 -- Batch 16/ 842, training loss 0.3929382860660553
Epoch 10 -- Batch 17/ 842, training loss 0.39

[19:05:27] Can't kekulize mol.  Unkekulized atoms: 2 3 4 5 6 7 20 21 22 23 24
[19:05:27] Can't kekulize mol.  Unkekulized atoms: 9 10 11
[19:05:27] Can't kekulize mol.  Unkekulized atoms: 13 14 15
[19:05:27] SMILES Parse Error: unclosed ring for input: 'COc1ccc(C2SCC3(O)CCN2Cc2ccc(Cl)cc2)cc1'
[19:05:27] Can't kekulize mol.  Unkekulized atoms: 11 12 22 23 33 34 35
[19:05:27] SMILES Parse Error: unclosed ring for input: 'COc1ccc(CN2CCc3nc4ccccc4c(C(=O)N3CCN(C)CC4)c23)CC1'
[19:05:27] Can't kekulize mol.  Unkekulized atoms: 3 4 5 9 13 14
[19:05:27] Can't kekulize mol.  Unkekulized atoms: 4 5 6 7 8 16 17 18 19 20 23 24 25
[19:05:27] SMILES Parse Error: unclosed ring for input: 'C[C@]12CCC3C(CC[C@H]4C[C@@H](O)CC[C@]44C)C1C[C@@H](O5)[C@H]2O'
[19:05:27] SMILES Parse Error: unclosed ring for input: 'N#CC1=C(N)Oc2c1c(=O)oc3ccccc3c21'
[19:05:27] Can't kekulize mol.  Unkekulized atoms: 3 4 5 15 16 17 18 19 20 21 22
[19:05:27] Can't kekulize mol.  Unkekulized atoms: 4 5 17 18 22 23 24 25 26 27 28
[

Epoch 11 -- Batch 1/ 842, training loss 0.386670857667923
Epoch 11 -- Batch 2/ 842, training loss 0.3914146423339844
Epoch 11 -- Batch 3/ 842, training loss 0.3789967894554138
Epoch 11 -- Batch 4/ 842, training loss 0.37827327847480774
Epoch 11 -- Batch 5/ 842, training loss 0.39302995800971985
Epoch 11 -- Batch 6/ 842, training loss 0.37669143080711365
Epoch 11 -- Batch 7/ 842, training loss 0.40123122930526733
Epoch 11 -- Batch 8/ 842, training loss 0.3964408040046692
Epoch 11 -- Batch 9/ 842, training loss 0.39984405040740967
Epoch 11 -- Batch 10/ 842, training loss 0.3935472071170807
Epoch 11 -- Batch 11/ 842, training loss 0.3677078187465668
Epoch 11 -- Batch 12/ 842, training loss 0.3954195976257324
Epoch 11 -- Batch 13/ 842, training loss 0.3994562327861786
Epoch 11 -- Batch 14/ 842, training loss 0.39097753167152405
Epoch 11 -- Batch 15/ 842, training loss 0.4138634502887726
Epoch 11 -- Batch 16/ 842, training loss 0.39848795533180237
Epoch 11 -- Batch 17/ 842, training loss 0.

[19:05:38] SMILES Parse Error: unclosed ring for input: 'COc1ccccc1CN1C[C@@H](C)[C@@H](OC)CN(C)C(=O)c2ccc(NC(=O)C3CCO)cc2OC[C@@H]1C'
[19:05:38] SMILES Parse Error: syntax error while parsing: /C(=C\COc1ccccc1)Nc1c(-c2ccc(F)cc2)nc2ccccn12
[19:05:38] SMILES Parse Error: Failed parsing SMILES '/C(=C\COc1ccccc1)Nc1c(-c2ccc(F)cc2)nc2ccccn12' for input: '/C(=C\COc1ccccc1)Nc1c(-c2ccc(F)cc2)nc2ccccn12'
[19:05:38] SMILES Parse Error: unclosed ring for input: 'CC1(C)CC2(C)CC(=O)C2=C1C(c1ccco1)C(C(=O)OC(C)C)C(C#N)=C(N)O2'
[19:05:38] Can't kekulize mol.  Unkekulized atoms: 8 9 10 11 13 14 15 16 19 20 22
[19:05:38] SMILES Parse Error: unclosed ring for input: 'Cc1oc2nc3c4ccc(Br)cc4nc(c3ccnc4ccccc34)c2cc1C'
[19:05:38] SMILES Parse Error: unclosed ring for input: 'CCOc1ccc(-c2csc(N3C(=O)NC(=O)/C(=C/c3cccc(F)c3)C2=O)c2ccccc2)cc1'
[19:05:38] Can't kekulize mol.  Unkekulized atoms: 8 9 10 11 12
[19:05:38] SMILES Parse Error: unclosed ring for input: 'Cc1c(C(=O)NC(c2ccccc2)c2ncnn2C)on1c(=O)cc(C)c1ccccc12

Epoch 12 -- Batch 1/ 842, training loss 0.3802540898323059
Epoch 12 -- Batch 2/ 842, training loss 0.3868546783924103
Epoch 12 -- Batch 3/ 842, training loss 0.3748510777950287
Epoch 12 -- Batch 4/ 842, training loss 0.3893398940563202
Epoch 12 -- Batch 5/ 842, training loss 0.37522462010383606
Epoch 12 -- Batch 6/ 842, training loss 0.3762870132923126
Epoch 12 -- Batch 7/ 842, training loss 0.4037068784236908
Epoch 12 -- Batch 8/ 842, training loss 0.3855622410774231
Epoch 12 -- Batch 9/ 842, training loss 0.38504043221473694
Epoch 12 -- Batch 10/ 842, training loss 0.38066625595092773
Epoch 12 -- Batch 11/ 842, training loss 0.3765183389186859
Epoch 12 -- Batch 12/ 842, training loss 0.378429114818573
Epoch 12 -- Batch 13/ 842, training loss 0.37309572100639343
Epoch 12 -- Batch 14/ 842, training loss 0.3816050887107849
Epoch 12 -- Batch 15/ 842, training loss 0.3844953179359436
Epoch 12 -- Batch 16/ 842, training loss 0.3727725148200989
Epoch 12 -- Batch 17/ 842, training loss 0.384

[19:05:48] Can't kekulize mol.  Unkekulized atoms: 4 5 6
[19:05:48] Can't kekulize mol.  Unkekulized atoms: 1 2 5 29
[19:05:48] SMILES Parse Error: unclosed ring for input: 'COc1cccc(NC(=O)N2C[C@H](O)COC[C@@H]3O[C@@H](CC(=O)N4CCN(C)CC[C@@H]4C[C@@H]3C4)cc2)c1'
[19:05:48] SMILES Parse Error: unclosed ring for input: 'Cc1nc2c(C(=O)NCc3ccco3)cnn2c(=O)c1C1c3ccccc3Oc3ccccc231'
[19:05:48] Can't kekulize mol.  Unkekulized atoms: 5 6 8 11 13
[19:05:48] Can't kekulize mol.  Unkekulized atoms: 1 2 3 4 5 15 29 30 31
[19:05:48] Can't kekulize mol.  Unkekulized atoms: 9 11 12 13 20 23 24
[19:05:48] SMILES Parse Error: syntax error while parsing: /C=C/c1ncc2c(n12)CCC(C)CC2
[19:05:48] SMILES Parse Error: Failed parsing SMILES '/C=C/c1ncc2c(n12)CCC(C)CC2' for input: '/C=C/c1ncc2c(n12)CCC(C)CC2'
[19:05:48] Can't kekulize mol.  Unkekulized atoms: 7 8 9 10 21 22 23
[19:05:48] SMILES Parse Error: extra open parentheses for input: 'CC(C(=O)N1CCc2c(cnc(S(=O)(=O)CC(C)(C)C)n2)(Cc2cccc(F)c2)C1'
[19:05:48] SMILE

Epoch 13 -- Batch 1/ 842, training loss 0.3668293356895447
Epoch 13 -- Batch 2/ 842, training loss 0.3572969436645508
Epoch 13 -- Batch 3/ 842, training loss 0.3705778121948242
Epoch 13 -- Batch 4/ 842, training loss 0.3794900178909302
Epoch 13 -- Batch 5/ 842, training loss 0.35983702540397644
Epoch 13 -- Batch 6/ 842, training loss 0.36621636152267456
Epoch 13 -- Batch 7/ 842, training loss 0.3768928647041321
Epoch 13 -- Batch 8/ 842, training loss 0.3647754192352295
Epoch 13 -- Batch 9/ 842, training loss 0.36129236221313477
Epoch 13 -- Batch 10/ 842, training loss 0.36579766869544983
Epoch 13 -- Batch 11/ 842, training loss 0.3711300790309906
Epoch 13 -- Batch 12/ 842, training loss 0.3492865264415741
Epoch 13 -- Batch 13/ 842, training loss 0.3785533308982849
Epoch 13 -- Batch 14/ 842, training loss 0.3649480640888214
Epoch 13 -- Batch 15/ 842, training loss 0.35713592171669006
Epoch 13 -- Batch 16/ 842, training loss 0.3861989974975586
Epoch 13 -- Batch 17/ 842, training loss 0.3

[19:05:58] SMILES Parse Error: unclosed ring for input: 'Cc1cccn2c(=O)c3c4c(sc3n1)CCCC3'
[19:05:58] Can't kekulize mol.  Unkekulized atoms: 11 12 13
[19:05:58] SMILES Parse Error: extra close parentheses while parsing: Cc1sc2c3c(ccc2[n+]1CCS(=O)(=O)NCC1CCCCC1)OCO3)c1ccccc1
[19:05:58] SMILES Parse Error: Failed parsing SMILES 'Cc1sc2c3c(ccc2[n+]1CCS(=O)(=O)NCC1CCCCC1)OCO3)c1ccccc1' for input: 'Cc1sc2c3c(ccc2[n+]1CCS(=O)(=O)NCC1CCCCC1)OCO3)c1ccccc1'
[19:05:58] SMILES Parse Error: unclosed ring for input: 'Cn1ccnc1Sc1ccc(/C(O)=C2[N+](=O)[O-])cc2ccnc12'
[19:05:58] SMILES Parse Error: unclosed ring for input: 'COc1cccc(NC(=O)C23CCN(C(=O)[C@H](CCC(N)=O)CC(C)C)CCC22)c1'
[19:05:58] Can't kekulize mol.  Unkekulized atoms: 9 10 11 13 14 15 16 17 22
[19:05:58] SMILES Parse Error: unclosed ring for input: 'O=C1C2C3C=CC(C2)C2C(=O)N1NCc1ccc(F)cc1'
[19:05:58] SMILES Parse Error: extra open parentheses for input: 'COc1ccc(-n2c(SCC(=O)Nc3ccccc3-c3c(ccc3ccccc34)nc3cccnc32)ccc1=O'
[19:05:58] SMILES Parse

Epoch 14 -- Batch 1/ 842, training loss 0.3551945090293884
Epoch 14 -- Batch 2/ 842, training loss 0.3650028109550476
Epoch 14 -- Batch 3/ 842, training loss 0.36570820212364197
Epoch 14 -- Batch 4/ 842, training loss 0.3622930943965912
Epoch 14 -- Batch 5/ 842, training loss 0.3525550067424774
Epoch 14 -- Batch 6/ 842, training loss 0.36388352513313293
Epoch 14 -- Batch 7/ 842, training loss 0.3654446005821228
Epoch 14 -- Batch 8/ 842, training loss 0.34828805923461914
Epoch 14 -- Batch 9/ 842, training loss 0.3518986105918884
Epoch 14 -- Batch 10/ 842, training loss 0.35674723982810974
Epoch 14 -- Batch 11/ 842, training loss 0.36640968918800354
Epoch 14 -- Batch 12/ 842, training loss 0.37129321694374084
Epoch 14 -- Batch 13/ 842, training loss 0.3595750629901886
Epoch 14 -- Batch 14/ 842, training loss 0.356548547744751
Epoch 14 -- Batch 15/ 842, training loss 0.3583643436431885
Epoch 14 -- Batch 16/ 842, training loss 0.36356380581855774
Epoch 14 -- Batch 17/ 842, training loss 0.

[19:06:09] Can't kekulize mol.  Unkekulized atoms: 4 5 6 7 8 10 11
[19:06:09] Can't kekulize mol.  Unkekulized atoms: 10 11 12 13 14
[19:06:09] Explicit valence for atom # 1 O, 3, is greater than permitted
[19:06:09] Can't kekulize mol.  Unkekulized atoms: 5 22 23 24 25 26 27
[19:06:09] Can't kekulize mol.  Unkekulized atoms: 1 2 3 4 5
[19:06:09] Can't kekulize mol.  Unkekulized atoms: 2 3 4 8 9
[19:06:09] Can't kekulize mol.  Unkekulized atoms: 1 2 3 4 27
[19:06:09] Explicit valence for atom # 6 F, 2, is greater than permitted
[19:06:09] SMILES Parse Error: unclosed ring for input: 'Cc1nnc2ccc3cc(-c4cccnc4)ccn12'
[19:06:09] Can't kekulize mol.  Unkekulized atoms: 10 12 13 14 16 17 18
[19:06:09] Can't kekulize mol.  Unkekulized atoms: 22 23 25
[19:06:09] Can't kekulize mol.  Unkekulized atoms: 6 7 20
[19:06:09] Can't kekulize mol.  Unkekulized atoms: 1 3 4 5 9 24 26 29 30
[19:06:09] SMILES Parse Error: unclosed ring for input: 'CN(C)c1ccc(C2C3=C(CCCC3=O)Nc3c2c(=O)n2Cc2ccccc2)(CCF)cc1'


Epoch 15 -- Batch 1/ 842, training loss 0.35131216049194336
Epoch 15 -- Batch 2/ 842, training loss 0.3488949239253998
Epoch 15 -- Batch 3/ 842, training loss 0.3464715778827667
Epoch 15 -- Batch 4/ 842, training loss 0.34852030873298645
Epoch 15 -- Batch 5/ 842, training loss 0.35005152225494385
Epoch 15 -- Batch 6/ 842, training loss 0.3545200526714325
Epoch 15 -- Batch 7/ 842, training loss 0.34256240725517273
Epoch 15 -- Batch 8/ 842, training loss 0.34702298045158386
Epoch 15 -- Batch 9/ 842, training loss 0.36478620767593384
Epoch 15 -- Batch 10/ 842, training loss 0.3552722632884979
Epoch 15 -- Batch 11/ 842, training loss 0.34794601798057556
Epoch 15 -- Batch 12/ 842, training loss 0.34574273228645325
Epoch 15 -- Batch 13/ 842, training loss 0.35690850019454956
Epoch 15 -- Batch 14/ 842, training loss 0.3492535352706909
Epoch 15 -- Batch 15/ 842, training loss 0.35627296566963196
Epoch 15 -- Batch 16/ 842, training loss 0.3438059091567993
Epoch 15 -- Batch 17/ 842, training los

[19:06:19] Can't kekulize mol.  Unkekulized atoms: 15 16 17 19 21
[19:06:19] SMILES Parse Error: unclosed ring for input: 'CCc1nnc(CNC(=O)CCc2nnc3n3CCCCC3)n(C)n1'
[19:06:19] Can't kekulize mol.  Unkekulized atoms: 2 3 4
[19:06:19] SMILES Parse Error: extra open parentheses for input: 'CN1CCC(N(CC(=O)N(Cc2ccc3c(c2)OCO3)Cc2cccs2)CC1'
[19:06:19] SMILES Parse Error: ring closure 2 duplicates bond between atom 9 and atom 10 for input: 'CCCc1ccc(OCC2c2cc/c(=O)4c4ccc(C)cc5nc3n2C)cc1'
[19:06:19] Can't kekulize mol.  Unkekulized atoms: 7 8 15 16 17
[19:06:19] SMILES Parse Error: unclosed ring for input: 'COc1ccc2c(c1)NC1c2cc(OC)c(OC)cc2C1=O'
[19:06:19] Can't kekulize mol.  Unkekulized atoms: 19 20 21 22 23
[19:06:19] SMILES Parse Error: unclosed ring for input: 'O=C(C1CCCC1)N1C[C@H]2N[C@H](C1)C3c1ccc(-c2ccccc2)cc1'
[19:06:19] SMILES Parse Error: unclosed ring for input: 'CN1[C@H]2CC[C@@H]1CN(Cc1nc(Cc3cccc(F)c3)ns2)CCC12'
[19:06:19] Can't kekulize mol.  Unkekulized atoms: 3 4 5 6 7 8 9
[19:06:19

Epoch 16 -- Batch 1/ 842, training loss 0.34887075424194336
Epoch 16 -- Batch 2/ 842, training loss 0.3276153802871704
Epoch 16 -- Batch 3/ 842, training loss 0.3511233329772949
Epoch 16 -- Batch 4/ 842, training loss 0.33494797348976135
Epoch 16 -- Batch 5/ 842, training loss 0.34347158670425415
Epoch 16 -- Batch 6/ 842, training loss 0.34494441747665405
Epoch 16 -- Batch 7/ 842, training loss 0.34301888942718506
Epoch 16 -- Batch 8/ 842, training loss 0.3276536762714386
Epoch 16 -- Batch 9/ 842, training loss 0.35543060302734375
Epoch 16 -- Batch 10/ 842, training loss 0.3411652743816376
Epoch 16 -- Batch 11/ 842, training loss 0.35644248127937317
Epoch 16 -- Batch 12/ 842, training loss 0.34729889035224915
Epoch 16 -- Batch 13/ 842, training loss 0.34343427419662476
Epoch 16 -- Batch 14/ 842, training loss 0.34637388586997986
Epoch 16 -- Batch 15/ 842, training loss 0.3553224802017212
Epoch 16 -- Batch 16/ 842, training loss 0.3447275459766388
Epoch 16 -- Batch 17/ 842, training los

[19:06:29] Can't kekulize mol.  Unkekulized atoms: 3 4 8 9 21 22 23
[19:06:29] Can't kekulize mol.  Unkekulized atoms: 4 5 6
[19:06:29] Can't kekulize mol.  Unkekulized atoms: 6 13 15 26 27 28 31
[19:06:29] SMILES Parse Error: unclosed ring for input: 'CC1(C)Cc2c(sc(NC(=O)c3ccccc3Cl)c2=O)C2(C)C'
[19:06:29] SMILES Parse Error: extra open parentheses for input: 'COc1cccc(C(=O)Oc2ccc(/C=C3/SC(=S)N(CC(=O)O)C3=O)cc2=O'
[19:06:29] Can't kekulize mol.  Unkekulized atoms: 13 14 15
[19:06:29] Can't kekulize mol.  Unkekulized atoms: 5 6 8 9 10
[19:06:29] Can't kekulize mol.  Unkekulized atoms: 1 2 3 4 5 6 19
[19:06:29] SMILES Parse Error: unclosed ring for input: 'O=C1CC(c2ccccc2)Nc2nc3cc4ccccc-n3c22'
[19:06:29] Can't kekulize mol.  Unkekulized atoms: 11 13 14 19 20 21 22
[19:06:29] Can't kekulize mol.  Unkekulized atoms: 2 3 4 5 6 7 8 22 23
[19:06:29] Can't kekulize mol.  Unkekulized atoms: 2 3 4 5 6 16 29
[19:06:29] SMILES Parse Error: unclosed ring for input: 'Cc1csc2nc(CNS(=O)(=O)c3ccc4c4c(c

Epoch 17 -- Batch 1/ 842, training loss 0.34768131375312805
Epoch 17 -- Batch 2/ 842, training loss 0.35069727897644043
Epoch 17 -- Batch 3/ 842, training loss 0.3368029296398163
Epoch 17 -- Batch 4/ 842, training loss 0.35167649388313293
Epoch 17 -- Batch 5/ 842, training loss 0.342470645904541
Epoch 17 -- Batch 6/ 842, training loss 0.3417949378490448
Epoch 17 -- Batch 7/ 842, training loss 0.3506258726119995
Epoch 17 -- Batch 8/ 842, training loss 0.3395083546638489
Epoch 17 -- Batch 9/ 842, training loss 0.3436502516269684
Epoch 17 -- Batch 10/ 842, training loss 0.3340004086494446
Epoch 17 -- Batch 11/ 842, training loss 0.3476896584033966
Epoch 17 -- Batch 12/ 842, training loss 0.33771029114723206
Epoch 17 -- Batch 13/ 842, training loss 0.34038203954696655
Epoch 17 -- Batch 14/ 842, training loss 0.34130024909973145
Epoch 17 -- Batch 15/ 842, training loss 0.3444267511367798
Epoch 17 -- Batch 16/ 842, training loss 0.3422892093658447
Epoch 17 -- Batch 17/ 842, training loss 0.3

[19:06:39] Can't kekulize mol.  Unkekulized atoms: 8
[19:06:39] Can't kekulize mol.  Unkekulized atoms: 9
[19:06:39] SMILES Parse Error: unclosed ring for input: 'C[C@@H]1CC[C@@]2(OC1)OC1CC3C4CC=C5C[C@@H](O)CC[C@]5(C)C4CC[C@]1(C)C2'
[19:06:39] Can't kekulize mol.  Unkekulized atoms: 13 14 29
[19:06:39] SMILES Parse Error: unclosed ring for input: 'COc1ccc(CNC(=O)CSc2nnc(Cn3c(=O)sc4ccccc33)n2C)cc1'
[19:06:39] SMILES Parse Error: extra open parentheses for input: 'c1ccc2c(c1)CCOC2c1nccc([nH]c2ccccc12'
[19:06:39] SMILES Parse Error: unclosed ring for input: 'CC1=NCC(C(=O)N2CCN(C(=O)c3cc4cc(OC)c(OC)cc4sc3c3)CCC2)C1'
[19:06:39] SMILES Parse Error: syntax error while parsing: CCOc1ccc(-n2c(C)cc(/)=C(\C#N)C2c2cc3c(cc2Br)OCO3)cc1C
[19:06:39] SMILES Parse Error: Failed parsing SMILES 'CCOc1ccc(-n2c(C)cc(/)=C(\C#N)C2c2cc3c(cc2Br)OCO3)cc1C' for input: 'CCOc1ccc(-n2c(C)cc(/)=C(\C#N)C2c2cc3c(cc2Br)OCO3)cc1C'
[19:06:40] SMILES Parse Error: extra open parentheses for input: 'Brc1cccc(Oc2ccc3c4c(oc3c2

Epoch 18 -- Batch 1/ 842, training loss 0.3246118128299713
Epoch 18 -- Batch 2/ 842, training loss 0.3254129886627197
Epoch 18 -- Batch 3/ 842, training loss 0.33914270997047424
Epoch 18 -- Batch 4/ 842, training loss 0.33431270718574524
Epoch 18 -- Batch 5/ 842, training loss 0.3470187187194824
Epoch 18 -- Batch 6/ 842, training loss 0.3404253423213959
Epoch 18 -- Batch 7/ 842, training loss 0.3414013981819153
Epoch 18 -- Batch 8/ 842, training loss 0.34155553579330444
Epoch 18 -- Batch 9/ 842, training loss 0.33105385303497314
Epoch 18 -- Batch 10/ 842, training loss 0.34139955043792725
Epoch 18 -- Batch 11/ 842, training loss 0.33808761835098267
Epoch 18 -- Batch 12/ 842, training loss 0.3519282639026642
Epoch 18 -- Batch 13/ 842, training loss 0.34202778339385986
Epoch 18 -- Batch 14/ 842, training loss 0.3400056064128876
Epoch 18 -- Batch 15/ 842, training loss 0.3301893472671509
Epoch 18 -- Batch 16/ 842, training loss 0.3318544030189514
Epoch 18 -- Batch 17/ 842, training loss 0

[19:06:50] SMILES Parse Error: syntax error while parsing: O=C(NCc1n[nH]c(=S)n1-c1cccc()c1)c1ccccc1
[19:06:50] SMILES Parse Error: Failed parsing SMILES 'O=C(NCc1n[nH]c(=S)n1-c1cccc()c1)c1ccccc1' for input: 'O=C(NCc1n[nH]c(=S)n1-c1cccc()c1)c1ccccc1'
[19:06:50] SMILES Parse Error: syntax error while parsing: CCCCNC(=O)/(=C/c1ccco1)NC(=O)c1ccc(C)cc1
[19:06:50] SMILES Parse Error: Failed parsing SMILES 'CCCCNC(=O)/(=C/c1ccco1)NC(=O)c1ccc(C)cc1' for input: 'CCCCNC(=O)/(=C/c1ccco1)NC(=O)c1ccc(C)cc1'
[19:06:50] SMILES Parse Error: syntax error while parsing: Cc1ccc(-n2c(=O)[nH]cc(C(=O)Nc3c(C)cc(Br)cc3C)c2=)cc1
[19:06:50] SMILES Parse Error: Failed parsing SMILES 'Cc1ccc(-n2c(=O)[nH]cc(C(=O)Nc3c(C)cc(Br)cc3C)c2=)cc1' for input: 'Cc1ccc(-n2c(=O)[nH]cc(C(=O)Nc3c(C)cc(Br)cc3C)c2=)cc1'
[19:06:50] SMILES Parse Error: ring closure 2 duplicates bond between atom 4 and atom 12 for input: 'C[C@@H]1C[C@@H](C2(c3ccc(F)cc3)C2)n(C)n1'
[19:06:50] SMILES Parse Error: extra close parentheses while parsing: C

Epoch 19 -- Batch 1/ 842, training loss 0.3267415761947632
Epoch 19 -- Batch 2/ 842, training loss 0.3274000883102417
Epoch 19 -- Batch 3/ 842, training loss 0.3315780460834503
Epoch 19 -- Batch 4/ 842, training loss 0.3226020336151123
Epoch 19 -- Batch 5/ 842, training loss 0.34827181696891785
Epoch 19 -- Batch 6/ 842, training loss 0.32821935415267944
Epoch 19 -- Batch 7/ 842, training loss 0.33390310406684875
Epoch 19 -- Batch 8/ 842, training loss 0.3423732817173004
Epoch 19 -- Batch 9/ 842, training loss 0.32637739181518555
Epoch 19 -- Batch 10/ 842, training loss 0.34051331877708435
Epoch 19 -- Batch 11/ 842, training loss 0.33092352747917175
Epoch 19 -- Batch 12/ 842, training loss 0.3239806592464447
Epoch 19 -- Batch 13/ 842, training loss 0.33943215012550354
Epoch 19 -- Batch 14/ 842, training loss 0.3362945020198822
Epoch 19 -- Batch 15/ 842, training loss 0.32654133439064026
Epoch 19 -- Batch 16/ 842, training loss 0.33544081449508667
Epoch 19 -- Batch 17/ 842, training loss

[19:07:00] Can't kekulize mol.  Unkekulized atoms: 5 6 8
[19:07:00] Can't kekulize mol.  Unkekulized atoms: 2 3 5 7 8 9 10 11 12
[19:07:00] Can't kekulize mol.  Unkekulized atoms: 12 13 14 15 16 17 25 33 34
[19:07:00] SMILES Parse Error: unclosed ring for input: 'C[C@]12CC=CCC1CC3CCC'
[19:07:00] Can't kekulize mol.  Unkekulized atoms: 3 5 10 11 20
[19:07:00] SMILES Parse Error: unclosed ring for input: 'CNC(=O)[C@@H]1C[C@@H](NCC2=CC[C@H]3C[C@H]2C3(c2ccc(C)cc2)C3)C1'
[19:07:00] Can't kekulize mol.  Unkekulized atoms: 10 11 12 13 26
[19:07:00] Can't kekulize mol.  Unkekulized atoms: 5 15 24
[19:07:00] Can't kekulize mol.  Unkekulized atoms: 1 2 3 4 5 6 7 8 9 10 11
[19:07:00] SMILES Parse Error: ring closure 3 duplicates bond between atom 16 and atom 17 for input: 'Cc1sc2nc(CCN)nc(N3C[C@H]4C[C@H]3C3[C@H]3CC[C@H]3N4)c2c1C'
[19:07:00] SMILES Parse Error: unclosed ring for input: 'CC(=O)c1c(C)oc2ccc(N(C(=O)Nc3ccccc3Cl)C3=O)c1Cl'
[19:07:00] SMILES Parse Error: extra open parentheses for input

Epoch 20 -- Batch 1/ 842, training loss 0.3206017017364502
Epoch 20 -- Batch 2/ 842, training loss 0.3367495834827423
Epoch 20 -- Batch 3/ 842, training loss 0.31786760687828064
Epoch 20 -- Batch 4/ 842, training loss 0.3202224671840668
Epoch 20 -- Batch 5/ 842, training loss 0.32466819882392883
Epoch 20 -- Batch 6/ 842, training loss 0.3153238296508789
Epoch 20 -- Batch 7/ 842, training loss 0.3328734040260315
Epoch 20 -- Batch 8/ 842, training loss 0.340962678194046
Epoch 20 -- Batch 9/ 842, training loss 0.32902634143829346
Epoch 20 -- Batch 10/ 842, training loss 0.31604287028312683
Epoch 20 -- Batch 11/ 842, training loss 0.3225213587284088
Epoch 20 -- Batch 12/ 842, training loss 0.32428473234176636
Epoch 20 -- Batch 13/ 842, training loss 0.329302579164505
Epoch 20 -- Batch 14/ 842, training loss 0.3308044373989105
Epoch 20 -- Batch 15/ 842, training loss 0.3404587507247925
Epoch 20 -- Batch 16/ 842, training loss 0.3265244960784912
Epoch 20 -- Batch 17/ 842, training loss 0.319

[19:07:11] SMILES Parse Error: extra close parentheses while parsing: CNC(=O)C12CC3(C))CC(C)(CC(C)(C3)C1)C2
[19:07:11] SMILES Parse Error: Failed parsing SMILES 'CNC(=O)C12CC3(C))CC(C)(CC(C)(C3)C1)C2' for input: 'CNC(=O)C12CC3(C))CC(C)(CC(C)(C3)C1)C2'
[19:07:11] SMILES Parse Error: unclosed ring for input: 'CC(C)[C@@H]1C2C=CC(OC(=O)C2CC2)C(=O)[C@@H]1CCO'
[19:07:11] Can't kekulize mol.  Unkekulized atoms: 1 2 6 7 8 28 29
[19:07:11] Can't kekulize mol.  Unkekulized atoms: 1 2 3 4 5
[19:07:11] SMILES Parse Error: unclosed ring for input: 'c1ccc(-n2ncc(-c3ccc4cn[nH]c3c3CC2C(=O)OC3CCCCC3)cc2)cc1'
[19:07:11] Can't kekulize mol.  Unkekulized atoms: 22 23 24 25 29 30 31
[19:07:11] Explicit valence for atom # 10 N, 5, is greater than permitted
[19:07:11] Explicit valence for atom # 7 N, 4, is greater than permitted
[19:07:11] Can't kekulize mol.  Unkekulized atoms: 3 4 5 6 8 9 10 14 24 25 26 27 28
[19:07:11] SMILES Parse Error: unclosed ring for input: 'CNc1ccc([N+](=O)[O-])cc1-n1c2ccccc3cccc2c

Epoch 21 -- Batch 1/ 842, training loss 0.3246091604232788
Epoch 21 -- Batch 2/ 842, training loss 0.32173535227775574
Epoch 21 -- Batch 3/ 842, training loss 0.3143523633480072
Epoch 21 -- Batch 4/ 842, training loss 0.3237268030643463
Epoch 21 -- Batch 5/ 842, training loss 0.3294360041618347
Epoch 21 -- Batch 6/ 842, training loss 0.33539897203445435
Epoch 21 -- Batch 7/ 842, training loss 0.3225676417350769
Epoch 21 -- Batch 8/ 842, training loss 0.3319466710090637
Epoch 21 -- Batch 9/ 842, training loss 0.3291208744049072
Epoch 21 -- Batch 10/ 842, training loss 0.33804795145988464
Epoch 21 -- Batch 11/ 842, training loss 0.3358694911003113
Epoch 21 -- Batch 12/ 842, training loss 0.32413244247436523
Epoch 21 -- Batch 13/ 842, training loss 0.319232314825058
Epoch 21 -- Batch 14/ 842, training loss 0.3236576318740845
Epoch 21 -- Batch 15/ 842, training loss 0.31051895022392273
Epoch 21 -- Batch 16/ 842, training loss 0.31000804901123047
Epoch 21 -- Batch 17/ 842, training loss 0.3

[19:07:21] Can't kekulize mol.  Unkekulized atoms: 1 2 4
[19:07:21] SMILES Parse Error: unclosed ring for input: 'CC1(C)CC2([N+](=O)[O-])NN=C(c2ccccc2O)S1'
[19:07:21] Can't kekulize mol.  Unkekulized atoms: 1 2 3 4 5 6 16 17 18
[19:07:21] Can't kekulize mol.  Unkekulized atoms: 17 18 19 20 21 23 24 25 26
[19:07:21] Can't kekulize mol.  Unkekulized atoms: 1 2 21
[19:07:21] SMILES Parse Error: unclosed ring for input: 'CCN1C(=O)[C@H]2[C@@H](CN3CCc3cnn(C)c3C2)N1Cc1cccc(Cl)c1'
[19:07:21] SMILES Parse Error: unclosed ring for input: 'CN1CCC(Cc2noc(C3CN(CCC(C)CC(C)C4)n3)CC2)cc1'
[19:07:21] SMILES Parse Error: unclosed ring for input: 'CC(=O)N1C=C(C)C2C=[N+](=O)/C1=C\c1ccc(OCc2ccccc2F)cc1'
[19:07:21] SMILES Parse Error: unclosed ring for input: 'CC1CCC2C3CCC4=NC(=NCc5ccccc5)N3N(CC)C(=O)C2C1'
[19:07:21] Can't kekulize mol.  Unkekulized atoms: 14 15 25
[19:07:21] Can't kekulize mol.  Unkekulized atoms: 9 10 12 13 14 15 16 18 19
[19:07:21] SMILES Parse Error: unclosed ring for input: 'Cc1ccc(C)c

Epoch 22 -- Batch 1/ 842, training loss 0.30925559997558594
Epoch 22 -- Batch 2/ 842, training loss 0.31402188539505005
Epoch 22 -- Batch 3/ 842, training loss 0.31802913546562195
Epoch 22 -- Batch 4/ 842, training loss 0.32954928278923035
Epoch 22 -- Batch 5/ 842, training loss 0.3185601532459259
Epoch 22 -- Batch 6/ 842, training loss 0.31877049803733826
Epoch 22 -- Batch 7/ 842, training loss 0.3206152021884918
Epoch 22 -- Batch 8/ 842, training loss 0.3252555727958679
Epoch 22 -- Batch 9/ 842, training loss 0.3230220079421997
Epoch 22 -- Batch 10/ 842, training loss 0.3275728225708008
Epoch 22 -- Batch 11/ 842, training loss 0.323304146528244
Epoch 22 -- Batch 12/ 842, training loss 0.31753620505332947
Epoch 22 -- Batch 13/ 842, training loss 0.32849815487861633
Epoch 22 -- Batch 14/ 842, training loss 0.32148101925849915
Epoch 22 -- Batch 15/ 842, training loss 0.3152467608451843
Epoch 22 -- Batch 16/ 842, training loss 0.3187935948371887
Epoch 22 -- Batch 17/ 842, training loss 0

[19:07:31] SMILES Parse Error: unclosed ring for input: 'CS(=O)(=O)c1ccc(CN2CCCC3(CC3)OCc2ccccc2)cc1'
[19:07:31] Can't kekulize mol.  Unkekulized atoms: 5 6 7 8 9
[19:07:31] Can't kekulize mol.  Unkekulized atoms: 14 15 16 19 21
[19:07:31] SMILES Parse Error: unclosed ring for input: 'Cc1cccc(C2=C(Sc3nc(-c4ccccc4)no3)N3CCC[C@@]32C2=O)c1'
[19:07:31] Can't kekulize mol.  Unkekulized atoms: 3 4 6
[19:07:31] Can't kekulize mol.  Unkekulized atoms: 1 2 11 13 14 15 16 17 18
[19:07:31] Can't kekulize mol.  Unkekulized atoms: 10 12 13 14 15 17 18
[19:07:31] Can't kekulize mol.  Unkekulized atoms: 11 12 13 14 15 16 17
[19:07:31] SMILES Parse Error: unclosed ring for input: 'CCOc1ccc(C2c3ccnc3c(SCC(=O)Nc4ccc(C)cc4)ncnc32)cc1'
[19:07:31] SMILES Parse Error: unclosed ring for input: 'O=C1CCc2cc(C(=O)c3ccc4c(Br)cccc3c3)ccc2N1'
[19:07:31] SMILES Parse Error: unclosed ring for input: 'CN(C)S(=O)(=O)n1cc(/C=C(\NC(=O)c2ccccc2F)C(=O)NCCO)C[C@@H]2C=Cc3ccccc31'
[19:07:31] Can't kekulize mol.  Unkekulized 

Epoch 23 -- Batch 1/ 842, training loss 0.31698521971702576
Epoch 23 -- Batch 2/ 842, training loss 0.31302446126937866
Epoch 23 -- Batch 3/ 842, training loss 0.30325308442115784
Epoch 23 -- Batch 4/ 842, training loss 0.3065296709537506
Epoch 23 -- Batch 5/ 842, training loss 0.31090816855430603
Epoch 23 -- Batch 6/ 842, training loss 0.30656513571739197
Epoch 23 -- Batch 7/ 842, training loss 0.324036568403244
Epoch 23 -- Batch 8/ 842, training loss 0.31825658679008484
Epoch 23 -- Batch 9/ 842, training loss 0.3187321424484253
Epoch 23 -- Batch 10/ 842, training loss 0.3160231113433838
Epoch 23 -- Batch 11/ 842, training loss 0.32248708605766296
Epoch 23 -- Batch 12/ 842, training loss 0.3184175491333008
Epoch 23 -- Batch 13/ 842, training loss 0.3104223608970642
Epoch 23 -- Batch 14/ 842, training loss 0.3281659185886383
Epoch 23 -- Batch 15/ 842, training loss 0.3225456774234772
Epoch 23 -- Batch 16/ 842, training loss 0.31720343232154846
Epoch 23 -- Batch 17/ 842, training loss 0

[19:07:41] Can't kekulize mol.  Unkekulized atoms: 11 22 23 24 25
[19:07:41] SMILES Parse Error: unclosed ring for input: 'Cc1noc(C)c1CN(C)C(Cc1ccccc1)c1nccc2n1Cc1ccccc1'
[19:07:41] SMILES Parse Error: unclosed ring for input: 'COc1ccc(-c2ccc3c(c2)[C@@H]2C[C@H](N(C)C(=O)CN(C)C)[C@H](CO)O2)cc1'
[19:07:41] Explicit valence for atom # 1 C, 5, is greater than permitted
[19:07:41] SMILES Parse Error: extra close parentheses while parsing: Cc1cc(-c2cc3ccccc3o2)c2ccco2)cc1
[19:07:41] SMILES Parse Error: Failed parsing SMILES 'Cc1cc(-c2cc3ccccc3o2)c2ccco2)cc1' for input: 'Cc1cc(-c2cc3ccccc3o2)c2ccco2)cc1'
[19:07:41] SMILES Parse Error: duplicated ring closure 4 bonds atom 21 to itself for input: 'Cc1cc(C)n2nc(C(=O)N3CCC4(CCc4ccccc44)C3)nc2n1'
[19:07:41] Can't kekulize mol.  Unkekulized atoms: 8 9 22 23 24 25 30
[19:07:41] SMILES Parse Error: unclosed ring for input: 'C/C(Cc1ccc([C@@H]2C(=O)NCCCN2CCCC2=O)cc1)NC(=O)c1ccco1'
[19:07:41] Can't kekulize mol.  Unkekulized atoms: 6 7 8 33 34
[19:07:41

Epoch 24 -- Batch 1/ 842, training loss 0.31341686844825745
Epoch 24 -- Batch 2/ 842, training loss 0.30258673429489136
Epoch 24 -- Batch 3/ 842, training loss 0.3117695152759552
Epoch 24 -- Batch 4/ 842, training loss 0.3212900757789612
Epoch 24 -- Batch 5/ 842, training loss 0.30760207772254944
Epoch 24 -- Batch 6/ 842, training loss 0.30804431438446045
Epoch 24 -- Batch 7/ 842, training loss 0.3204834461212158
Epoch 24 -- Batch 8/ 842, training loss 0.3108193874359131
Epoch 24 -- Batch 9/ 842, training loss 0.3129231333732605
Epoch 24 -- Batch 10/ 842, training loss 0.3188689649105072
Epoch 24 -- Batch 11/ 842, training loss 0.31039971113204956
Epoch 24 -- Batch 12/ 842, training loss 0.31858697533607483
Epoch 24 -- Batch 13/ 842, training loss 0.31347814202308655
Epoch 24 -- Batch 14/ 842, training loss 0.3278818726539612
Epoch 24 -- Batch 15/ 842, training loss 0.3084886074066162
Epoch 24 -- Batch 16/ 842, training loss 0.3264237940311432
Epoch 24 -- Batch 17/ 842, training loss 0

[19:07:52] SMILES Parse Error: syntax error while parsing: Cc1ccc(NC(=O)Cc2csc(-c3cccc()c3)n2)cc1NS(C)(=O)=O
[19:07:52] SMILES Parse Error: Failed parsing SMILES 'Cc1ccc(NC(=O)Cc2csc(-c3cccc()c3)n2)cc1NS(C)(=O)=O' for input: 'Cc1ccc(NC(=O)Cc2csc(-c3cccc()c3)n2)cc1NS(C)(=O)=O'
[19:07:52] Can't kekulize mol.  Unkekulized atoms: 2 3 4 5 6 7 18 20 21
[19:07:52] Can't kekulize mol.  Unkekulized atoms: 1 2 3 4 5 6 18 19 23
[19:07:52] SMILES Parse Error: unclosed ring for input: 'Cc1ccc(C2CCC3(CCC(=O)N3CCSC3=O)CC2)cc1'
[19:07:52] Can't kekulize mol.  Unkekulized atoms: 1 2 5 6 28
[19:07:52] SMILES Parse Error: ring closure 2 duplicates bond between atom 5 and atom 10 for input: 'CCOC(=O)C12C(=O)OCC12C(=O)N(C(C)C)c1ccccc12'
[19:07:52] Can't kekulize mol.  Unkekulized atoms: 27 28 29 30 31
[19:07:52] Can't kekulize mol.  Unkekulized atoms: 16 17 18 19 20
[19:07:52] SMILES Parse Error: unclosed ring for input: 'COCC(=O)Nc1ccc2c(c1)[C@@H]1C[C@@H](CC(=O)O)O[C@@H]2C[C@@H](O)[C@@H]1O2'
[19:07:52] SM

Epoch 25 -- Batch 1/ 842, training loss 0.29613691568374634
Epoch 25 -- Batch 2/ 842, training loss 0.3096993565559387
Epoch 25 -- Batch 3/ 842, training loss 0.30765050649642944
Epoch 25 -- Batch 4/ 842, training loss 0.3119145929813385
Epoch 25 -- Batch 5/ 842, training loss 0.324607253074646
Epoch 25 -- Batch 6/ 842, training loss 0.31439635157585144
Epoch 25 -- Batch 7/ 842, training loss 0.30113211274147034
Epoch 25 -- Batch 8/ 842, training loss 0.29917970299720764
Epoch 25 -- Batch 9/ 842, training loss 0.3189898133277893
Epoch 25 -- Batch 10/ 842, training loss 0.31025129556655884
Epoch 25 -- Batch 11/ 842, training loss 0.31116724014282227
Epoch 25 -- Batch 12/ 842, training loss 0.302013099193573
Epoch 25 -- Batch 13/ 842, training loss 0.3145207166671753
Epoch 25 -- Batch 14/ 842, training loss 0.3141750395298004
Epoch 25 -- Batch 15/ 842, training loss 0.31307703256607056
Epoch 25 -- Batch 16/ 842, training loss 0.30915096402168274
Epoch 25 -- Batch 17/ 842, training loss 0

[19:08:02] Explicit valence for atom # 4 O, 3, is greater than permitted
[19:08:02] Can't kekulize mol.  Unkekulized atoms: 4 5 6 20 21 22 23
[19:08:02] Can't kekulize mol.  Unkekulized atoms: 2 3 4 5 6 9 10
[19:08:02] Can't kekulize mol.  Unkekulized atoms: 2 3 10
[19:08:02] Can't kekulize mol.  Unkekulized atoms: 2 3 5 7 8 9 10 14 15 16 17 18 20
[19:08:02] Can't kekulize mol.  Unkekulized atoms: 3 4 15
[19:08:02] SMILES Parse Error: extra open parentheses for input: 'N#Cc1c(NC(=O)CN(CCCN2CCOCC2)sc2ccccc2n1'
[19:08:02] Can't kekulize mol.  Unkekulized atoms: 9 10 21
[19:08:02] SMILES Parse Error: unclosed ring for input: 'Cc1cccc(OCC2CO2)c1-c1ccc(-c2ccccc2OC3CCCCC2)cc1'
[19:08:02] Can't kekulize mol.  Unkekulized atoms: 1 3
[19:08:02] SMILES Parse Error: unclosed ring for input: 'Cc1ccc(F)c(NC(=O)C2C3C=CC4(O3)C2C(=O)N(C2CCC(C)CC2)C4C4C(=O)NC2=O)c1'
[19:08:02] SMILES Parse Error: extra close parentheses while parsing: N=C(N)N=N/C=C/C=C/c1ccccc1)c1ccccc1
[19:08:02] SMILES Parse Error: F

Epoch 26 -- Batch 1/ 842, training loss 0.30417197942733765
Epoch 26 -- Batch 2/ 842, training loss 0.3142158091068268
Epoch 26 -- Batch 3/ 842, training loss 0.31422415375709534
Epoch 26 -- Batch 4/ 842, training loss 0.30467814207077026
Epoch 26 -- Batch 5/ 842, training loss 0.31406477093696594
Epoch 26 -- Batch 6/ 842, training loss 0.3197287321090698
Epoch 26 -- Batch 7/ 842, training loss 0.29786986112594604
Epoch 26 -- Batch 8/ 842, training loss 0.3080611228942871
Epoch 26 -- Batch 9/ 842, training loss 0.30594146251678467
Epoch 26 -- Batch 10/ 842, training loss 0.3132709264755249
Epoch 26 -- Batch 11/ 842, training loss 0.3219780921936035
Epoch 26 -- Batch 12/ 842, training loss 0.3123544156551361
Epoch 26 -- Batch 13/ 842, training loss 0.30510562658309937
Epoch 26 -- Batch 14/ 842, training loss 0.3211993873119354
Epoch 26 -- Batch 15/ 842, training loss 0.30149808526039124
Epoch 26 -- Batch 16/ 842, training loss 0.31006136536598206
Epoch 26 -- Batch 17/ 842, training loss

[19:08:12] Can't kekulize mol.  Unkekulized atoms: 11 12 13 14 15 16 17 18 21
[19:08:12] Can't kekulize mol.  Unkekulized atoms: 24 25 26
[19:08:12] Can't kekulize mol.  Unkekulized atoms: 2 3 4 6 9
[19:08:12] SMILES Parse Error: extra close parentheses while parsing: CC1(C)Oc2ccc3cccc(=O)[n+]24)cc1Cl
[19:08:12] SMILES Parse Error: Failed parsing SMILES 'CC1(C)Oc2ccc3cccc(=O)[n+]24)cc1Cl' for input: 'CC1(C)Oc2ccc3cccc(=O)[n+]24)cc1Cl'
[19:08:12] Can't kekulize mol.  Unkekulized atoms: 1 2 3
[19:08:12] Explicit valence for atom # 9 C, 5, is greater than permitted
[19:08:12] Can't kekulize mol.  Unkekulized atoms: 1 2 3 4 17
[19:08:12] Can't kekulize mol.  Unkekulized atoms: 7 8 9 10 27
[19:08:12] SMILES Parse Error: unclosed ring for input: 'CC1(C)Cc2c(c(N3CCOCC3)nc3sc4c(-c5cccc5c5ccccc55)cnc4c23)CO1'
[19:08:12] SMILES Parse Error: unclosed ring for input: 'CCN(CC)S(=O)(=O)c1ccc(OC)c(N2CCN(C(=O)C34CC5CC(CC(C5)C2)C4)CC2)c1'
[19:08:12] Can't kekulize mol.  Unkekulized atoms: 12 13 17 18 2

Epoch 27 -- Batch 1/ 842, training loss 0.3072579801082611
Epoch 27 -- Batch 2/ 842, training loss 0.2980761229991913
Epoch 27 -- Batch 3/ 842, training loss 0.3063768446445465
Epoch 27 -- Batch 4/ 842, training loss 0.30846646428108215
Epoch 27 -- Batch 5/ 842, training loss 0.31508150696754456
Epoch 27 -- Batch 6/ 842, training loss 0.31293633580207825
Epoch 27 -- Batch 7/ 842, training loss 0.305373877286911
Epoch 27 -- Batch 8/ 842, training loss 0.3042360544204712
Epoch 27 -- Batch 9/ 842, training loss 0.3126601576805115
Epoch 27 -- Batch 10/ 842, training loss 0.3080982267856598
Epoch 27 -- Batch 11/ 842, training loss 0.3055137097835541
Epoch 27 -- Batch 12/ 842, training loss 0.3016073405742645
Epoch 27 -- Batch 13/ 842, training loss 0.3054662346839905
Epoch 27 -- Batch 14/ 842, training loss 0.3084569275379181
Epoch 27 -- Batch 15/ 842, training loss 0.3080443739891052
Epoch 27 -- Batch 16/ 842, training loss 0.3099311292171478
Epoch 27 -- Batch 17/ 842, training loss 0.3039

[19:08:23] SMILES Parse Error: extra close parentheses while parsing: Cc1ccc(F)cc1NC(=O)CSc1nnc(-c2cc(Cl)ccc2Cl)n1N)c1ccccc1
[19:08:23] SMILES Parse Error: Failed parsing SMILES 'Cc1ccc(F)cc1NC(=O)CSc1nnc(-c2cc(Cl)ccc2Cl)n1N)c1ccccc1' for input: 'Cc1ccc(F)cc1NC(=O)CSc1nnc(-c2cc(Cl)ccc2Cl)n1N)c1ccccc1'
[19:08:23] SMILES Parse Error: extra close parentheses while parsing: c1ccc2c3c([nH]c2c1)c1nc-n1CCc1ccccc1)C2
[19:08:23] SMILES Parse Error: Failed parsing SMILES 'c1ccc2c3c([nH]c2c1)c1nc-n1CCc1ccccc1)C2' for input: 'c1ccc2c3c([nH]c2c1)c1nc-n1CCc1ccccc1)C2'
[19:08:23] Can't kekulize mol.  Unkekulized atoms: 3 4 5
[19:08:23] Can't kekulize mol.  Unkekulized atoms: 14 15 25
[19:08:23] Can't kekulize mol.  Unkekulized atoms: 5 7 25
[19:08:23] Can't kekulize mol.  Unkekulized atoms: 2 3 5 6 19
[19:08:23] Can't kekulize mol.  Unkekulized atoms: 3 4 5 6 8 27 28 30 31
[19:08:23] Can't kekulize mol.  Unkekulized atoms: 5 6 7 8 9 26 27
[19:08:23] SMILES Parse Error: unclosed ring for input: 'C1CCC

Epoch 28 -- Batch 1/ 842, training loss 0.2962436378002167
Epoch 28 -- Batch 2/ 842, training loss 0.3059232532978058
Epoch 28 -- Batch 3/ 842, training loss 0.3032916784286499
Epoch 28 -- Batch 4/ 842, training loss 0.29936355352401733
Epoch 28 -- Batch 5/ 842, training loss 0.30717000365257263
Epoch 28 -- Batch 6/ 842, training loss 0.30118489265441895
Epoch 28 -- Batch 7/ 842, training loss 0.31719139218330383
Epoch 28 -- Batch 8/ 842, training loss 0.3036789298057556
Epoch 28 -- Batch 9/ 842, training loss 0.31465965509414673
Epoch 28 -- Batch 10/ 842, training loss 0.30576133728027344
Epoch 28 -- Batch 11/ 842, training loss 0.3071347177028656
Epoch 28 -- Batch 12/ 842, training loss 0.3067638576030731
Epoch 28 -- Batch 13/ 842, training loss 0.3026958107948303
Epoch 28 -- Batch 14/ 842, training loss 0.3067464232444763
Epoch 28 -- Batch 15/ 842, training loss 0.30458277463912964
Epoch 28 -- Batch 16/ 842, training loss 0.3087851405143738
Epoch 28 -- Batch 17/ 842, training loss 0

[19:08:33] Can't kekulize mol.  Unkekulized atoms: 9 10 11 18 20
[19:08:33] SMILES Parse Error: unclosed ring for input: 'Cn1c(=O)c2c3c(ncn2CC(=O)OCC(=O)Nc2ccc(Br)cc2)n(C)c1=O'
[19:08:33] SMILES Parse Error: unclosed ring for input: 'CCOC(=O)C1(O)C2CCCCC2C1(O)C(=O)N(C)C2CCCCC1'
[19:08:33] Explicit valence for atom # 12 O, 3, is greater than permitted
[19:08:33] Can't kekulize mol.  Unkekulized atoms: 7 8 9 10 12 27
[19:08:33] SMILES Parse Error: unclosed ring for input: 'c1ccc(COC2C(C3COC4(CCCCC4)O3)OC3OC4(CCCCC4)OC3[C@H]2C(=O)N(CCC(F)(F)F)C3=O)cc1'
[19:08:33] Explicit valence for atom # 12 O, 3, is greater than permitted
[19:08:33] SMILES Parse Error: syntax error while parsing: Cc1cccc(-n2nc3c(c2NC(=O)Cc2ccc(O)c(OC)c2)CS(=O)==)C3)c1C
[19:08:33] SMILES Parse Error: Failed parsing SMILES 'Cc1cccc(-n2nc3c(c2NC(=O)Cc2ccc(O)c(OC)c2)CS(=O)==)C3)c1C' for input: 'Cc1cccc(-n2nc3c(c2NC(=O)Cc2ccc(O)c(OC)c2)CS(=O)==)C3)c1C'
[19:08:33] Explicit valence for atom # 9 C, 5, is greater than permitted

Epoch 29 -- Batch 1/ 842, training loss 0.295406311750412
Epoch 29 -- Batch 2/ 842, training loss 0.3022112548351288
Epoch 29 -- Batch 3/ 842, training loss 0.29400551319122314
Epoch 29 -- Batch 4/ 842, training loss 0.30828857421875
Epoch 29 -- Batch 5/ 842, training loss 0.3039529323577881
Epoch 29 -- Batch 6/ 842, training loss 0.29846230149269104
Epoch 29 -- Batch 7/ 842, training loss 0.30116215348243713
Epoch 29 -- Batch 8/ 842, training loss 0.3059957027435303
Epoch 29 -- Batch 9/ 842, training loss 0.30876341462135315
Epoch 29 -- Batch 10/ 842, training loss 0.3003529906272888
Epoch 29 -- Batch 11/ 842, training loss 0.30735528469085693
Epoch 29 -- Batch 12/ 842, training loss 0.3174561560153961
Epoch 29 -- Batch 13/ 842, training loss 0.30606985092163086
Epoch 29 -- Batch 14/ 842, training loss 0.3028392791748047
Epoch 29 -- Batch 15/ 842, training loss 0.29791346192359924
Epoch 29 -- Batch 16/ 842, training loss 0.29539093375205994
Epoch 29 -- Batch 17/ 842, training loss 0.3

[19:08:43] Explicit valence for atom # 11 C, 5, is greater than permitted
[19:08:43] Can't kekulize mol.  Unkekulized atoms: 6 7 8
[19:08:43] SMILES Parse Error: unclosed ring for input: 'O=C(O)C1C2CC3C1C(=O)Oc1ccccc1C1c3sc(=O)[nH]c2SC31'
[19:08:43] Can't kekulize mol.  Unkekulized atoms: 16 17 18 19 20
[19:08:43] SMILES Parse Error: unclosed ring for input: 'CCN(CC)CC(=O)N1CCc2c(sc3ccccc23)C12CC1'
[19:08:43] SMILES Parse Error: unclosed ring for input: 'COc1ccc2c(c1)sc1nc(C(C)=O)c(=O)n2C'
[19:08:43] Can't kekulize mol.  Unkekulized atoms: 1 2 3 4 5 7 8 9 10 11 12 13 14
[19:08:43] SMILES Parse Error: ring closure 2 duplicates bond between atom 14 and atom 15 for input: 'Cc1ccc(CN(C(=O)c2csnn2)C2C2CCCCC2)c(C)c1'
[19:08:43] SMILES Parse Error: unclosed ring for input: 'CCOC(=O)c1c(C)oc2ccc(N(C(C)=O)S(=O)(=O)c3ccc4c5c(ccc3c3)CCC4)cc12'
[19:08:43] SMILES Parse Error: unclosed ring for input: 'CN1C(C(=O)O)C2C3C=CC(C4)C2C1=O'
[19:08:43] SMILES Parse Error: ring closure 4 duplicates bond betw

Epoch 30 -- Batch 1/ 842, training loss 0.2970809042453766
Epoch 30 -- Batch 2/ 842, training loss 0.3127555847167969
Epoch 30 -- Batch 3/ 842, training loss 0.29363590478897095
Epoch 30 -- Batch 4/ 842, training loss 0.301332950592041
Epoch 30 -- Batch 5/ 842, training loss 0.3112838566303253
Epoch 30 -- Batch 6/ 842, training loss 0.2928890287876129
Epoch 30 -- Batch 7/ 842, training loss 0.3063929080963135
Epoch 30 -- Batch 8/ 842, training loss 0.3075919449329376
Epoch 30 -- Batch 9/ 842, training loss 0.2963351905345917
Epoch 30 -- Batch 10/ 842, training loss 0.29250746965408325
Epoch 30 -- Batch 11/ 842, training loss 0.29490116238594055
Epoch 30 -- Batch 12/ 842, training loss 0.3009416162967682
Epoch 30 -- Batch 13/ 842, training loss 0.30367615818977356
Epoch 30 -- Batch 14/ 842, training loss 0.3042241632938385
Epoch 30 -- Batch 15/ 842, training loss 0.3034258186817169
Epoch 30 -- Batch 16/ 842, training loss 0.3045646548271179
Epoch 30 -- Batch 17/ 842, training loss 0.298

[19:08:54] Can't kekulize mol.  Unkekulized atoms: 4 5 6 7 12 13 21
[19:08:54] Can't kekulize mol.  Unkekulized atoms: 2 3 4 5 22
[19:08:54] SMILES Parse Error: extra close parentheses while parsing: Cc1ncn(-c2ccc(Nc3cc(-c4nocc4C)ccn3)cc2)n1)c1
[19:08:54] SMILES Parse Error: Failed parsing SMILES 'Cc1ncn(-c2ccc(Nc3cc(-c4nocc4C)ccn3)cc2)n1)c1' for input: 'Cc1ncn(-c2ccc(Nc3cc(-c4nocc4C)ccn3)cc2)n1)c1'
[19:08:54] SMILES Parse Error: unclosed ring for input: 'O=C1NC(=O)c2c1c1ccccc1c2ccccc12'
[19:08:54] Can't kekulize mol.  Unkekulized atoms: 2 3 9 11 26
[19:08:54] Can't kekulize mol.  Unkekulized atoms: 11 12 14 15 17 18 19
[19:08:54] SMILES Parse Error: unclosed ring for input: 'CS(=O)(=O)N1CCc2nc(C3COc3ccccc3CNCc3ccco3)CC2C1'
[19:08:54] Can't kekulize mol.  Unkekulized atoms: 4 5 6 7 8 15 16 17 18
[19:08:54] SMILES Parse Error: unclosed ring for input: 'COc1ccc(C(=O)Nc2c3c(nn2-c2nc(C(C)(C)C)no2)CS(=O)(=O)C2)cc1OC'
[19:08:54] SMILES Parse Error: extra close parentheses while parsing: Cc1n

## 3. Utilize the trained LSTM model
Since training the lstm model has been completed, the trained model has been used to generate new smiles as follows.

In [None]:
import sys 
import glob
sys.path.append('../src/model/')

for dataset in dataset_list:
	model_dir = f'{PRETRAIN_RESULTS}/{dataset}_results/vanilalstm'
	sample_num_list = [10000, 41743] # 41743 : numer of SMILES before randomization

	for sample_num in sample_num_list:
		outfd   = f'{PRETRAIN_RESULTS}/{dataset}_results/sampling_{sample_num}'
		appname = 'generative_models.apps.Sampling.SamplingApp'
		vocab   = f'{model_dir}/model/vocabulary.pickle'
		m_str 	= glob.glob(f'{model_dir}/model/best_model_structure_epoch*.pickle')[0]
		m_state = glob.glob(f'{model_dir}/model/best_model_epoch*.pth')[0]
		n 		= sample_num
		rseed 	= 42
		params = [	'--model=lstm',
					f'--model-structure={m_str}',
					f'--model-state={m_state}',
					f'--vocab={vocab}',
					f'--n={n}',
					'--allow-override',
					f'--outdir={outfd}',
					f'--random-seed={rseed}',
					f'--write-excel-file'
		]

		run(appname, *params)


[22:34:08] Can't kekulize mol.  Unkekulized atoms: 14 15 16 17 18 19 20 21 22
[22:34:08] Can't kekulize mol.  Unkekulized atoms: 1 2 3 5 6 9 15 16 17 18 19 20 25
[22:34:08] Can't kekulize mol.  Unkekulized atoms: 17 18 19 20 21
[22:34:08] Can't kekulize mol.  Unkekulized atoms: 4 5 6 14 15
[22:34:09] SMILES Parse Error: unclosed ring for input: 'COc1ccc(C)cc1N1C(=O)[C@H](c2ccccc2)C(c2n(c3ccc(Cl)cc2)c2ccccc2Cl)=NN1c1ccccc1'
[22:34:09] Can't kekulize mol.  Unkekulized atoms: 5 6 7 9 20
[22:34:09] SMILES Parse Error: unclosed ring for input: 'c1cc2occ(CCSc3ncnc4sccc35)n2c2c1'
[22:34:09] SMILES Parse Error: syntax error while parsing: CC[C@@H](CCc1ccccc1)NC(=O)NC[C@]1(c2cccc()C2)CC(C)C)c1
[22:34:09] SMILES Parse Error: Failed parsing SMILES 'CC[C@@H](CCc1ccccc1)NC(=O)NC[C@]1(c2cccc()C2)CC(C)C)c1' for input: 'CC[C@@H](CCc1ccccc1)NC(=O)NC[C@]1(c2cccc()C2)CC(C)C)c1'
[22:34:09] SMILES Parse Error: unclosed ring for input: 'COc1cccc([C@@H]2CCCN2c2nc3ccc4c(c3)OCCO4)cc1'
[22:34:09] Can't kekulize

## 4. Finetune the LSTM model
Finetuning the trained model using a small number of compounds is a routine by loading the trained model and training them using `finetune.py` code. The cli tool is `apps.FineTune.py`. 

To switch between **filtered** and **unfiltered** fine-tuning data, modify the variable `FINETUNE_FILTER` in `src/paths.py`.


In [3]:
from src.paths import FINETUNE_FILTER
print(f'Fine-tuning data status: {FINETUNE_FILTER}')

Fine-tuning data status: filtered


In [None]:
import sys
import glob
sys.path.append('../src/model/')
from src.paths import ensure_dirs, FINETUNE_DATA, FINETUNE_RESULTS, FINETUNE_FILTER
ensure_dirs()

f_data_list = ['CHEMBL4005', 'CHEMBL1908389', 'CHEMBL284', 'CHEMBL214', 'CHEMBL253']

for dataset in dataset_list:
	for f_data in f_data_list:
		model_dir = f'{PRETRAIN_RESULTS}/{dataset}_results/vanilalstm'
		outfd     = f'{FINETUNE_RESULTS}/{dataset}_results/{f_data}_finetune'
		appname   = 'generative_models.apps.Finetune.FinetunerApp'
		vocab     = f'{model_dir}/model/vocabulary.pickle'
		m_str 	  = glob.glob(f'{model_dir}/model/best_model_structure_epoch*.pickle')[0]
		m_state   = glob.glob(f'{model_dir}/model/best_model_epoch*.pth')[0]
		save_epoch_models = False

		data      = f'{FINETUNE_DATA}/{FINETUNE_FILTER}-{f_data}_train_rdsmi3.tsv'
		smicol    = 'rd3_smiles'
		epochs	  = 100
		rseed 	  = 42
		batchsize = 16
		lr 		  = 1e-4 # small learning rate
		sampling_epoch = 10000

		params = [	
					f'--data={data}',
					f'--smi-colname={smicol}',
					f'--sampling-epoch={sampling_epoch}',
					'--model=lstm',
					f'--model-structure={m_str}',
					f'--model-state={m_state}',
					f'--vocab={vocab}',
					f'--epochs={epochs}',
					'--override-folder=1',
					f'--outdir={outfd}',
					f'--random-seed={rseed}',
					f'--batch-size={batchsize}',
					f'--lr={lr}',
					f'--exclude-pad-loss=1',
					'--use-cpus' # force working on CPUs
		]

		run(appname, *params)


Start logging
Starting: Finetuner, Namespace(num_workers=1, outdir='/home/abe/Paper/Pretraining-Assesment-for-LSTM-Molecular-Generation/results/finetune/filtered/pubchem_filtered_ac_results/CHEMBL4005_finetune', case='', override_folder=1, tensorboard_prefix='tensor_board', data='/home/abe/Paper/Pretraining-Assesment-for-LSTM-Molecular-Generation/data/finetune/filtered/filtered_CHEMBL4005_train_rdsmi3.tsv', smi_colname='rd3_smiles', vocab='/home/abe/Paper/Pretraining-Assesment-for-LSTM-Molecular-Generation/results/pretrain/pubchem_filtered_ac_results/vanilalstm/model/vocabulary.pickle', random_seed=42, debug=None, sampling_epoch=10000, model='lstm', write_xlsx=None, save_snapshot_models='', batch_size=16, epochs=100, validation_ratio=0.1, lr=0.0001, exclude_pad_loss=1, early_stopping_patience=0, load_model=None, model_structure='/home/abe/Paper/Pretraining-Assesment-for-LSTM-Molecular-Generation/results/pretrain/pubchem_filtered_ac_results/vanilalstm/model/best_model_structure_epoch28.

[01:02:55] SMILES Parse Error: unclosed ring for input: 'CCOc1ccc(NC(=O)C2=C(CCNC(=O)C2CC2)N(C)C)cc1'
[01:02:55] Can't kekulize mol.  Unkekulized atoms: 4 5 6 7 8 9 23
[01:02:55] SMILES Parse Error: unclosed ring for input: 'COc1ccc(-c2nc(C#N)c(N3CCC3(CC3)OCCO4)c2C)cc1OC'
[01:02:55] SMILES Parse Error: unclosed ring for input: 'N#Cc1cc2c(nc1SCc1cnn3n1Cc1ccccc1)CCCC2'
[01:02:55] SMILES Parse Error: unclosed ring for input: 'CN(C)C(=O)N1CCN(Cc2nc3sc(-c4ccccc4)c(=O)[nH]2)CC1'
[01:02:55] Can't kekulize mol.  Unkekulized atoms: 9 10 11 12 13 14 17
[01:02:55] SMILES Parse Error: ring closure 3 duplicates bond between atom 14 and atom 15 for input: 'Clc1ccc(C23CCN(Cc3nnnn3C3CCCCC3)CC2)cc1'
[01:02:55] SMILES Parse Error: unclosed ring for input: 'C[C@H]1CN([C@@H](C)CO)C(=O)Cc2cc(NC(=O)Nc3ccccc3)ccc2O[C@H](C)CN1CCCC1'
[01:02:55] Can't kekulize mol.  Unkekulized atoms: 14 15 16 17 18 20 21 23 24
[01:02:55] Explicit valence for atom # 10 N, 4, is greater than permitted
[01:02:55] Can't kekulize m

sampling smiles at most 1000.


[01:02:57] SMILES Parse Error: unclosed ring for input: 'COc1ccccc1C1(CCn2c(=O)oc2ccccc2)CCO1'
[01:02:57] SMILES Parse Error: extra open parentheses for input: 'Cc1ccc(N(C2=NC3CCCCC3=NC2(C(=O)NCc2ccco2)C2)cc1'
[01:02:57] Can't kekulize mol.  Unkekulized atoms: 6 7 17 18 19 20 21 22 23
[01:02:57] SMILES Parse Error: unclosed ring for input: 'COc1ccccc1N1CCN(CCC(Oc2ccc(C)cc2)CC2CCO)CC1'
[01:02:57] Can't kekulize mol.  Unkekulized atoms: 6 7 8
[01:02:57] Can't kekulize mol.  Unkekulized atoms: 15 16 17
[01:02:57] Can't kekulize mol.  Unkekulized atoms: 1 2 3 4 5
[01:02:57] Can't kekulize mol.  Unkekulized atoms: 12 13 24
[01:02:57] Can't kekulize mol.  Unkekulized atoms: 3 4 5 6 7
[01:02:57] Can't kekulize mol.  Unkekulized atoms: 4 5 6 11 12 13 14 15 16
[01:02:57] SMILES Parse Error: unclosed ring for input: 'O=C(CN1CCCC1)N1c2ccccc2C1(c1ccccc1)CCCC1'
[01:02:57] SMILES Parse Error: unclosed ring for input: 'CC(C)N1CC2C3CC1C(C)(CO)N4C(=O)c1cc2ccccc2o1'
[01:02:57] Can't kekulize mol.  Unkek

Epoch 1/100, training: 9 batches, size 16*1
Epoch 1 -- Batch 1/ 9, training loss 1.9817802906036377
Epoch 1 -- Batch 2/ 9, training loss 1.890822172164917
Epoch 1 -- Batch 3/ 9, training loss 1.9888304471969604
Epoch 1 -- Batch 4/ 9, training loss 1.689042091369629
Epoch 1 -- Batch 5/ 9, training loss 1.3676780462265015
Epoch 1 -- Batch 6/ 9, training loss 1.5268007516860962
Epoch 1 -- Batch 7/ 9, training loss 1.4545291662216187
Epoch 1 -- Batch 8/ 9, training loss 1.271896243095398
Epoch 1 -- Batch 9/ 9, training loss 1.192453384399414
----------------------------------------------------------------------


KeyboardInterrupt: 