Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to train LinkerInvent's prior model using my own data #63

Closed
Bruce410526 opened this issue Apr 25, 2024 · 3 comments
Closed

How to train LinkerInvent's prior model using my own data #63

Bruce410526 opened this issue Apr 25, 2024 · 3 comments

Comments

@Bruce410526
Copy link

Thank you very much for open-sourcing the project. I noticed that pre-trained prior models are provided in the 'prior/' directory. I am a beginner and would like to train a new prior using my own data, but I couldn't find any instructions on how to train the prior. I would greatly appreciate it if you could provide me with some guidance and assistance on this matter

@halx
Copy link
Contributor

halx commented Apr 25, 2024

Hi,

many thanks for your interest in REINVENT and welcome to the community!

We have four different generator "styles", which one are you interested in (Reinvent, Libinvent, Linkinvent, Mol2Mol). Generally, I would not necessarily recommend to produce new priors for production as it requires some knowledge and skill to make a high-quality one. But it can certainly also be an interesting learning exercise to make new models and get it to perform well.

Many thanks,
Hannes.

@Bruce410526
Copy link
Author

Hi,

many thanks for your interest in REINVENT and welcome to the community!

We have four different generator "styles", which one are you interested in (Reinvent, Libinvent, Linkinvent, Mol2Mol). Generally, I would not necessarily recommend to produce new priors for production as it requires some knowledge and skill to make a high-quality one. But it can certainly also be an interesting learning exercise to make new models and get it to perform well.

Many thanks, Hannes.

First of all, thank you for your reply. I'm particularly interested in the 'styles' of the Linkinvent generator. My own dataset consists of a group of molecules represented in SMILES format. I want to train a prior model using Linkinvent starting from this data. I'm not sure where to find scripts for data preprocessing and retraining. I would greatly appreciate it if you could provide them.

@halx
Copy link
Contributor

halx commented Apr 25, 2024

Hi again,

the original publication describes how the input SMILES have been split but eventually is the same method as applied for Libinvent. The repository of the original code may have some clues and data but the data splitting code is in this repository.

Cheers,
Hannes.

@halx halx closed this as completed May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants