Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential New Feature: allowing users to input customized initial pipelines #1321

Open
t-harden opened this issue Sep 13, 2023 · 1 comment

Comments

@t-harden
Copy link

After thoroughly reviewing the well-structured and logically clear code of TPOT, we have identified that the construction of pipelines in the initial population is random. It seems like a natural idea to allow users to provide some well-defined initial pipelines to TPOT. This approach has the potential to enhance the algorithm's performance and reduce the evolutionary time.

Additionally, we have observed some related issues (#1107) and pull requests (#502). Regrettably, these updates were not implemented optimally. We are highly motivated to work on developing this new feature and have made some initial progress. However, it's worth noting that TPOT's framework is complex, and we are not yet fully familiar with it. For instance, we haven't yet deciphered the function of self.tree_structure in /tpot/base.py.

Considering your expertise with TPOT's inner workings, we would greatly appreciate your assistance in resolving issues that may arise during our development efforts. We believe that collaborating in this manner will not only help us comprehend TPOT's development principles but also contribute to the overall advancement of the project.

Thank you very much for considering our request and for your continued contributions to TPOT. We look forward to working together to further enhance this remarkable tool. 🤝

@perib
Copy link
Contributor

perib commented Sep 13, 2023

Thank you for your interest in contributing! We would love to have your help.

We have shifted development into our next version, TPOT2. We rewrote the codebase from scratch with the goal of making it clearer, more modular, and easier to extend and maintain. This also resolves many bugs/issues with TPOT1 (such as infinitely stalling), and begins to implement some new features (such as graph-based pipelines).

We have an alpha version of the codebase here: https://github.com/EpistasisLab/tpot2/tree/main . The dev branch contains the latest version, while the main is the current "stable version."
It is still in development, and things may change over time. We would love feedback on the organization/API/etc , especially from people looking to contribute.

For your question, about the initial population:
In TPOT2, the initial population is pulled from a generator that yields individuals. The code that is currently used is found here. Effectively, it loops through the possible root nodes and adds a number of random insertions to create the random population. This would be the function to modify to change how the initial population is generated.

The actual individual class is defined here: Its essentially a networkX graph of NodeLabel() objects that hold the method class (as types) and the hyperparameters (as dictionaries). the root node is the parent, child nodes feed their output to the parent nodes.

So if we want the user to provide the initial pipelines, we would just need a function that converts sklearn pipelines to the graphindividual format.

Feel free to open an issue on the TPOT2 page.

I'm also happy to set up a meeting to discuss any questions/concerns

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants