Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Draft] Refactor the library - Merge same as PR #10 -> Extensibility + Customability #9

Closed
wants to merge 10 commits into from

Conversation

IchiruTake
Copy link

@IchiruTake IchiruTake commented Jul 26, 2022

@pstjohn Please review for me when I am processed and afterward. Do you mind if the function convert to the class object for better the state of the molecule is maintained.

Related: #2

Cleanup

Create the class::MolProcessor

Update utils.py
Preparation

Add credits

Cleanup

Create the class::MolProcessor

Update utils.py
@IchiruTake
Copy link
Author

@pstjohn I have completed the first step preparing module utils.py as a replacement for fragment.py. Can you take a look at it ? If everything going well, should I push all calling methods from different files such as prediction.py and model.py using the module utils.py ?

Cleanup `drawing.py`

Update `model.py` (P1)
- Refactor + Cleanup
- Correct documentation
- Move RDKit logging into `__init__.py` to disable by default.
@IchiruTake
Copy link
Author

@pstjohn Can you review the code for me?
What I did is to:

  • Introduce utils.py as a replacement of fragment.py to improve the pipeline speed. The fragment.py is left there for legacy purpose.
  • Centralize the global variable state into arg::MODEL_CONFIG in __init__.py
  • Cleanup, correct documentation,

@IchiruTake
Copy link
Author

@pstjohn I have corrected the failed test. Can you review it again ?

Copy link
Collaborator

@pstjohn pstjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just so I'm following, is the computational speed improvement here mainly de-duplicating inside the fragment code, rather than after all the fragments have been generated? That might be more simply implemented as a line or two here:
https://github.com/NREL/alfabet/blob/master/alfabet/fragment.py#L11-L12

A couple style notes:
It's standard to use snake_case for variable and function names, and TitleCase for class definitions. Internal class functions usually just start with a preceding underscore (and don't have a trailing one) i.e., _my_internal_function.

@IchiruTake
Copy link
Author

IchiruTake commented Jul 27, 2022

Just so I'm following, is the computational speed improvement here mainly de-duplicating inside the fragment code, rather than after all the fragments have been generated? That might be more simply implemented as a line or two here: https://github.com/NREL/alfabet/blob/master/alfabet/fragment.py#L11-L12

A couple style notes: It's standard to use snake_case for variable and function names, and TitleCase for class definitions. Internal class functions usually just start with a preceding underscore (and don't have a trailing one) i.e., _my_internal_function.

I will correct the function as your style. Yes you are right, but it does not stop there, the optimization of de-duplicating is important but what I want is to centralize all molecules processing with a centralized class to avoid any memory leak, possible parallelization and reduce duplicated operation. For example, the conversion of molecules from SMILES and AddHs function which return new molecule. Even if they are written in C++, the time and temporary memory is non-trivial with less yime fluctuation. The web-based is the best example to get this advantage, deduplicate fragments benefits large dataset processing

Moreover, I tried to reduce the namespace of ALFABET by taking only necessary functions which is definitely faster with narrower import. I would fix it in within this week as in here, it is the midnight.

Furthermore, we want to gain more control on the molecule processing, which you can see in the method MolProcessor::GetReaction() and reduce future refactoring.

All in all, thank you for your review.

@pstjohn
Copy link
Collaborator

pstjohn commented Jul 27, 2022

Are you able to run the test suite locally? Looks like some of those tests are still failing

@IchiruTake
Copy link
Author

IchiruTake commented Jul 28, 2022

Are you able to run the test suite locally? Looks like some of those tests are still failing
Let me check that if the installation behaves well

@IchiruTake
Copy link
Author

IchiruTake commented Jul 28, 2022

Are you able to run the test suite locally? Looks like some of those tests are still failing

The installation worked fine and call test_predict() does not raise assertion. I think it would be better to cast the current state of utils.py to fragment.py for better maintenance. I will try to make code more consistently to your coding style. If there are anything else, you can response to me immediately through Linkedin for faster response.

Speed the model start-up by setting arg::compile=False on empty configuration model.

Linkedin: https://www.linkedin.com/in/minh-pham-hoang-0b0626172/

image

- Resolving conflict
- Correct style on `utils.py`
- Update document
- Move from `utils.py` to `fragment.py` back
- Remove redundant whitespace
@IchiruTake
Copy link
Author

IchiruTake commented Jul 28, 2022

This PR attempted to refactor the source code by:

  • Pushing redundant function into object and passed one molecule at a time to store the state. For example, the non-controlled iterative conversion of SMILES to Mol and Add Hydrogen to mol, even they are written in C++
  • Perform de-duplication within one molecule instead of a dataset.
  • Making the import workspace narrower to boost the lookup method.
  • Update the documentation

@IchiruTake IchiruTake changed the title Optimize the return of data Refactor the library Jul 28, 2022
@IchiruTake
Copy link
Author

IchiruTake commented Jul 29, 2022

The tests worked well. Do you have any comments? @pstjohn

@IchiruTake IchiruTake changed the title Refactor the library [Draft] Refactor the library - Merge same as PR #10 -> Extensibility + Customability Aug 30, 2022
@IchiruTake IchiruTake closed this Aug 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants