[Draft] Refactor the library - Merge same as PR #10 -> Extensibility + Customability #9

IchiruTake · 2022-07-26T02:12:45Z

@pstjohn Please review for me when I am processed and afterward. Do you mind if the function convert to the class object for better the state of the molecule is maintained.

Related: #2

Cleanup Create the class::MolProcessor Update utils.py

Preparation Add credits Cleanup Create the class::MolProcessor Update utils.py

…to speedup-01

IchiruTake · 2022-07-26T07:48:50Z

@pstjohn I have completed the first step preparing module utils.py as a replacement for fragment.py. Can you take a look at it ? If everything going well, should I push all calling methods from different files such as prediction.py and model.py using the module utils.py ?

Cleanup `drawing.py` Update `model.py` (P1)

- Refactor + Cleanup - Correct documentation - Move RDKit logging into `__init__.py` to disable by default.

IchiruTake · 2022-07-27T09:09:46Z

@pstjohn Can you review the code for me?
What I did is to:

Introduce utils.py as a replacement of fragment.py to improve the pipeline speed. The fragment.py is left there for legacy purpose.
Centralize the global variable state into arg::MODEL_CONFIG in __init__.py
Cleanup, correct documentation,

IchiruTake · 2022-07-27T16:11:51Z

@pstjohn I have corrected the failed test. Can you review it again ?

pstjohn

Just so I'm following, is the computational speed improvement here mainly de-duplicating inside the fragment code, rather than after all the fragments have been generated? That might be more simply implemented as a line or two here:
https://github.com/NREL/alfabet/blob/master/alfabet/fragment.py#L11-L12

A couple style notes:
It's standard to use snake_case for variable and function names, and TitleCase for class definitions. Internal class functions usually just start with a preceding underscore (and don't have a trailing one) i.e., _my_internal_function.

IchiruTake · 2022-07-27T17:11:15Z

Just so I'm following, is the computational speed improvement here mainly de-duplicating inside the fragment code, rather than after all the fragments have been generated? That might be more simply implemented as a line or two here: https://github.com/NREL/alfabet/blob/master/alfabet/fragment.py#L11-L12

A couple style notes: It's standard to use snake_case for variable and function names, and TitleCase for class definitions. Internal class functions usually just start with a preceding underscore (and don't have a trailing one) i.e., _my_internal_function.

I will correct the function as your style. Yes you are right, but it does not stop there, the optimization of de-duplicating is important but what I want is to centralize all molecules processing with a centralized class to avoid any memory leak, possible parallelization and reduce duplicated operation. For example, the conversion of molecules from SMILES and AddHs function which return new molecule. Even if they are written in C++, the time and temporary memory is non-trivial with less yime fluctuation. The web-based is the best example to get this advantage, deduplicate fragments benefits large dataset processing

Moreover, I tried to reduce the namespace of ALFABET by taking only necessary functions which is definitely faster with narrower import. I would fix it in within this week as in here, it is the midnight.

Furthermore, we want to gain more control on the molecule processing, which you can see in the method MolProcessor::GetReaction() and reduce future refactoring.

All in all, thank you for your review.

pstjohn · 2022-07-27T19:00:15Z

Are you able to run the test suite locally? Looks like some of those tests are still failing

IchiruTake · 2022-07-28T01:03:11Z

Are you able to run the test suite locally? Looks like some of those tests are still failing
Let me check that if the installation behaves well

IchiruTake · 2022-07-28T03:50:59Z

Are you able to run the test suite locally? Looks like some of those tests are still failing

The installation worked fine and call test_predict() does not raise assertion. I think it would be better to cast the current state of utils.py to fragment.py for better maintenance. I will try to make code more consistently to your coding style. If there are anything else, you can response to me immediately through Linkedin for faster response.

Speed the model start-up by setting arg::compile=False on empty configuration model.

Linkedin: https://www.linkedin.com/in/minh-pham-hoang-0b0626172/

- Resolving conflict - Correct style on `utils.py` - Update document - Move from `utils.py` to `fragment.py` back - Remove redundant whitespace

IchiruTake · 2022-07-28T04:34:24Z

This PR attempted to refactor the source code by:

Pushing redundant function into object and passed one molecule at a time to store the state. For example, the non-controlled iterative conversion of SMILES to Mol and Add Hydrogen to mol, even they are written in C++
Perform de-duplication within one molecule instead of a dataset.
Making the import workspace narrower to boost the lookup method.
Update the documentation

IchiruTake · 2022-07-29T13:19:36Z

The tests worked well. Do you have any comments? @pstjohn

Preparation

5263b43

IchiruTake force-pushed the speedup-01 branch from a887df3 to 5554c87 Compare July 26, 2022 03:26

Add credits

91a4e8a

Cleanup Create the class::MolProcessor Update utils.py

IchiruTake force-pushed the speedup-01 branch from fb975f4 to 91a4e8a Compare July 26, 2022 03:36

Configure utils.py to fragment.py

837a8c7

Preparation Add credits Cleanup Create the class::MolProcessor Update utils.py

IchiruTake force-pushed the speedup-01 branch from bfab4ea to 837a8c7 Compare July 26, 2022 04:39

Merge branch 'speedup-01' of https://github.com/IchiruTake/alfabet in…

f7f644b

…to speedup-01

IchiruTake mentioned this pull request Jul 26, 2022

[NEED IMPROVEMENT] alfabet.model.predict is too slow #2

Closed

IchiruTake added 2 commits July 27, 2022 11:00

Update model.py (P2)

536a75e

Cleanup `drawing.py` Update `model.py` (P1)

Centralize all states found across project in __init__.py

848b6c7

- Refactor + Cleanup - Correct documentation - Move RDKit logging into `__init__.py` to disable by default.

IchiruTake force-pushed the speedup-01 branch from 449ed46 to 848b6c7 Compare July 27, 2022 09:04

Correct failed test due to the lack of mol argument

e0835ba

pstjohn reviewed Jul 27, 2022

View reviewed changes

Resolve conflict

833cd6a

- Resolving conflict - Correct style on `utils.py` - Update document - Move from `utils.py` to `fragment.py` back - Remove redundant whitespace

IchiruTake force-pushed the speedup-01 branch from e6dce02 to 833cd6a Compare July 28, 2022 04:28

IchiruTake requested a review from pstjohn July 28, 2022 04:29

Update fragment.py

eb4c3a5

IchiruTake changed the title ~~Optimize the return of data~~ Refactor the library Jul 28, 2022

Remove unnecessary checking

040a8bf

IchiruTake changed the title ~~Refactor the library~~ [Draft] Refactor the library - Merge same as PR #10 -> Extensibility + Customability Aug 30, 2022

IchiruTake closed this Aug 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Draft] Refactor the library - Merge same as PR #10 -> Extensibility + Customability #9

[Draft] Refactor the library - Merge same as PR #10 -> Extensibility + Customability #9

IchiruTake commented Jul 26, 2022 •

edited

Loading

IchiruTake commented Jul 26, 2022

IchiruTake commented Jul 27, 2022

IchiruTake commented Jul 27, 2022

pstjohn left a comment

IchiruTake commented Jul 27, 2022 •

edited

Loading

pstjohn commented Jul 27, 2022

IchiruTake commented Jul 28, 2022 •

edited

Loading

IchiruTake commented Jul 28, 2022 •

edited

Loading

IchiruTake commented Jul 28, 2022 •

edited

Loading

IchiruTake commented Jul 29, 2022 •

edited

Loading

[Draft] Refactor the library - Merge same as PR #10 -> Extensibility + Customability #9

[Draft] Refactor the library - Merge same as PR #10 -> Extensibility + Customability #9

Conversation

IchiruTake commented Jul 26, 2022 • edited Loading

IchiruTake commented Jul 26, 2022

IchiruTake commented Jul 27, 2022

IchiruTake commented Jul 27, 2022

pstjohn left a comment

Choose a reason for hiding this comment

IchiruTake commented Jul 27, 2022 • edited Loading

pstjohn commented Jul 27, 2022

IchiruTake commented Jul 28, 2022 • edited Loading

IchiruTake commented Jul 28, 2022 • edited Loading

IchiruTake commented Jul 28, 2022 • edited Loading

IchiruTake commented Jul 29, 2022 • edited Loading

IchiruTake commented Jul 26, 2022 •

edited

Loading

IchiruTake commented Jul 27, 2022 •

edited

Loading

IchiruTake commented Jul 28, 2022 •

edited

Loading

IchiruTake commented Jul 28, 2022 •

edited

Loading

IchiruTake commented Jul 28, 2022 •

edited

Loading

IchiruTake commented Jul 29, 2022 •

edited

Loading