CASTER layer implementation #73

andriy-nikolov · 2022-02-02T15:56:07Z

Summary

An implementation of the CASTER model layers based on https://github.com/kexinhuang12345/CASTER

Note:

covers only supervised training stage
input dimensionality assumed to be correct and meaningful according to the assumptions of the algorithm
TODO: inclusion of the unsupervised stage (would require a pipeline hook)
TODO: inclusion of the BIOSNAP dataset from the paper
TODO: data loading using the input processing from the paper
Unit tests provided for these changes
Documentation and docstrings added for these changes using the sphinx style

Changes

Implementation of the CASTER model with the custom loss function
An example script for invoking the model (performance not meaningful because of the incomplete implementation of the model)
A unit test method to check the output dimensionality

- only supervised training stage - input dimensionality assumed to be correct

chemicalx/models/caster.py

cthoyt · 2022-02-02T15:57:23Z

chemicalx/models/caster.py

+    def __init__(
+        self,
+        *,
+        drug_channels: int = 1722,


this should be set based on dataset and not have a default

cthoyt · 2022-02-02T15:58:03Z

chemicalx/models/caster.py

+        )
+
+        # predictor: eight layer NN
+        predictor_layers_dict = OrderedDict()


this is very hard to think about. can you just use a list?

cthoyt · 2022-02-02T15:59:07Z

chemicalx/models/caster.py

+        """
+        Run a forward pass of the CASTER model
+
+        :param drug_pair_features (torch.FloatTensor): functional representation of each drug pair (see unpack method)


better to make a named tuple class to document this kind of thing

cthoyt · 2022-02-02T18:24:51Z

@andriy-nikolov note I did some reorganization of the code in this PR. Make sure you pull before you begin to work on it again

codecov-commenter · 2022-02-02T18:31:02Z

Codecov Report

Merging #73 (a9aaae8) into main (5449f96) will increase coverage by 0.83%.
The diff coverage is 97.93%.

@@            Coverage Diff             @@
##             main      #73      +/-   ##
==========================================
+ Coverage   93.87%   94.70%   +0.83%     
==========================================
  Files          29       30       +1     
  Lines         832     1058     +226     
==========================================
+ Hits          781     1002     +221     
- Misses         51       56       +5

Impacted Files	Coverage Δ
chemicalx/pipeline.py	`87.67% <66.66%> (-0.91%)`	⬇️
chemicalx/models/deepddi.py	`95.00% <94.73%> (-5.00%)`	⬇️
chemicalx/models/deepdrug.py	`96.77% <96.66%> (-3.23%)`	⬇️
chemicalx/models/gcnbmp.py	`97.61% <97.59%> (-2.39%)`	⬇️
chemicalx/loss.py	`100.00% <100.00%> (ø)`
chemicalx/models/caster.py	`100.00% <100.00%> (ø)`
tests/unit/test_models.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5449f96...a9aaae8. Read the comment docs.

* WIP: model forward pass works, not tested * added dropout and batch norm * WIP: made DeepDrug example, not tested * moved to using layers only, not GCN torchdrug model * docstring * added dropout and made context feats optional * added DeepDrug unit test * deepdrug self attribute fix * docstring update * unpack method update (when no context feats used) * isort * fixed test setting (context_channels) * fixed testing without context * black * RST fix * RST fix * more pythonic loop + swap i to _ * removed context feat support in DeepDrug * removed context handling from testing DeepDrug * fixed examples DeepDrug, no context handling, decreased epochs 100->20 * removed unused import * used a wrapper for calling the same layers on pairs of batches * used a wrapper for calling the same layers on pairs of batches * docstring fix * Abstract process applied to left and right sides * Apply black * Cleanup Co-authored-by: Charles Tapley Hoyt <cthoyt@gmail.com>

* linting * GCNBMP Scatter Reduction fix * Using Rel Conv Layers instead of RGCN Model (avoid unecessary sum readouts) * Added docstrings and fixed highway update implementation * Make number of relationship configurable * little help of black for linting * Cleaning upuseless imports * Sharing attention between right and left side * Adding reference to GCNBMP docstring * Type hinting everything * Fixing docstring in example * - Removing type hints in docstrings as they were added to signatures - Chunked iteration of the BMP backbone for better readability * Ading more-itertools as a dependecy * Using pairwise for encoder construction * Adding missing docstrings * Fixing linting and precommit hook * Fixing the citation back to what is in main * Tests,formatting,example * Tests,formatting,example * GCNBMP * Cleanup Co-authored-by: kcvc236 <kcvc236@seskscpg057.prim.scp> Co-authored-by: Rozemberczki <kmdb028@astrazeneca.net> Co-authored-by: kcvc236 <kcvc236@seskscpg059.prim.scp> Co-authored-by: Charles Tapley Hoyt <cthoyt@gmail.com>

* update: Add deepddi model * update: Add deepddi examples * update: Add deepddi test case * Style: deepddi model * Style: deepddi model * Style: deepddi_examples.py * Update deepddi.py * Update deepddi.py Co-authored-by: Charles Tapley Hoyt <cthoyt@gmail.com>

cthoyt · 2022-02-03T16:46:33Z

I have several concerns with this PR, would have been nice to do a review first. Most importantly: why does it change the standard interface of the forward() function? I don't see where any of the other things it returns are used

benedekrozemberczki · 2022-02-03T16:59:13Z

The paper discusses two types of training techniques - supervised and unsupervised. In the unsupervised setting you could use any type of drug pair dataset. This forward pass allows for both setups, in our experiments we only consider supervised ones.

CASTER layer implementation

f8031a9

- only supervised training stage - input dimensionality assumed to be correct

andriy-nikolov requested a review from benedekrozemberczki February 2, 2022 15:56

cthoyt reviewed Feb 2, 2022

View reviewed changes

chemicalx/models/caster.py Outdated Show resolved Hide resolved

cthoyt reviewed Feb 2, 2022

View reviewed changes

cthoyt added 2 commits February 2, 2022 19:22

Apply black and reorganize

94cb4b4

Move loss into its own module

4ae0c7b

cthoyt added 2 commits February 2, 2022 19:26

Update caster.py

53a1952

Reduce diff on citation

0e46f9f

kajocina and others added 6 commits February 3, 2022 15:54

CASTER review fixes

b9ee29f

flake8 fixes

8205759

CASTER: typing fix

a9aaae8

benedekrozemberczki approved these changes Feb 3, 2022

View reviewed changes

benedekrozemberczki merged commit 6463147 into AstraZeneca:main Feb 3, 2022

benedekrozemberczki linked an issue Feb 3, 2022 that may be closed by this pull request

Add the CASTER model #17

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CASTER layer implementation #73

CASTER layer implementation #73

andriy-nikolov commented Feb 2, 2022 •

edited

cthoyt Feb 2, 2022

cthoyt Feb 2, 2022

cthoyt Feb 2, 2022

cthoyt commented Feb 2, 2022

codecov-commenter commented Feb 2, 2022 •

edited

cthoyt commented Feb 3, 2022

benedekrozemberczki commented Feb 3, 2022 •

edited

CASTER layer implementation #73

CASTER layer implementation #73

Conversation

andriy-nikolov commented Feb 2, 2022 • edited

Summary

Changes

cthoyt Feb 2, 2022

Choose a reason for hiding this comment

cthoyt Feb 2, 2022

Choose a reason for hiding this comment

cthoyt Feb 2, 2022

Choose a reason for hiding this comment

cthoyt commented Feb 2, 2022

codecov-commenter commented Feb 2, 2022 • edited

Codecov Report

cthoyt commented Feb 3, 2022

benedekrozemberczki commented Feb 3, 2022 • edited

andriy-nikolov commented Feb 2, 2022 •

edited

codecov-commenter commented Feb 2, 2022 •

edited

benedekrozemberczki commented Feb 3, 2022 •

edited