Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: clean up GraphDataset and Trainer class #255

Merged
merged 36 commits into from Nov 28, 2022
Merged
Show file tree
Hide file tree
Changes from 31 commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
ee6ce9a
no caps in dataset and trainer modules
DaniBodor Nov 25, 2022
b6c06f0
make features parameters plural
DaniBodor Nov 25, 2022
c78e2c0
rename filter parameter
DaniBodor Nov 25, 2022
d7e0098
reorder dataset init
DaniBodor Nov 25, 2022
cafe5c6
move load_one_graph method
DaniBodor Nov 25, 2022
7e24284
create _check_task method
DaniBodor Nov 25, 2022
be83602
cleaned _check_features method
DaniBodor Nov 25, 2022
d6a98d6
rename GraphDataset class
DaniBodor Nov 25, 2022
fc9ce6d
plural node- and edge_feature throughout
DaniBodor Nov 25, 2022
f34adeb
fix _check_features callout
DaniBodor Nov 25, 2022
917c12f
clean up load_one_graph
DaniBodor Nov 25, 2022
e0915d6
rename _divide_dataset
DaniBodor Nov 25, 2022
03459a3
rename _precluster
DaniBodor Nov 25, 2022
33ea92d
Trainer type hinting
DaniBodor Nov 25, 2022
b3229b3
rename neuralnet parameter
DaniBodor Nov 25, 2022
c88457e
fix super() in custom network creation in readme
DaniBodor Nov 25, 2022
4ef59a3
rename neuralnet parameter
DaniBodor Nov 25, 2022
02078cf
reorganize Trainer methods in temporal order
DaniBodor Nov 25, 2022
54c4602
move _load_pretrained_model method
DaniBodor Nov 25, 2022
66f639e
metrics_output_dir
DaniBodor Nov 25, 2022
cad9b3b
make datasets and neuralnet attributes of Trainer
DaniBodor Nov 25, 2022
70f9c8f
Trainer method to check dataset equivalence
DaniBodor Nov 25, 2022
3ce7756
fix minor bugs
DaniBodor Nov 25, 2022
2aef7fd
add unit tests for catching non-equivalent dataset
DaniBodor Nov 25, 2022
e2e3f6a
removed default root assignment in test_trainer
DaniBodor Nov 25, 2022
9669e2b
fix linting and update error message
DaniBodor Nov 25, 2022
dfed479
check that a target exists
DaniBodor Nov 25, 2022
0fbd9e4
clarified usage of custom target in docstring
DaniBodor Nov 25, 2022
deece17
fix linting
DaniBodor Nov 25, 2022
ef5937e
clarified class_weights docstring
DaniBodor Nov 25, 2022
5af95f5
make state keys reflect attribute names
DaniBodor Nov 25, 2022
fa73398
improve docstrings, type hinting, empty lines
DaniBodor Nov 25, 2022
9b6e817
revert changelog
DaniBodor Nov 25, 2022
37c10c6
minor updates
DaniBodor Nov 25, 2022
751b457
Merge branch 'main' into 236_clean_dataset_dbodor
DaniBodor Nov 25, 2022
29bf3ac
remove obsolete warning, fix a typo
DaniBodor Nov 25, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
4 changes: 2 additions & 2 deletions CHANGELOG.md
Expand Up @@ -66,7 +66,7 @@ Released on June 28, 2022
* QueryDataset class #53
* Unit tests for NeuralNet class #86
* Error message if you pick the wrong metrics #110
* Unit tests for HDF5DataSet class parameters #82
* Unit tests for GraphDataset class parameters #82
* Installation from PyPI in the readme #122

DaniBodor marked this conversation as resolved.
Show resolved Hide resolved
### Changed
Expand All @@ -76,7 +76,7 @@ Released on June 28, 2022
* `multiprocessing.Queue` has been replaced with `multiprocessing.pool.map` in PreProcessor #56
* `test_preprocess.py` does not fail anymore on Mac M1 #74
* It's now possible to pass your own train/test split to NeuralNet class #81
* HDF5DataSet class now is used in the UX #83
* GraphDataset class now is used in the UX #83
* IndexError running `NeuralNet.train()` has been fixed #89
* pip installation has been fixed
* Repository has been renamed deeprank-core, and the package deeprankcore #101
Expand Down
34 changes: 17 additions & 17 deletions README.md
Expand Up @@ -204,39 +204,39 @@ Data can be split in sets implementing custom splits according to the specific a
Assuming that the training, validation and testing ids have been chosen (keys of the hdf5 file), then the corresponding graphs can be saved in hdf5 files containing only references (external links) to the original one. For example:

```python
from deeprankcore.DataSet import save_hdf5_keys
from deeprankcore.dataset import save_hdf5_keys

save_hdf5_keys("<original_hdf5_path.hdf5>", train_ids, "<train_hdf5_path.hdf5>")
save_hdf5_keys("<original_hdf5_path.hdf5>", valid_ids, "<val_hdf5_path.hdf5>")
save_hdf5_keys("<original_hdf5_path.hdf5>", test_ids, "<test_hdf5_path.hdf5>")
```

Now the HDF5DataSet objects can be defined:
Now the GraphDataset objects can be defined:

```python
from deeprankcore.DataSet import HDF5DataSet
from deeprankcore.dataset import GraphDataset

node_features = ["bsa", "res_depth", "hse", "info_content", "pssm"]
edge_features = ["distance"]

# Creating HDF5DataSet objects
dataset_train = HDF5DataSet(
# Creating GraphDataset objects
dataset_train = GraphDataset(
hdf5_path = "<train_hdf5_path.hdf5>",
node_feature = node_features,
edge_feature = edge_features,
node_features = node_features,
edge_features = edge_features,
target = "binary"
)
dataset_val = HDF5DataSet(
dataset_val = GraphDataset(
hdf5_path = "<val_hdf5_path.hdf5>",
node_feature = node_features,
edge_feature = edge_features,
node_features = node_features,
edge_features = edge_features,
target = "binary"

)
dataset_test = HDF5DataSet(
dataset_test = GraphDataset(
hdf5_path = "<test_hdf5_path.hdf5>",
node_feature = node_features,
edge_feature = edge_features,
node_features = node_features,
edge_features = edge_features,
target = "binary"
)
```
Expand All @@ -246,18 +246,18 @@ dataset_test = HDF5DataSet(
Let's define a Trainer instance, using for example of the already existing GNNs, GINet:

```python
from deeprankcore.Trainer import Trainer
from deeprankcore.trainer import Trainer
from deeprankcore.ginet import GINet
from deeprankcore.utils.metrics import OutputExporter, ScatterPlotExporter

metrics_output_directory = "./metrics"
metrics_exporters = [OutputExporter(metrics_output_directory)]

trainer = Trainer(
GINet,
dataset_train,
dataset_val,
dataset_test,
GINet,
batch_size = 64,
metrics_exporters = metrics_exporters
)
Expand Down Expand Up @@ -297,7 +297,7 @@ def normalized_cut_2d(edge_index, pos):

class CustomNet(torch.nn.Module):
def __init__(self):
super(Net, self).__init__()
super().__init__()
self.conv1 = SplineConv(d.num_features, 32, dim=2, kernel_size=5)
self.conv2 = SplineConv(32, 64, dim=2, kernel_size=5)
self.fc1 = torch.nn.Linear(64, 128)
Expand All @@ -320,10 +320,10 @@ class CustomNet(torch.nn.Module):
return F.log_softmax(self.fc2(x), dim=1)

trainer = Trainer(
CustomNet,
dataset_train,
dataset_val,
dataset_test,
CustomNet,
batch_size = 64,
metrics_exporters = metrics_exporters
)
Expand Down