Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Atomic/bond targets prediction (#280)
* Multitask constraint for atomic/bond properties prediction * Uncertainty functions for atomic/bond properties prediction * Use `--constraints_path` to do constraints * Bugfix arguments in get_data * Bugfix applying constraints in different loss_function * Update README.md * Delete the comments * Bugfix repairing MPNN model * Modify get_header function * Bugfix adding is_atom_bond_targets in UncertaintyEvaluator * Bugfix repairing original_scaling referenced before assignment * Remove a redundant function in test_integration.py * Remove unnecessary import and raise * Fixtypo seperate * Fix UserWarning in torch.nn.Softmax UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument. * Fix FutureWarning in `_check_reg_targets` in `sklearn.metrics._regression.py` FutureWarning: Arrays of bytes/strings is being converted to decimal numbers if dtype='numeric'. This behavior is deprecated in 0.24 and will be removed in 1.1 (renaming of 0.26). Please convert your data to numeric values explicitly instead. This is coming from function calls to the loss function in chemprop's metrics.py, or more specifically, in `_check_reg_targets` in `sklearn.metrics._regression.py`. The targets need to be sanitized to floats as they're read in before they get to this step in chemprop. * Update README.md * Check whether `--adding_h` is used * Increase csv field limit For atomic/bond properties dataset, if .csv file is used, it might happen _csv.Error: field larger than field limit (131072). In order to solve this, the field limit is set to sys.maxsize. * Add atomic and bond targets data for testing * Bugfix constraints() in MoleculeDataset * Allow atom-level targets with scaffold_balanced * Add testing for atomic and bond targets * Update README.md Co-authored-by: Charles McGill <44245643+cjmcgill@users.noreply.github.com> * Let `--adding_h` be an optional choice * Minor changes in args.py * Avoid multiple copies of a task * Update README.md Co-authored-by: Charles McGill <44245643+cjmcgill@users.noreply.github.com> * Add get_mixed_task_names() in data/utils.py * Update test file * Add `is_atom_bond_targets` as a dataset property * Test get_header function with PICKLE file * Change the protocol of test .pkl file from 5 to 4 `ValueError: unsupported pickle protocol: 5` would happen for panadas users with lower python version. * FIxtypo atoml Co-authored-by: Charles McGill <44245643+cjmcgill@users.noreply.github.com> * Upload a new test .pkl file generated by older pandas version `AttributeError: Can't get attribute '_unpickle_block' on <module 'pandas._libs.internals'` might appear when using older pandas version to open .pkl file saved by newer pandas version. * Fix number_of_atoms() and number_of_bonds() functions * Raise NotImplementedError for unsupported extensions * Add get_constraints() in data/utils.py * Replace some statements with `is_atom_bond_targets` * Remove separate device setting in MoleculeModel * Add a relative output size multiplier * Remove unnecessary attributes in MultiReadout * Remove the behavior of using pickle file as input or output * Remove the test of get_header() with .pkl file * Add AtomBondScaler in data/scaler.py * Use json.loads() to load data * Let the bond hidden be order invariant * Remove an unnecessary attribute in UncertaintyCalibrator * Bugfix read is_atom_bond_targets from args * Bigfix * Shared FFN Weights * Make ffn weight sharing optional * Fix consistency when num_layers=1 * Update chemprop/models/ffn.py Co-authored-by: Shih-Cheng Li <scli@mit.edu> * Update chemprop/models/ffn.py Co-authored-by: Shih-Cheng Li <scli@mit.edu> * Add description to readme for the atom bond shared ffn argument * Let the number of layers in weight FFN be controllable * Add option of adding bond types to output of bond targets * Bugfix set bond_types_batch as None * Fixtypo and remove eval() * Use bond descriptors as descriptor * Bugfix missing brackets * Bugfix for multitarget mve_weighting calibration (#291) * Bugfix for multitarget mve_weighting calibration * Remove troubleshooting prints * Bond oder invariant * Update the test score for mve_weighting * Support mve_weighting calibration in atomic/bond level * Support transfer learning for atomic/bond targets * Backward compatibility for parameter names * Bond order invariant in bond constraints model * Fixing typo in load_frzn_model (#294) * Fixbug for mve_weighting in atom level * Bugfix mve_weighting with ence * transfer missing variables to new shapes * define num_tasks in every evaluator * Update test results * transpose masks * transfer None to np.nan * replace 1 with True in mask * Replace missing values of None with null * Update README.md * Bugfix wrong break statement in get_mixed_task_names * Support atom-mapped SMILES for atomic property modeling * Refactor building FFN * Replace AttrProxy with ModuleList * Apply suggestions from code review Co-authored-by: david graff <60193893+davidegraff@users.noreply.github.com> * Remove import of AttrProxy * Save results as list instead of np.ndarray * Apply suggestions from code review Co-authored-by: david graff <60193893+davidegraff@users.noreply.github.com> * Import torch.nn.functional module in ffn.py * raise error when using unrecognized ffn_type * Add return type of build_ffn() * Fix typo and remove old legacy * Replace narrow with tensor indexing * Make notation in the code closer to the paper * Fixbug wrong constraints_batch in prediction * Use camel case Co-authored-by: david graff <60193893+davidegraff@users.noreply.github.com> * Replace ValueError with RuntimeError Co-authored-by: david graff <60193893+davidegraff@users.noreply.github.com> * Change the notation of formula Co-authored-by: david graff <60193893+davidegraff@users.noreply.github.com> * use broadcasting instead of indexing and flattening * Add `FFN` for unconstrained predictions * Use dropout probability as input instead of nn.Dropout * Apply suggestions from code review Co-authored-by: david graff <60193893+davidegraff@users.noreply.github.com> * Make `ffn_base` be optional argument * Load activation function by `get_activation_function()` * Passing dict to load ffn params * Apply suggestions from code review Co-authored-by: david graff <60193893+davidegraff@users.noreply.github.com> * Define `FFNAtten` as a subclass of `FFN` * Avoid defining functions in `forward()` * Rename the `dropout_layer` to `dropout` * Check smiles_columns by str * Use torch.torch to define PyTorch tensors * Rename the function names * BigFix for smiles_columns as None * Replace torch.float32 with torch.float Co-authored-by: david graff <60193893+davidegraff@users.noreply.github.com> * Apply CodeQL suggestion * Delete comments * Replace `or` with explicit `if else` * Simplify the encodings concatenation * Black format moddel.py * Apply CodeQL suggestion * Support atom-mapped SMILES * Bugfix for considering atom-mapped during using get_mixed_task_names() * Bugfix sum of a list of lists `n_atoms` is a list of lists. Using `sum(n_atoms)` would raise `TypeError: unsupported operand type(s) for +: 'int' and 'list'` error. To avoid it, `np.array(n_atoms).sum()` is used instead. * Remove redundant code * Fix NumPy 1.24 compatibility issue * Update README.md * Fix typo and reorder two attributes * Revising the parameter description for `raw_constraints` and `constraints` * Refactor multitask_utils.py --------- Co-authored-by: Charles McGill <44245643+cjmcgill@users.noreply.github.com> Co-authored-by: Chas <charlesjmcgill@gmail.com> Co-authored-by: david graff <60193893+davidegraff@users.noreply.github.com>
- Loading branch information