Skip to content

Improve compile_values and related functions#765

Merged
bennybp merged 6 commits into
MolSSI:mainfrom
janash:dataset_ui
Oct 18, 2023
Merged

Improve compile_values and related functions#765
bennybp merged 6 commits into
MolSSI:mainfrom
janash:dataset_ui

Conversation

@janash

@janash janash commented Oct 4, 2023

Copy link
Copy Markdown
Member

This PR makes some edits to the compile_values method and adds a new dataset method called get_properties_df.

The following changes are made to compile_values:

  • compile_values now always returns a multi-level index where the specification is the top leve index.
  • compile_values now has an option to unpack results from the callable, allowing more than one dataframe column to be built at a time.
  • entry_name may be a string or list of strings, but now also has a default value in case the user doesn't provide it.

Example use:

import qcportal as ptl

client = ptl.PortalClient("https://qcademo.molssi.org")
dataset = client.get_dataset_by_id(1)

# Case 1 - Single value returned from callable.
df1 = dataset.compile_values(lambda x: x.properties["scf_total_energy"],"scf_total_energy" )

# Case 2 - Tuple of two values returned from callable with unpack True
df2 = dataset.compile_values(lambda x: (x.properties["scf_total_energy"], x.properties["scf_iterations"]), unpack=True)

The function get_properties_df is added to allow easier compilation of record properties into a dataframe. The function takes a list of properties and returns a multi-level index dataframe similar to the type returned from compile_values. Under the hood, the function uses compile_values accessing the properties using .get in case property doesn't exist for a particular specification. Additionally, before the df is returned to the user any columns containing all nan are dropped.

Example use (continuing from example above)

properties = dataset.get_properties_df(["scf_total_energy", "scf dipole", "mp2 dipole"])

@codecov

codecov Bot commented Oct 4, 2023

Copy link
Copy Markdown

Codecov Report

Merging #765 (0325110) into main (23c4103) will decrease coverage by 0.08%.
Report is 22 commits behind head on main.
The diff coverage is 5.00%.

❗ Current head 0325110 differs from pull request most recent head fee87f3. Consider uploading reports for the commit fee87f3 to get more accurate results

Additional details and impacted files

@bennybp bennybp changed the title edits to dataset interface Improve compile_values and related functions Oct 5, 2023
@janash janash requested a review from bennybp October 16, 2023 15:35

@bennybp bennybp left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some cosmetic changes that make it consistent with everything else. Otherwise good job! :)

Comment thread qcportal/qcportal/dataset_models.py Outdated
self,
value_call: Callable,
value_name: str,
value_names: Union[List[str], str] = "value",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having the type be Union[Iterable[str], str] would be more generic :)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, Sequence, not Iterable.

Comment thread qcportal/qcportal/dataset_models.py Outdated
# Make specification top level index.
return return_val.swaplevel(axis=1)

def get_properties_df(self, properties_list: List[str]) -> pd.DataFrame:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also should be Iterable[str]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sequence, not Iterable, since order matters

Comment thread qcportal/qcportal/dataset_models.py Outdated

Parameters
-----------
value_call : Callable

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't use the types in the docstrings, since sphinx can now get the types from the function arguments itself.

@bennybp

bennybp commented Oct 18, 2023

Copy link
Copy Markdown
Contributor

Looks good! Thanks!

@bennybp bennybp merged commit fc3cd30 into MolSSI:main Oct 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants