Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output results as a dataframe + return short names, hctsa names and values as standard. #31

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

jmoo2880
Copy link
Collaborator

@jmoo2880 jmoo2880 commented Jun 4, 2024

Modifying the original changes proposed by @anniegbryant in PR #21, this PR updates the catch22 output to a DataFrame and marks a transition away from long (HCTSA) feature names toward what has previously been called "short" feature names, as default, i.e., mode_5 instead of DN_HistogramMode_5, etc.

Breaking Changes:

Since these modifications will introduce breaking changes for the existing user base, this PR will constitute a new major version release (catch22 v1.0.0), with docs + README updated to reflect the new output format. Users will need to be made aware of the new output via clear documentation and a migration guide in the changelogs to avoid confusion.

Major changes

  • Removal of short_names as an optional parameter in the catch22_all() function. Three columns will now be returned as standard: feature, hctsa_name and value. That is, catch22_all() now accepts only two arguments:
catch22_all(data, catch24=False)
  • < = v0.4.5 catch22 features_short are now called features (or feature in the output DataFrame).
  • < = v0.4.5 catch22 featureis now features_hctsa (or hctsa_name in the output DataFrame).
  • catch22 results are now returned as a pandas DataFrame instead of a dict for improved readability:
df = catch22_all(data, catch24=False)

# print the first feature name
print(df.feature[0])

# print the first feature value
print(df.value[0])

# print the first feature HCTSA (long) name
print(df.hctsa_name[0])
  • Added pandas and numpy dependencies.

Minor changes

  • Added a security policy, SECURITY.md.
  • Added a code of conduct, CODE_OF_CONDUCT.md
  • Added a darkmode logo to the README.
  • Added python unit testing + python version support badges to README.
  • Updated usage guide in README to notify users of DataFrame output.
  • Included support for python 3.12 unit test runners.
  • Updated unit tests to support new DataFrame output.

@jmoo2880 jmoo2880 requested a review from benfulcher June 4, 2024 09:48
@jmoo2880
Copy link
Collaborator Author

jmoo2880 commented Jun 4, 2024

Also, the changelogs will be more extensive and clearer about the breaking changes for users + new naming conventions with old short_names and names essentially swapping places.

@benfulcher
Copy link
Contributor

@anniegbryant can you do a quick test?

@KieranOwens
Copy link

I tried the new catch22_all function with my workflow. The change in the dictionary/dataframe key from 'values' (old version) to 'value' (new version) breaks my code. Otherwise, it works fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants