Skip to content

Conversation

@wigging
Copy link
Member

@wigging wigging commented May 15, 2025

Allow the user to disable writing search results to a CSV file. Collect search results using a SearchHistory class. Resolves issue #309 .

@codecov
Copy link

codecov bot commented May 15, 2025

Codecov Report

Attention: Patch coverage is 85.87786% with 37 lines in your changes missing coverage. Please review.

Project coverage is 49.92%. Comparing base (c9c39f8) to head (ff2d967).
Report is 20 commits behind head on develop.

Files with missing lines Patch % Lines
src/deephyper/evaluator/callback.py 87.37% 4 Missing and 9 partials ⚠️
src/deephyper/hpo/_search.py 88.46% 6 Missing and 6 partials ⚠️
src/deephyper/hpo/_cbo.py 21.42% 10 Missing and 1 partial ⚠️
src/deephyper/skopt/optimizer/optimizer.py 83.33% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #325      +/-   ##
===========================================
+ Coverage    43.69%   49.92%   +6.23%     
===========================================
  Files          124      108      -16     
  Lines         8377     7609     -768     
  Branches      1375     1219     -156     
===========================================
+ Hits          3660     3799     +139     
+ Misses        4387     3468     -919     
- Partials       330      342      +12     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment on lines +151 to +174
if len(resultsList) > 0:
started_dumping = self._csv_cursor > 0
file_mode = "a" if started_dumping else "w"

if not (started_dumping):
for result in resultsList:
# Waiting to start receiving non-failed jobs before dumping results
is_single_obj_and_has_success = (
"objective" in result and type(result["objective"]) is not str
)
is_multi_obj_and_has_success = (
"objective_0" in result and type(result["objective_0"]) is not str
)
if is_single_obj_and_has_success or is_multi_obj_and_has_success or flush:
self._csv_columns = result.keys()
break

if self._csv_columns is not None:
with open(os.path.join(path), file_mode) as fp:
writer = csv.DictWriter(fp, self._csv_columns, extrasaction="ignore")
if not (started_dumping):
writer.writeheader()
writer.writerows(resultsList)
self._csv_cursor += len(resultsList)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be avoided by converting the results list to a dataframe then use pandas to write the dataframe to csv file? This would eliminate importing the csv module.

Copy link
Member

@Deathn0t Deathn0t May 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is to avoid redundant writing. It just appends new lines. The pandas logic would re-write everything again an again. Making the complexity squared instead of linear.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The self.jobs = [] is a list of dictionaries, is that right? Each dictionary represents the results of a completed job, is that correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The self.jobs is actually a list of HPOJob objects.

@wigging
Copy link
Member Author

wigging commented May 16, 2025

At the end of the Search.search method is this:

        self.dump_jobs_done_to_csv(flush=True)

        self.history.compute_pareto_efficiency()

        df_results = self.history.to_dataframe()

        return df_results

It looks like the job results are written to CSV then the pareto results are calculated. Does this update the CSV with the pareto results?

@Deathn0t
Copy link
Member

The PR is looking better but we still need to add a few unit tests.

@Deathn0t Deathn0t mentioned this pull request May 16, 2025
@Deathn0t
Copy link
Member

The Redis tests needs to be checked and updated if necessary.

@wigging
Copy link
Member Author

wigging commented May 19, 2025

I merged the latest develop branch into this branch and apparently it fixed the failing Redis tests.

@wigging
Copy link
Member Author

wigging commented May 19, 2025

For the Search class, is it necessary to name the input argument as checkpoint_history_to_csv? This seems very wordy, why not something shorter like csv_output?

@wigging
Copy link
Member Author

wigging commented May 20, 2025

The search method is currently defined as:

search(self, max_evals: int = -1, timeout: int = None, max_evals_strict: bool = False)

A better definition would be the following:

search(self, max_evals=-1, timeout=0, max_evals_strict=False) -> pd.DataFrame

When you have default values there is no need to explicitly give the type, it is inferred from the default value. I don't know why timeout would be an int or None; can it have a default value of 0? Providing a return type for this method helps type checkers like pyright otherwise it will think the return type is optional.

@Deathn0t
Copy link
Member

Hi @wigging ,

  • adding the test for Pareto efficient is great.
  • adding the returned type to search(…): seems like a good idea, I will do it.
  • changing the type def/default value of timeout: let’s do it in an other PR, because it may have an impact.
  • I am voting for keeping the verbose name of checkpointing_history_to_csv because it is sort of explicit and with auto-completion in idea easy.

@Deathn0t Deathn0t marked this pull request as ready for review May 26, 2025 07:50
@Deathn0t Deathn0t changed the title CSV output parameter and search history feat (search): adding option to disable dumping the results to civ May 26, 2025
@Deathn0t Deathn0t merged commit 0cc0256 into develop May 26, 2025
9 checks passed
@Deathn0t Deathn0t deleted the 309-results-output-v2 branch May 26, 2025 08:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants