Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added version control diff #1345

Merged
merged 24 commits into from
Nov 29, 2021
Merged

Added version control diff #1345

merged 24 commits into from
Nov 29, 2021

Conversation

AbhinavTuli
Copy link
Contributor

@AbhinavTuli AbhinavTuli commented Nov 17, 2021

🚀 🚀 Pull Request

Checklist:

  • My code follows the style guidelines of this project and the Contributing document
  • I have commented my code, particularly in hard-to-understand areas
  • I have kept the coverage-rate up
  • I have performed a self-review of my own code and resolved any problems
  • I have checked to ensure there aren't any other open Pull Requests for the same change
  • I have described and made corresponding changes to the relevant documentation
  • New and existing unit tests pass locally with my changes

Changes

Need to add tests.

@codecov
Copy link

codecov bot commented Nov 17, 2021

Codecov Report

Merging #1345 (81379a6) into main (a32bc32) will increase coverage by 0.40%.
The diff coverage is 99.31%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1345      +/-   ##
==========================================
+ Coverage   91.48%   91.89%   +0.40%     
==========================================
  Files         150      152       +2     
  Lines       10607    11032     +425     
==========================================
+ Hits         9704    10138     +434     
+ Misses        903      894       -9     
Flag Coverage Δ
unittests 91.89% <99.31%> (+0.40%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
hub/core/dataset/dataset.py 93.63% <96.66%> (+0.16%) ⬆️
hub/util/diff.py 98.42% <98.42%> (ø)
hub/constants.py 100.00% <100.00%> (ø)
hub/core/chunk_engine.py 95.93% <100.00%> (+0.30%) ⬆️
hub/core/tensor.py 79.14% <100.00%> (+0.45%) ⬆️
hub/core/version_control/commit_diff.py 100.00% <100.00%> (ø)
hub/core/version_control/test_version_control.py 99.79% <100.00%> (+0.14%) ⬆️
hub/util/keys.py 100.00% <100.00%> (ø)
hub/util/version_control.py 95.67% <100.00%> (-1.55%) ⬇️
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a32bc32...81379a6. Read the comment docs.

Copy link
Contributor

@farizrahman4u farizrahman4u left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix tests to use assertions, lgtm otherwise.

@@ -426,19 +433,63 @@ def checkout(self, address: str, create: bool = False) -> str:
def log(self):
"""Displays the details of all the past commits."""
commit_node = self.version_state["commit_node"]
logger.info("---------------\nHub Version Log\n---------------\n")
logger.info(f"Current Branch: {self.version_state['branch']}")
print("---------------\nHub Version Log\n---------------\n")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we comprehensively changing these emits or cherry picking as we go along?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only have 2 functions that deliberately print out something, diff and log.
Realized that print was a better choice while implementing diff, changed log too for consistency.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cluster setup (atm) will capture stdout emits as log. Strongly suggest removing any free form debugging logs, and strictly stick to f"<log-level> <log-msg> - <key-0>: <val-0> - [...] <key-n>: <val-n>" formatting of logs. For debug (check with @absinha18) it is possible that stderr is ignored and can be used for free-form debugging emits. We wants logs that we can systemically grep.

assert local_ds.commit_id != third_commit_id


def test_diff_linear(local_ds, capsys):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going forward like to see a more robust approach to test parameters. We can discuss what package in python world exist that allow for parametric tests that obviate the need for hardcoded values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do have parametrization with different kinds of datasets with our fixtures. This test doesn't need to use s3/gcs datasets though, so just using "local_ds" fixture works

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was actually referring to lines 371->, so the test body itself and not the source of the data. Things like local_ds.xyz.extend([1, 2, 3]). How about something like:

def test_diff_linear(test_spec, local_ds, capsys) where test_spec is a dictionary of vars and funcs that generate values for the test body (which should be ideally entirely free of any hardcoded input). For now, you can init the test_spec with values but at a later date we can fuzz the code with randomly generated test_spec.

@AbhinavTuli AbhinavTuli merged commit 1fb3250 into main Nov 29, 2021
@AbhinavTuli AbhinavTuli deleted the vc/diff branch November 29, 2021 15:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants