Added version control diff #1345

AbhinavTuli · 2021-11-17T05:58:42Z

🚀 🚀 Pull Request

Checklist:

My code follows the style guidelines of this project and the Contributing document
I have commented my code, particularly in hard-to-understand areas
I have kept the coverage-rate up
I have performed a self-review of my own code and resolved any problems
I have checked to ensure there aren't any other open Pull Requests for the same change
I have described and made corresponding changes to the relevant documentation
New and existing unit tests pass locally with my changes

Changes

Need to add tests.

codecov · 2021-11-17T05:58:54Z

Codecov Report

Merging #1345 (81379a6) into main (a32bc32) will increase coverage by 0.40%.
The diff coverage is 99.31%.

@@            Coverage Diff             @@
##             main    #1345      +/-   ##
==========================================
+ Coverage   91.48%   91.89%   +0.40%     
==========================================
  Files         150      152       +2     
  Lines       10607    11032     +425     
==========================================
+ Hits         9704    10138     +434     
+ Misses        903      894       -9

Flag	Coverage Δ
unittests	`91.89% <99.31%> (+0.40%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
hub/core/dataset/dataset.py	`93.63% <96.66%> (+0.16%)`	⬆️
hub/util/diff.py	`98.42% <98.42%> (ø)`
hub/constants.py	`100.00% <100.00%> (ø)`
hub/core/chunk_engine.py	`95.93% <100.00%> (+0.30%)`	⬆️
hub/core/tensor.py	`79.14% <100.00%> (+0.45%)`	⬆️
hub/core/version_control/commit_diff.py	`100.00% <100.00%> (ø)`
hub/core/version_control/test_version_control.py	`99.79% <100.00%> (+0.14%)`	⬆️
hub/util/keys.py	`100.00% <100.00%> (ø)`
hub/util/version_control.py	`95.67% <100.00%> (-1.55%)`	⬇️
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a32bc32...81379a6. Read the comment docs.

hub/core/version_control/commit_diff.py

hub/core/version_control/test_version_control.py

farizrahman4u

Fix tests to use assertions, lgtm otherwise.

alphazero · 2021-11-29T01:14:38Z

hub/core/dataset/dataset.py

@@ -426,19 +433,63 @@ def checkout(self, address: str, create: bool = False) -> str:
    def log(self):
        """Displays the details of all the past commits."""
        commit_node = self.version_state["commit_node"]
-        logger.info("---------------\nHub Version Log\n---------------\n")
-        logger.info(f"Current Branch: {self.version_state['branch']}")
+        print("---------------\nHub Version Log\n---------------\n")


Are we comprehensively changing these emits or cherry picking as we go along?

We only have 2 functions that deliberately print out something, diff and log.
Realized that print was a better choice while implementing diff, changed log too for consistency.

The cluster setup (atm) will capture stdout emits as log. Strongly suggest removing any free form debugging logs, and strictly stick to f"<log-level> <log-msg> - <key-0>: <val-0> - [...] <key-n>: <val-n>" formatting of logs. For debug (check with @absinha18) it is possible that stderr is ignored and can be used for free-form debugging emits. We wants logs that we can systemically grep.

alphazero · 2021-11-29T01:17:31Z

hub/core/version_control/test_version_control.py

+    assert local_ds.commit_id != third_commit_id
+
+
+def test_diff_linear(local_ds, capsys):


Going forward like to see a more robust approach to test parameters. We can discuss what package in python world exist that allow for parametric tests that obviate the need for hardcoded values.

we do have parametrization with different kinds of datasets with our fixtures. This test doesn't need to use s3/gcs datasets though, so just using "local_ds" fixture works

I was actually referring to lines 371->, so the test body itself and not the source of the data. Things like local_ds.xyz.extend([1, 2, 3]). How about something like:

def test_diff_linear(test_spec, local_ds, capsys) where test_spec is a dictionary of vars and funcs that generate values for the test body (which should be ideally entirely free of any hardcoded input). For now, you can init the test_spec with values but at a later date we can fuzz the code with randomly generated test_spec.

AbhinavTuli added 2 commits November 15, 2021 00:56

added commit diff

992abd1

diff works

7987564

AbhinavTuli added 16 commits November 17, 2021 16:39

bug fixes

1d9b205

bug fixes and formatting

0cd38c1

better formatting

fe99b77

simplify code

6d229cb

updated docstrings

85129ed

move diff utils

9b969fb

add checks for commits

7316d2b

fix auto commit

e632c9c

more vc fixes

8f1eb3d

add diff tests

1261ba6

Merge remote-tracking branch 'origin' into vc/diff

9a69eee

fix readonly

35d78ab

lint fixes

c337b91

Merge remote-tracking branch 'origin' into vc/diff

23f3a3f

compressed the changes displayed

047b7a1

cleaned up diff output

1f55080

AbhinavTuli requested a review from farizrahman4u November 25, 2021 15:29

fix linting

1daf235

aliubimov reviewed Nov 25, 2021

View reviewed changes

hub/core/version_control/commit_diff.py Show resolved Hide resolved

hub/core/version_control/commit_diff.py Show resolved Hide resolved

hub/core/version_control/test_version_control.py Outdated Show resolved Hide resolved

farizrahman4u approved these changes Nov 28, 2021

View reviewed changes

AbhinavTuli added 3 commits November 28, 2021 13:38

improved tests

40e3797

mypy fix

255adf8

clean up CommitDiff class

81c4727

AbhinavTuli requested a review from aliubimov November 28, 2021 08:46

AbhinavTuli added 2 commits November 28, 2021 14:40

mypy fix

f86d832

cleanup

81379a6

alphazero reviewed Nov 29, 2021

View reviewed changes

alphazero approved these changes Nov 29, 2021

View reviewed changes

AbhinavTuli merged commit 1fb3250 into main Nov 29, 2021

AbhinavTuli deleted the vc/diff branch November 29, 2021 15:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added version control diff #1345

Added version control diff #1345

AbhinavTuli commented Nov 17, 2021 •

edited

Loading

codecov bot commented Nov 17, 2021 •

edited

Loading

farizrahman4u left a comment

alphazero Nov 29, 2021

AbhinavTuli Nov 29, 2021

alphazero Nov 30, 2021

alphazero Nov 29, 2021

AbhinavTuli Nov 29, 2021

alphazero Nov 30, 2021

		assert local_ds.commit_id != third_commit_id


		def test_diff_linear(local_ds, capsys):

Added version control diff #1345

Added version control diff #1345

Conversation

AbhinavTuli commented Nov 17, 2021 • edited Loading

🚀 🚀 Pull Request

Checklist:

Changes

codecov bot commented Nov 17, 2021 • edited Loading

Codecov Report

farizrahman4u left a comment

Choose a reason for hiding this comment

alphazero Nov 29, 2021

Choose a reason for hiding this comment

AbhinavTuli Nov 29, 2021

Choose a reason for hiding this comment

alphazero Nov 30, 2021

Choose a reason for hiding this comment

alphazero Nov 29, 2021

Choose a reason for hiding this comment

AbhinavTuli Nov 29, 2021

Choose a reason for hiding this comment

alphazero Nov 30, 2021

Choose a reason for hiding this comment

AbhinavTuli commented Nov 17, 2021 •

edited

Loading

codecov bot commented Nov 17, 2021 •

edited

Loading