New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unit tests #68
Unit tests #68
Conversation
5099d4b
to
7ce0786
Compare
So I see you added some images, directly to a repo... My approach was to put them to a public Google Drive (because it has faster download speed)(I hope), and download them on-the-go in GitHub Actions (since their vms have very high internet speeds) - so we don't force everyone who just wants to git clone this repo to download 50 images+videos Alternatively, we could put them in separate repo, if GDrive didn't suite us... That's why I put @Akianonymus script to download stuff from google drive in
Seeing as many edge cases as I seen, I would prefer to keep actual "test images" as close as what you get directly from Google Takeout...
Here they are:
|
Mmhh.. I see your point. I haven't thought much about videos yet 🤔 I directly added the photos to generate a strong link between test data and code. If a new, unhandled edge case case is found, the code that fixes it should be added together with the new test data in the same PR. We wouldn't want to add it to the test set before the fix is merged because that would break the CI for the current implementation. We also don't want to add it after the fix, because we want the CI to tell us that the PR indeed fixes the new test case. We could of course reference different files on Google Drive like TL;DR: Git > GDrive IMO I quickly looked into LFS, but that seems too costly. I like your suggestion of a separate git repo 👍 We could add that as Git submodule and have the benefit of tight coupling without the downside that every user needs to download large amounts of data. Contributing fellow developers probably know how to handle git submodules and we could also offer convenience scripts to avoid some of the common pitfalls. If you agree, do you mind creating a repo?
Can you give an example of where the result depended on the actual image data? |
cedba84
to
d09d424
Compare
d09d424
to
8d4fd9d
Compare
I went ahead with the described approach. If you agree we should transfer https://github.com/sebastianludwig/GooglePhotosTakeoutHelperTestData to you @TheLastGimbus or simply re-create that repo and update the reference. |
Whoa... that's some big changes...
I can't specifically pin point a one, but I'm too afraid to try to mess with them 😬 I can't promise any timeline to merge this - I will need to learn some of these myself |
@@ -0,0 +1,3 @@ | |||
[submodule "tests/integration/fixtures"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, you look like a guy who knows submodules (I dont 😆 ) - could you show me how to add GDrive scripts as a submodule? And how to use them after that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found this to be a decent introduction https://www.atlassian.com/git/tutorials/git-submodule
Git submodules sound awesome when you learn about them the first time, but be cautioned: they have a few rough edges and most of the time there's a better choice. For example having the GDrive scripts directly in this repo doesn't hurt, I think. Or a completely different repo which is only linked in the README
if you want to decouple them from this repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, last time I tried, they very very problematic (because of course I didn't read any tutorials 💯)
Huh - could we actually remove a test repo from submodules too, and just tell people to git clone
it? We could just add a tag there, so it would be nicely versioned when CI/tests change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we could. But I don't think we'd gain anything. The test code by itself is very small and wont' impact the repo size noticeably.
On the other hand having tests and implementation split over two repos and thus two PRs makes reviewing a lot harder. Often PRs change both (implementation changes, tests added) and not being able to discuss both in the same place is going to be cumbersome and annoying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No no, test repo meaning repo with test images
@@ -753,4 +751,4 @@ def _walk_with_tqdm(res, bar: _tqdm): | |||
|
|||
|
|||
if __name__ == '__main__': | |||
main() | |||
main(sys.argv[1:]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to be able to call main
from the tests and pass "command line args" as parameters.
['0th', piexif.ImageIFD.DateTime], | ||
['Exif', piexif.ExifIFD.DateTimeOriginal], | ||
['Exif', piexif.ExifIFD.DateTimeDigitized], | ||
['GPS', piexif.GPSIFD.GPSVersionID], | ||
['GPS', piexif.GPSIFD.GPSLatitude], | ||
['GPS', piexif.GPSIFD.GPSLatitudeRef], | ||
['GPS', piexif.GPSIFD.GPSLongitude], | ||
['GPS', piexif.GPSIFD.GPSLongitudeRef], | ||
['GPS', piexif.GPSIFD.GPSAltitude], | ||
['GPS', piexif.GPSIFD.GPSAltitudeRef] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see you added a lot of piexif
stuff - I'm worried about this, because I wanted to start transitioning to other library - because this one is pretty dead - and now I have more stuff to make me procrastinate this decision, and more stuff that can break :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you already have one in mind? If you haven't decided yet, I don't think that should block the merge of this PR. On the contrary, it makes sense to have a testing rig in place when swapping out integral parts to be sure everything keeps working. Also the unit tests aren't coupled to the implementation: we can replace the EXIF lib in the implementation and keep the tests on piexif
for a while and see how it goes. Or the other way round.
Also I contain all the piexif
stuff in this one method - I can also move the import down here if you'd like that to be more explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exif looks nicer and actually maintained
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about the following plan:
- Get the test setup in place
- Add a couple of more tests to increase the confidence
- Switch the tests over to
exif
- in a separate PR so the change is isolated and it can easily be judged if the switch broke anything - After gaining some experience with
exif
switch over theimplementation
or try another lib in the tests
?
I hear you. What would you need to be comfortable with the cleaning approach? More theoretical reasoning why I'm convinced this won't cause issues, more practical proofs like the steps described in contirbuting.md, anything else..? I'm happy to elaborate
No worries, feel free to ask about anything. |
I just don't think it's necessary. We just need few pictures to test if dates are set right, if TL;DR I just really don't think it's necessary |
Ah, yes, the deduplication. Completely haven't thought about that. Yes, the image contents do make a difference there. Mmhh..I still think it makes sense to test and handle the different aspects of the tool differently to 1) give privacy conscious people a way of comfortably share their test data and 2) not waste space in the test data repo unnecessarily. So, suggestion: use cleaned images for EXIF data tests, use full images for deduplication tests. I mean, with the process outlined in contributing.md we will find out if the cleaning causes any problems (the second point of step 7 would fail). I don't think there's anything to lose, only to win.
You just met the first one ;-) |
Okay, I though a bit about this, and here is my vision/manifest for this:This script is generally one big dump of code. It's not modular, or easy to divide into pices - that's why, when testing it, I think we should just run it like we run it from cmd - that way it won't require modifing the The Actions workflow will look like this:
For the properties itself... we can make a
etc... Then, just loop through it with python test and we are good - that way we can easilty add new edge-case photos and their expected values What do you think, @sebastianludwig? As for test data itself, yeah, I created test data repo, and let's just keep everything there - just don't upload any 2gb videos there |
That's definitely a valid approach. Before I form an opinion, can you share your thoughts on why you think this is the way forward and the benefits you see compared to the approach proposed in this PR? |
What you did so far:
TL;DR this will be pretty simillar, just less pythonic and with one abstraction layer more (that is, |
Not completely. In case of failures we wouldn't get stacktraces but only console logs. Also debugging wouldn't be possible, because the helper would be running in a different process. I'm not convinced stylish reasons are worth these tangible darwbacks.
Abstractions all come with a cost of complexity, understandability and maintainability. The JSON files for example need to be maintained separately, and separate parsing logic is necessary. Also they would only check the aspects explicitly encoded into the JSON files instead of all possible aspects we could check from reference images, effectively reducing the code/case coverage. If you see this explicitness as benefit, I think it'd still be better to write the asserts directly into the unit tests instead of JSON files. If we'd have to write the expected values by hand anyway, why introduce additional indirection and probably sacrifice error explicitness (for multiple images failing assertions we'd only have the error message all originating in the same assert statement). The CI integration and the fancy green checkmarks are icing on the cake and everything is possible wich each setup. So the UI shouldn't inform your decision on the technical foundation. Leaving the "this is how it could work as well" level, I'm still curious what you think why this different approach is better. Like what problems of the proposed solution does it solve, which downsides does it improve on? |
But we got those super sweat logs from Here is my backstory - I never wrote any tests before, in any language. I want to learn it myself from the begining, so I went with ones that are purest and simplest - just
I really have no good reason. I'm just experimenting, and I don't know if any of those ideas are good. Maybe jsons will be a pain, but I want to try it - this repo is a nice sandbox where I can play with shovel and rakes |
Yeah, no worries. Glad I was of any help. Good luck with this project 🚀 |
The goal of this PR is to establish an integration test baseline to build upon in the future. The foundation to confidently move forward with future development is to avoid regressions. Since the whole tool can be thought of as a function with input parameters, deterministic behaviour and a result, it makes sense to assert the correctness on this level first before caring more about the internals. As long as this works, the rest doesn't matter too much.
The rough plan is:
input
), containing examples for all cases (including weird edge cases).output
).reference
).Opening this PR as early draft to discuss the approach.
The next steps required to start reaping the benefits of this are
Future steps would be to integrate this into some kind of CI, but even without it's already very helpful as a quick test run on a local checkout is easy.
@TheLastGimbus Regarding the second point you mentioned that you already collected test data from other users. Since sending them over to you and having them permanently stored in a public repo is still something different I also added a script to remove the actual image data from the files while preserving the EXIF metadata. It would be great if you could clean and provide what you have collected, that would probably already be a very solid test data set. The files I added so far are only to demonstrate the approach.