-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Swap sqlite cursor with dictionary and set data structures #24
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pretty close, just a few minor changes. Mostly blocking on the exception handling.
src/bitrot.py
Outdated
) | ||
|
||
return renamed | ||
except: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't use bare except:
, specify which exceptions you expect. That would also let me understand why you need to wrap this block in a try:except:
in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The exception was for hashes_dict_db[new_sha1], as new_sha1 could be a new one sha1 indeed. ;)
I'm sure this block can be reworked in something simpler/cleaner.
tests/test-bitrot.bats
Outdated
|
||
#WARNING! | ||
#Be careful don't use ((, cause (( $status == pp )) && echo Really WRONG! | ||
#the issue is that (( 0 == letters )) is always true ... :( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this comment. "Don't use ((", so what should I use instead? You're using "((" in lines 29, 45, and so on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the default header for my bats tests.
If you have the chance of having a typo with (( $status == 0 )) and swap the 0 with a O or whatever string it will be always true.
status=0; (( $status == pp )) && echo Really WRONG!
It's just another bash "bug" that I need to report some day (they will think on it as a feature...).
PD: (( )) in bash script is for testing integers, so I reckon strings decay to 0 in bash (( )).
src/bitrot.py
Outdated
@@ -180,12 +179,13 @@ def run(self): | |||
errors = [] | |||
current_size = 0 | |||
missing_paths = self.select_all_paths(cur) | |||
paths, total_size = list_existing_paths( | |||
hashes_dict_db = self.select_all_hashes(cur) | |||
paths_fs, total_size = list_existing_paths( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you keep the paths
name as it was before? I don't see the value of changing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the suffixes to differentiate between fs and db info. It was clearer for me to read hashes_db than hashes (and have to remember everytime that they came from the DB), the same reason for paths (do they come from the DB or the FS, ummm? ).
src/bitrot.py
Outdated
b'.', expected=missing_paths, ignored={bitrot_db, bitrot_sha512}, | ||
follow_links=self.follow_links, | ||
) | ||
|
||
for p in paths: | ||
for p in paths_fs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to wrap this in sorted()
for reproducible results if it doesn't affect the execution speed too much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool.
I'll try it, reproducible results are really important. ;)
The first thing I tried was using just bisect.bisect_left with the former sorted path, but O(1) beats O(log n) really hard, so I opted for using set rather than list. ;)
src/bitrot.py
Outdated
chash = '' | ||
while row: | ||
rhash, rpath = row | ||
if chash != rhash: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's simpler if you rather do:
result.setdefault(rhash, set()).add(rpath)
Then you don't need to order the SELECT which should be even faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice syntantic sugar I didn't know about setdefault. :)
I'll try it.
src/bitrot.py
Outdated
try: # if doesn't exist the fs path of the database | ||
found = [path for path in hashes_dict_db[new_sha1] if path not in paths_fs] | ||
renamed = found.pop() | ||
if renamed : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Unnecessary space before the colon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
And what about the found.pop() could I use something different?. I don't know why but I found it odd.
src/bitrot.py
Outdated
@@ -180,12 +179,13 @@ def run(self): | |||
errors = [] | |||
current_size = 0 | |||
missing_paths = self.select_all_paths(cur) | |||
paths, total_size = list_existing_paths( | |||
hashes_dict_db = self.select_all_hashes(cur) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the name hashes
would be simpler and contains the same information.
src/bitrot.py
Outdated
|
||
# update the path in the database | ||
try: # if doesn't exist the fs path of the database |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Umm. :S
What if I reword it as:
"If the fs path doesn't exist in the database yet" (/native grammar mode off)
Or just
"If the path isn't in the dabatase"
*Remove logging of test
Thanks! ✨ 🍰 ✨ |
-paths_fs (set) contains all the paths(files) in the actual filesystem
-hashes_dict_db (dictionary) substitute the sqlite query with dict[hash] = set(db paths)
See #23 for details.