Skip to content

Conversation

@ehanson8
Copy link
Contributor

What does this PR do?

Adding functions to build a file list from a local file system as well as a web directory. These functions are the first components of a workflow to ingest new content into DSpace. I haven't figured how I want to test the local file system function yet but there is a test for the web directory function.

Includes new or updated dependencies?

YES

@ehanson8
Copy link
Contributor Author

Added a test for the local file list function

dsaps/models.py Outdated
for root, dirs, files in os.walk(directory, topdown=True):
for file in files:
if file.endswith(file_extension):
full_file_path = os.path.join(root, file).replace('\\', '/')
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for using the replace here? I'm assuming it has something to do with Windows, but I'd like to better understand the requirement this is solving.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is legacy code from my last job that I copied without thinking, it doesn't seem necessary so I'll remove it

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, I would just get rid of this function entirely. The standard library already provides the ability to recursively generate a list of files matching a specific extension:

import glob

files = glob.glob('/path/to/search/**/*.pdf', recursive=True)

If there are potentially going to be lots (thousands) of files I'd suggest using iglob instead. This functionality is also in the pathlib module.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I will save this for when I create a click command for the full workflow. Thanks!

@ehanson8 ehanson8 force-pushed the file-list-func branch 2 times, most recently from 907d9e5 to 327d496 Compare January 6, 2020 18:26
assert '1234' in child_list


def test_build_file_list_local():
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you remove the function that this is testing this test would obviously no longer be needed. For future reference though, I'd suggest using one of pytest's temp dir fixtures. I've found a few isolated cases where they don't quite fit, but they will generally be a better option than trying to manage temp files and temp dirs on your own.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, though I hope I don't have consider too many local file operations in the future

@ehanson8 ehanson8 merged commit 23cab51 into refactor Jan 6, 2020
@ehanson8 ehanson8 deleted the file-list-func branch January 6, 2020 19:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants