Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new clean, convert, export commands #30

Merged
merged 25 commits into from
Jul 6, 2020
Merged

Conversation

jayqi
Copy link
Member

@jayqi jayqi commented Jun 25, 2020

Added new commands to the CLI. The summary of the commands are as follows:

  • clean: Removes files that don't match what nbautoexport would create based on current notebooks and current config. For example, the use case is removing stuff like Untitled.py to clean up after stuff from renamed or deleted notebooks.
  • convert: This runs conversion with the configuration provided by CLI arguments (no .nbautoexport file needed).
  • export: This runs conversion on a folder with a .nbautoexport configuration file or with override CLI options.
  • install: The old main command of creating a .nbautoexport configuration file.

Additional changes:

  • Adding an clean configuration option to automatically run the cleaning as part of post_save. Defaults to False.
  • Add rst as an export format
  • Add __main__.py to support for calling with python -m. This requires setting typer>=0.3.0.

Additional refactoring:

  • Split source code into multiple modules
  • Refactored export functionality into new function export.export_notebook to better support code reuse for convert and export commands.
  • Refactored all instances of os.path to use pathlib
  • Refactored .nbautoexport config to use pydantic model, with strict validation on fields

How stuff works:

  • How clean, convert, and export know what files are notebooks:
    • New function utils.find_notebook will attempt to load every file in given directory with nbformat.
  • How clean knows what files to expect:
    • New function clean.notebook_exports_generator holds logic for predicting exported files.
    • New function clean.get_extension grabs nbconvert exporters to figure out appropriate file extension

Fixed some bugs:

  • The asciidoc, latex, markdown, and rst formats create separate directories for image files that weren't getting moved into the subfolders.
  • pdf format resulted in a unicode decode error because the postprocessor was trying to read the pdf file back in as unicode text in order to remove the cell numbers
  • nbconvert grabs CLI arguments from sys.argv, which leads to really weird errors if nbconvert gets initiated through something like pytest or nbautoexport. Previously this was worked around in the tests by monkeypatching sys.argv. Instead, we now use a context manager to temporarily clear sys.argv whenever nbconvert is initialized, which prevents errors no matter if it was nbautoexport, pytest, or jupyter that starts the outer process.

Looking for feedback:

  • Are the names of the commands clear?
  • Right now we only support organize_by of notebooks and extensions. Wondering if it would make sense to support None, i.e., keep converted output in parent directory and not in subfolders. Opened Proposal: Add support for having no subfolders #32

TODO:

  • Update unit tests to work with refactor
  • Add new unit tests for new commands

Known issues, thinking will save these for the future:

  • Determining appropriate file extension to get expected export files is currently clunky for script. nbconvert actually determines the language from loading the .ipynb file and then dispatches to the specific language exporter. We may want to do the same thing. ended up implementing
  • I don't actually know how to get the exporters for languages besides Python (expecting R and Julia?). The nbconvert library only directly supports python. We'll have to figure out how it knows about others. Currently this will return .txt. Ran this against an R notebook I had, and it looks up the .r from the notebook metadata under file_extension.
  • The expected export files functionality does not support the subdirectories that asciidoc, markdown, and rst create for images, so clean will remove them. This may require figuring out naming conventions, and/or refactoring the expected_exports method to return globs instead of Paths. ended up implementing
  • Functionality that searches for notebooks (clean, export, convert) currently just glob for *.ipynb files. This may miss stuff, and we should potentially instead try checking any file with nbformat instead. ended up implementing

Note: I tried to get the root of the CLI to point to install but it doesn't work, because other subcommands will conflict with arguments.

Resolves #20, #21

pytest.ini Outdated Show resolved Hide resolved
@jayqi jayqi changed the title DRAFT: Add new clean, convert, export commands Add new clean, convert, export commands Jun 28, 2020
@jayqi jayqi requested review from r-b-g-b and pjbull June 28, 2020 00:26
@jayqi
Copy link
Member Author

jayqi commented Jun 28, 2020

@pjbull @r-b-g-b this PR is ready.

Apologies that it is quite hefty, but the new functionality I wanted to add really required some significant refactoring. Also a lot of it is tests! This PR is at 99% test coverage.

@jayqi jayqi marked this pull request as ready for review June 28, 2020 00:28
nbautoexport/clean.py Outdated Show resolved Hide resolved
nbautoexport/nbautoexport.py Show resolved Hide resolved
nbautoexport/nbautoexport.py Outdated Show resolved Hide resolved
nbautoexport/nbautoexport.py Outdated Show resolved Hide resolved
Copy link
Member

@pjbull pjbull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per our discussion, combine the export and convert commands.

@jayqi jayqi force-pushed the additional-commands branch 11 times, most recently from a751747 to f847b05 Compare July 2, 2020 06:48
@jayqi jayqi force-pushed the additional-commands branch 2 times, most recently from 269f44b to 093c1d9 Compare July 2, 2020 07:10
@codecov
Copy link

codecov bot commented Jul 2, 2020

Codecov Report

Merging #30 into master will increase coverage by 3.58%.
The diff coverage is 98.54%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #30      +/-   ##
==========================================
+ Coverage   94.73%   98.31%   +3.58%     
==========================================
  Files           2        8       +6     
  Lines         152      357     +205     
==========================================
+ Hits          144      351     +207     
+ Misses          8        6       -2     
Impacted Files Coverage Δ
nbautoexport/export.py 95.52% <95.52%> (ø)
nbautoexport/jupyter_config.py 96.36% <96.36%> (ø)
nbautoexport/__init__.py 100.00% <100.00%> (ø)
nbautoexport/__main__.py 100.00% <100.00%> (ø)
nbautoexport/clean.py 100.00% <100.00%> (ø)
nbautoexport/nbautoexport.py 98.91% <100.00%> (+4.35%) ⬆️
nbautoexport/sentinel.py 100.00% <100.00%> (ø)
nbautoexport/utils.py 100.00% <100.00%> (ø)
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 31a9d6f...a44bae0. Read the comment docs.

@jayqi
Copy link
Member Author

jayqi commented Jul 2, 2020

@pjbull @ejm714 @r-b-g-b

Okay, I think this should be ready to go.

  • Merged the export and convert commands. This is just named export.
  • Using miniconda in the CI now, so that we can install pandoc. Works great across all three OSes.
  • I had to remove pdf from the unit tests because unfortunately it requires xelatex, and I couldn't figure out a way to easily install it in the CI. Will open a new issue about figuring this out. Opened Install LaTeX in CI to enable testing PDF export format #35. Ultimately, testing pdf had helped me catch a bug last week, so I would like to have it tested.

@pjbull pjbull merged commit d55f414 into master Jul 6, 2020
@pjbull pjbull deleted the additional-commands branch July 6, 2020 23:17
This was referenced Jul 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Clean command
2 participants