Skip to content

Fix Doc2Text::Odt::Document.parse_and_save when using tmp files as input #25

Closed
alecslupu wants to merge 5 commits into
bostko:masterfrom
alecslupu:fix/temporary-path
Closed

Fix Doc2Text::Odt::Document.parse_and_save when using tmp files as input #25
alecslupu wants to merge 5 commits into
bostko:masterfrom
alecslupu:fix/temporary-path

Conversation

@alecslupu
Copy link
Copy Markdown
Contributor

In decidim we have the following snippet:

      def transform_to_md_file(doc_file)
        md_file = Tempfile.new
        Doc2Text::Odt::Document.parse_and_save doc_file, md_file
        md_file
      end

inside the parse_and_save, the input is /tmp/doc-to-markdown-odt20260509-18225-lmbxu2, and output_filename is /tmp/20260509-18225-dtzzo3.

Running the spec, on my end, i get an error like:

  1) Decidim::Proposals::Admin::ImportParticipatoryText call when the form is valid with odt document behaves like import participatory_text succeeds broadcasts ok and creates the proposals
     Failure/Error: Doc2Text::Odt::Document.parse_and_save doc_file.path, md_file.path
     
     Errno::ENOENT:
       No such file or directory @ rb_sysopen - /home/myuser/Sites/decidim/develop/decidim-proposals/tmp/.doc-to-markdown-odt20260509-18225-lmbxu2_unpacked_odt/mimetype
     Shared Example Group: "import participatory_text succeeds" called from ./spec/commands/decidim/proposals/admin/import_participatory_text_spec.rb:93
     # ./lib/decidim/proposals/odt_to_markdown.rb:41:in 'Decidim::Proposals::OdtToMarkdown#transform_to_md_file'
     # ./lib/decidim/proposals/odt_to_markdown.rb:23:in 'Decidim::Proposals::OdtToMarkdown#to_md'
     # ./lib/decidim/proposals/doc_to_markdown.rb:42:in 'Decidim::Proposals::DocToMarkdown#to_md'
     # ./app/commands/decidim/proposals/admin/import_participatory_text.rb:48:in 'Decidim::Proposals::Admin::ImportParticipatoryText#parse_participatory_text_doc'
     # ./app/commands/decidim/proposals/admin/import_participatory_text.rb:27:in 'block in Decidim::Proposals::Admin::ImportParticipatoryText#call'
     # /home/alecslupu/Sites/decidim/develop/decidim-core/lib/decidim/command.rb:31:in 'Decidim::Command#transaction'
     # ./app/commands/decidim/proposals/admin/import_participatory_text.rb:25:in 'Decidim::Proposals::Admin::ImportParticipatoryText#call'
     # ./spec/commands/decidim/proposals/admin/import_participatory_text_spec.rb:43:in 'block (5 levels) in <module:Admin>'
     # ./spec/commands/decidim/proposals/admin/import_participatory_text_spec.rb:43:in 'block (4 levels) in <module:Admin>'

From the stack trace, I could identify that the spec (running with --backtrace) is actually failing in the unpack method of Doc2Text::XmlBasedDocument::DocumentFile.

The line zip_file.extract entry, zipped_file_extract_path expects a third named argument destination_directory:, which has a default of .. This means that when you pass a file having the full path (ex a temp file), the destination_directory becomes cwd + original path, leading to strange path.

I have managed to fix on my local, using the attached snippet.

Copilot AI review requested due to automatic review settings May 9, 2026 20:35
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to fix ODT unpacking/parsing when the input document is provided via a temporary file path (e.g., Tempfile), addressing failures caused by how rubyzip resolves extraction destinations for absolute paths.

Changes:

  • Adjust ODT (and other XML-based) unpacking to extract into a temporary directory rather than alongside the input file.
  • Modify rubyzip extraction invocation to account for destination directory resolution.
  • Add a spec case that exercises unpacking from a temporary file.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

File Description
lib/doc2text/xml_based_document_file.rb Changes extraction target directory strategy and alters how zip entries are extracted.
spec/unpack_odt_spec.rb Adds a spec for unpacking an ODT from a Tempfile.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread lib/doc2text/xml_based_document_file.rb
Comment thread lib/doc2text/xml_based_document_file.rb Outdated
Comment thread lib/doc2text/xml_based_document_file.rb
Comment thread spec/unpack_odt_spec.rb
Comment thread spec/unpack_odt_spec.rb
Comment thread spec/unpack_odt_spec.rb
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (2)

spec/unpack_odt_spec.rb:26

  • The fixture being copied is a ZIP/binary file. Using File.read/tempfile.write without binary mode can corrupt the bytes on Windows due to newline translation. Prefer File.binread and tempfile.binmode (or open the Tempfile in binary) when copying the .odt bytes.
      tempfile = Tempfile.new('text_styles.odt')
      tempfile.write File.read(File.join 'spec', 'fixtures', 'text_styles.odt')
      tempfile.rewind
      tempfile.close

spec/unpack_odt_spec.rb:27

  • Cleanup here isn't protected if an expectation fails (e.g., tempfile.unlink / @odt.clean won't run), which can leave temp files/dirs behind and make runs flaky. Wrap the example in begin/ensure and ensure both the tempfile and @odt.clean are executed in the ensure block (e.g., via tempfile.close!).
      tempfile = Tempfile.new('text_styles.odt')
      tempfile.write File.read(File.join 'spec', 'fixtures', 'text_styles.odt')
      tempfile.rewind
      tempfile.close

Comment on lines 14 to +17
zip_file.each do |entry|
zipped_file_extract_path = File.join extract_path, entry.name
FileUtils.mkdir_p File.dirname(zipped_file_extract_path)
zip_file.extract entry, zipped_file_extract_path
zip_file.extract entry, entry.name, destination_directory: extract_path
Comment thread spec/unpack_odt_spec.rb
Comment on lines +21 to +24
context "when the odt is a temporary file" do
it "runs from a temp file" do
tempfile = Tempfile.new('text_styles.odt')
tempfile.write File.read(File.join 'spec', 'fixtures', 'text_styles.odt')
@bostko
Copy link
Copy Markdown
Owner

bostko commented May 13, 2026

See #26

@bostko bostko closed this May 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants