OutOfMemoryError downloading file from URL with trailing slash #2839

memo33 · 2023-09-11T18:54:06Z

Coursier runs out of memory when downloading a large file (750 MB) from a URL that ends with a slash (such as example.org/foo/).

The problem comes from Downloader#doTouchCheckFile where the implementation assumes the file content is a directory listing -- and tries to read the entire file content as a String into memory.

coursier/modules/cache/jvm/src/main/scala/coursier/cache/internal/Downloader.scala

Lines 160 to 167 in ef2b102

    
           if (updateLinks && file.getName == ".directory") { 
        
             val linkFile = FileCache.auxiliaryFile(file, "links") 
        
             val succeeded = 
        
               try { 
        
                 val content = 
        
                   WebPage.listElements(url, new String(Files.readAllBytes(file.toPath), UTF_8)) 
        
                     .mkString("\n")

This could potentially be fixed by reading a file input stream instead and passing it to WebPage#listElements. On the JVM, HTML parsing is done using Jsoup, which does have support for input streams, and I assume, on binary data, it would throw a parse error early on. What would be needed for the JS backend, though?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OutOfMemoryError downloading file from URL with trailing slash #2839

OutOfMemoryError downloading file from URL with trailing slash #2839

memo33 commented Sep 11, 2023

OutOfMemoryError downloading file from URL with trailing slash #2839

OutOfMemoryError downloading file from URL with trailing slash #2839

Comments

memo33 commented Sep 11, 2023