Make large files great again #191

camjac251 · 2020-05-28T16:30:24Z

I was in the process of stress testing this app before making the switch but I seem to have run into some problems with large files.

I don't expect to upload this large of a file but I tried to upload a 23GB mkv file through my admin sharex config. This was directly to my publicip:9999, no nginx inbetween. Monitoring the ram and cpu, cpu seemed to spike a few times throughout the upload but ram was pretty strong at 230~MB total usage and it wouldn't go up which is great. I watched the file gain in size in /uploads and started counting how long it would take from the upload being finished to it giving me the url. It was stuck on the process that was done uploading but still waiting to give a url for about a minute and 30 seconds. After that the file started over with reuploading, although in my uploads folder, the old file remained and a new file was being written for this new transfer.

I cancelled it then since I think it would just loop over and over probably. I tested beforehand with a gigabyte file and it worked all right (there was some delay between the upload being finished to url generated). In my config I have maxSize: '150000MB', and noJsMaxSize: '100GB',
The other main options are pretty much default.

Is there a way that I could give you helpful debug logs for these transfers? I'm worried I might be hitting a bottleneck with the architecture of node.

The text was updated successfully, but these errors were encountered:

BobbyWibowo · 2020-05-28T16:37:48Z

Intensive processes that I can think of, that the node would do with the uploads after fully uploaded, are probably file hashing and virus scanning. Thumbnails are created in the background, so no waiting there.
You didn't mention virus scanning, so it most likely is the file hashing then (FYI it's for duplication avoidance).

Try to comment out:

lolisafe/controllers/uploadController.js

Lines 606 to 641 in df11fb1

    
           const hash = await new Promise((resolve, reject) => { 
        
             const result = crypto.createHash('md5') 
        
             fs.createReadStream(info.path) 
        
               .on('error', error => reject(error)) 
        
               .on('end', () => resolve(result.digest('hex'))) 
        
               .on('data', data => result.update(data, 'utf8')) 
        
           }) 
        
           // Check if the file exists by checking its hash and size 
        
           const dbFile = await db.table('files') 
        
             .where(function () { 
        
               if (user === undefined) 
        
                 this.whereNull('userid') 
        
               else 
        
                 this.where('userid', user.id) 
        
             }) 
        
             .where({ 
        
               hash, 
        
               size: info.data.size 
        
             }) 
        
             // Select expirydate to display expiration date of existing files as well 
        
             .select('name', 'expirydate') 
        
             .first() 
        
           if (dbFile) { 
        
             // Continue even when encountering errors 
        
             await utils.unlinkFile(info.data.filename).catch(logger.error) 
        
             // logger.log(`Unlinked ${info.data.filename} since a duplicate named ${dbFile.name} exists`) 
        
             // If on /nojs route, append original file name reported by client 
        
             if (req.path === '/nojs') 
        
               dbFile.original = info.data.originalname 
        
             exists.push(dbFile) 
        
             return 
        
           }

Then add a new line below them: const hash = null.

If that was indeed the bottleneck, I'll add a config option to completely disable the feature I guess. Can't really make that any quicker.

BobbyWibowo · 2020-05-28T16:43:49Z

After that the file started over with reuploading, although in my uploads folder, the old file remained and a new file was being written for this new transfer.

Can't think of why this would happen though.
If client attempts to reupload, then it has to be something from the client itself. But I don't recall the homepage uploader having the feature to do that.

camjac251 · 2020-05-28T16:53:56Z

Could the internal webserver have a timeout set?

I commented it out like this

self.storeFilesToDb = async (req, res, user, infoMap) => {
  const files = []
  const exists = []
  const albumids = []

  await Promise.all(infoMap.map(async info => {
    // Create hash of the file
/*    const hash = await new Promise((resolve, reject) => {
      const result = crypto.createHash('md5')
      fs.createReadStream(info.path)
        .on('error', error => reject(error))
        .on('end', () => resolve(result.digest('hex')))
        .on('data', data => result.update(data, 'utf8'))
    })

    // Check if the file exists by checking its hash and size
    const dbFile = await db.table('files')
      .where(function () {
        if (user === undefined)
          this.whereNull('userid')
        else
          this.where('userid', user.id)
      })
      .where({
        hash,
        size: info.data.size
      })
      // Select expirydate to display expiration date of existing files as well
      .select('name', 'expirydate')
      .first()

    if (dbFile) {
      // Continue even when encountering errors
      await utils.unlinkFile(info.data.filename).catch(logger.error)
      // logger.log(`Unlinked ${info.data.filename} since a duplicate named ${dbFile.name} exists`)

      // If on /nojs route, append original file name reported by client
      if (req.path === '/nojs')
        dbFile.original = info.data.originalname

      exists.push(dbFile)
      return
    }*/

    const timestamp = Math.floor(Date.now() / 1000)
    const data = {

Running the transfer again now

BobbyWibowo · 2020-05-28T17:01:37Z

You'd have to add the const hash = null line, as it's being referenced somewhere below when inserting to DB.

Other than the timeout between Nginx to lolisafe service, I'm not aware of where else.
Supposedly since Express sits on top Node.js HTTP API, the default timeout of it applies as well: https://nodejs.org/docs/latest-v12.x/api/http.html#http_server_timeout. Not sure if it's that kind of timeout, but at least I don't recall it being changed elsewhere, so it should still be the default 2 mins.

camjac251 · 2020-05-28T17:12:27Z

That was it. It was instantaneous right after it finished uploading in giving the link. Nothing I've tried had been so fast like this before.

Could there be a faster checksum method available for duplicates in node?
There's a few Blake3 implementations here https://www.npmjs.com/search?q=blake3

uhthomas · 2020-05-28T17:21:44Z

Just a suggestion: Have you considered using a multi-writer when upload? So rather than treating uploading, and hashing, as two separate steps, it can instead be done at the same time. This is hope kipp handles file uploads.

As @camjac251 suggested, it might also be a good idea to use blake3. md5 should never, ever be used anymore.

BobbyWibowo · 2020-05-28T17:46:45Z

Just a suggestion: Have you considered using a multi-writer when upload? So rather than treating uploading, and hashing, as two separate steps, it can instead be done at the same time. This is hope kipp handles file uploads.

Frankly hadn't given that any thought before this issue. Also thought of something like that while this was going on, but I was sure Multer (the current lib we use to parse multipart data) didn't have stream-based API to hook into.
I just gave it a look again, and indeed stream-based API is only on RC versions versions atm. Can probably give that a try as-is, but eh, dunno.
Though there are some solutions that involve writing own Multer storage engine, such as this one. Probably a better choice for now.

There's a few Blake3 implementations here https://www.npmjs.com/search?q=blake3

As @camjac251 suggested, it might also be a good idea to use blake3. md5 should never, ever be used anymore.

Aight, that sounds good to me as well.

BobbyWibowo · 2020-05-28T19:53:50Z

@camjac251 try 62a9775

camjac251 · 2020-05-29T01:01:40Z

Oh wow. I'll try it out right now. Thank you so much

camjac251 · 2020-05-29T11:03:46Z

This is incredible. Thank you for adding this. It was almost instant after the 23GB file was done uploading until it gave me the link. So much faster now

BobbyWibowo added the enhancement New feature or request label May 28, 2020

BobbyWibowo changed the title ~~[BUG] Large files~~ [BUG] Make large files great again May 28, 2020

BobbyWibowo added this to In progress in Mainline May 28, 2020

BobbyWibowo self-assigned this May 28, 2020

BobbyWibowo changed the title ~~[BUG] Make large files great again~~ Make large files great again May 28, 2020

BobbyWibowo closed this as completed May 29, 2020

Mainline automation moved this from In progress to Done May 29, 2020

BobbyWibowo mentioned this issue Jan 5, 2021

Rewrite chunk uploads chibisafe/chibisafe#243

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make large files great again #191

Make large files great again #191

camjac251 commented May 28, 2020

BobbyWibowo commented May 28, 2020 •

edited

Loading

BobbyWibowo commented May 28, 2020

camjac251 commented May 28, 2020

BobbyWibowo commented May 28, 2020

camjac251 commented May 28, 2020 •

edited

Loading

uhthomas commented May 28, 2020

BobbyWibowo commented May 28, 2020

BobbyWibowo commented May 28, 2020

camjac251 commented May 29, 2020

camjac251 commented May 29, 2020

Make large files great again #191

Make large files great again #191

Comments

camjac251 commented May 28, 2020

BobbyWibowo commented May 28, 2020 • edited Loading

BobbyWibowo commented May 28, 2020

camjac251 commented May 28, 2020

BobbyWibowo commented May 28, 2020

camjac251 commented May 28, 2020 • edited Loading

uhthomas commented May 28, 2020

BobbyWibowo commented May 28, 2020

BobbyWibowo commented May 28, 2020

camjac251 commented May 29, 2020

camjac251 commented May 29, 2020

BobbyWibowo commented May 28, 2020 •

edited

Loading

camjac251 commented May 28, 2020 •

edited

Loading