Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make large files great again #191

Closed
camjac251 opened this issue May 28, 2020 · 10 comments
Closed

Make large files great again #191

camjac251 opened this issue May 28, 2020 · 10 comments
Assignees
Labels
enhancement New feature or request
Projects

Comments

@camjac251
Copy link

I was in the process of stress testing this app before making the switch but I seem to have run into some problems with large files.

I don't expect to upload this large of a file but I tried to upload a 23GB mkv file through my admin sharex config. This was directly to my publicip:9999, no nginx inbetween. Monitoring the ram and cpu, cpu seemed to spike a few times throughout the upload but ram was pretty strong at 230~MB total usage and it wouldn't go up which is great. I watched the file gain in size in /uploads and started counting how long it would take from the upload being finished to it giving me the url. It was stuck on the process that was done uploading but still waiting to give a url for about a minute and 30 seconds. After that the file started over with reuploading, although in my uploads folder, the old file remained and a new file was being written for this new transfer.

I cancelled it then since I think it would just loop over and over probably. I tested beforehand with a gigabyte file and it worked all right (there was some delay between the upload being finished to url generated). In my config I have maxSize: '150000MB', and noJsMaxSize: '100GB',
The other main options are pretty much default.

Is there a way that I could give you helpful debug logs for these transfers? I'm worried I might be hitting a bottleneck with the architecture of node.

@BobbyWibowo
Copy link
Owner

BobbyWibowo commented May 28, 2020

Intensive processes that I can think of, that the node would do with the uploads after fully uploaded, are probably file hashing and virus scanning. Thumbnails are created in the background, so no waiting there.
You didn't mention virus scanning, so it most likely is the file hashing then (FYI it's for duplication avoidance).

Try to comment out:

const hash = await new Promise((resolve, reject) => {
const result = crypto.createHash('md5')
fs.createReadStream(info.path)
.on('error', error => reject(error))
.on('end', () => resolve(result.digest('hex')))
.on('data', data => result.update(data, 'utf8'))
})
// Check if the file exists by checking its hash and size
const dbFile = await db.table('files')
.where(function () {
if (user === undefined)
this.whereNull('userid')
else
this.where('userid', user.id)
})
.where({
hash,
size: info.data.size
})
// Select expirydate to display expiration date of existing files as well
.select('name', 'expirydate')
.first()
if (dbFile) {
// Continue even when encountering errors
await utils.unlinkFile(info.data.filename).catch(logger.error)
// logger.log(`Unlinked ${info.data.filename} since a duplicate named ${dbFile.name} exists`)
// If on /nojs route, append original file name reported by client
if (req.path === '/nojs')
dbFile.original = info.data.originalname
exists.push(dbFile)
return
}

Then add a new line below them: const hash = null.

If that was indeed the bottleneck, I'll add a config option to completely disable the feature I guess. Can't really make that any quicker.

@BobbyWibowo
Copy link
Owner

After that the file started over with reuploading, although in my uploads folder, the old file remained and a new file was being written for this new transfer.

Can't think of why this would happen though.
If client attempts to reupload, then it has to be something from the client itself. But I don't recall the homepage uploader having the feature to do that.

@camjac251
Copy link
Author

Could the internal webserver have a timeout set?

I commented it out like this

self.storeFilesToDb = async (req, res, user, infoMap) => {
  const files = []
  const exists = []
  const albumids = []

  await Promise.all(infoMap.map(async info => {
    // Create hash of the file
/*    const hash = await new Promise((resolve, reject) => {
      const result = crypto.createHash('md5')
      fs.createReadStream(info.path)
        .on('error', error => reject(error))
        .on('end', () => resolve(result.digest('hex')))
        .on('data', data => result.update(data, 'utf8'))
    })

    // Check if the file exists by checking its hash and size
    const dbFile = await db.table('files')
      .where(function () {
        if (user === undefined)
          this.whereNull('userid')
        else
          this.where('userid', user.id)
      })
      .where({
        hash,
        size: info.data.size
      })
      // Select expirydate to display expiration date of existing files as well
      .select('name', 'expirydate')
      .first()

    if (dbFile) {
      // Continue even when encountering errors
      await utils.unlinkFile(info.data.filename).catch(logger.error)
      // logger.log(`Unlinked ${info.data.filename} since a duplicate named ${dbFile.name} exists`)

      // If on /nojs route, append original file name reported by client
      if (req.path === '/nojs')
        dbFile.original = info.data.originalname

      exists.push(dbFile)
      return
    }*/

    const timestamp = Math.floor(Date.now() / 1000)
    const data = {

Running the transfer again now

@BobbyWibowo
Copy link
Owner

You'd have to add the const hash = null line, as it's being referenced somewhere below when inserting to DB.

Other than the timeout between Nginx to lolisafe service, I'm not aware of where else.
Supposedly since Express sits on top Node.js HTTP API, the default timeout of it applies as well: https://nodejs.org/docs/latest-v12.x/api/http.html#http_server_timeout. Not sure if it's that kind of timeout, but at least I don't recall it being changed elsewhere, so it should still be the default 2 mins.

@camjac251
Copy link
Author

camjac251 commented May 28, 2020

That was it. It was instantaneous right after it finished uploading in giving the link. Nothing I've tried had been so fast like this before.

Could there be a faster checksum method available for duplicates in node?
There's a few Blake3 implementations here https://www.npmjs.com/search?q=blake3

@uhthomas
Copy link

Just a suggestion: Have you considered using a multi-writer when upload? So rather than treating uploading, and hashing, as two separate steps, it can instead be done at the same time. This is hope kipp handles file uploads.

As @camjac251 suggested, it might also be a good idea to use blake3. md5 should never, ever be used anymore.

@BobbyWibowo
Copy link
Owner

Just a suggestion: Have you considered using a multi-writer when upload? So rather than treating uploading, and hashing, as two separate steps, it can instead be done at the same time. This is hope kipp handles file uploads.

Frankly hadn't given that any thought before this issue. Also thought of something like that while this was going on, but I was sure Multer (the current lib we use to parse multipart data) didn't have stream-based API to hook into.
I just gave it a look again, and indeed stream-based API is only on RC versions versions atm. Can probably give that a try as-is, but eh, dunno.
Though there are some solutions that involve writing own Multer storage engine, such as this one. Probably a better choice for now.

There's a few Blake3 implementations here https://www.npmjs.com/search?q=blake3

As @camjac251 suggested, it might also be a good idea to use blake3. md5 should never, ever be used anymore.

Aight, that sounds good to me as well.

@BobbyWibowo BobbyWibowo added the enhancement New feature or request label May 28, 2020
@BobbyWibowo BobbyWibowo changed the title [BUG] Large files [BUG] Make large files great again May 28, 2020
@BobbyWibowo BobbyWibowo added this to In progress in Mainline May 28, 2020
@BobbyWibowo BobbyWibowo self-assigned this May 28, 2020
@BobbyWibowo BobbyWibowo changed the title [BUG] Make large files great again Make large files great again May 28, 2020
@BobbyWibowo
Copy link
Owner

@camjac251 try 62a9775

@camjac251
Copy link
Author

Oh wow. I'll try it out right now. Thank you so much

@camjac251
Copy link
Author

This is incredible. Thank you for adding this. It was almost instant after the 23GB file was done uploading until it gave me the link. So much faster now

Mainline automation moved this from In progress to Done May 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
No open projects
Development

No branches or pull requests

3 participants