Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when uploading files with more than 1.1 GBs #366

Closed
normancarcamo opened this issue Feb 18, 2020 · 2 comments
Closed

Error when uploading files with more than 1.1 GBs #366

normancarcamo opened this issue Feb 18, 2020 · 2 comments

Comments

@normancarcamo
Copy link

Hello there, I am trying to use this package to handle potentially large csv files, but I got an error.
I will try to show the code to see where I might be doing something wrong.

The file that I'm trying to upload is this:
Screen Shot 2020-02-18 at 2 32 42 PM

the middleware for upload the files is express-fileupload
Example app.ts

// ... skipped for brevity

app.use(fileUpload());

// ... skipped for brevity

This service has the logic to handle the file using express-fileupload:

  async processFile(files: IUpload.IFileArray | undefined, config: any) {
    console.log('FILES:', files);
    // 1. VALIDATE INPUT:
    if (!files || Object.keys(files).length === 0) {
      throw new Error('No files were uploaded.');
    }
  
    if (!this.util.is.object(files.dataset)) {
      throw new Error('Invalid dataset.');
    }
   
    if (this.util.is.empty(config)) {
      throw new Error('Invalid configuration.');
    }
  
    if (!this.util.is.string(config.provider) || 
      this.util.is.empty(config.provider)) {
      throw new Error('Invalid provider.');
    }

    if (!this.util.is.string(config.columns) || 
      this.util.is.empty(config.columns)) {
      throw new Error('Invalid columns.');
    }

    let nullObject: boolean = false;

    if ('nullObject' in config) {
      if (!this.util.is.boolean(config.nullObject)) {
        throw new Error('nullObject must be of boolean type');
      } else {
        nullObject = config.nullObject;
      }
    }

    let delimiter: string[] = [','];

    if ('delimiter' in config) {
      if (!this.util.is.array(config.delimiter) 
      || config.delimiter.length === 0) {
        throw new Error('delimiter must be of array type');
      } else {
        delimiter = config.delimiter;
      }
    }

    let quote: string = '"';

    if ('quote' in config) {
      if (!this.util.is.string(config.quote) || config.quote === '') {
        throw new Error('quote must be of string type');
      } else {
        quote = config.quote;
      }
    }

    let trim: boolean = true;

    if ('trim' in config) {
      if (!this.util.is.boolean(config.trim)) {
        throw new Error('trim must be of boolean type');
      } else {
        trim = config.trim;
      }
    }

    let ignoreEmpty: boolean = false;

    if ('ignoreEmpty' in config) {
      if (!this.util.is.boolean(config.ignoreEmpty)) {
        throw new Error('ignoreEmpty must be of boolean type');
      } else {
        ignoreEmpty = config.ignoreEmpty;
      }
    }

    let noheader: boolean = false;

    if ('noheader' in config) {
      if (!this.util.is.boolean(config.noheader)) {
        throw new Error('noheader must be of boolean type');
      } else {
        noheader = config.noheader;
      }
    }
  
    // 2. PARSE DATA:
    let data: any[] = [];
    try {
      const dataset = (files.dataset as IUpload.IUploadedFile);
      data = await this.util.csv({
        nullObject: nullObject,
        delimiter: delimiter,
        quote: quote,
        trim: trim,
        ignoreEmpty: ignoreEmpty,
        noheader: noheader,
        includeColumns: new RegExp(config.columns.replace(/,/gm, '|')),
        headers: config.columns.split(','),
      }).fromString(dataset.data.toString('utf8'));
    } catch (err) {
      console.log('ERROR ->', err);
      throw new Error('Cannot parse the csv file content. ' + err.message);
    }

    if (data.length === 0) {
      throw new Error('Cannot save the provider data being empty.');
    }
  
    // 3. INSERT DATA:
    return await this.repository.createProducts(config.provider, data);
  }

And this is how I upload the file (curl for testing purposes) and the error I got:

Screen Shot 2020-02-18 at 2 43 57 PM

@normancarcamo
Copy link
Author

Hi, I'm going to close this because it is not an issue, I was trying to parse big files in the wrong way, the solution is to use streams and it's working like a charm with file of more than 7GB (smooth).

@KhalilMohammad
Copy link

KhalilMohammad commented Feb 11, 2021

Hi,

Can you share your stream code?

I have CSV files of over 60 GBs.
Even for my smaller 10GB files, it took 8 hours to complete with the below code.

I am using it like this.
It's very slow.

const csv = require("csvtojson/v2");

const writeStream = require("fs").createWriteStream("persons2.json");
csv({
  delimiter: "\t",
})
  .fromFile("persons.csv")
  .pipe(writeStream);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants