New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

es.merge broke my data #93

Closed
ghost opened this Issue Jan 14, 2016 · 2 comments

Comments

Projects
None yet
1 participant
@ghost

ghost commented Jan 14, 2016

I have 12 files to handle, so I want to create 12 ReadStreams and merge to one.

In each file, each line is a JSON string.

My code is like this:

  fileNames = fileNames.filter(Boolean);
  es.merge(fileNames.map(function(item, index) {
               return fs.createReadStream(item);
             }))
    .pipe(es.split())
    .pipe(es.parse())

After run it, I got some JSON.parse error like this:

SyntaxError: Unexpected token

and I found the error json looks like this:

{"ts":1452700832306,"segments":{"sex":1,"age":5,"os":1,"deviceprice":5},"appkey":"560129b88bda20a5270f88da","token":"jXL29pROAnEGng","deviceid":"D4FB58CD-39{"ts":1452700810077,"appkey":"560129b88bda20a5270f88da","token":"jXL29pROAnEGng","deviceid":"B8DA0F64C613E896B412788B9BDB06D4"}

I grep 1452700832306 and 1452700810077, and find they come from two different files:

in file1, test1.log:

{"ts":1452700832306,"segments":{"sex":1,"age":5,"os":1,"deviceprice":5},"appkey":"560129b88bda20a5270f88da","token":"jXL29pROAnEGng","deviceid":"D4FB58CD-39CF-4445-AD26-C85194C0D032"}

in file2, test2.log:

{"ts":1452700810077,"appkey":"560129b88bda20a5270f88da","token":"jXL29pROAnEGng","deviceid":"B8DA0F64C613E896B412788B9BDB06D4"}

So it looks like some of my data has been broken.

Node version: v4.2.3
OS: CentOS Linux release 7.0.1406 (Core)

@dominictarr

This comment has been minimized.

Owner

dominictarr commented Jan 18, 2016

merge doesn't know where the lines in your flies are. when fs reads from the disk, it may break them up anywhere. you need the splits to be at the lines so that es.parse can handle them.

this should work:

  es.merge(fileNames.map(function(item, index) {
               return fs.createReadStream(item)
                  .pipe(es.split())
                  .pipe(es.parse())
             }))
@ghost

This comment has been minimized.

ghost commented Jan 19, 2016

Thank you @dominictarr, you are right! I need to learn about how stream works.

@ghost ghost closed this Jan 19, 2016

This issue was closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment