Experiment

Do following operations on a given csv dataset:

Filter OUT people without email (field #3)
Filter OUT people without phone number (field #5)
Remove any spaces, x, (, ), - from phone number (field #5)

Why?

Because we live in an era where FaaS (function as a service) like AWS Lambda is getting more and more powerful, but at the same time has its own limitations, like memory. Sometimes FaaS is ideal to do some operations and sometimes those operations involve big files (ie. log parsing, archiving). Knowing how to operate on those files in an efficient manner might make or break possibility of using the best tool for the job.

Input

Faker seed: 1337
Rows: 1000000
Data size: 775264070 bytes (739MB)

Buffers

Load everything to memory
Perform operations
Write to output file

Output size: 496410002 bytes (473MB)

Execution time (ms)	Memory used (MB)
7600	2086

Streams

Stream file to highland
Perform operations
Pipe to the output file

Output size: 495772785 bytes (473MB)

Execution time (ms)	Memory used (MB)
5800	12

Conclusion

It looks like when dealing with big files streams are more efficient, especially in the memory department.

Notes

I didnt manage to find any difference between the output files even though file sizes are different.

Reproduce

If you want to reproduce the experiment:

npm ci                  # install dependencies

npm run generate-data   # settings: 1m of rows, 1337 seed

npm run experiment      # run experiments

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.editorconfig		.editorconfig
.gitignore		.gitignore
README.md		README.md
buffer.js		buffer.js
generate-data.js		generate-data.js
package-lock.json		package-lock.json
package.json		package.json
streams.js		streams.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.editorconfig

.editorconfig

.gitignore

.gitignore

README.md

README.md

buffer.js

buffer.js

generate-data.js

generate-data.js

package-lock.json

package-lock.json

package.json

package.json

streams.js

streams.js

Repository files navigation

Experiment

Why?

Input

Buffers

Streams

Conclusion

Notes

Reproduce

About

Releases

Packages

Languages

pavelloz/streams-vs-buffers

Folders and files

Latest commit

History

Repository files navigation

Experiment

Why?

Input

Buffers

Streams

Conclusion

Notes

Reproduce

About

Resources

Stars

Watchers

Forks

Languages