New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove parsing perf bottleneck in WordEmbeddingsTransform #1599
Conversation
…cies to output folder, even if they are not used - to allow for dynamic assembly loading for EtwProfiler when used from console app
…ich can be overwritten by TrainConfig
/cc @danmosemsft |
This seems like just the kind of thing the Utf8Parser in System.Memory was meant to handle.... |
Seems all feedback was addressed? Just needs rebase? |
And a two approvals ;) |
@shauheen do you have further feedback? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Branch still needs to be updated with master.
@shauheen @eerhardt I have addressed all issues, PTAL one more time. @tannergooding thanks for pointing the |
This PR improves the performance of reading large text files and affects two of our most time-consuming benchmarks.
Info:
Before:
After:
Which is two minutes less to read the huge file for both benchmarks which results in a x3 boost for
WikiDetox_WordEmbeddings_SDCAMC
and 40% improvement forWikiDetox_WordEmbeddings_OVAAveragedPerceptron
Reading the file was a bottleneck:
I have applied all possible optimizations and parallelized this operation.
I am going to post a detailed description on Monday.