Skip to content

Documentation for file iterators #69

@lmullen

Description

@lmullen

I have a question about using file iterators. As mentioned in #65, it's necessary to iterate over the texts in a corpus twice, once for building the vocabulary, and then again to build the DTM. My assumption is that in the case of a file iterator, that means you are reading each file from disk twice. My guess is that for corpora that would fit into memory one might be better off loading the texts into a character vector one's self, then iterating over that twice. But for corpora that don't fit into memory, reading each file twice would be the only way to construct the DTM. Is that correct, or is there a better way?

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions