-
-
Notifications
You must be signed in to change notification settings - Fork 40
-
-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use disk.frame to read fwf with transformations #88
Comments
You can read the chunks as you have done and then use |
Do you have a link to the 10g files? |
Home: Link Link to 1 of 20 files: Data Dictionary (in the main page): PDF File The R function I developed to handle this data: Function Thank you for your interest. This database is very important for the Brazilian Society, as we use for academic studies, in the fight against corruption ... |
There isn't enough information that is easy to digest for me to understand what's going on. But I think you want to read a large file chunk by chunk using readr. You can do that and to convert the results to a disk.frame you can simply use the library(disk.frame)
df = disk.frame("some_where")
readr::read_lines_chunked(path_to_file, callback = function(chunk, id) {
add_chunk(df, chunk, id) # this will add each chunk to the disk.frame
}) please re-open if this doesn't answer your question. |
Hello, I read your package information on the UseR! 2019 and found it fantastic.
I would like to know the package
tidyr
works withdisk.frame
?If not, do you want to implement it with
disk.frame
?I have a case that your package would help a lot. I reported the case to the
vroom
(link) package repository.Follow the example I reported:
My example is a peculiar case.
The Federal Revenue Service of Brazil publishes data in a single file (10Gb) with several data.frame agglutinated, in a fwf format.
So we have to read part of the file (with read_lines_chunked ()) and treat the chunk with a function executed with callback [SideEffectChunkCallback] and then write the result to a CSV or DBMS.
We repeat this until we read every file (or files, as there may be more than one).
I'll try to sketch an example:
Created on 2019-07-08 by the reprex package (v0.3.0)
I was successful in developing the code to handle the data because of the read_lines_chunked () function to read the file in parts + callback [SideEffectChunkCallback] to process and write the result to a CSV or a DBMS.
A function that has the same functionality in the vroom package would be very important.
Originally posted by @georgevbsantiago in #76 (comment)
The text was updated successfully, but these errors were encountered: