Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read Json in chunks #235

Closed
parasml opened this issue May 14, 2020 · 3 comments
Closed

Read Json in chunks #235

parasml opened this issue May 14, 2020 · 3 comments
Assignees
Labels
bug Something isn't working minor release Will be addressed in the next minor release question Further information is requested
Milestone

Comments

@parasml
Copy link

parasml commented May 14, 2020

Hi @igorborgest,

I am reading my JSON file in chunks as it is too big in size, In below code.

df = wr.s3.read_json(path1, chunksize=2, lines=True)

df type return is Generator.

I am struggling to save above chunk (df) to parquet format, please share your thoughts on how to achieve this or if we can save it to dataframe.

Thanks,
Prashan

@parasml parasml added the question Further information is requested label May 14, 2020
@parasml
Copy link
Author

parasml commented May 15, 2020

Hi @igorborgest

Thanks for all the help in past.

Just tried with below code, but still not able to read json lines in for loop. Please suggest.

code:

df = wr.s3.read_json(path1, chunksize=5, lines=True)
i = 0
for row in df:
print("row = ", row)

print("row = ", json.loads(row))
print("--------------")

=======================================

Regards,
Prashan

@igorborgest
Copy link
Contributor

Hi @parasml,

Actually I found a bug that I.ve just fixed in the json reading with chunksize.

Do you mind to test our development branch?

pip install git+https://github.com/awslabs/aws-data-wrangler.git@dev

Example:

for df in wr.s3.read_json(paths, lines=True, chunksize=1):
    print(df)

I would like to make sure that it will be fixed on our next version 1.2.0. Thanks!

@igorborgest igorborgest self-assigned this May 15, 2020
@igorborgest igorborgest added bug Something isn't working minor release Will be addressed in the next minor release WIP Work in progress labels May 15, 2020
@igorborgest igorborgest added this to the 1.2.0 milestone May 15, 2020
@igorborgest
Copy link
Contributor

Released on version 1.2.0

@igorborgest igorborgest added bug Something isn't working and removed bug Something isn't working WIP Work in progress labels May 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working minor release Will be addressed in the next minor release question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants