Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expected speed for large files/what is "slow"? #401

Open
cyrusae opened this issue Aug 20, 2022 · 0 comments
Open

Expected speed for large files/what is "slow"? #401

cyrusae opened this issue Aug 20, 2022 · 0 comments

Comments

@cyrusae
Copy link

cyrusae commented Aug 20, 2022

I am trying to figure out if the time elapsed using fromJSON that I'm looking at is typical or a red flag; I apologize if that's not appropriate to raise an issue about.

I'm using files from these API endpoints (i.e., I get JSON in, I do not have control over the form of the JSON before it is imported, it is theoretically the same every time), namely the default cards and all cards bulk files. I use download.file() to retrieve them and then load the saved files with fromJSON.

The ~281mb default cards file takes between 65 and 95 seconds to fromJSON in. The ~1.5gb all cards file, which is similarly-structured because they're both sets of Card objects, takes between 25 and 45 minutes. (I am using R 4.2.1 in RStudio on Windows 11 with a SSD and 64gb of RAM, if that matters; peak memory usage tops out around ~12gb RAM.)

Is that in the realm of reasonable expectation for files of this size/complexity, including the nonlinear increase in processing time, or should I be treating it as a red flag? I have no basis of comparison for JSON as opposed to CSV and don't mean to be disrespectful if the answer is in fact that this is as fast as it gets, I just didn't want to assume that was normal.

To reproduce/see on your own machine:

  1. Get https://api.scryfall.com/bulk-data directly from that url to get the download links for "default" and "all cards" data
  2. download.file() the JSON for "default" and "all" cards
  3. Try fromJSON() on the resulting files (I am using default settings)
  4. (Optional) The numbers I'm citing are from benchmarking with tictoc.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant