Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark & enable chunk size for pandas data fetching #332

Open
suranah opened this issue Oct 25, 2021 · 1 comment
Open

Benchmark & enable chunk size for pandas data fetching #332

suranah opened this issue Oct 25, 2021 · 1 comment
Assignees
Labels
🧮 algorithms Core algorithms, optimization & ML model related 🛠️ backend

Comments

@suranah
Copy link
Contributor

suranah commented Oct 25, 2021

Enable chunk size for pandas data fetching to make it more performant for larger data sizes. We need to benchmark the ideal chunk size with 10M rows as an input for the default time window.

@suranah suranah added 🛠️ backend 🧮 algorithms Core algorithms, optimization & ML model related labels Oct 25, 2021
varunp2k added a commit that referenced this issue Oct 25, 2021
- enabled chunk_size during data fetching
- added file connectors_utils.py, utils for connectors
- data is fetched in chunks and then merged into a single dataframe
varunp2k added a commit that referenced this issue Oct 26, 2021
- enabled chunk_size during data fetching
- added file connectors_utils.py, utils for connectors
- data is fetched in chunks and then merged into a single dataframe
kartikay-bagla added a commit that referenced this issue Oct 27, 2021
fix(backend): #332 added chunk_size param for data fetching
@bhargavsk1077
Copy link
Contributor

Tested different chunk sizes with 'spotify_12m' dataset

The following are the observed fetch times:

  • CHUNKSIZE=10000 : 1 min 57 seconds
  • CHUNKSIZE=20000 : 1 min 56 seconds
  • CHUNKSIZE=50000 : 1 min 50 seconds

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🧮 algorithms Core algorithms, optimization & ML model related 🛠️ backend
Projects
None yet
Development

No branches or pull requests

3 participants