You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It's not obvious for somebody using the API to know when to use load_all_data.
I see two reasons to turn sampling on when using large files:
I'm exploring a data set within a notebook, and therefore I might create temporary measures or intermediates queries or I might even try to join stores and I quickly want to get all those results
I haven't created a cube yet and since data from stores are reloaded after each join and create_cube methods call, I'd rather do it afterwards.
# We can now load all the data so that visualizations operate on the entire dataset.
# NB: as a best practice, to optimize speed while exploring your data, we recommend keeping the default sampling mode enabled.
# Once the model is ready, as it is the case in this notebook, you may call session.load_all_data() after creating the cube.
session.load_all_data()
Can you confirm what is the best practice? Could you document it as well somewhere?
The text was updated successfully, but these errors were encountered:
Atoti can handle very large volumes of data while still providing fast answers to queries. However loading a large amount of data during the modeling phase of the application is rarely a good idea because creating stores, cubes, hierarchies and measures are all operations that takes more time when there is more data.
Sampling is a way to have immediate feedback for each cell call so as a rule of thumb you can try to use session.load_all_data as late as possible in your project, even as the last line of your notebook if you can.
Think of it as first building your model with a sample of the data, then replaying every thing with the whole dataset but instead of replaying each cell you call session.load_all_data.
Description
It's not obvious for somebody using the API to know when to use
load_all_data
.I see two reasons to turn sampling on when using large files:
join
andcreate_cube
methods call, I'd rather do it afterwards.In https://github.com/atoti/notebooks/blob/master/retail/pricing-simulations-around-product-classes/main.ipynb I ended up writing:
Can you confirm what is the best practice? Could you document it as well somewhere?
The text was updated successfully, but these errors were encountered: