Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Benchmarking memory #516

Closed
1 of 2 tasks
mynameisvinn opened this issue Jan 30, 2021 · 25 comments
Closed
1 of 2 tasks

[FEATURE] Benchmarking memory #516

mynameisvinn opened this issue Jan 30, 2021 · 25 comments
Labels
good first issue Good for newcomers help wanted Extra attention is needed

Comments

@mynameisvinn
Copy link
Contributor

mynameisvinn commented Jan 30, 2021

馃毃馃毃 Feature Request

  • Related to an existing Issue
  • A new implementation (Improvement, Extension)

We should benchmark memory usage when fetching from a Hub dataset.

If your feature will improve HUB

In the near term, well-scoped memory benchmarks will assess new features. In the long term, it can be used to compare performance with other libraries such as Zarr and Tile.

Description of the possible solution

We could start with a client-side benchmark reading from a local volume, perhaps with memory-profiler.

@mynameisvinn mynameisvinn added this to Committed in Development Roadmap Jan 30, 2021
@mynameisvinn mynameisvinn added good first issue Good for newcomers help wanted Extra attention is needed labels Jan 30, 2021
@KrishnaChaitanya1
Copy link
Contributor

Hi @mynameisvinn . I am interested in doing this. However I am new to benchmarking. Can you tell me how to do this? Jakub gave some pointers as to where to start. I am going through those

@haiyangdeperci
Copy link
Contributor

@KrishnaChaitanya1 you may start from going through the documentation of the memory_profiler and applying it to Hub functions.

@KrishnaChaitanya1
Copy link
Contributor

@haiyangdeperci . Sure will check that.

@haiyangdeperci
Copy link
Contributor

Great! You might also check #520 out.

@KrishnaChaitanya1
Copy link
Contributor

@haiyangdeperci @mynameisvinn. Is this what you were expecting?
shot3

@mynameisvinn
Copy link
Contributor Author

@KrishnaChaitanya1 so far so good. Don't forget to track parameters (eg caching arguments) for reproducibility.

@KrishnaChaitanya1
Copy link
Contributor

Hi @mynameisvinn . Can you elaborate a bit?

@haiyangdeperci
Copy link
Contributor

@KrishnaChaitanya1 This is a good start. Whenever you set up an environment for benchmarking there are different conditions that can affect the measured performance. Here (I believe) @mynameisvinn wanted you to parametrize the function that you are profiling. For instance, the cache or storage_cache arguments could be False or True, and you should be mindful of both.

@mynameisvinn
Copy link
Contributor Author

@haiyangdeperci is on top of things per usual

@KrishnaChaitanya1
Copy link
Contributor

@haiyangdeperci Thanks for the info. So in a sense I should play a bit with the parameters right?

@haiyangdeperci
Copy link
Contributor

@KrishnaChaitanya1 yeah, the idea is that you design the setup in a way that the same function can be run with different parameters

@KrishnaChaitanya1
Copy link
Contributor

@haiyangdeperci Got It. I will try doing it.

@haiyangdeperci
Copy link
Contributor

Awesome, looking forward to your benchmarks

@haiyangdeperci
Copy link
Contributor

haiyangdeperci commented Feb 8, 2021

@KrishnaChaitanya1 Hi, could you give us an update on this task? Let us know if you need help.

@KrishnaChaitanya1
Copy link
Contributor

Hi @haiyangdeperci . I was busy the last week. Didnt find time to work on this. I will make some progress this week.

@haiyangdeperci
Copy link
Contributor

No worries, I was just checking what the status is 馃憤 Please take your time. If you happen to be free tomorrow, please join our benchmarking group call.

@KrishnaChaitanya1
Copy link
Contributor

Hi @haiyangdeperci . Here are some screenshots. The first one is using Hub and the second one is using tensorflow. Loaded Cifar100 on both. Check and let me know if this helps
Sho1
Shot2

@haiyangdeperci
Copy link
Contributor

Hi, thanks for helping us out. I would suggest defining parameters as variables instead of just duplicating lines. Also, in this case you should control for cache and storage_cache separately.

@KrishnaChaitanya1
Copy link
Contributor

@haiyangdeperci . Sure I will try changing them to variables. Control for cache and storage_cache in the sense, I should use them separately right and not together

@haiyangdeperci
Copy link
Contributor

@KrishnaChaitanya1 correct! Sorry if I was not precise enough

@KrishnaChaitanya1
Copy link
Contributor

Hi @haiyangdeperci . I modified the code to variables. Tested only for cache. Also added basic system information for flavor. Let me know if I should make any improvements.
shot1

@KrishnaChaitanya1
Copy link
Contributor

Hi @haiyangdeperci . Can you check the above comment and let me know your thoughts? I changed it to variables as you mentioned.

@haiyangdeperci
Copy link
Contributor

@KrishnaChaitanya1 can you send in the entire code in the form of a PR? It would be easier for us to review that. I assume you are running this with cache variable equal to False and then True. That's all right. We need to gather these results in some readable format so you may include that as well. If you need more support, come to the benchmark call tomorrow, the team and other community members will help you out.

@KrishnaChaitanya1
Copy link
Contributor

@haiyangdeperci . Created PR. #638. I have a doubt, how do you remove those merge commits?

@mynameisvinn
Copy link
Contributor Author

Closing this feature request due to inactivity and lack of interest. Will revive it if more users request it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers help wanted Extra attention is needed
Projects
No open projects
Development

No branches or pull requests

3 participants