Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Directory Structures, DataFrame Page, & Others #3142

Open
u3Izx9ql7vW4 opened this issue Apr 28, 2024 · 0 comments
Open

Directory Structures, DataFrame Page, & Others #3142

u3Izx9ql7vW4 opened this issue Apr 28, 2024 · 0 comments
Labels
type / enhancement Issue type: new feature or request

Comments

@u3Izx9ql7vW4
Copy link

u3Izx9ql7vW4 commented Apr 28, 2024

馃殌 Feature

Hi I've been using Aim pretty extensively almost every day for the last couple weeks. Overall the experience been really great -- orders of magnitude better than MLFlow, which was what I was using before. The UI is particularly well-suited for ablation studies.

To that end, I've noticed a few pain points and have turned into feature requests.

Runs/Metrics Page: Organization
Some kind of directory structure that allows developers to organize runs. For example, I might want to organize my runs like so:

ModelA/
|   ModelA-VariationB/
|   |   run 2l3krjf9...
|   |   run sd0f9j3...
AblationTest2024/
|   FeatureX/
|   |   run 95zkhe...
|   |   run priwnc3...
DatasetTests/
|   DatasetX/
|   |   DatasetX-v2/
|   |   |  run bby47z...
|   |   |  run we5b6n..
Adhoc/
...

To take the directory analogy further, it would be great to be able to copy/move/paste runs.

It doesn't have to be modelled as a directory data-structure behind the scenes, it can be something like AWS's S3 directories, which is just a flat container of files where the illusion of directories are created with file names like dirA/subdirB/file , and the directory structure would only appears in the UI. I think there are packages that do this for you, maybe s3fs, but this isn't my area of expertise.

Metrics
It would be great if the plot could show the run name or some kind of identifier when I hover over a line. Right now when I view display metrics on a graph, it shows a bunch of lines, and I have to look down at the bottom to see the color coating to know which run corresponds to what metric.
Capture d鈥檈虂cran 2024-04-28 a虁 14 14 44

It would also be nice to order the plots on this page. I think they're alphabetical at the moment. Usually I', displaying 3+ metrics simultaneously, and it's ordered from least important to most important, with the most important being dead last. This is a bit of a drag.

Scatter Plot

  • Show step-level data on the scatter plot page, like on the Metrics Page.
  • Have a color gradient associated with data points, eg. run-abc: step 1 = light blue, step 300 = dark blue; run-def: step1 = light orange, step 300 = dark orange, etc

DataFrames Page
Have a page dedicated to seeing difference between Dataframes of runs. There's already a really power "Show Table Diffs" feature, and this would be a killer feature to have when comparing dataframes. It would also be nice to apply filtering and display metrics in accordance to this filters simple aggregates, like count, simple mean / median, standard deviation (eg, format like count=50, mean=32 +/- 2.5) on the bottom right like they have it in Excel. Finally, the option to download the data as a CSV.

Getting started guide
I don't recall coming across in the getting started guide that I needed to specify a repo parameter in the aim.Run function. I got stuck on this and almost gave up because I couldn't get the simple example to show up as a run in the UI.


Motivation

Organization
While the product is marketed for large quantities of runs, I'm finding a little difficult to keep track of everything. Right now I have about 150 unarchived runs, and it's very quickly becoming a sea. Most of my runs are batched into themes, such as testing out a new variation of an existing model, seeing the effect of adding/removing a feature, etc. But these "themes" vary widely, hence the request for user-defined directories.

The request about copy/past request is related to benchmarking. If I want to create a new folder, I usually want to copy over a benchmark from a previous run, rather than re-running the benchmark every time I want to do a new set of experiments.

Moving request relates to reorganization. If I find that I have too many folders and they all fall under a unified theme, then it would be really useful to nest them.

Scatter Plot
Scatter plot doesn't show step level data like on the Metric page. For example, if I have a run with a 100 logged data points as metrics, on the metrics page, it will show the 100 data points corresponding to each step, but the scatter page only shows one data point (presumably the last one).

DataFrames Page
Manual examination of Dataframes is a pretty sizeable component of machine learning / statistical analysis. It's one thing to see aggregate statistics like RMSE, F1 Score, loss, etc. it's another to see where the model made its mistakes, and how large were the mistakes. Often the latter exercise is much more informative.

One concrete use case is seeing prediction versus target, then applying filters to see where model got things really wrong, and be able to compare across models/features, etc. So as far as artifacts go, this is a pretty important one.

@u3Izx9ql7vW4 u3Izx9ql7vW4 added the type / enhancement Issue type: new feature or request label Apr 28, 2024
@u3Izx9ql7vW4 u3Izx9ql7vW4 changed the title What I want for Christmas (batch of feature requests) Directory Structures, DataFrame Page, & Others Apr 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type / enhancement Issue type: new feature or request
Projects
None yet
Development

No branches or pull requests

1 participant