Creating unique identifier in the template.yaml #29

arnaudstiegler · 2021-05-26T13:22:25Z

For now, it looks like we can sort of uniquely identify each template using a combination of template name and dataset name, but I'm expecting potential collisions when a lot of people start contributing. Besides, naming each template might not be useful (like if we end up with names like template1 template2 etc...), and it would help contributors if they don't have to add a name/check conflicts on the naming part before merging their template.yaml.

I was thinking that we could add an ID to each entry by getting the hash of timestamp + dataset + string of prompt python function or jinja template? That should be more than enough to prevent collisions

The text was updated successfully, but these errors were encountered:

shanyas10 · 2021-05-26T14:22:43Z

As an extension, I think we can also keep a user from saving an empty template (currently we can save a template right after entering the name). However, @arnaudstiegler if we remove the template name, do you think it can make it difficult for one to trace back to the template they created? As an alternative, I think we can have the author's name saved in the template so that it's easier to filter but I'm sure there are other better ways.

arnaudstiegler · 2021-05-26T15:41:22Z

For sure keeping the template name is better to retrieve some templates.
Adding the author name is also useful, and as you mentioned probably a better way to filter out templates.

That being said, I believe the template name should not be used as a database key here:

so that we don't have to force providing a name (no need for uninformative names, contributors don't have to add them)
so that we don't have to deal with collisions in names (i.e 2 templates with same name will have different uuid anyway)

VictorSanh · 2021-05-26T18:53:10Z

Actually, I want us to separate templates for each dataset. You would have as many templates.yaml as you have datasets (i.e. under dataset_name/templates.yaml).

There are two reasons for that (as discussed in #23 ):
1/ once we scale to 50 people working in //, it'll be easier and more parallelizable to have one dataset assigned to one person
2/ easier to set up some sort of review via PRs (one pr = one prompted dataset) and track down who's done what and avoid as much as possible merge conflicts
We did that for the dataset sprint and it worked quite well to maximize the output of each participant.

The author's name would be naturally tracked on Github via the commit history.

On the template name, I think we should autocomplete (template5) it by default if the user doesn't provide it

If someone wants to take these action items, please do!

stephenbach · 2021-05-26T19:22:53Z

I'll get to it tonight if no one claims it before then!

arnaudstiegler · 2021-05-26T19:54:08Z

Yes, if one person is assigned to one dataset, there's a lot less risks of conflict and we don't really have to deal with that at all.
@stephenbach Happy to work on that if you want

stephenbach · 2021-05-26T19:57:19Z

Sure, help is always appreciated! I'll tag you too. I was thinking we'd just modify TemplateCollection to read and write from the directory. We could add some code to handle modifications to avoid rewriting files that haven't been modified. Maybe read everything once and only write/reread files that are modified? Not sure if that's doable with the @st.cache annotation.

arnaudstiegler · 2021-05-26T20:11:41Z

yes, that makes sense.
I'm not super familiar with the repo yet, but would be nice to could create a helper class, init one object per dataset that will link to the corresponding template.yaml. So that we would read/write with the object corresponding to the dataset
TemplateCollection would be reading all files / init those helper classes

stephenbach · 2021-05-26T20:29:30Z

Yup, that could work nicely! The only reason I can see that we need to read them all is because we allow users to filter based on how many templates there are for a dataset. So another way you could do it is read that info once and only then touch the modified files after that. Not sure which is better.

arnaudstiegler · 2021-05-27T13:25:43Z

Have a draft for that, will open a PR by end of day to get some feedback.

It does one full-read, and selected writes on modified files. However, it does require to keep in mem all templates from all datasets (we could circumvent that, but I don't think it's worth the effort).
As indicated by Victor, I changed the templates folder so that it contains one folder per dataset, and if applicable subfolders for different config. Each folder/subfolder has its own template.yaml

stephenbach · 2021-05-28T19:17:42Z

Closing for now since this was essentially resolved by #34, even though we didn't actually create unique IDs.

VictorSanh · 2021-05-28T21:29:06Z

After talking with @arnaudstiegler, I came to the realization that there is another advantage to using uids.
Right now it's a pain in the *ss to change the name of a template through the app after it's created (you basically need to delete it and create a new one, or directly edit the YAML file). IDs would bypass this issue since you could have a text box with the template name, independently of the ID.
This is very minor, but feel free to open a PR introducing this change!
Sorry to have derailed your initial suggestion

arnaudstiegler · 2021-05-28T22:34:09Z

Sure, I can open a PR that will:

assign a uuid to each template that is created
use the uuid as key to retrieve templates
display the template name on the UI, and allow updates

stephenbach self-assigned this May 26, 2021

stephenbach added the enhancement New feature or request label May 26, 2021

stephenbach assigned arnaudstiegler May 26, 2021

arnaudstiegler mentioned this issue May 28, 2021

Refactoring TemplateCollection to allow for one template.yaml per dataset #34

Merged

stephenbach closed this as completed May 28, 2021

arnaudstiegler mentioned this issue May 31, 2021

Add uuid to each template | allow for name updates #43

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Creating unique identifier in the template.yaml #29

Creating unique identifier in the template.yaml #29

arnaudstiegler commented May 26, 2021

shanyas10 commented May 26, 2021

arnaudstiegler commented May 26, 2021

VictorSanh commented May 26, 2021 •

edited

stephenbach commented May 26, 2021

arnaudstiegler commented May 26, 2021

stephenbach commented May 26, 2021

arnaudstiegler commented May 26, 2021 •

edited

stephenbach commented May 26, 2021

arnaudstiegler commented May 27, 2021

stephenbach commented May 28, 2021

VictorSanh commented May 28, 2021

arnaudstiegler commented May 28, 2021

Creating unique identifier in the template.yaml #29

Creating unique identifier in the template.yaml #29

Comments

arnaudstiegler commented May 26, 2021

shanyas10 commented May 26, 2021

arnaudstiegler commented May 26, 2021

VictorSanh commented May 26, 2021 • edited

stephenbach commented May 26, 2021

arnaudstiegler commented May 26, 2021

stephenbach commented May 26, 2021

arnaudstiegler commented May 26, 2021 • edited

stephenbach commented May 26, 2021

arnaudstiegler commented May 27, 2021

stephenbach commented May 28, 2021

VictorSanh commented May 28, 2021

arnaudstiegler commented May 28, 2021

VictorSanh commented May 26, 2021 •

edited

arnaudstiegler commented May 26, 2021 •

edited