Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating unique identifier in the template.yaml #29

Closed
arnaudstiegler opened this issue May 26, 2021 · 12 comments
Closed

Creating unique identifier in the template.yaml #29

arnaudstiegler opened this issue May 26, 2021 · 12 comments
Assignees
Labels
enhancement New feature or request

Comments

@arnaudstiegler
Copy link
Contributor

For now, it looks like we can sort of uniquely identify each template using a combination of template name and dataset name, but I'm expecting potential collisions when a lot of people start contributing. Besides, naming each template might not be useful (like if we end up with names like template1 template2 etc...), and it would help contributors if they don't have to add a name/check conflicts on the naming part before merging their template.yaml.

I was thinking that we could add an ID to each entry by getting the hash of timestamp + dataset + string of prompt python function or jinja template? That should be more than enough to prevent collisions

@shanyas10
Copy link
Contributor

As an extension, I think we can also keep a user from saving an empty template (currently we can save a template right after entering the name). However, @arnaudstiegler if we remove the template name, do you think it can make it difficult for one to trace back to the template they created? As an alternative, I think we can have the author's name saved in the template so that it's easier to filter but I'm sure there are other better ways.

@arnaudstiegler
Copy link
Contributor Author

For sure keeping the template name is better to retrieve some templates.
Adding the author name is also useful, and as you mentioned probably a better way to filter out templates.

That being said, I believe the template name should not be used as a database key here:

  • so that we don't have to force providing a name (no need for uninformative names, contributors don't have to add them)
  • so that we don't have to deal with collisions in names (i.e 2 templates with same name will have different uuid anyway)

@VictorSanh
Copy link
Member

VictorSanh commented May 26, 2021

Actually, I want us to separate templates for each dataset. You would have as many templates.yaml as you have datasets (i.e. under dataset_name/templates.yaml).

There are two reasons for that (as discussed in #23 ):
1/ once we scale to 50 people working in //, it'll be easier and more parallelizable to have one dataset assigned to one person
2/ easier to set up some sort of review via PRs (one pr = one prompted dataset) and track down who's done what and avoid as much as possible merge conflicts
We did that for the dataset sprint and it worked quite well to maximize the output of each participant.

The author's name would be naturally tracked on Github via the commit history.

On the template name, I think we should autocomplete (template5) it by default if the user doesn't provide it

If someone wants to take these action items, please do!

@stephenbach
Copy link
Member

I'll get to it tonight if no one claims it before then!

@stephenbach stephenbach self-assigned this May 26, 2021
@stephenbach stephenbach added the enhancement New feature or request label May 26, 2021
@arnaudstiegler
Copy link
Contributor Author

Yes, if one person is assigned to one dataset, there's a lot less risks of conflict and we don't really have to deal with that at all.
@stephenbach Happy to work on that if you want

@stephenbach
Copy link
Member

Sure, help is always appreciated! I'll tag you too. I was thinking we'd just modify TemplateCollection to read and write from the directory. We could add some code to handle modifications to avoid rewriting files that haven't been modified. Maybe read everything once and only write/reread files that are modified? Not sure if that's doable with the @st.cache annotation.

@arnaudstiegler
Copy link
Contributor Author

arnaudstiegler commented May 26, 2021

yes, that makes sense.
I'm not super familiar with the repo yet, but would be nice to could create a helper class, init one object per dataset that will link to the corresponding template.yaml. So that we would read/write with the object corresponding to the dataset
TemplateCollection would be reading all files / init those helper classes

@stephenbach
Copy link
Member

Yup, that could work nicely! The only reason I can see that we need to read them all is because we allow users to filter based on how many templates there are for a dataset. So another way you could do it is read that info once and only then touch the modified files after that. Not sure which is better.

@arnaudstiegler
Copy link
Contributor Author

Have a draft for that, will open a PR by end of day to get some feedback.

It does one full-read, and selected writes on modified files. However, it does require to keep in mem all templates from all datasets (we could circumvent that, but I don't think it's worth the effort).
As indicated by Victor, I changed the templates folder so that it contains one folder per dataset, and if applicable subfolders for different config. Each folder/subfolder has its own template.yaml

@stephenbach
Copy link
Member

Closing for now since this was essentially resolved by #34, even though we didn't actually create unique IDs.

@VictorSanh
Copy link
Member

After talking with @arnaudstiegler, I came to the realization that there is another advantage to using uids.
Right now it's a pain in the *ss to change the name of a template through the app after it's created (you basically need to delete it and create a new one, or directly edit the YAML file). IDs would bypass this issue since you could have a text box with the template name, independently of the ID.
This is very minor, but feel free to open a PR introducing this change!
Sorry to have derailed your initial suggestion

@arnaudstiegler
Copy link
Contributor Author

Sure, I can open a PR that will:

  • assign a uuid to each template that is created
  • use the uuid as key to retrieve templates
  • display the template name on the UI, and allow updates

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants