Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Epic] Data Augmentation Issue Overview and Discussions #55

Closed
6 tasks
tianjianjiang opened this issue Jun 3, 2021 · 2 comments
Closed
6 tasks

[Epic] Data Augmentation Issue Overview and Discussions #55

tianjianjiang opened this issue Jun 3, 2021 · 2 comments

Comments

@tianjianjiang
Copy link
Contributor

tianjianjiang commented Jun 3, 2021

@tianjianjiang (Mike) hopes that this issue ticket will act like an Epic issue that covers potential tasks of data augmentation.
Please kindly note that Mike is merely the first member who tries to work on data augmentation and by no means an authority of this topic. Feel free to participate in any possible way.

Issues (PRs)

Request for Proposal

Although the section title is https://en.wikipedia.org/wiki/Request_for_proposal but this section isn't anywhere near a formal RFP yet. Please don't hesitate to edit or comment.

Purpose

Diversify our prompts and generalize our models.
(For example, see this comment: #52 (comment).)

Scopes

  1. Prompt template
  2. (Possibly in the future) perturbation or more complex approaches for training/fine-tuning, e.g., papers of prompt/prefix tuning.

Implementation Plans

Prompt template

Expected input/output in Jinja

Presumably world politics, sports, business, and science and technology must remain intact.

  • Input
{{text}} 
Is this a piece of news regarding world politics, sports, business, or science and technology? ||| 
{{ ["World politics", "Sports", "Business", "Science and technology"][label] }}
  • Output
{{text}} 
Is this news about world politics, sports, business, or science and technology? ||| 
{{ ["World politics", "Sports", "Business", "Science and technology"][label] }}

TBD

@VictorSanh
Copy link
Member

VictorSanh commented Jun 3, 2021

this plan looks good to me! thanks @tianjianjiang

Since we are in agreement that protecting text with jinja is a good solution (#49 (comment)) let's go with that.

we can discuss directly on issues/pull requests. I'll take a closer look at your PR later today.

(note that we'll start the prompt sourcing sprint independently of the augmentation pipeline since it's almost two orthogonal and parallelizable questions)

@stephenbach
Copy link
Member

Closing due to inactivity. Feel free to reopen if you want to revisit this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants