Classification weak learners should allow more than one input feature #14

agravier · 2022-07-20T15:12:18Z

Hi team,

I decided to give Refinery a try with a classification problem where there are more than one input features, and the idea is to classify their combination into a few categories.

To give an example of a similar problem, imagine an oxymoron classification task with 2 input features: word_a and word_b, and a binary class output: is_oxymoron and not_oxymoron.

The problem I have is that the two features or their embeddings are useless in isolation, it's their interaction that counts. But in all weak learner possibilities, I apparently must choose feature a or feature b, I can't use both. Am I misunderstanding something? This could be something I don't understand in the UI.

Also, I would expect to be able to transform the input data with my own functions and use that as input as well; although not ideal, this could be used to work around the limitation of one input feature per learner.

Otherwise, it looks good and the UI is rather well-organised.

JWittmeyer · 2022-07-21T07:22:54Z

Hi @agravier,

thank you for reaching out to us and your feedback. You are right, both options (1. multi-attribute embeddings & 2. "calculated" columns) aren't part of our current UI. Calculated columns are on our roadmap for 2022.

jhoetter · 2022-07-21T07:24:51Z

Hi! That point is 100% valid, and we thought about it too. We're thinking about the following, and I'd be curious what you think about it:

currently, you have one programming interface, i.e. in the heuristics sections
in the near future (Q4), you'll be able to have a programming interface similar to that to write computed attributes, e.g.

def word_a_cat_word_b(record):
    return str(record["word_a"]) + str(record["word_b"])

also, we're continuing our work on our embedder library. Here, again we want to provide a programmatic interface that provides similar to the active learning templates, but with which you can compute your very own customized (and finetuned) embeddings, e.g.

from embedders.classification.contextual import TransformerSentenceEmbedder
def classification_word_a_cat_word_b_distilbert(record):
    embedder = TransformerSentenceEmbedder("distilbert-base-cased")
    return embedder.fit_transform(record["word_a_cat_word_b"], record["is_oxymoron"])

of course, not 100% sure about the exact interface here, but that is the general idea.

And thanks for trying out refinery, means a lot! :)

agravier · 2022-07-21T13:25:25Z

Thanks for getting back to me @JWittmeyer and @jhoetter. Sound good, as long as the UX is there to make all this clear. Another couple of things that you may want to consider, from my trial: tabular data export (not that JSON is horrible, but the thing lends itself to a tabular format) and "partially annotated input reconciliation", when one of the columns of the imported data already contains some labels. Obviously this raises some more questions that could be presented to the user about what to do with this data, like assign it to which annotator, etc.

agravier · 2022-07-21T13:26:37Z

I'll revisit in a few months, all the best, cheers!

jhoetter · 2022-07-21T14:27:55Z

Thanks for the input @agravier. We already have a format to upload existing data (https://docs.kern.ai/docs/project-creation-and-data-upload#uploading-existing-labeled-data), but I agree that this requires UX improvement. We'll work on this, and I'd be happy to have your feedback again when that's implemented :)

jhoetter · 2022-09-13T11:28:32Z

This will be first solved by implementing #40. You'll be able to modify any attributes, in that case have e.g. a concatenation of word_a and word_b (similar to this):

def word_a_cat_word_b(record):
    return str(record["word_a"]) + str(record["word_b"])

Afterward, you can apply encoding to this attribute.

We'll ultimately provide an extensive interface to program embeddings, but that is a bit further down the road :)

jhoetter · 2022-10-05T09:52:44Z

@agravier This is solved with the release of version 1.3.0. You can now do attribute modifications, which allow you to then create exactly the embeddings you like. Let us know what you think :)

agravier · 2022-10-22T04:25:20Z

Thanks for the heads up @jhoetter , I'll give it a try at the next occasion. Cheers

jhoetter self-assigned this Jul 22, 2022

jhoetter added the enhancement New feature or request label Jul 22, 2022

jhoetter added this to Backlog in Roadmap Kern AI refinery Jul 25, 2022

jhoetter mentioned this issue Jul 25, 2022

Attribute calculation #40

Closed

jhoetter moved this from Backlog to 2022 in Roadmap Kern AI refinery Aug 3, 2022

jhoetter moved this from 2022 to Next cycle in Roadmap Kern AI refinery Aug 16, 2022

jhoetter moved this from Next cycle to Current cycle in Roadmap Kern AI refinery Sep 13, 2022

jhoetter added this to the v1.2.1 milestone Sep 13, 2022

jhoetter closed this as completed Oct 5, 2022

JWittmeyer moved this from Current cycle to Done in Roadmap Kern AI refinery Oct 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Classification weak learners should allow more than one input feature #14

Classification weak learners should allow more than one input feature #14

agravier commented Jul 20, 2022

JWittmeyer commented Jul 21, 2022

jhoetter commented Jul 21, 2022

agravier commented Jul 21, 2022

agravier commented Jul 21, 2022

jhoetter commented Jul 21, 2022

jhoetter commented Sep 13, 2022 •

edited

jhoetter commented Oct 5, 2022

agravier commented Oct 22, 2022

Classification weak learners should allow more than one input feature #14

Classification weak learners should allow more than one input feature #14

Comments

agravier commented Jul 20, 2022

JWittmeyer commented Jul 21, 2022

jhoetter commented Jul 21, 2022

agravier commented Jul 21, 2022

agravier commented Jul 21, 2022

jhoetter commented Jul 21, 2022

jhoetter commented Sep 13, 2022 • edited

jhoetter commented Oct 5, 2022

agravier commented Oct 22, 2022

jhoetter commented Sep 13, 2022 •

edited