Sentence embedding field #158

ivansmokovic · 2020-04-02T21:26:52Z

Adds a Sentence embedding field

…olt/takepod into missing-data-token � Conflicts: � takepod/datasets/iterator.py � takepod/storage/field.py � test/storage/test_iterator.py

…olt/takepod into missing-data-token

…tenceEmbeddingField # Conflicts: # takepod/datasets/iterator.py # takepod/examples/ner_example.py # takepod/storage/field.py # takepod/storage/vocab.py # test/storage/test_field.py # test/storage/test_iterator.py # test/storage/test_vocab.py

takepod/storage/field.py

test/storage/test_field.py

…beddingField � Conflicts: � test/storage/test_field.py

FilipBolt · 2020-12-18T00:08:18Z

This PR looks a bit outdated at this point. However, I'd like to have this functionality easily integrated in the NumericalizerABC interface. @ivansmokovic Would you be comfortable with discarding this? I don't see SentenceEmbedding in this form long term, as I believe we need some general free-form input-output contract such that we can transform a single token to a single number, multiple tokens to a single number and multiple tokens to a single number (vector) (SentenceEmbedding case)

ivansmokovic added 27 commits February 19, 2020 22:52

Added per-field custom datatype support

739d3f2

WIP: TfIdfVectorizer update pending

7c6740c

Added option to define custom missing data symbol

5532206

Optimized handling od missing value rows in Iterator

280cc43

Made TfIdf vectorizer not support fields with missing data

2f505b2

Merge branches 'master' and 'missing-data-token' of github.com:FilipB…

8596aec

…olt/takepod into missing-data-token � Conflicts: � takepod/datasets/iterator.py � takepod/storage/field.py � test/storage/test_iterator.py

Merge remote-tracking branch 'origin/master' into missing-data-token

6080172

Added missing data support to subclasses of Field

cb833dc

Merge branches 'master' and 'missing-data-token' of github.com:FilipB…

0c20b24

…olt/takepod into missing-data-token

Fixed a test

00dc2fc

Added custom padding token to field for use with custom_numericalize

c4a1184

flake8

a51b3b4

WIP, testing

5bff1b0

flake8

feeac4e

Added documentation

c119625

Added per-field custom datatype support

c5fa93e

WIP: TfIdfVectorizer update pending

3a7efc3

Added option to define custom missing data symbol

318a36b

Made TfIdf vectorizer not support fields with missing data

41b1aa7

Fixed a test

dcaba7c

Added custom padding token to field for use with custom_numericalize

09cbfb6

flake8

39f322f

WIP, testing

56fb576

flake8

f914f8f

Added documentation

37c0ed5

rebased to master

19141e0

ivansmokovic requested review from FilipBolt, mttk and sskudar April 2, 2020 21:26

ivansmokovic self-assigned this Apr 2, 2020

FilipBolt approved these changes Apr 3, 2020

View reviewed changes

takepod/storage/field.py Outdated Show resolved Hide resolved

test/storage/test_field.py Show resolved Hide resolved

ivansmokovic requested a review from FilipBolt April 17, 2020 13:00

ivansmokovic added 3 commits April 17, 2020 15:02

Added language, vocab, is_target, allow_missing_data

c67546f

Merge branch 'master' of github.com:FilipBolt/takepod into SentenceEm…

66592d6

…beddingField � Conflicts: � test/storage/test_field.py

merged master

17980ba

FilipBolt approved these changes Apr 29, 2020

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sentence embedding field #158

Sentence embedding field #158

ivansmokovic commented Apr 2, 2020

FilipBolt commented Dec 18, 2020 •

edited

Loading

Sentence embedding field #158

Are you sure you want to change the base?

Sentence embedding field #158

Conversation

ivansmokovic commented Apr 2, 2020

FilipBolt commented Dec 18, 2020 • edited Loading

FilipBolt commented Dec 18, 2020 •

edited

Loading