Feature/embeddings #88

JackHopkins · 2023-11-25T11:41:40Z

Adding support to return embeddings for retrieval use-cases in the following syntax:

@monkey.patch
def score_sentiment(input: str) -> Embedding[np.array[float]]
    """
    Scores the input between 0-10
    """

With alignment statements defining contrastive fine-tuning cases in the same syntax of existing align statements:

@monkey.align
def align_score_sentiment():
    assert score_sentiment("This food is disgusting") != score_sentiment("This food tastes great!")

We are storing positive and negative contrastive examples, using the new Bloom filter implementation.

Because we can't use the patched embeddings to improve the quality of the model (AFAIK), we don't save them.

The new terminology for standard patches/align statements is Symbolic, and for embedding patches/align statements it is Embeddable.

There is now an EmbeddingModelManager to handle the creation of prompts and to generate the embeddings.
LanguageModeler is now called LanguageModelManager and it is responsible for repairing and managing the operations of the backend LLM.

High-level LLM providers can inherit from Embedding_API and/or LLM_API to expose embedding and symbolic sampling support respectively.

The next step is to move the low-level OpenAI finetuning logic into something like an SymbolicFinetuningAPI which handles training distilled models.

Conversely, we could also create a EmbeddingFinetuningAPI to handle contrastive finetuning using the examples we are now collecting from Embedding functions.

Currently only OpenAI is supported - and it inherits from both Embedding_API and LLM_API.

Description: This commit refactors the monkey_patch module and persistence layer. The following files were modified:

Description:

Description: This commit updates the language modeler and embedding API in the monkey_patch package. The following files have been modified:

…ture Commit Description:

Description:

Commit Description: This commit fixes the load method in the `IBloomFilterPersistence` interface and the `FileSystemBloomFilterPersistence` class.

… files Commit description:

Description:

Description: This commit updates various files related to language models and embeddings. The following changes were made:

Description:

…ions with non-patched embeddable functions Commit Description:

Description:

Commit Description:

Description:

… from finetune hash

Jack Hopkins and others added 19 commits November 24, 2023 14:08

Refactor monkey_patch and persistence

3dbd845

Description: This commit refactors the monkey_patch module and persistence layer. The following files were modified:

Refactor code for better logging functionality

7067ace

Description:

Add Embedding class to models

b89f909

Description:

Update language modeler and embedding API

f13eed4

Description: This commit updates the language modeler and embedding API in the monkey_patch package. The following files have been modified:

small bugfixes

9764ad4

some small bugfixes and fixing merge conflicts

90ad806

Merge branch 'master' into feature/persistence

c823c46

Commit Subject: Update language models and models in monkey patch fea…

075ea5c

…ture Commit Description:

Update .gitignore to include monkey_patch files

6c2877c

Description:

Commit Subject: Fix bloom filter persistence load method

6879cfd

Commit Description: This commit fixes the load method in the `IBloomFilterPersistence` interface and the `FileSystemBloomFilterPersistence` class.

Commit subject line: Updated .gitignore and removed unnecessary .idea…

efc1459

… files Commit description:

Refactor monkey patching and add persistence feature

117917a

Description:

Update embedding.py and register.py

3870429

Description:

Update language model and embedding functionality

f43b529

Description: This commit updates various files related to language models and embeddings. The following changes were made:

Refactor monkey patching code for embeddings

a8f7cce

Description:

Commit Subject: Add exception handling for comparing embeddable funct…

921e403

…ions with non-patched embeddable functions Commit Description:

Update monkey patch for embeddings

8f6d1d2

Description:

Update .gitignore to ignore unnecessary files during version control

4fdc75d

Description:

Commit Subject Line: Add monkey patch for embedding models

e484c13

Commit Description:

JackHopkins changed the title ~~[WIP] Feature/embeddings~~ Feature/embeddings Nov 25, 2023

JackHopkins requested a review from MartBakler November 25, 2023 22:23

JackHopkins changed the title ~~Feature/embeddings~~ [WIP] Feature/embeddings Nov 25, 2023

Commit Subject: Refactor assertion_visitor and bloom_filter

c57354c

Commit Description:

JackHopkins changed the title ~~[WIP] Feature/embeddings~~ Feature/embeddings Nov 26, 2023

Jack Hopkins and others added 4 commits November 26, 2023 11:26

Commit Subject: Add main_embed.py and assertion changes

6ee2fe5

Commit Description:

Update function description in monkey_patch models

83d11ae

Description:

small fix for incorrect datapoint counting for reconstructing configs…

3b3b0ad

… from finetune hash

cleaned the embedding example a bit

0fe1235

MartBakler approved these changes Nov 26, 2023

View reviewed changes

Merge branch 'master' into feature/embeddings

536cbea

JackHopkins merged commit b0f14f0 into master Nov 26, 2023

JackHopkins deleted the feature/embeddings branch November 26, 2023 11:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/embeddings #88

Feature/embeddings #88

JackHopkins commented Nov 25, 2023 •

edited

Feature/embeddings #88

Feature/embeddings #88

Conversation

JackHopkins commented Nov 25, 2023 • edited

JackHopkins commented Nov 25, 2023 •

edited