Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/embeddings #88

Merged
merged 25 commits into from
Nov 26, 2023
Merged

Feature/embeddings #88

merged 25 commits into from
Nov 26, 2023

Conversation

JackHopkins
Copy link
Contributor

@JackHopkins JackHopkins commented Nov 25, 2023

Adding support to return embeddings for retrieval use-cases in the following syntax:

@monkey.patch
def score_sentiment(input: str) -> Embedding[np.array[float]]
    """
    Scores the input between 0-10
    """

With alignment statements defining contrastive fine-tuning cases in the same syntax of existing align statements:

@monkey.align
def align_score_sentiment():
    assert score_sentiment("This food is disgusting") != score_sentiment("This food tastes great!")

We are storing positive and negative contrastive examples, using the new Bloom filter implementation.

Because we can't use the patched embeddings to improve the quality of the model (AFAIK), we don't save them.

The new terminology for standard patches/align statements is Symbolic, and for embedding patches/align statements it is Embeddable.

There is now an EmbeddingModelManager to handle the creation of prompts and to generate the embeddings.
LanguageModeler is now called LanguageModelManager and it is responsible for repairing and managing the operations of the backend LLM.

High-level LLM providers can inherit from Embedding_API and/or LLM_API to expose embedding and symbolic sampling support respectively.

The next step is to move the low-level OpenAI finetuning logic into something like an SymbolicFinetuningAPI which handles training distilled models.

Conversely, we could also create a EmbeddingFinetuningAPI to handle contrastive finetuning using the examples we are now collecting from Embedding functions.

Currently only OpenAI is supported - and it inherits from both Embedding_API and LLM_API.

Jack Hopkins and others added 19 commits November 24, 2023 14:08
Description: This commit refactors the monkey_patch module and persistence layer. The following files were modified:
Description: This commit updates the language modeler and embedding API in the monkey_patch package. The following files have been modified:
Commit Description: This commit fixes the load method in the `IBloomFilterPersistence` interface and the `FileSystemBloomFilterPersistence` class.
Description: This commit updates various files related to language models and embeddings. The following changes were made:
…ions with non-patched embeddable functions

Commit Description:
@JackHopkins JackHopkins changed the title [WIP] Feature/embeddings Feature/embeddings Nov 25, 2023
@JackHopkins JackHopkins changed the title Feature/embeddings [WIP] Feature/embeddings Nov 25, 2023
@JackHopkins JackHopkins changed the title [WIP] Feature/embeddings Feature/embeddings Nov 26, 2023
@JackHopkins JackHopkins merged commit b0f14f0 into master Nov 26, 2023
@JackHopkins JackHopkins deleted the feature/embeddings branch November 26, 2023 11:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants