-
Notifications
You must be signed in to change notification settings - Fork 410
Expand file tree
/
Copy pathmarvin.mdx
More file actions
138 lines (106 loc) · 4.87 KB
/
marvin.mdx
File metadata and controls
138 lines (106 loc) · 4.87 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
---
title: Comparing Marvin
---
[Marvin](https://github.com/PrefectHQ/marvin) lets developers do extraction or classification tasks in Python as shown below (TypeScript is not supported):
```python
import pydantic
class Location(pydantic.BaseModel):
city: str
state: str
marvin.extract("I moved from NY to CHI", target=Location)
```
You can also provide instructions:
```python
marvin.extract(
"I paid $10 for 3 tacos and got a dollar and 25 cents back.",
target=float,
instructions="Only extract money"
)
# [10.0, 1.25]
```
or using enums to classify
```python
from enum import Enum
import marvin
class RequestType(Enum):
SUPPORT = "support request"
ACCOUNT = "account issue"
INQUIRY = "general inquiry"
request = marvin.classify("Reset my password", RequestType)
assert request == RequestType.ACCOUNT
```
For enum classification, you can add more instructions to each enum, but then you don't get fully typed outputs, nor can reuse the enum in your own code. You're back to working with raw strings.
```python
# Classifying a task based on project specifications
project_specs = {
"Frontend": "Tasks involving UI design, CSS, and JavaScript.",
"Backend": "Tasks related to server, database, and application logic.",
"DevOps": "Tasks involving deployment, CI/CD, and server maintenance."
}
task_description = "Set up the server for the new application."
task_category = marvin.classify(
task_description,
labels=list(project_specs.keys()),
instructions="Match the task to the project category based on the provided specifications."
)
assert task_category == "Backend"
```
Marvin has some inherent limitations for example:
1. How to use a different model?
2. What is the full prompt? Where does it live? What if I want to change it because it doesn't work well for my use-case? How many tokens is it?
3. How do I test this function?
4. How do I visualize results over time in production?
### Using BAML
Here is the BAML equivalent of this classification task based off the prompt Marvin uses under-the-hood. Note how the prompt becomes transparent to you using BAML. You can easily make it more complex or simpler depending on the model.
```baml
enum RequestType {
SUPPORT @alias("support request")
ACCOUNT @alias("account issue") @description("A detailed description")
INQUIRY @alias("general inquiry")
}
function ClassifyRequest(input: string) -> RequestType {
client GPT4 // choose even open source models
prompt #"
You are an expert classifier that always maintains as much semantic meaning
as possible when labeling text. Classify the provided data,
text, or information as one of the provided labels:
TEXT:
---
{{ input }}
---
{{ ctx.output_format }}
The best label for the text is:
"#
}
```
And you can call this function in your code
```python
from baml_client import baml as b
...
requestType = await b.ClassifyRequest("Reset my password")
# fully typed output
assert requestType == RequestType.ACCOUNT
```
### The bottom line
Marvin was a big source of inspiration for us -- their approach is simple and elegant for quick Python prototypes.
**BAML's advantages over Marvin:**
- **Prompt transparency** - See and control exactly what's sent to the LLM
- **Multi-language support** - Python, TypeScript, Java, Go, not just Python
- **Model flexibility** - Use any provider (OpenAI, Claude, Gemini, open-source)
- **Real testing** - Test in VSCode without API calls or burning tokens
- **Production features** - Built-in retries, fallbacks, streaming, error handling
- **Better type system** - Enums with descriptions, aliases, complex nested types
- **Cost optimization** - See token usage and optimize prompts
**What this means for your applications:**
- **Faster development** - Test and iterate on prompts instantly
- **Better reliability** - Handle edge cases and model failures automatically
- **Multi-language teams** - Same logic works in Python, TypeScript, and more
- **Production readiness** - Built-in observability and error handling
- **Model independence** - Never get locked into one provider
**Marvin is great for:** Quick Python prototypes, simple one-off tasks
**BAML is great for:** Production applications, multi-language teams, complex workflows
We recommend checking out Marvin if you're just starting with prompt engineering or need a quick Python solution. But if you're building production applications that need reliability, observability, and multi-language support, [try BAML](https://docs.boundaryml.com).
### Limitations of BAML
BAML does have some limitations we are continuously working on. Here are a few of them:
1. It is a new language. However, it is fully open source and getting started takes less than 10 minutes. We are on-call 24/7 to help with any issues (and even provide prompt engineering tips)
1. Developing requires VSCode. You _could_ use vim and we have workarounds but we don't recommend it.