In [1]:
from dotenv import load_dotenv
from pydantic import Field

load_dotenv()
import marvin
from marvin.settings import temporary_settings
from typing import Annotated, List, Callable
from annotated_types import Predicate

# Final Report
Peter Yong Zhong
17-730 Prompt Engineering 
Do not redistribute 

## Abstract

## Introduction

Large Language Models have made a significant impact on the field of software engineering and is often touted as a revolution that will forever change how software engineers develop, test and deploy their code. Generally speaking, the impact to SE has been localized to two main areas: first, the development and generation of executable code[1][2], and second, the integration of LM calls into the logic of other software systems. For the latter, generally speaking, developers either perform manual string based manipulation based on some prompt template and use the foundation model APIs directly, or rely on one of the many popular frameworks in existence to perform their LM needs. Such popularity and diversity of patterns of LM programming abstractions has also invited active research in this area. The Language Model System Interface Model (LMSI)[3], for instance, is recently proposed to stratify the abstractions to seven layer of increasing abstractness. The authors have also observed five families of LM abstractions that have emerged for popular frameworks. 

Unfortunately, many of these frameworks places their emphasis on the machine learning aspects of LMs focused on incorporating novel prompting techniques[4], pipeline designs[5] and efficient inference[6]. In a setting where an LM is *progressively* integrated into an existing software system, we argue that framework designers should instead employ a bottom up approach where the concerns and patterns of programming is the starting point, and LMs are used to augment these patterns to provide a powerful generalization and analysis computational semantics. 

The layer at which the developers interact with LMs are almost always at the level of a Programming Language. Similar to Object Oriented Programming[7], Async Await programming[8] and Object Relational Mappers[9], there is a wealth of examples where language features have empowered developers to more conveniently express their business logic in a more abstract and general sense, and the compiler is able to "desugar" these constructs to a more low level implementation that is not always easy to work with. For this project, we also wish to explore how we could design such language features in a Programming Language in a post LM environment, in particular to ascertain the challenges and the opportunities it may introduce and unlock. 

In this project, we augment a popular LM framework, Marvin[10], one of the few frameworks that echos our philosophy, observing this more bottoms up approach, focused on empowering developers who "care more about *using* AI than *building* AI". It introduces a few helpful coding constructs for transforming data, extracting entities, classification and AI functions. These constructs serve as a starting point for our designs. 

Our contributions are focused on three areas. First, we both propose and implement a set of language features that are empowered by LMs and provide an open source implementation of these features under the Apache license. These features are focused on Higher Order Types and Functions, Contracts and Semantic Pattern Matching. Second, we introduce a novel notion of a *Natural Language Types*, which connect a software need for structure, along with the LMs tendency for fuzziness. Lastly, we detail a novel type driven and unit-tests driven code generation methodology that, while currently unimplemented, opens up potential for future work.
 

## Natural Language Types

Generally speaking, a type in a programming language performs two important interrelated tasks: First it lays out the schema of some data model. That is, the type specifies what information an object of that type should have and what "type" or constraint that each subpieces of these information should have. These constraints are either statically or dyanmically checked so that when the program encounter an object of a given type, it would know how to access the fields appropriately. A second consideration for types, especially in the object oriented environment, is that the way programmers reason about types is very similar to how we reason about objects outside of programming in the space of natural language. This intuitive understanding of inheritance and types are helpful since an LLM trained on human languages tend to similarly be able to reason about it. 

The idea behind *Natural Language Types* is extremely simple and is mainly comprised of two components. Firstly, in a natual language environment, the traditional types of `int`, `string` may be woefully insufficient for describing the more complex fuzzy constraints. Furthermore, in a traditional typing environment, the interdependencies between different fields are difficult to capture. Therefore, the first component of Natural Language Types is the ability to define both Field level and InterField level constraints using natural language. This component will be crucial in later sections as it forms one of the backbone of the Natural Language contract system.
 

We can explore this idea through an example below. Here, let's start off with a simple, traditional type - Pilot. 

In [3]:
class Pilot(marvin.NaturalLangType):
    id: int
    name: str
    plane_model: str
    certificate: str
    airport: str

Pilot(id = 1, name="Noah Tabuex", plane_model="Cessna 172", certificate="PPL", airport = "KPIT")

Pilot(other_information=None, id=1, name='Noah Tabuex', plane_model='Cessna 172', certificate='PPL', airport='KPIT')

With Natural Language Types, we should be able to use natural language to specify certain constraints. Here in this case, we could rely on the internal knowledge of the LM to validate the result. Here we are using `temporary_setting` to enable the contract capabilities. We chose this design since we argue that contracts should mainly be a develop time artifact that should be disabled in production to minimize performance degradations.

In [2]:
class Pilot(marvin.NaturalLangType):
    id: int
    name: str
    plane_model: Annotated[str, Predicate(marvin.val_contract("Plane must contain more than 1 engine"))]
    certificate: str
    airport: str

In [7]:
with temporary_settings(ai__text__disable_contract=False):
    Pilot(id = 1, name="Noah Tabuex", plane_model="Cessna 172", certificate="PPL", airport = "KPIT")

ValidationError: 1 validation error for Pilot
plane_model
  Predicate val_contract.<locals>.wrapper failed [type=predicate_failed, input_value='Cessna 172', input_type=str]

In [3]:
with temporary_settings(ai__text__disable_contract=False):
    p = Pilot(id = 1, name="Noah Tabuex", plane_model="Boeing 747", certificate="PPL", airport = "KPIT")
p

Pilot(other_information=None, id=1, name='Noah Tabuex', plane_model='Boeing 747', certificate='PPL', airport='KPIT')

The natural language constraints, as mentioned above, could also be applied in a more global fashion. In this example, we are also demonstrating natural language inhertance. Usually in programming, inheritence refines a type by introducing additional fields and restricting behaviors. However, the refinement could in fact be other restrictions or constraints that we place. 


In [2]:
class Pilot(marvin.NaturalLangType):
    id: int
    name: str
    plane_model: str
    certificate: str
    airport: str

In [4]:
from typing import List


class AdvancedPilot(Pilot):
    @classmethod
    def natural_lang_constraints(cls) -> List[str]:
        existing = super().natural_lang_constraints()
        new_constraints = [
            "The certificate should allow pilot to fly for compensation and is suitable for the plane_model"
        ]
        return existing + new_constraints

In [4]:
with temporary_settings(ai__text__disable_contract=False):
    ap = AdvancedPilot(id = 1, name="Noah Tabuex", plane_model="Boeing 747", certificate="PPL", airport = "KPIT")
    # A Private Pilot's license is probably not sufficient for a Boeing 747
ap

ValidationError: 1 validation error for AdvancedPilot
  Value error, Natural language constraints not met:The certificate should allow pilot to fly for compensation and is suitable for the plane_model
 [type=value_error, input_value={'id': 1, 'name': 'Noah T...PPL', 'airport': 'KPIT'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.6/v/value_error

In [6]:
with temporary_settings(ai__text__disable_contract=False):
    ap = AdvancedPilot(id = 1, name="Noah Tabuex", plane_model="Airbus A380", certificate="ATP with Type Rating", airport = "KPIT")
    # ATP refer to airline transport pilot which can carry passengers 
ap

AdvancedPilot(other_information=None, id=1, name='Noah Tabuex', plane_model='Airbus A380', certificate='ATP with Type Rating', airport='KPIT')

Astute readers might notice a field `other_information` that is printed, but is not otherwise defined. This powers the second aspect of Natural Language Types. When an LM is generating content, excessively constraining its output, for instance through having restricted and often limited fields might hamper future performance. Instead, the LMs should be given an opportunity to store other relevant information about an object in **natural language** which could be used in the future. 

Let's continue with the Piloting example, and this time we use one of the constructs already provided by Marvin to illustrate this:


In [4]:
pilot = marvin.extract("Noah Singer, employee number 321, is a Boeing 747 Pilot holding an Airline Transport Pilot with 1000 hours of operations. He mainly flies from KPIT.", Pilot)[0]

In [5]:
pilot

Pilot(other_information='1000 hours of operations', id=321, name='Noah Singer', plane_model='Boeing 747', certificate='Airline Transport Pilot', airport='KPIT')

Here, the LM has dynamically captured the 1000 hours of operations as possible information to be used in the future. Let's imagine a scenario where we want to use this object in some other natural language computation, here say through Marvin's AI function.  

In [7]:
@marvin.ai_fn
def is_experience_pilot(pilot: Pilot) -> bool:
    "Returns whether the pilot has significant experience or straight out of pilot school"
    
is_experience_pilot(pilot)

True

The key takeaway is that the loss of information when transitioning between different perspectives (natural language vs programming) need not to be constrained by the relative loss of information dettermined by the complexity of the type but can retain information in the natural language environment. 

For programmers familiar with C++, and especially its dynamic casting structure with its inheritence structure, natural language types allow us to express this concept as well. For instance, we might have an ExperiencedPilot class:

In [7]:
from annotated_types import Gt


class ExperiencedPilot(Pilot):
    
    hours_flown : Annotated[int, Gt(500)] = Field(description="Hours flown by the pilot")
    @classmethod
    def natural_lang_constraints(cls) -> List[str]:
        existing = super().natural_lang_constraints()
        new_constraints = [
            "The Pilot must be experienced and has not had disciplinary infractions"
        ]
        return existing + new_constraints

print(marvin.try_cast(pilot,ExperiencedPilot))
pilot_unexperienced = marvin.extract("Noah Singer, employee number 344, is a Boeing 747 Pilot holding an Airline Transport Pilot with 1000 hours of operations. He mainly flies from KPIT. Noah was recently convicted of a DUI and is placed under suspension. ", Pilot)[0]
print(marvin.try_cast(pilot_unexperienced,ExperiencedPilot))


ExperiencedPilot(other_information=None, id=1, name='Noah Tabuex', plane_model='Airbus A380', certificate='ATP with Type Rating', airport='KPIT', hours_flown=1000)
None


## LM Programming Constructs

### Higher Order Functions

An LM can be thought of as an inference engine between some natural language input, some instructions to generate a natural language output. In this sense, its no different from performing computation in the natural language space. Such a correspondence is the basis for many schema-driven features in DSPy and Marvin, where first order function signatures, in natural language or translated from python, can be used as a template for the LM to perform fuzzy computations. The AI functions for Marvin demonstrates a more concrete example: 

In [2]:
@marvin.fn
def generate_recipe(ingredients: list[str]) -> list[str]:
    """From a list of `ingredients`, generates a
    complete instruction set to cook a recipe.
    """

generate_recipe(["lemon", "chicken", "olives", "coucous"])

['Preheat your oven to a suitable temperature for baking chicken, such as 375 degrees Fahrenheit.',
 'Season the chicken with salt, pepper, and a bit of the zest from your lemon for a citrusy flavor.',
 'Place the seasoned chicken in a baking dish.',
 'Slice the lemon and place the slices on top of the chicken to infuse it with lemony aroma while it cooks.',
 'Bake the chicken in the preheated oven until it is fully cooked and the juices run clear, about 35-45 minutes, depending on the size of the chicken pieces.',
 "While the chicken is baking, prepare the couscous according to the package instructions, typically by boiling water, adding the couscous, and letting it sit covered until it's fluffy and all the water is absorbed.",
 'Once the couscous is ready, stir in some chopped olives for an added burst of flavor and mix well.',
 'Serve the baked lemon chicken hot, paired with the olive-infused couscous on the side.']

The connections with functions are obviously very powerful from a language design perspective. However, none of the existing literatures and tools we evaluated has made the connections between LMs and higher order functions, which, in a functional language or a language where functional programming patterns are prevalent like Python, could be used to express more powerful business logics.

#### Higher Order Inputs

A function is said to have a higher ordered input, if one of the arguments to the function is a function in of itself. Such a programming pattern is incredibly common in functional programming and beyond. Traditionally, a function input allows the behavior of the main function to be parametric on the input function. A concrete example might be when integrating external APIs, different endpoints might require different preprocessing or postprocessing of data. Passing functions specific to each API endpoint can abstract and simplify the integration process.

Such a concept should therefore not be foreign to LM users as it is closely connected to the concept of function calling and tool usage or ReAct[17]. An enquiry to the LM that requires the LM to use some tool to gather external information or otherwise interact with external environments could be thought of as having a function arguments representing the signatures of the tool. We have implemented this as an augmentation of the `marvin.fn` interface. 

The following example may more concretely demonstrate this feature: 

Let's say we have some external API that returns the weather condition at cities

In [10]:

def weather_at_city(city: str) -> str:
    if city == "San Francisco":
        return "Sunny and bright"
    if city == "Los Angeles":
        return "Cold and Cloudy"
    if city == "Washington D.C.":
        return "Cloudy but comfortable"

A developer building a travel planning application could be interested in how the attractions might be recommended to the users based on the weather conditions. In this case, they may wish to write a function where given an attraction, and access to this API, the model would give an rating out of 10. 

In [3]:
from typing import Callable

@marvin.fn(max_tool_usage_times=1)
def pleasantness(attraction: str, weather_func: Callable[[str], str]) -> str:
    """
    Args:
        attraction: the name of the attraction in some place
        weather_func: a function that get the weather at a particular **city** that the attraction is located.
    Returns:
        How pleasant the attraction will likely be given the weather between 0 and 10
    """
    pass


In [8]:
# # the weather in SF is really good rn, LA not so much
print(pleasantness("The Golden Gate Bridge", weather_at_city))
print(pleasantness("Hollywood Sign", weather_at_city)) 

8
3


A possible way that a developer might integrate such a function is through sorting a list of attractions. 

In [12]:
print(sorted(["The Golden Gate Bridge","Hollywood Sign", "Lincoln Memorial"], key=pleasantness, reverse=True))

['The Golden Gate Bridge', 'Lincoln Memorial', 'Hollywood Sign']


#### Higher Order Function Outputs

Similar to functional inputs, functions themselves can output another function. In a more trivial case, this may take shape in a form of currying, but in the general case, it is often a process of input specialization, where the outputted function is specialized by some input. 

It is best to illustrate this pattern through an example:


In [2]:
@marvin.fn
def rating_for_customer(customer_profile: str) -> Callable[[str], int]:
    """
    Args:
        customer_profile: the preferences of the customer
    Returns:
        a function that specializes on the customer_profile to give a rating of a product between 1 to 10.
    """
    pass



In this case, we are producing another callable function that is powered by LM. However, rather than fixing a prompt, the prompt is dynamically generated based on the customer profile. This could bring forth a few potential benefits. Based on the purpose of the generated function, it's possible that certain aspects of the customer profile would become irrelevant. Therefore, the generated function, along with its prompt, could be shorter if this function were to be applied in a broader setting. Secondly, the induced function is now simply a generic ratings function and can be applied wherever a ratings function would be desired, without worrying about the customer profile. In fact, a possible use case may also include some sort of ensemble recommender using this as one of many rating input functions.

In [3]:
rating_func = rating_for_customer(
    "asian lady who cares about quality but cost is of greater concern"
)
rt = rating_func("A wonderful blender that is only $19, on sale from $100")  
rt

'8'

## LM Programming Language Constructs


- Extending the Marvin AI function interface to a more generalized, higher order setting
    - We make the connection between higher order functional output, i.e. a function that outputs another function, with the concept of *prompt specialization*
    - We also allow higher order functional inputs, i.e. a function that takes another function as input, and implement it as tool usage
- 

## Limitations, Evaluations and Discussions

## Related Work

This work is inspired by the many frameworks that have aimed to simplify or otherwise augment the process of interacting with LM from a programming language. In particular, the concept of natural language signatures derive from earlier work on DSP, the precursor to the modern DSPy project[5]. Further, there have been much effort spent on constraining the LM outputs to follow a particular schema or a set of contraints. Instructor[11], Outline[12], and LangChain[13], have all implemented features where the output from the LLM is parsed to a Pydantic model and the constraints associated with the models are checked dynamically. However, unlike the *Natural Language Types* we propose here, the validation step is limited by the conventional validation strategies provided by Pydantic, and lacks a systematic way in which fields and the entire object could be subjected to fuzzy natural language constraints, nor a way to map these constraints to the prompts themselves. Furthermore, information not captured by the model itself is discarded even though it could be helpful in the later processing steps. 

The notion of a natural language contract system is built upon decades of research on software contracts. Design by Contract is popularized by the programming language Eiffel[14], where the programmer, for a given routine, could annotate the `require` and `ensure` clauses, which is then optionally checked dynamically. The syntax we applied for the Semantic Contracts are more directly inspired by the Racket[15] contract system, in particular its `define/contract` structure on individual functions. The notion of dependency amongst the input arguments and the output result is also inspired by its `->i` contract combinator. However, Racket contracts are higher order by design while our design are only first order. One possible future direction could be to ascertain how and if Semantic Contracts could be applied to a system like Racket. 

Lastly, as this work mainly centers around introducing novel Programming Language constructs, we observe signficiant parallels between our work and earlier work on programming patterns. A prime example is Object Relational Mappers[9], where database tables and queries are mapped to the more familar concepts of programming object and functions. Such a pattern is very similar to our attempt at mapping LM constructs to programming languages constructs. Further, the Natural Language Types directly draw upon the design philosophy of Object Oriented Design[16], where we argue that the patterns of inheritance, polymorphism and casting continues to be relevant in its LM powered counterparts.  

## Future Work 

## Conclusion 
 

[1]: “Introducing code llama, a state-of-the-art large language model for coding,” AI at Meta, https://ai.meta.com/blog/code-llama-large-language-model-coding/ (accessed Apr. 27, 2024). 
[2]: S. Zhou et al., “Docprompting: Generating code by retrieving the docs,” arXiv.org, https://doi.org/10.48550/arXiv.2207.05987 (accessed Apr. 27, 2024). 
[3]: Two Sig article
[4]: LangChain's prompting repo
[5]: DSPy
[6]: Constrained generation 
Object Oriented Programming[7], 
Async Await programming[8] and 
Object Relational Mappers[9]
[10]: Marvin
[11]: Instructor
[12]: Outline
[13]: LangChjain
[14]: Eiffel 
[15]: Racket define/contract 
[16]: https://dl.acm.org/doi/10.1145/323648.323751
[17] : react