In [1]:
from dotenv import load_dotenv
from pydantic import Field

load_dotenv()
import marvin
from marvin.settings import temporary_settings
from typing import Annotated, List, Callable
from annotated_types import Predicate

# Final Report
Peter Yong Zhong
17-730 Prompt Engineering 
Do not redistribute 
Warning: The report may contain explicit language in some examples to illustrate certain topics of importance and serving strictly for demonstration purposes. 

## Abstract

This paper explores the integration of Large Language Models (LLMs) into software engineering, focusing on developing language features suited for a post-LLM environment. We propose new programming constructs, including Higher Order Types and Functions, Contracts, and Semantic Pattern Matching, alongside the innovative concept of 'Natural Language Types' to bridge structured software needs with the fuzziness typical of LLMs. Our approach emphasizes a bottom-up design philosophy, enhancing traditional programming patterns with AI capabilities to address both the abstractness and computational demands of integrating LLMs. This research contributes to the understanding and evolution of programming abstractions in LLM-enhanced environments, aiming to make LLMs more accessible and functional for developers.

## Introduction


Large Language Models (LLMs) have significantly impacted software engineering, often heralded as a revolutionary force in how software engineers develop, test, and deploy code. The influence of LLMs primarily extends to two areas: firstly, the development and generation of executable code[1][2], and secondly, integrating LLM calls into other software systems' logic. In the latter case, developers either manually manipulate strings based on prompt templates and directly use foundation model APIs, or they utilize popular frameworks to meet their LLM requirements. The diversity in LLM programming abstractions has spurred active research. For example, the recently proposed Language Model System Interface Model (LMSI)[3] stratifies these abstractions into seven layers of increasing abstractness. Researchers have identified five families of LLM abstractions emerging within popular frameworks.

However, many frameworks emphasize the machine learning aspects of LLMs, focusing on novel prompting techniques[4], pipeline designs[5], and efficient inference[6]. We argue that in scenarios where LLMs are *progressively* integrated into existing systems, a bottom-up approach should be adopted. This approach should prioritize programming concerns and patterns, using LLMs to enhance these patterns with powerful generalization and computational analysis capabilities.

Developers typically interact with LLMs at the programming language level. Similar to paradigms like Object-Oriented Programming[7], Async-Await programming[8], and Object-Relational Mappers[9], language features have enabled developers to express business logic more abstractly. The compiler then 'desugars' these constructs into a lower-level implementation that can be complex. In this project, we explore designing such language features in a post-LLM environment to understand the challenges and opportunities they present.

We have chosen to augment Marvin[10], a popular LLM framework that aligns with our philosophy of empowering developers focused on using AI rather than building it. Marvin introduces several coding constructs for data transformation, entity extraction, classification, and AI functions, which serve as a foundation for our designs.

Our contributions are threefold. First, we propose and implement a set of language features enhanced by LLMs, providing an open-source implementation under the Apache license. These features include Higher Order Types and Functions, Contracts, and Semantic Pattern Matching. Second, we introduce a novel concept of 'Natural Language Types', blending structured software needs with the fuzziness typical of LLMs. Lastly, we outline a novel type-driven and unit-tests-driven code generation methodology that, while not yet implemented, lays the groundwork for future developments.
 

## Natural Language Types

In programming, a type serves two crucial interrelated functions. First, it defines the schema of a data model by specifying the required information an object of that type should hold and setting constraints on the nature or "type" of each piece of data. These constraints are checked either statically or dynamically, ensuring that when a program interacts with an object of a given type, it can appropriately access and manipulate its fields. Second, in object-oriented environments, the conceptualization of types mirrors the natural language reasoning about physical objects. This intuitive alignment between programming types and natural language facilitates understanding, particularly beneficial when working with LLMs trained on human language data.

The concept of Natural Language Types is straightforward yet innovative, primarily consisting of two elements. Traditional data types like int and string often fall short in describing complex, fuzzy constraints prevalent in natural language environments. Moreover, traditional typing systems struggle to address the interdependencies between different fields. Thus, the first component of Natural Language Types involves defining constraints at both the field level and between fields using natural language. This feature is vital as it underpins the Natural Language contract system discussed later in this work.
 

We can explore this idea through an example below. Here, let's start off with a simple, traditional type - Pilot. 

In [3]:
class Pilot(marvin.NaturalLangType):
    id: int
    name: str
    plane_model: str
    certificate: str
    airport: str

Pilot(id = 1, name="Noah Tabuex", plane_model="Cessna 172", certificate="PPL", airport = "KPIT")

Pilot(other_information=None, id=1, name='Noah Tabuex', plane_model='Cessna 172', certificate='PPL', airport='KPIT')

With Natural Language Types, we should be able to use natural language to specify certain constraints. Here in this case, we could rely on the internal knowledge of the LM to validate the result. Here we are using `temporary_setting` to enable the contract capabilities. We chose this design since we argue that contracts should mainly be a develop time artifact that should be disabled in production to minimize performance degradations.

In [2]:
class Pilot(marvin.NaturalLangType):
    id: int
    name: str
    plane_model: Annotated[str, Predicate(marvin.val_contract("Plane must contain more than 1 engine"))]
    certificate: str
    airport: str

In [7]:
with temporary_settings(ai__text__disable_contract=False):
    Pilot(id = 1, name="Noah Tabuex", plane_model="Cessna 172", certificate="PPL", airport = "KPIT")

ValidationError: 1 validation error for Pilot
plane_model
  Predicate val_contract.<locals>.wrapper failed [type=predicate_failed, input_value='Cessna 172', input_type=str]

In [3]:
with temporary_settings(ai__text__disable_contract=False):
    p = Pilot(id = 1, name="Noah Tabuex", plane_model="Boeing 747", certificate="PPL", airport = "KPIT")
p

Pilot(other_information=None, id=1, name='Noah Tabuex', plane_model='Boeing 747', certificate='PPL', airport='KPIT')

The natural language constraints, as mentioned above, could also be applied in a more global fashion. In this example, we are also demonstrating natural language inhertance. Usually in programming, inheritence refines a type by introducing additional fields and restricting behaviors. However, the refinement could in fact be other restrictions or constraints that we place. 


In [2]:
class Pilot(marvin.NaturalLangType):
    id: int
    name: str
    plane_model: str
    certificate: str
    airport: str

In [4]:
from typing import List


class AdvancedPilot(Pilot):
    @classmethod
    def natural_lang_constraints(cls) -> List[str]:
        existing = super().natural_lang_constraints()
        new_constraints = [
            "The certificate should allow pilot to fly for compensation and is suitable for the plane_model"
        ]
        return existing + new_constraints

In [4]:
with temporary_settings(ai__text__disable_contract=False):
    ap = AdvancedPilot(id = 1, name="Noah Tabuex", plane_model="Boeing 747", certificate="PPL", airport = "KPIT")
    # A Private Pilot's license is probably not sufficient for a Boeing 747
ap

ValidationError: 1 validation error for AdvancedPilot
  Value error, Natural language constraints not met:The certificate should allow pilot to fly for compensation and is suitable for the plane_model
 [type=value_error, input_value={'id': 1, 'name': 'Noah T...PPL', 'airport': 'KPIT'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.6/v/value_error

In [6]:
with temporary_settings(ai__text__disable_contract=False):
    ap = AdvancedPilot(id = 1, name="Noah Tabuex", plane_model="Airbus A380", certificate="ATP with Type Rating", airport = "KPIT")
    # ATP refer to airline transport pilot which can carry passengers 
ap

AdvancedPilot(other_information=None, id=1, name='Noah Tabuex', plane_model='Airbus A380', certificate='ATP with Type Rating', airport='KPIT')

Astute readers might notice a field `other_information` that is printed, but is not otherwise defined. This powers the second aspect of Natural Language Types. When an LM is generating content, excessively constraining its output, for instance through having restricted and often limited fields might hamper future performance. Instead, the LMs should be given an opportunity to store other relevant information about an object in **natural language** which could be used in the future. 

Let's continue with the Piloting example, and this time we use one of the constructs already provided by Marvin to illustrate this:


In [4]:
pilot = marvin.extract("Noah Singer, employee number 321, is a Boeing 747 Pilot holding an Airline Transport Pilot with 1000 hours of operations. He mainly flies from KPIT.", Pilot)[0]

In [5]:
pilot

Pilot(other_information='1000 hours of operations', id=321, name='Noah Singer', plane_model='Boeing 747', certificate='Airline Transport Pilot', airport='KPIT')

Here, the LM has dynamically captured the 1000 hours of operations as possible information to be used in the future. Let's imagine a scenario where we want to use this object in some other natural language computation, here say through Marvin's AI function.  

In [7]:
@marvin.ai_fn
def is_experience_pilot(pilot: Pilot) -> bool:
    "Returns whether the pilot has significant experience or straight out of pilot school"
    
is_experience_pilot(pilot)

True

The key takeaway is that the loss of information when transitioning between different perspectives (natural language vs programming) need not to be constrained by the relative loss of information dettermined by the complexity of the type but can retain information in the natural language environment. 

For programmers familiar with C++, and especially its dynamic casting structure with its inheritence structure, natural language types allow us to express this concept as well. For instance, we might have an ExperiencedPilot class:

In [7]:
from annotated_types import Gt


class ExperiencedPilot(Pilot):
    
    hours_flown : Annotated[int, Gt(500)] = Field(description="Hours flown by the pilot")
    @classmethod
    def natural_lang_constraints(cls) -> List[str]:
        existing = super().natural_lang_constraints()
        new_constraints = [
            "The Pilot must be experienced and has not had disciplinary infractions"
        ]
        return existing + new_constraints

print(marvin.try_cast(pilot,ExperiencedPilot))
pilot_unexperienced = marvin.extract("Noah Singer, employee number 344, is a Boeing 747 Pilot holding an Airline Transport Pilot with 1000 hours of operations. He mainly flies from KPIT. Noah was recently convicted of a DUI and is placed under suspension. ", Pilot)[0]
print(marvin.try_cast(pilot_unexperienced,ExperiencedPilot))


ExperiencedPilot(other_information=None, id=1, name='Noah Tabuex', plane_model='Airbus A380', certificate='ATP with Type Rating', airport='KPIT', hours_flown=1000)
None


## LM Programming Constructs

### Higher Order Functions

An LM can be thought of as an inference engine between some natural language input, some instructions to generate a natural language output. In this sense, its no different from performing computation in the natural language space. Such a correspondence is the basis for many schema-driven features in DSPy and Marvin, where first order function signatures, in natural language or translated from python, can be used as a template for the LM to perform fuzzy computations. The AI functions for Marvin demonstrates a more concrete example: 

In [2]:
@marvin.fn
def generate_recipe(ingredients: list[str]) -> list[str]:
    """From a list of `ingredients`, generates a
    complete instruction set to cook a recipe.
    """

generate_recipe(["lemon", "chicken", "olives", "coucous"])

['Preheat your oven to a suitable temperature for baking chicken, such as 375 degrees Fahrenheit.',
 'Season the chicken with salt, pepper, and a bit of the zest from your lemon for a citrusy flavor.',
 'Place the seasoned chicken in a baking dish.',
 'Slice the lemon and place the slices on top of the chicken to infuse it with lemony aroma while it cooks.',
 'Bake the chicken in the preheated oven until it is fully cooked and the juices run clear, about 35-45 minutes, depending on the size of the chicken pieces.',
 "While the chicken is baking, prepare the couscous according to the package instructions, typically by boiling water, adding the couscous, and letting it sit covered until it's fluffy and all the water is absorbed.",
 'Once the couscous is ready, stir in some chopped olives for an added burst of flavor and mix well.',
 'Serve the baked lemon chicken hot, paired with the olive-infused couscous on the side.']

The connections with functions are obviously very powerful from a language design perspective. However, none of the existing literatures and tools we evaluated has made the connections between LMs and higher order functions, which, in a functional language or a language where functional programming patterns are prevalent like Python, could be used to express more powerful business logics.

#### Higher Order Inputs

A function is considered to have a higher-order input if one of its arguments is another function. This programming pattern is prevalent in functional programming and extends beyond it. Traditionally, having a function as an input allows the main function's behavior to be parameterized based on the input function. For example, when integrating external APIs, different endpoints may necessitate different pre-processing or post-processing of data. Passing functions tailored to each API endpoint can abstract and streamline the integration process.

This concept should therefore be familiar to LM users, as it closely relates to the idea of function calling and tool usage, such as in ReAct[17]. An inquiry to the LM that requires it to use a tool to gather external information or interact with external environments can be viewed as having function arguments that represent the tool's signatures. We have implemented this concept through an enhancement of the marvin.fn interface.

The following example may more concretely demonstrate this feature: 

Let's say we have some external API that returns the weather condition at cities

In [10]:

def weather_at_city(city: str) -> str:
    if city == "San Francisco":
        return "Sunny and bright"
    if city == "Los Angeles":
        return "Cold and Cloudy"
    if city == "Washington D.C.":
        return "Cloudy but comfortable"

A developer building a travel planning application could be interested in how the attractions might be recommended to the users based on the weather conditions. In this case, they may wish to write a function where given an attraction, and access to this API, the model would give an rating out of 10. 

In [3]:
from typing import Callable

@marvin.fn(max_tool_usage_times=1)
def pleasantness(attraction: str, weather_func: Callable[[str], str]) -> str:
    """
    Args:
        attraction: the name of the attraction in some place
        weather_func: a function that get the weather at a particular **city** that the attraction is located.
    Returns:
        How pleasant the attraction will likely be given the weather between 0 and 10
    """
    pass


In [8]:
# # the weather in SF is really good rn, LA not so much
print(pleasantness("The Golden Gate Bridge", weather_at_city))
print(pleasantness("Hollywood Sign", weather_at_city)) 

8
3


A possible way that a developer might integrate such a function is through sorting a list of attractions. 

In [12]:
print(sorted(["The Golden Gate Bridge","Hollywood Sign", "Lincoln Memorial"], key=pleasantness, reverse=True))

['The Golden Gate Bridge', 'Lincoln Memorial', 'Hollywood Sign']


#### Higher Order Function Outputs

Similar to functional inputs, functions themselves can output another function. In a more trivial case, this may take shape in a form of currying, but in the general case, it is often a process of input specialization, where the outputted function is specialized by some input. 

It is best to illustrate this pattern through an example:


In [2]:
@marvin.fn
def rating_for_customer(customer_profile: str) -> Callable[[str], int]:
    """
    Args:
        customer_profile: the preferences of the customer
    Returns:
        a function that specializes on the customer_profile to give a rating of a product between 1 to 10.
    """
    pass



In this case, we are producing another callable function that is powered by LM. However, rather than fixing a prompt, the prompt is dynamically generated based on the customer profile. This could bring forth a few potential benefits. Based on the purpose of the generated function, it's possible that certain aspects of the customer profile would become irrelevant. Therefore, the generated function, along with its prompt, could be shorter if this function were to be applied in a broader setting. Secondly, the induced function is now simply a generic ratings function and can be applied wherever a ratings function would be desired, without worrying about the customer profile. In fact, a possible use case may also include some sort of ensemble recommender using this as one of many rating input functions.

In [3]:
rating_func = rating_for_customer(
    "asian lady who cares about quality but cost is of greater concern"
)
rt = rating_func("A wonderful blender that is only $19, on sale from $100")  
rt

'8'

### Natural Language Contracts

Design by Contract (DbC)[14] is a programming methodology that defines precise interface specifications through preconditions, postconditions, and invariants, which clarify the expected behavior of software components. DbC offers significant benefits during development by reducing bugs and enhancing code reliability, as it enforces a formal agreement on what software components must accomplish before and after execution. However, implementing DbC presents challenges, particularly because business logic is often articulated in natural language, which can be ambiguous and difficult to translate into programmatically enforceable contracts. This discrepancy can lead to complexities in defining exhaustive and accurate contracts that fully encapsulate the intended behavior of the software.

However, with the advent of LMs, the issues introduced by the gap between natural language business logic, and programmatically enforceable contract can be mitigated significantly, since the LMs essentially provide a way of making *documentation* executable. 

The ideal use case for these flavour of contracts is not to apply them in a production environment, but rather at development and integration/testing time. The contracts make sure that the flow of values in and out of different components of the program adheres to some natural language descriptions. Whilst such checks in production would likely be intractable, it could serve as important tools to discover discrepancies between design specification and actual implementation during testing (such behavior is more commonly known in developer lingo as "bugs") 

The example below demonstrates how we have augmented marvin to introduce a @func_contract decorator that would complement pydantic to provide first order contracts enforceable by natural language constraints.


In [4]:
@marvin.func_contract
def reply_comment(
    processed_comment: Annotated[
        str,
        Predicate(
            marvin.val_contract("must not contain words inappropriate for children")
        ),
    ],
) -> None:
    print("The comment passed validation and is sent to the server")


In [5]:
with temporary_settings(ai__text__disable_contract=False):
    print("Try First Reply with Illegal Arguments")
    try:
        reply_comment("fuck this shit")
    except Exception as e:
        print("The first call is flagged as a contract violation")
        print(e)
    try:
        reply_comment("The sky is beautiful today")
    except Exception as e:
        print("The second call is flagged as a contract violation")
        print(e)

Try First Reply with Illegal Arguments
The first call is flagged as a contract violation
1 validation error for reply_comment
0
  Predicate val_contract.<locals>.wrapper failed [type=predicate_failed, input_value='fuck this shit', input_type=str]
The comment passed validation and is sent to the server


Specifying interdependence of input variables should also be allowed:

In [7]:
@marvin.func_contract(
    pre=lambda comment, reply: marvin.val_contract(
        "the comment and reply must be somewhat related"
    )(comment=comment, reply=reply)
)
def process_comment(comment: str, reply: str) -> str:
    return f"comment: {comment}\nreply: {reply}"


In [8]:
with temporary_settings(ai__text__disable_contract=False):
    try:
        process_comment("This apple is great!", "IKEA stock is down a lot")
    except Exception as e:
        print(e)
    print(process_comment("This apple is great!", "I agree, but the apple is very sweet and so could be unhealthy"))

Pre condition not met
comment: This apple is great!
reply: I agree, but the apple is very sweet and so could be unhealthy


Our last example truly demonstrates the ability for contracts to serve as both a validator but also documentation. 

In [ ]:
@marvin.func_contract(
    pre=lambda user, transaction: marvin.val_contract(
        "The user needs to be authenticated to operate in the same market as the transaction"
    )(user=user, transaction=transaction),
)
def process_payment(
    user: Annotated[
        User, Predicate(marvin.val_contract("User should be eligible for purchases"))
    ],
    transaction: Annotated[
        Transaction,
        Predicate(
            marvin.val_contract(
                "The transaction must not involved illicit drugs or other items banned in PA"
            )
        ),
    ],
) -> None:
    # code to process the transaction
    pass


### Semantic Pattern Matching

The final programming pattern we introduce is based on Pattern Matching in programming languages, a feature that allows developers to check a value against a pattern and, if it matches, to deconstruct the value according to the structure of the pattern. This capability is typically used to simplify code that involves multiple conditions or branches, such as in switch statements or complex conditional expressions. However, traditional pattern matching is primarily structural, limited to matching and decomposing data based on predefined patterns that closely align with the data's physical structure. This structural approach restricts its applicability in scenarios where the data's context or semantics might provide a more intuitive understanding. Significantly, traditional pattern matching does not accommodate natural language-based decomposition, which could allow for a more flexible and semantic interpretation of data objects, leveraging the nuances of human language to enhance the match and decomposition processes.

Recognizing this limitation, we introduce a notion of *Semantic Pattern Matching* that is operated at the levels of Natural Language, incorporating fuzziness that were previously unattainable by traditional techniques. Currently, we are introducing this language construct as a standalong `match` function. However, as future work, we wish to augment the python `match` function to admit our semantic pattern matching paradigms.  

Revisiting the pilot examples, we demonstrate how *Natural Language Types* could interoperate with *Semantic Pattern Matching*. The constraints and typing information is expressed in the prompt to LMs which matches but also casts the source data(of any serializable type) to the given clauses, should an appropriate one exist. 

In [11]:
class Pilot(marvin.NaturalLangType):
    id: int
    name: str
    plane_model: str
    certificate: str
    airport: str


class AdvancedPilot(Pilot):
    @classmethod
    def natural_lang_constraints(cls) -> List[str]:
        existing = super().natural_lang_constraints()
        new_constraints = [
            "The pilot must hold the appropriate certificate for the plane_model, "
            + 'which should also be a plane that is considered "big" with paid passengers'
        ]
        return existing + new_constraints
class StudentPilot(Pilot):
    @classmethod
    def natural_lang_constraints(cls) -> List[str]:
        existing = super().natural_lang_constraints()
        new_constraints = [
            "The pilot should not have too much experience"
        ]
        return existing + new_constraints

print(marvin.match(
    "Noah Singer, employee number 321, is a Boeing 747 Pilot "
    "holding an Airline Transport Pilot with 1000 hours of operations. "
    "He mainly flies from KPIT. ",
    (AdvancedPilot, lambda pilot: f"Advanced Pilot name {pilot.name} flying mainly {pilot.plane_model}"),
    fall_through=lambda : print("No Advanced Pilot found")
))
print(marvin.match(
    "Peter Zhong, employee number 453 is a training pilot flying out of KPJC with 6 hours of experience mainly in Piper Warrior",
    (AdvancedPilot, lambda pilot: f"Advanced Pilot name {pilot.name} flying mainly {pilot.plane_model}"),
    (StudentPilot, lambda pilot: f"Student Pilot name {pilot.name} flying mainly {pilot.plane_model}"),
    fall_through=lambda : print("No Advanced Pilot found")
))

'Advanced Pilot name Noah Singer flying mainly Boeing 747'


'Student Pilot name Peter Zhong flying mainly Piper Warrior'

The decomposition and matching is not just at the type level, but could be made to capture arbitrary capture groups in a semantically aware fashion:  

In [12]:
marvin.match(
    "Alexa up the sound by 10 points will you? ",
    ("Play Music by {artist}", lambda artist: artist),
    ("Volume increase by {volume_up} units", lambda volume_up: print("System: Increasing Volume by 10 pts")),
    ("Lights on", lambda: True),
    ("Lights off", lambda: True),
    (AdvancedPilot, lambda pilot: print(pilot))
)

System: Increasing Volume by 10 pts


In [14]:
marvin.match(
    "Alexa, I am feeling the room is a bit dark",
    ("Play Music by {artist}", lambda artist: artist),
    ("Volume increase by {volume_up} units", lambda volume_up: print("System: Increasing Volume by 10 pts")),
    ("Lights on", lambda: print("Turning on the lights")),
    ("Lights off", lambda: True),
    (AdvancedPilot, lambda pilot: print(pilot))
)

Turning on the lights


The pattern is versatile to capture other types that the user may wish to match to: 

In [15]:
marvin.match(
    "The recipe requires 1. Eggs 2. Tomatoes 3. Pineapples 4. Salt 5. Pepper",
    (list, lambda ls: print(ls))
)

['Eggs', 'Tomatoes', 'Pineapples', 'Salt', 'Pepper']


There are other features of match that is unsuitable to explain here to its entirety. We refer interested readers to the implementation of the match function. 

## Limitations and Threats to Validity

At the design level, the concepts introduced have been implemented within Python and Marvin, subjecting us to the constraints of both systems. Introducing a programming language feature without the ability to modify the underlying language requires us to manipulate existing language features to accommodate new needs, often leading to excessive code "scaffolding" instead of a more natural expression of the feature.

At the evaluation level, a potential criticism we anticipate is the absence of an evaluation section. Unlike a prompting technique whose effectiveness can be assessed using an existing dataset with controlled variables, language design in programming is fundamentally a human-oriented endeavor. We recognize that the absence of a user study may compromise the validity of this project. Ideally, we would like to determine whether these features truly make LMs more accessible to developers. While many language features could technically be implemented more straightforwardly by "desugaring" them, developers often prefer them for their convenience. Without a user study, the actual impact of these language features remains uncertain.

## Future Work

As mentioned in the limitations section, for future work, we aim to conduct a user-oriented study on the practicality of these constructs, which would enhance our understanding of how software engineers perceive these features.

Additionally, we are exploring the possibility of leveraging these insights to develop our own Domain Specific Language (DSL) for LM-centric computations. This DSL would allow us to experiment with LM-centric language features more freely and with greater control, moving beyond the limitations of Python and Marvin. However, for such a language to be truly effective, it must integrate or communicate with traditional languages like Python or Java, utilizing the robust existing ecosystem. Another potential avenue could involve enhancing a language like Python by introducing novel syntax that goes beyond its current structural pattern matching capabilities to include semantic pattern matching.

Moreover, the prompting strategies for the current project is fixed. Whilst it is able to leverage tool usage to ensure schema following, it suffers from the same brittleness of manual prompt engineering. However, if we are able to parameterize the logic away from the prompting and embed the language features in a library like DSPy[5], then we could take advantage of its optimization pipeline to improve the accuracy of the process. 

Lastly, we plan to investigate the possibilities of language-aware code generation. The literature on in-coding context generation—where the LM generates code to fill in specific gaps annotated by programmers—is sparse. Current methods do not capitalize on language-specific attributes such as surrounding typing context and constraints, information that IDEs currently use to provide intellisense recommendations. We see potential for a collaborative design between IDE features and language to harness in-context analysis. Moreover, we aim to explore whether existing unit test frameworks could support code generation. While many code generation techniques currently benefit from unit tests and property-based testing, none have been fully integrated with existing unit testing frameworks. Such integration could significantly enhance the convenience and efficacy of accessing advanced code generation strategies.

## Related Work

This work is influenced by numerous frameworks that have aimed to simplify or enhance the interaction with LMs through programming languages. Specifically, the idea of natural language signatures is an extension of previous developments in DSP, the forerunner of today’s DSPy project[5]. Additionally, significant efforts have been made to constrain LM outputs to adhere to specific schemas or sets of constraints. Tools such as Instructor[11], Outline[12], and LangChain[13] have implemented mechanisms where LLM outputs are converted into Pydantic models, with associated constraints dynamically validated. However, unlike the Natural Language Types we propose, their validation process relies on standard Pydantic validation methods and does not systematically handle fuzzy natural language constraints or map these constraints directly to the prompts. Moreover, data not represented by the model is discarded, potentially omitting useful information in subsequent processing steps.

The concept of a natural language contract system builds on decades of research into software contracts. The Design by Contract approach, popularized by the Eiffel programming language[14], allows programmers to annotate routines with require and ensure clauses, which are then optionally validated dynamically. The syntax we have adopted for Semantic Contracts is especially influenced by the Racket[15] contract system, notably its define/contract structure for individual functions and its ->i contract combinator for dependency among arguments and results. While Racket contracts are inherently higher-order, our design is currently first-order. Exploring the application of Semantic Contracts to a system like Racket could be an intriguing future direction.

Lastly, as our work primarily focuses on introducing novel programming language constructs, we see significant parallels with earlier work on programming patterns, particularly Object Relational Mappers[9], which link database tables and queries to the more familiar concepts of programming objects and functions. This pattern closely mirrors our approach of aligning LM constructs with programming language constructs. Additionally, the Natural Language Types draw from the design philosophy of Object-Oriented Design[7], arguing that the patterns of inheritance, polymorphism, and casting remain pertinent in their LM-enhanced counterparts.

## Conclusion 

In conclusion, this research advances the integration of Large Language Models (LLMs) into software engineering by developing new programming constructs designed to improve developer interactions with LLMs. We have introduced Higher Order Types and Functions, Contracts, and Semantic Pattern Matching, along with the concept of 'Natural Language Types'. These tools are designed to bridge the gap between structured programming needs and the fuzzy logic typical of LLMs, offering developers a more intuitive and effective way to utilize AI in software development. Future efforts will focus on further refining these features and assessing their practical impact through user studies. Our goal is to facilitate a more accessible and functional use of LLMs in software engineering, enhancing the overall utility and adoption of AI technologies in the development process. 

[1]: “Introducing code llama, a state-of-the-art large language model for coding,” AI at Meta, https://ai.meta.com/blog/code-llama-large-language-model-coding/ (accessed Apr. 27, 2024). 
[2]: S. Zhou et al., “Docprompting: Generating code by retrieving the docs,” arXiv.org, https://doi.org/10.48550/arXiv.2207.05987 (accessed Apr. 27, 2024). 
[3]: P. Y. Zhong et al., “A guide to large language model abstractions,” Two Sigma, https://www.twosigma.com/articles/a-guide-to-large-language-model-abstractions/ (accessed Apr. 28, 2024). 
[4]: Langsmith, https://smith.langchain.com/hub (accessed Apr. 28, 2024). 
[5]: O. Khattab et al., “DSPy: Compiling declarative language model calls into self-improving pipelines,” arXiv.org, https://doi.org/10.48550/arXiv.2310.03714 (accessed Apr. 28, 2024). 
[6]: L. Beurer-Kellner, M. Fischer, and M. Vechev, “Prompting is programming: A query language for large language models,” arXiv.org, https://doi.org/10.48550/arXiv.2212.06094 (accessed Apr. 28, 2024). 
[7]: K. Nygaard, ‘Basic concepts in object oriented programming’, in Proceedings of the 1986 SIGPLAN Workshop on Object-Oriented Programming, Yorktown Heights, New York, USA, 1986, pp. 128–132.
[8]: J. Skeet and E. Lippert, C# in Depth. Shelter Island, NY: Manning Publications Co., 2019. 
[9]: E. J. O’Neil, ‘Object/relational mapping 2008: hibernate and the entity data model (edm)’, in Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, Vancouver, Canada, 2008, pp. 1351–1356.
[10]: PrefectHQ, “PREFECTHQ/Marvin: ✨ build ai interfaces that spark joy,” GitHub, https://github.com/PrefectHQ/marvin (accessed Apr. 28, 2024). 
[11]: Jxnl, “JXNL/instructor: Structured outputs for LLMS,” GitHub, https://github.com/jxnl/instructor (accessed Apr. 28, 2024). 
[12]: Outlines-Dev, “Outlines-dev/outlines: Structured text generation,” GitHub, https://github.com/outlines-dev/outlines (accessed Apr. 28, 2024). 
[13]: Langchain-Ai, “Langchain-ai/Langchain: 🦜🔗 build context-aware reasoning applications,” GitHub, https://github.com/langchain-ai/langchain (accessed Apr. 28, 2024). 
[14]: R. Switzer, Eiffel: an introduction. USA: Prentice-Hall, Inc., 1993. 
[15]: M. Flatt, R. B. Findler, and PLT, “https://docs.racket-lang.org/guide/contracts.html,” Racket Contracts, https://docs.racket-lang.org/guide/contracts.html (accessed Apr. 28, 2024). 
[16]: S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan, and Y. Cao, "ReAct: Synergizing Reasoning and Acting in Language Models," arXiv:2210.03629 [cs.CL], revised Mar. 10, 2023. Available: https://doi.org/10.48550/arXiv.2210.03629 (accessed Apr. 28, 2024).