# GenParse Tiny Example

This notebook demonstrates a simple use case of the GenParse library for constrained text generation.
It uses a basic grammar to generate completions for the phrase "Sequential Monte Carlo is",
constraining the output to either "good" or "bad".

The notebook showcases how to set up inference, run it, and process the results to obtain
probabilities for each generated text.

Reset the environment to clear all variables and imports. Note that Jupyter will display the results of the last run when you re-enter. Even if you've fixed the code, it will still display the old results.

In [1]:
# Import necessary modules to interact with the IPython environment
from IPython import get_ipython
import os

# Set an environment variable to prevent parallelism issues between
# jupyter asynchronous event loops and tokenizers parallelism.
os.environ['TOKENIZERS_PARALLELISM'] = 'false'

# Get the current IPython instance
ipython = get_ipython()

# If an IPython instance is running, reset the kernel state to clear all variables and imports
if ipython:
    ipython.run_line_magic('reset', '-sf')

# Re-import required modules for display and asynchronous operations
from IPython.display import clear_output
import nest_asyncio

# Apply nest_asyncio to allow nested event loops, which is useful in Jupyter notebooks
nest_asyncio.apply()

# Clear the output of the current cell to keep the notebook clean
clear_output(wait=True)

In [2]:
from genparse import InferenceSetup

## Define Grammar

In [3]:
# Define a simple grammar using Lark syntax
grammar = """
start: "Sequential Monte Carlo is " ( "good" | "bad" )
"""
print('Grammar defined.')

Grammar defined.


## Initialize InferenceSetup

In [4]:
# Initialize InferenceSetup with GPT-2 model and character-level proposal
inference_setup = InferenceSetup('gpt2', grammar, proposal_name='character')
print('InferenceSetup created successfully.')

2024-11-26 18:26:30,795 - genparse.util : Using CPU for LM next token probability computations
2024-11-26 18:26:30,795 - genparse.util : Initializing Python Earley parser
2024-11-26 18:26:30,930 - genparse.util : Initializing character proposal with 2 subprocesses


InferenceSetup created successfully.


## Run Inference

In [5]:
# Run inference with a single space as the initial prompt, 5 particles,
# and set verbosity to 1 to print progress to the console
inference_result = inference_setup(' ', n_particles=5, verbosity=1)
print('Inference completed.')

├ Particle   0 `S` : 1.10:	[1;36m[[0mS[1;36m][0m
├ Particle   1 `Sequ` : 1.10:	[1;36m[[0mSequ[1;36m][0m
├ Particle   2 `Sequ` : 1.10:	[1;36m[[0mSequ[1;36m][0m
├ Particle   3 `Sequ` : 1.10:	[1;36m[[0mSequ[1;36m][0m
├ Particle   4 `Se` : 1.10:	[1;36m[[0mSe[1;36m][0m
│ Step   1 average weight: 1.0986
└╼
├ Particle   0 `e` : 2.20:	[1;36m[[0mS[1;36m|[0me[1;36m][0m
├ Particle   1 `e` : 2.48:	[1;36m[[0mSequ[1;36m|[0me[1;36m][0m
├ Particle   2 `e` : 2.48:	[1;36m[[0mSequ[1;36m|[0me[1;36m][0m
├ Particle   3 `e` : 2.48:	[1;36m[[0mSequ[1;36m|[0me[1;36m][0m
├ Particle   4 `qu` : 2.20:	[1;36m[[0mSe[1;36m|[0mqu[1;36m][0m
│ Step   2 average weight: 2.3795
└╼
├ Particle   0 `q` : 3.30:	[1;36m[[0mS[1;36m|[0me[1;36m|[0mq[1;36m][0m
├ Particle   1 `nt` : 3.18:	[1;36m[[0mSequ[1;36m|[0me[1;36m|[0mnt[1;36m][0m
├ Particle   2 `nt` : 3.18:	[1;36m[[0mSequ[1;36m|[0me[1;36m|[0mnt[1;36m][0m
├ Particle   3 `n` : 3.18:	[1;36m[[0mSequ[1;36m|