# Extracting Eligibility Rules

We've done a few similar exercises. In this workbook we want to look at the Expanded Offer instead.

The goal is to obtain a list of all the eligibility rules for all the actions in the scheme, ideally disregarding the other requirements which don't qualify as an eligibility rule (eg the requirement to have an inspection during the grasnt period).

In [None]:
test_file = 'src/python/output/expanded_sections/txt/AGF1 Maintain very low density infield agroforestry on less sensitive land.txt'



In [24]:
with open('prompts/eligibility-ExOffer-few-shot.txt', 'r') as f:
    system_prompt = f.read()

print(system_prompt)

You are an agent in the Policy department. Your task is to extract information from the given document and present it in a structured format.

The output format will be JSON, which is a collection of nested key-value pairs and unnamed lists.

The information we need to extract is:
- the action code. This is typically a few uppercase characters and a number and will be the first thing in the document.
- the description of the action. This will be the remainder of the first line, after the action code
- the payment amount in £ per unit. Name this value accordingingly. It is an annual payment so 'per year' doesn't need to be included.
- the elgibility rules. There will be multiple of these so they should be in a list.

Each eligibility rule needs a name, which you can derive from the description by replacing spaces with dashes and removing special characters.
This name should be recorded as the value of a key named 'id'.

If an eligibility rule mentions a numeric value then the value shou

# Test it with an unseen input file

In [36]:
from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate, HumanMessagePromptTemplate
from langchain.schema.messages import SystemMessage
import yaml
import os
import json
import pandas as pd
from openai import OpenAI


# Set up your OpenAI API key
with open('openai.yaml', 'r') as f:
    config = yaml.safe_load(f)

os.environ['OPENAI_API_KEY'] = config['openai_key']

llm = ChatOpenAI()
client = OpenAI()

In [5]:
# read in a file
with open('/Users/joe/Documents/git/defra/ffc-rps-scratchpad/src/python/output/expanded_sections/txt/CIPM1 Assess integrated pest management and produce a plan.txt', 'r') as f:
    unseen_file = f.read()

unseen_file

'CIPM1: Assess integrated pest management and produce a plan\nDuration\xa0\n3 years\nHow much you’ll be paid\xa0\xa0\n£1,129 for the assessment and plan per year\nAction’s aim\xa0\nThis action’s aim is that you:\xa0\nunderstand the benefits, costs, impacts and risks of your current approach to crop pest, weed and disease management for your land\xa0\neffectively plan how to adopt a range of integrated pest management (IPM) methods appropriate to your farm\xa0\xa0\nWhere you can do this action\xa0\nYou can do this action on all agricultural land located below the moorland line. The IPM assessment and plan should cover all of the relevant areas of your farm.\xa0\nThis is an ‘agreement level’ SFI action. This means you apply to include it in your agreement\xa0instead of entering specific areas of land.\xa0\xa0\nYou can only apply for this action:\xa0\nif at least one land parcel is linked to your Single Business Identifier (SBI), so it’s shown on your digital maps\xa0\nin one SFI agreemen

In [25]:
def extract_structured_rules(input: str, system_prompt: str):
    """Given the text of an Action, return a JSON document of the eligibility rules according to the schema in the system prompt"""
    chat_completion = client.chat.completions.create(
        messages=[
            {
                "role": "system",
                "content": system_prompt,
            },
            {
                "role": "user",
                "content": input,
            }
        ],
        model="gpt-4",
    )

    return chat_completion.choices[0].message.content

result = extract_structured_rules(system_prompt, unseen_file)
print(result)

{
    "code": "CIPM1",
    "description": "Assess integrated pest management and produce a plan",
    "payment": {
      "amountPerYear": 1129
    },
    "eligibilityRules": [
      { "id": "land-parcel-linked-to-SBI", "config": { "is_linked_to_SBI": "true" }},
      { "id": "only-in-one-SFI-agreement", "config": { "is_in_one_SFI_agreement": "true" }},
      { "id": "protected-land-SSSIs", "id": "SSSIs", "config": { "is_protected": "true", "SSSI_consents_required": "false" } },
      { "id": "protected-land-Historic-and-archaeological-features", "id": "Historic-and-archaeological", "config": { "is_protected": "true", "HEFER_required": "false" } }
    ]
}


In [26]:
json.loads(result)

{'code': 'CIPM1',
 'description': 'Assess integrated pest management and produce a plan',
 'payment': {'amountPerYear': 1129},
 'eligibilityRules': [{'id': 'land-parcel-linked-to-SBI',
   'config': {'is_linked_to_SBI': 'true'}},
  {'id': 'only-in-one-SFI-agreement',
   'config': {'is_in_one_SFI_agreement': 'true'}},
  {'id': 'SSSIs',
   'config': {'is_protected': 'true', 'SSSI_consents_required': 'false'}},
  {'id': 'Historic-and-archaeological',
   'config': {'is_protected': 'true', 'HEFER_required': 'false'}}]}

In [37]:
import os
from pathlib import Path
import json

def process_txt_to_json(directory, system_prompt):
    # Define the txt and json subdirectories
    txt_dir = Path(directory) / 'txt'
    json_dir = Path(directory) / 'json'
    print(f"Input directory: {txt_dir.resolve()}")
    print(f"Output directory: {json_dir.resolve()}")

    # Create the json subdirectory if it doesn't exist
    json_dir.mkdir(exist_ok=True)

    # Iterate through all txt files in the txt subdirectory
    for txt_file in txt_dir.glob('AHW2 Supplementary winter bird food.txt'):
        # Read the contents of the txt file
        with txt_file.open('r', encoding='utf-8') as file:
            content = file.read()

        # Define the new json file path with the same name but .json extension
        json_file = json_dir / txt_file.with_suffix('.json').name

        # send the content to the LLM using the supplied prompt, which should contain few shot examples
        response = extract_structured_rules(system_prompt, content)

        with json_file.open('w', encoding='utf-8') as file:
            file.write(response)
        print(f"Wrote out {json_file.name}")


# Usage example
directory = "output/expanded_sections"
process_txt_to_json(directory, system_prompt)


Input directory: /Users/joe/Documents/git/defra/ffc-rps-scratchpad/src/python/output/expanded_sections/txt
Output directory: /Users/joe/Documents/git/defra/ffc-rps-scratchpad/src/python/output/expanded_sections/json


RateLimitError: Error code: 429 - {'error': {'message': 'You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.', 'type': 'insufficient_quota', 'param': None, 'code': 'insufficient_quota'}}