# Quick checkilist creator 

Summarizer
Goal: Create a "checklist" (like requirements matrix) from an RFP

There's a few ways to address this:

Full RFP
- Input
  - FULL RFP
  - Prompt asks for the checklist directly
- Output
  - Checklist
  - Approx count of tokens (roughly 3.5 characters)

Page-wise summary
- Input
  - Per page summary
  - Prompt asks to consolidate into checklist
- Output
  - As above

RAG-driven
- Input
  - Vector DB
  - Query for relevant information about sections
  - Top X pages provided to prompt
  - Prompt using that information
- Output
  - As above


In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import sys
sys.path.append('../rfpgo/')
from credentials import *
from process.prompts import *
from checklist.prompts import *
from summarize.summarizer import Summarizer
from utils import *
import os
import pandas as pd
from pathlib import Path
os.environ["ANTHROPIC_API_KEY"] = ANTHROPIC_KEY

In [3]:
from langchain_community.llms import Ollama
from langchain_anthropic import ChatAnthropic

gemma = Ollama(model="gemma2")
anth_haiku = ChatAnthropic(model='claude-3-haiku-20240307')
anth_opus = ChatAnthropic(model='claude-3-opus-20240229')

  gemma = Ollama(model="gemma2")


## Checklistizer


In [15]:
class Checklist(object):

    # full RFP checklist
    full_rfp_prompt = full_rfp

    # checklist_from_page_summaries
    checklist_from_page_summaries = checklist_from_page_summaries

    # formatting prompt
    format_sections = format_sections

    def _count_tokens(self, text):
        # standard - ~3.5 characters / token
        return round(len(text)/3.5)

    def __init__(self, llm, fn, llm_drafter=None):
        self.llm = llm
        self.llm_name = llm.dict()['model']
        if llm_drafter is not None:
            self.llm_drafter = llm_drafter
        else:
            self.llm_drafter = llm
        self.summarizer = Summarizer(llm, fn)
        split_doc = self.summarizer.split_doc
        self.full_rfp_text = '\n'.join(split_doc)

    def _qa_turn(self):
        # ask questions
        self.c_questions = checklist_questions.format(outline=self.checklist_revisions[-1])
        self.c_questions_response = call_llm(self.c_questions, self.llm).split('</questions>')[0]

        # answer questions
        self.c_answers = checklist_answers.format(document=self.full_rfp_text,
            questions=self.c_questions_response)
        return call_llm(self.c_answers, self.llm).split('</response>')[0]

    def checklist(self, turns=2):

        # track revisions
        self.checklist_revisions = []
        self.questions_revisions = []

        self.c_full = self.full_rfp_prompt.format(document=self.full_rfp_text)
        self.c_full_tokens = self._count_tokens(self.c_full)
        self.c_full_response = call_llm(self.c_full, self.llm_drafter).split('</outline>')[0]

        # first version
        self.checklist_revisions.append(self.c_full_response)

        for i in range(turns):
            c_answers_response = self._qa_turn()
            # step 4 - update with answers
            self.c_update = checklist_update.format(
                outline=self.c_full_response,
                questions=c_answers_response)
            c_update_response = call_llm(self.c_update, self.llm_drafter).split('</outline_update>')[0]
            self.checklist_revisions.append(c_update_response)
            self.questions_revisions.append(c_answers_response)

        self.narrative = [
            self.checklist_revisions[0],
            self.questions_revisions[0],
            self.checklist_revisions[1],
            self.questions_revisions[1],
            self.checklist_revisions[2],
        ]

        # TODO: decide if this is even a worthwhile strategy, full doc seems to work much better
        # if not hasattr(self.summarizer, 'page_summaries'):
        #     print('Running summarizer...')
        #     self.summarizer.summarize()

        # self.c_page = self.checklist_from_page_summaries.format(
        #     document=self.summarizer.joined_p)
        # self.c_page_tokens = self._count_tokens(self.c_page)
        # self.c_page_response = call_llm(self.c_page, self.llm)

    def display(self):
        print('\n----\n'.join(self.narrative))

    def output(self):
        return pd.Series(self.narrative)



In [8]:
# example rfp
fn = '../data/labels/drafter_09262024/RFP_Study to evaluate methods to calculate area median income.pdf'

In [None]:
# run checklist with two setups
# test - gemma
test_opt = Checklist(gemma, fn, llm_drafter=gemma)
test_opt.checklist()
test_opt.display()
test_opt.output()

In [22]:
# haiku only
opt1 = Checklist(anth_haiku, fn)
opt1.checklist()

In [18]:
# opus + haiku
opt2 = Checklist(anth_haiku, fn, llm_drafter=anth_opus)
opt2.checklist()

In [24]:
# stack and output
output_df = pd.concat([opt1.output(), opt2.output()], axis=1)
output_df.columns = ['haiku_only', 'haiku_opus']
output_df.to_csv(
    '../data/output/checklist_11042024.csv', index=False)

In [122]:
checklist_dict = {}

for model in [anth_haiku]:# anth_haiku]:#, anth_haiku, anth_opus]:#, oai_3, oai_4]:
    model_name = model.dict()['model']
    if model_name not in checklist_dict:
        checklist_dict[model_name] = Checklist(model, fn)
        checklist_dict[model_name].checklist()
    else:
        print('Skipping', model_name)

In [123]:
d = checklist_dict[model.dict()['model']].__dict__

In [124]:
for r in d['questions_revisions']:
    print(r)
    print('--')

Here are the answers to the questions based on the RFP document:

1. The deadline for submitting the proposal is August 20, 2024 at 12:00pm Pacific Time.

2. The total budget allocated for this project is $225,000. Proposals in excess of this amount will be considered non-responsive and will not be evaluated.

3. The key criteria or factors that will be used to evaluate the technical proposal (Section III) are:

Technical Proposal - 60%
- Project Approach/Methodology
- Work Plan  
- Project Schedule
- Risks
- Deliverables


--
Questions and Answers:

1. What are the key deliverables expected for the project?
The key deliverables for the project are:
- A draft report by March 17, 2025
- A final written report by April 28, 2025
- An editable PowerPoint summarizing the report by May 9, 2025
- A video recorded voice-over presentation of the PowerPoint slides by May 16, 2025
- Up to four live presentations of the report to various groups as requested by COMMERCE

2. What specific qualificat

In [125]:
#print out the narrative here
print(d['checklist_revisions'][0])
print('--')
print(d['questions_revisions'][0])
print('--')
print(d['checklist_revisions'][1])
print('--')
print(d['questions_revisions'][1])
print('--')
print(d['checklist_revisions'][2])


Based on the RFP document, here is a detailed outline of the sections required in a response:

I. Letter of Submittal (Mandatory)
   A. Name, address, and contact information of the Proposer
   B. Principal officers of the Proposer
   C. Legal status and organization of the Proposer
   D. Federal and state identification numbers
   E. Location of operations
   F. Identification of any state employees or former state employees involved

II. Certifications and Assurances (Mandatory)
   A. Signed and dated form

III. Technical Proposal (Scored)
   A. Project Approach/Methodology
   B. Work Plan
      1. Objective A: Identify alternative methods to calculate AMI
         a. Stakeholder Engagement
         b. Literature Review
      2. Objective B: Calculate AMI by alternative methods
         a. Data Collection
         b. Application of Methods
         c. Creating Comparison Tables
      3. Objective C: Compare and evaluate current and alternative AMI methods
         a. Impact Analysis


In [47]:
for r in [d['c_full_response'],
d['c_questions_response'],
d['c_answers_response'],
d['c_update_response']]:
    print(r)
    print('___')

Based on the detailed Request for Proposal (RFP) document provided, here is an outline of the key sections required in a response:

1. Letter of Submittal (Mandatory)
   - Provide basic information about the Proposer (name, address, legal status, etc.)
   - Identify any state employees or former state employees involved with the Proposer

2. Certifications and Assurances (Mandatory) 
   - Signed form agreeing to the terms and conditions of the RFP

3. Technical Proposal (Scored)
   - Project Approach/Methodology
   - Work Plan addressing the 5 objectives and corresponding study questions
   - Project Schedule
   - Risks
   - Deliverables

4. Management Proposal (Scored)
   - Project Management
     - Project Team Structure and Internal Controls
     - Staff Qualifications and Experience
   - Experience of the Proposer
   - Related Information (e.g. past contracts, terminated contracts)
   - References
   - OMWBE and WDVA Certification (optional)

5. Cost Proposal (Scored)
   - Identifi

In [59]:
results = []

for model in checklist_dict:
    results.append(
        [model, 
        checklist_dict[model].c_full_response, 
        checklist_dict[model].c_page_response])

df = pd.DataFrame(results, 
    columns=['model', 'full_response', 'page_response'])

df.to_csv('../data/output/checklist_10212024/checklists.csv')

In [55]:
# the token count is the same no matter what
print(checklist_dict['gemma2'].c_full_tokens, checklist_dict['gemma2'].c_page_tokens)

39071 4671


In [7]:
for model in checklist_dict:
    print(model)
    print(checklist_dict[model].c_full_response)
    print('----')

claude-3-haiku-20240307
Based on the review of the Request for Proposals (RFP) document, the following outline summarizes the key sections required in a response:

I. Letter of Submittal (Mandatory)
   A. Proposer information (name, address, principal officers, legal status, etc.)
   B. Federal and state business registration details
   C. Disclosure of any state employees or former state employees involved

II. Certifications and Assurances (Mandatory)
   A. Signed certification form agreeing to RFP terms and conditions
   B. Indication of any proposed contract edits

III. Technical Proposal (Scored)
   A. Project Approach/Methodology
   B. Work Plan addressing the 5 specified objectives and corresponding study questions
   C. Project Schedule
   D. Risks and mitigation strategies
   E. Deliverables

IV. Management Proposal (Scored)
   A. Project Team Structure and Internal Controls
   B. Staff Qualifications and Experience
   C. Proposer's Relevant Experience
   D. References
   E. O

In [79]:
for model in checklist_dict:
    print(model)
    print(checklist_dict[model].c_page_response)
    print('----')

gemma2
This contract document between the Washington Department of Commerce (COMMERCE) and a Contractor outlines the terms and conditions governing their work together. 

**Key Sections:**

* **Pages 31-32:** Introductions - Table of Contents, Contract Face Sheet outlining parties, amount, dates, purpose, and incorporated documents.
* **Pages 33-34:** Financial Terms - Compensation, reimbursement, billing procedures, payment timelines, financial reporting requirements via Access Equity.
* **Pages 35-36:** Insurance -  Mandatory coverage types (general liability, cyber, automobile, professional, fidelity) with specific minimum limits; Fidelity insurance details for both primary contractor and subcontractors.
* **Page 37:** General Terms & Conditions - Key definitions ("Authorized Representative," "Personal Information"), data access provisions, advance payments, amendments, and the principle that the written contract is all-encompassing.
* **Pages 38-39:** Legal & Ethical Requirements -

In [8]:
# # display results
# print(gemma_summary.summary_short)
# print('--')
# print(gemma_summary.summary)


In [9]:
# step in between - need to process the checklist text into sections and descriptions
checklist_source = checklist_dict['claude-3-haiku-20240307'].c_full_response
section_dict = {}
cur_section = None
for l in checklist_source.split('\n'):
    # identify whether line is a section
    # if so, add to dict
    # if not, add to current section
    if len(l)>0:
        if l[0].isnumeric():
            cur_section = ' '.join(l.split()[1:])
            section_dict[cur_section] = []
        else:
            if cur_section is not None:
                section_dict[cur_section].append(l.strip())

In [10]:
# the above does not seem to work well with RAG
# might make sense to take the individual lines of the outline
# this is hacky - will need to reconfigure output of the checklist
checklist_source = checklist_dict['claude-3-haiku-20240307'].c_full_response
sections = []
# skip the first line
for l in checklist_source.split('\n')[1:]:
    if len(l)>0:
        sections.append(l.strip())

In [11]:
sections

['I. Letter of Submittal (Mandatory)',
 'A. Proposer information (name, address, principal officers, legal status, etc.)',
 'B. Federal and state business registration details',
 'C. Disclosure of any state employees or former state employees involved',
 'II. Certifications and Assurances (Mandatory)',
 'A. Signed certification form agreeing to RFP terms and conditions',
 'B. Indication of any proposed contract edits',
 'III. Technical Proposal (Scored)',
 'A. Project Approach/Methodology',
 'B. Work Plan addressing the 5 specified objectives and corresponding study questions',
 'C. Project Schedule',
 'D. Risks and mitigation strategies',
 'E. Deliverables',
 'IV. Management Proposal (Scored)',
 'A. Project Team Structure and Internal Controls',
 'B. Staff Qualifications and Experience',
 "C. Proposer's Relevant Experience",
 'D. References',
 'E. Optional: OMWBE/WDVA Certification',
 'V. Cost Proposal (Scored)',
 'A. Identification of all costs, including expenses',
 'B. Breakdown of

In [12]:
# quick test - if we ask the LLM to ask questions, do we get something sensible?
prompt = """Below is the outline of a proposal. \
What questions would you ask to get more information for drafting the proposal? \
Respond with no more than three questions, one per line. 
Generate nothing else.

{document}"""
f_prompt = prompt.format(document=checklist_dict['claude-3-haiku-20240307'].c_full_response)

resp = call_llm(f_prompt, anth_haiku)


In [13]:
print(resp)

1. Can you provide more details on the 5 specified objectives and corresponding study questions mentioned in the Technical Proposal section?

2. What are the key evaluation criteria that will be used to assess the proposals?

3. Are there any specific formatting or page limit requirements for the different sections of the proposal?


In [14]:
questions = [q for q in resp.split('\n') if len(q)>0]
questions

['1. Can you provide more details on the 5 specified objectives and corresponding study questions mentioned in the Technical Proposal section?',
 '2. What are the key evaluation criteria that will be used to assess the proposals?',
 '3. Are there any specific formatting or page limit requirements for the different sections of the proposal?']

# RAG workflow

In [15]:
from langchain.vectorstores import FAISS
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.llms import HuggingFacePipeline
# this will need to be downloaded from the HF hub
emb_model_name = "sentence-transformers/all-MiniLM-L6-v2"
emb = HuggingFaceEmbeddings(model_name=emb_model_name)

  emb = HuggingFaceEmbeddings(model_name=emb_model_name)


In [16]:
class RAG(object):
    def __init__(self, llm, fn, questions):
        self.llm = llm
        self.llm_name = llm.dict()['model']
        self.summarizer = Summarizer(llm, fn)
        self.split_doc = self.summarizer.split_doc
        # this does not seem to retrieve relevant info
        #self.section_dict = section_dict
        #$self.sections = sections
        self.questions = questions
        self._create_db()
    
    def _create_db(self):
        self.document_db = FAISS.from_texts(self.split_doc, emb)

    def _query_db(self, prompt, k=3):
        relevant_docs = self.document_db.similarity_search(prompt, k=k)
        return relevant_docs

    def _store_req_resp(self, req, resp):
        d = {
            'documents': req,
            'response': resp
            }
        return d
    

    def retrieve_sections(self, k=3):
        # get sections
        self.sections_resp = {}
        for s in self.sections:
            query_prompt = rag_sections_detail.format(
                document=s
            )
            section_docs = self._query_db(query_prompt, k=k)
            joined_docs = '\n'.join([d.page_content for d in section_docs])
            # section_prompt = get_sections.format(
            #     document=joined_docs
            #     )
            # section_response = call_llm(section_prompt, self.llm)
            # store for later
            #self.sections = self._store_req_resp(joined_docs, section_response)
            self.sections_resp[s] = self._store_req_resp(joined_docs, '')

    def retrieve_answers(self, k=3):
        # get answers to q prompts
        self.q_resp = {}
        for q in self.questions:
            q_docs = self._query_db(q, k=k)
            joined_docs = '\n'.join([d.page_content for d in q_docs])
            self.q_resp[q] = self._store_req_resp(joined_docs, '')

    


In [18]:
r = RAG(gemma, fn, questions=questions)
r.retrieve_answers(k=2)

In [19]:
for q in r.q_resp:
    print(q)
    for d in r.q_resp[q]['documents'].split('\n'):
        print(d[:100])
    print('----')

1. Can you provide more details on the 5 specified objectives and corresponding study questions mentioned in the Technical Proposal section?
 
Page | 14 of 48 
 proprietorship. Proposers wishing to submit any proposed contract edits must indicate so on this fo
(see Section 2.14).  
 
 
 TECHNICAL PROPOSAL (SCORED) 
The Technical Proposal must contain a comprehensive description of services including the following 
elements: 
 
A. Project Approach/Methodology : Include a complete description of the Proposer’s 
proposed approach and methodology for the project. This section should convey Proposer’s 
full understanding of the proposed project.  See Exhibit E for the entire list of questions that 
should be answered in the resulting report.  
 
B. Work Plan: Include all project requirements and the proposed tasks, services, activities, etc. 
necessary to accomplish the scope of work and five objectives of the project defined in this 
RFP:  
  
Objective A. Identify alternative methods to c

In [None]:
for s in r.sections_resp:
    print(s)
    if 'objectives' in s:
        for d in r.sections_resp[s]['documents'].split('\n'):
            print(d[:100])
        print('----')

I. Introduction
A. Purpose and Background
B. Project Title
C. Funding Source
II. General Information for Proposers
A. RFP Coordinator Contact Information
B. Estimated Schedule of Procurement Activities
C. Question and Answer Period
D. Pre-Proposal Conference
E. Proposal Submission Requirements
III. Proposal Contents
A. Mandatory Certifications and Assurances
B. Technical Proposal
1. Project Approach/Methodology
2. Work Plan
3. Project Schedule
4. Risks and Mitigation Strategies
5. Deliverables
C. Management Proposal
1. Project Team Structure
2. Internal Controls
3. Project Manager Qualifications
D. Cost Proposal
1. Detailed Budget
E. Diverse Business Inclusion Plan
F. Small or Veteran-Owned Business Certification
IV. Evaluation and Award
A. Evaluation Criteria and Weighting
B. Possibility of Virtual Presentations or Interviews
C. Notification of Apparent Successful Contractor
D. Debriefing Process for Unsuccessful Proposers
E. Protest Procedures
V. Contract Requirements
A. Contract Ter