# Quick drafter 

Summarizer
Goal: Summarize a long-form RFP

    Page-wise summaries
        Input
            RFP, broken into pages
            Summary prompt
        Output
            Short summary, per page
    Consolidator
        Input
            Per page summary
            Consolidation prompt
        Output
            Consolidated summary

Drafter
Goal: Draft an initial response to the RFP
    Question answerer
        Input
            RFP summary
            Short section summary
            Section questions
            Vendor information
            Q-A prompt
        Output
            Answers per question
    Section drafter
        Input
            RFP summary
            Short section summary
            Questions + answers
            Drafting prompt
        Output
            Section draft
    Consolidator
        Input
            RFP summary
            List of section drafts
            Draft Consolidation prompt
        Output
            Consolidated draft


In [203]:
%load_ext autoreload
%autoreload 1

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [204]:
import sys
sys.path.append('../rfpgo/')
from credentials import *
from process.prompts import *
from utils import *
import os
import pandas as pd
from pathlib import Path
import PyPDF2
os.environ["OPENAI_API_KEY"] = OPENAI_KEY

In [108]:
from langchain.llms import Ollama
from langchain_openai import OpenAI, ChatOpenAI
#from langchain_anthropic import ChatAnthropic
#gemma = Ollama(model="gemma:7b")
# trying gemma 2
gemma = Ollama(model="gemma2")
oai_3 = ChatOpenAI(model='gpt-3.5-turbo')
oai_4 = ChatOpenAI(model='gpt-4-turbo')

## Summarizer
This is a different sort of summarizer than the quick one - the idea is to summarize page by page and then consolidate those into a single summary

In [60]:
class Summarizer(object):
    page_prompt = page_summary
    consolidate_prompt_long = consolidate_summary_long
    consolidate_prompt_short = consolidate_summary_short

    def __init__(self, llm, fn):
        self.llm = llm
        self.llm_name = llm.dict()['model']
        self.split_doc = self._splitter(fn)

    def _splitter(self, doc):
        # Open the PDF file
        pdf_file = open(fn, 'rb')

        # Create a PDF reader object
        pdf_reader = PyPDF2.PdfReader(pdf_file)

        # Loop through each page and extract the text
        collect = []
        for p in pdf_reader.pages:
            text = p.extract_text()
            collect.append(text)
        return collect 
    
    def summarize(self):
        # pagewise summaries
        self.page_summaries = []
        for s in self.split_doc:
            p = self.page_prompt.format(document=s)
            self.page_summaries.append(call_llm(p, self.llm))

        # consolidate
        joined_p = ''
        for i, p in enumerate(self.page_summaries):
            joined_p += f'Page {i+1}: {p}\n\n'
        c = self.consolidate_prompt_long.format(document=joined_p)
        self.summary = call_llm(c, self.llm)

        # short summary
        c = self.consolidate_prompt_short.format(document=self.summary)
        self.summary_short = call_llm(c, self.llm)

        


In [61]:
# example rfp
fn = '../data/labels/drafter_09262024/RFP_Study to evaluate methods to calculate area median income.pdf'

In [169]:
summary_dict = {}

for model in [gemma, oai_3, oai_4]:
    model_name = model.dict()['model']
    if model_name not in summary_dict:
        summary_dict[model_name] = Summarizer(model, fn)
        summary_dict[model_name].summarize()

    # output to csv
    result = pd.DataFrame.from_records(zip(
        summary_dict[model_name].split_doc, 
        summary_dict[model_name].page_summaries),
        columns=['document', 'summary'])
    result.loc[0, 'long_summary'] = summary_dict[model_name].summary
    result.loc[0, 'short_summary'] = summary_dict[model_name].summary_short
    result.to_csv(f'../data/output/drafter_09262024/summaries_{model_name}.csv', index=False)


**Key Highlights:**

* **Financial Terms:**
    * Detailed breakdown of compensation limits, expense reimbursement procedures, billing requirements, and payment timelines on Page 33.
    * Specific invoicing requirements, prohibited duplicate billing practices, and responsibility for disallowed costs outlined on Page 34.
    * Mandatory use of the Access Equity platform for subcontractor data collection and reporting.
* **Insurance Requirements:**
    * Specific types and minimum coverage amounts required for liability, cyber liability, automobile liability, professional errors and omissions, and fidelity insurance on Pages 35 & 36.
* **Legal and Ethical Obligations:**
    * Compliance with the Americans with Disabilities Act (ADA), restrictions on contract assignment, attorney fees provisions, confidentiality obligations, safeguarding personal information, and addressing potential conflicts of interest on Page 38.
    * Rules regarding conflicts of interest, copyright ownership, and 

In [63]:
# display results
print(gemma_summary.summary_short)
print('--')
print(gemma_summary.summary)


##  Washington Department of Commerce Contract Template Summary (For Bid Response)

**Focus:** This template outlines the legal framework for working with the Washington Department of Commerce (COMMERCE). **Your response must address ALL its provisions.**

**Key Areas to Address:**

* **Basic Information:** Clearly state your company details, contract amount, funding source, project dates, and purpose.
* **Special Terms & Conditions:**  Detail your compensation structure, expense policies, billing procedures, and payment timeline. Demonstrate understanding of COMMERCE's right to terminate or withhold payments.
* **Financial Management:**  
    * Outline your billing procedures and ensure compliance with cost limitations.
    * Clearly state how you will report subcontractor information through the Access Equity platform.
    * Provide evidence of all required insurance coverage (general liability, cyber liability, automobile liability, professional liability, and fidelity). 
* **Legal 

# From summaries to sections
Want to get to a set of sections from these page level summaries.  This likely requires RAG.  Let's see what we can put together.

In [68]:
from langchain.vectorstores import FAISS
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.llms import HuggingFacePipeline
# this will need to be downloaded from the HF hub
emb_model_name = "sentence-transformers/all-MiniLM-L6-v2"
emb = HuggingFaceEmbeddings(model_name=emb_model_name)

In [88]:
class RAG(object):
    def __init__(self, Summarizer):
        self.Summarizer = Summarizer
        self._create_db()
    
    def _create_db(self):
        self.document_db = FAISS.from_texts(self.Summarizer.split_doc, emb)

    def _query_db(self, prompt):
        relevant_docs = self.document_db.similarity_search(prompt, k=3)
        return relevant_docs

    def _store_req_resp(self, req, resp):
        d = {
            'documents': req,
            'response': resp
            }
        return d
    
    def retrieve_sections(self):
        # get sections
        section_docs = self._query_db(rag_sections)
        joined_docs = '\n'.join([d.page_content for d in section_docs])
        section_prompt = get_sections.format(
            document=joined_docs
            )
        section_response = call_llm(section_prompt, self.Summarizer.llm)
        # store for later
        self.sections = self._store_req_resp(joined_docs, section_response)

        

    
        


In [89]:
r = RAG(gemma_summary)
r.retrieve_sections()

In [67]:

document_db = FAISS.from_texts(doc_splits, emb)

## Drafter

In [142]:
# compile from Howard's work
#compiled_dict = pd.read_clipboard().dropna(how='all', axis=0).ffill(axis=0).groupby('Objective')['Key Tasks'].unique()
#compiled_dict = compiled_dict.apply(list).to_dict()

compiled_dict = {'Objective A: Identify Alternative Methods to Calculate AMI': ['Stakeholder Engagement: Engage with groups such as Washington State Housing Finance Commission, Washington Low Income Housing Alliance, Affordable Housing Advisory Board, Public Housing Authorities, and others.',
  'Hold at least three public input sessions.',
  'Conduct one-on-one meetings with at least 15 key stakeholders.',
  'Geographic Focus: Stakeholder engagement must consider feedback across legislative districts, metropolitan, and non-metropolitan areas in Washington State.',
  'Literature Review: Review current AMI methodologies and innovative alternatives, including U.S. Census Bureau data and regional alternatives.'],
 'Objective B: Calculate AMI Using Alternative Methods': ['Data Collection: Collect relevant income, demographic, and housing market data from legislative districts, metropolitan, and non-metropolitan counties in Washington.',
  'Apply Alternative Methods: Calculate AMI using alternative methods and compare results to the current AMI method.',
  'Comparison Tables: Create comparison tables showing the outcomes of each AMI method for different household sizes and geographic areas (e.g., legislative districts, metro/non-metro areas).'],
 'Objective C: Compare and Evaluate Current and Alternative AMI Methods': ['Impact Analysis: Evaluate the impact of alternative AMI methods on housing programs, including income limits and rent ceilings across legislative districts, metropolitan, and non-metropolitan areas.',
  'Feasibility Analysis: Assess the benefits, costs, and challenges of implementing each alternative AMI method.',
  'Include the feasibility of using census data to calculate AMI by legislative district.'],
 'Objective D: Collect Stakeholder Feedback on Alternative AMI Methods': ['Stakeholder Feedback: Collect feedback from groups such as the Washington Center for Real Estate Research, Habitat for Humanity of Washington State, Northwest Community Land Trust Coalition, veterans advocacy organizations, and Office of Rural and Farmworker Housing.',
  'Geographic Focus: Ensure engagement from stakeholders representing legislative districts, metropolitan, and non-metropolitan areas.',
  'Summarize stakeholder responses regarding alternative AMI methods, focusing on geographic differences, equity, and feasibility.'],
 'Objective E: Provide Recommendations Based on Feedback and Analysis': ['Recommendations: Develop recommendations for alternative AMI methods based on stakeholder feedback and analysis.',
  'Geographic Focus: Provide recommendations that address geographic differences, including those between legislative districts, metropolitan, and non-metropolitan areas.',
  'Provide steps for implementation, including cost considerations and funding sources.']}

In [205]:
# checklist from Howard's work
# checklist_dict = pd.read_clipboard().dropna(how='all', axis=0).ffill(axis=0).groupby(
#   ['Objective', 'Section'])['Questions'].unique()
# checklist_dict = checklist_dict.apply(list).to_dict()


checklist_dict = {
 ('Objective A study questions',
  'Literature review'): ['1. Provide a detailed description of how AMI is calculated for housing assistance programs administered in Washington state, and conduct a comprehensive assessment of the strengths and weaknesses of these AMI methodologies. Assess reliability, accuracy of inflation adjustments, methodological limitations, biases, and key assumptions impacting accuracy and adjustments, methodological limitations, biases, and key assumptions impacting accuracy and', '2. Examine the history of existing AMI calculation methodologies. Describe significant changes, reasons for updates, the intended impacts, and the outcomes resulting from these changes.', '3. Identify existing programs using AMI to set income eligibility standards, explaining how they establish income limits and rent amounts for affordable housing units. Provide examples demonstrating AMI’s impact on eligibility criteria and affordability thresholds.', '4. Provide a detailed analysis of the potential consequences of inaccurately calculated AMIs, whether they are overestimated or underestimated. This analysis should cover how such whether they are overestimated or underestimated. This analysis should cover how such inaccuracies could impact housing affordability, eligibility criteria for assistance programs, and economic disparities.', '5. What are the current geographic breakdowns used for calculating AMI in Washington state, and how do they vary for different programs? What alternative geographic breakdowns could be considered, including by legislative district, congressional district, or to a specific level of US Census geographic entity, and how might these alternatives affect housing policies and program effectiveness?', '6. Compile examples from various jurisdictions, both within Washington and across the nation, where alternative methodologies have been employed to calculate AMI. Include a summary of implementation challenges faced, strategies used to address these challenges, and lessons learned that can inform the evaluation and potential adoption of alternative AMI calculation methods in Washington state.'],
 ('Objective A study questions',
  'Stakeholder engagement'): ['1. Conduct a comprehensive analysis of each stakeholder group to understand their interests, concerns, needs, and potential impacts of alternative AMI calculation methods on their operations and constituents.', '2. Compile stakeholders’ perspectives on the effectiveness of the current AMI methodology and their perception of this methodology’s strengths, weaknesses, and areas for improvement.', '3. Compile alternative methods suggested by stakeholder groups for further evaluation and improvement of the current AMI methodology.', "4. Develop a summary report consolidating stakeholders' overall feedback and suggestions regarding the AMI calculation methods, along with any additional comments or insights provided.", '5. Gather and assess the key criteria and their relative importance identified by stakeholders for prioritizing and evaluating alternative methods for calculating AMI in Washington state.'],
 ('Objective B study questions',
  'Calculate AMI using alternative methods'): ['1. Define and explain the methodology and formula for each alternative AMI calculation method.', '2. Provide the source and date of each data set included in each alternative AMI calculation method.', '3. Provide tables showing the results of each alternative AMI calculation method for currently used geographic areas for different household sizes.', '4. Provide tables showing the results of each alternative AMI calculation method using legislative districts for different household sizes.', '5. Indicate how many people would be affected by each alternative AMI calculation method in terms of additional people shifted in or out of each income eligible bracket.'],
 ('Objective C study questions',
  'Compare and evaluate current and alternative AMI methods'): ['1. How would the implementation of alternative AMI methods impact housing programs and the number of eligible program recipients? In what ways would income limits, housing rent ceilings be affected? subsidies, and',
  '2. Describe how different geographic areas in Washington state would experience changes under alternative AMI methods, and how many households would be impacted by these changes.',
  '3. Are there any equity concerns associated with the current AMI method or the proposed alternatives? Where would the burden be placed, and what potential unintended could arise from different methods?consequences',
  '4. How does the feasibility of implementing alternative AMI methods compare to the current methodology? What are the potential costs, benefits, and challenges associated with each alternative?',
  '5. How do the strengths and weaknesses of each AMI calculation method compare when considering their impact on housing programs and their feasibility for implementation?',
  '6. Conduct a cost-benefit analysis for each alternative AMI calculation method. Evaluate anticipated costs versus expected benefits and determine the overall cost-benefit ratio.'],
 ('Objective D study questions',
  'Follow-up stakeholder engagement'): ['1. Examine the main challenges identified by stakeholders in implementing the new AMI calculation methods. Provide recommendations and suggestions from stakeholders for overcoming these implementation challenges.', '2. Report on equity and fairness concerns raised by stakeholders, including potential impacts on different demographic groups or geographic areas.', "Analyze stakeholders' perceptions of the effectiveness of the proposed alternative AMI methods compared to the current method and summarize stakeholders' views on the most significant potential effects of these new methods on housing affordability and eligibility for assistance programs.", "4. Report on stakeholders' understanding of the new methods for calculating AMI, highlighting areas that require further clarification.", '5. How do stakeholders view the use of legislative district data for calculating AMI compared to other methods? What advantages or disadvantages do stakeholders see in using legislative district data?'],
 ('Objective E study questions',
  'Provide comprehensive recommendations based on feedback and analysis'): ['1. Evaluate the broader policy implications of adopting each alternative AMI calculation method. Analyze how these methods might influence future housing policies and programs in Washington State, including potential challenges when interacting with income requirements',
  '2. Provide final recommendations for the most suitable alternative methods to calculate AMI in Washington state. Provide a rationale for selecting these methods, aligning them with stakeholder feedback and the overall analysis.',
  '3. Recommend approaches for implementing each alternative AMI calculation method. Outline specific steps and resource requirements for successful implementation.'],
}

In [206]:
# excess criteria for drafting any section
criteria = """1. The technical proposal must contain sufficient detail to convey to members of the evaluation team the Proposer’s knowledge of the subjects and skills necessary to successfully complete the project. Include any required involvement of COMMERCE staff. The Proposer may also present any creative approaches that might be appropriate and may provide any pertinent supporting documentation. Identify any work to be completed by subcontractors but do not select subcontractors until all relevant requirements have been reviewed, including the Code of Federal Regulations if applicable."
2. Project schedule must ensure that all required deliverables are provided. Include a project schedule with deliverables outlining a plan for addressing the question content and reports.
3. The Proposer must identify potential risks that are considered significant to the success of the project in sufficient detail to convey to members of the evaluation team the manage these risks, including timely reporting of risks to COMMERCE.Proposer’s ability correctly assess and manage risk. Include how the Proposer will effectively monitor and manage these risks, including timely reporting of risks to COMMERCE.
4. Fully describe deliverables to be submitted under the proposed contract. Deliverables must support the purpose of this RFP."""

In [None]:
s = Summary(
    document_fp=f'{DATA_FP}/0_synth_rfp.txt',
    label_dict=f'{LABEL_FP}/howard_09122024/0_summary.json')
s.run(llm=gemma)
s.save(f'{DATA_FP}/output/howard_09122024/0_summary_output_{gemma.dict()["model"]}.json')

project name
agency
solicitation number
contact person
email
submission deadline
contract term
source link
summary


In [144]:
# vendor info
vendor_info = open('../data/labels/drafter_09262024/vendor_community_attributes.txt', 'r').read()

In [225]:
class Drafter(object):
    def __init__(self, llm, req_matrix, vendor_info, checklist_dict, criteria):
        self.llm = llm
        self.req_matrix = req_matrix
        self.vendor_info = vendor_info
        self.checklist_dict = checklist_dict
        self.criteria = criteria
        self.checklist_section_dict, self.checklist_section_contents = self._process_checklist(checklist_dict)

    def _store_req_resp(self, req, resp):
        d = {
            'documents': req,
            'response': resp
            }
        return d

    def _process_checklist(self, checklist_dict):
        checklist_section_dict = {}
        for k, kk in checklist_dict:
            # need to process, this is named something different
            trans_k = ' '.join(k.split()[:2])
            if trans_k not in checklist_section_dict:
                checklist_section_dict[trans_k] = []
            checklist_section_dict[trans_k].append(kk)
        checklist_section_contents = {}
        for (k, kk), contents in checklist_dict.items():
            checklist_section_contents[kk] = contents
        return checklist_section_dict, checklist_section_contents

    def _create_checklist_req_prompt(self, section):
        prompt = ''
        # translate section
        trans_section = ' '.join(section.split()[:2])[:-1]
        
        for subsection in self.checklist_section_dict[trans_section]:
            prompt += f"{subsection}\n"
            prompt += '\n'.join(self.checklist_section_contents[subsection])
        return prompt

    def draft(self):
        self.sections = {}
        for section in self.req_matrix:
            prompt = {
                'section': section,
                'vendor_info': self.vendor_info,
                'section_reqs': '\n'.join(self.req_matrix[section]),
                'checklist_reqs': self._create_checklist_req_prompt(section),
                'criteria': self.criteria
                }
            prompt = vendor_requirements.format(**prompt)
            response = call_llm(prompt, self.llm)
            self.sections[section] = self._store_req_resp(prompt, response)

In [226]:
# for each llm, draft
draft_dict = {}

for model in [gemma]:#, oai_3, oai_4]:
    model_name = model.dict()['model']
    if model_name not in draft_dict:    
        drafter = Drafter(model, compiled_dict, vendor_info, checklist_dict, criteria)
        drafter.draft()
        draft_dict[model_name] = drafter.sections
    # output as CSV 
    results = []
    for s in draft_dict[model_name]:
        results.append([s,
        draft_dict[model_name][s]['documents'], 
        draft_dict[model_name][s]['response']])

    results_df = pd.DataFrame(results, columns=['section', 'documents', 'response'])

    results_df.to_csv(f'../data/output/drafter_09262024/drafter_{model_name}_v2.csv', index=False)

Objective A
Objective A
Objective B
Objective C
Objective D
Objective E


In [170]:
# one-off example
prompt = """You are drafting the Objective B: Calculate AMI Using Alternative Methods section of a proposal. Your company details are as follows: 
About Community Attributes Inc (https://communityattributes.com)
* Founded: In 2005 by Chris Mefford, CAI is a Seattle-based consulting firm that focuses on community and economic development. The firm uses demographic, economic, and strategic planning to provide impactful solutions that help communities grow and thrive.
* Specialties: Data storytelling, economic analysis, strategic planning, GIS mapping, and stakeholder engagement. CAI helps clients visualize complex data and make informed decisions for urban planning and organizational development.
Key Management Team
* Chris Mefford (Founder & CEO): Chris has an extensive background in economic development and urban planning. Before founding CAI, he worked in transportation planning and economic analysis. Chris holds an MBA from the University of Washington, an MS in Urban and Regional Planning from the University of Iowa, and a BA in Mathematics and Economics from the University of Northern Iowa. He is a certified planner (AICP) and frequently presents on topics related to regional economic trends and community development?.
* Michaela Jellicoe (Senior Economist): With an MS in Agricultural Economics from Purdue University and a BA in Economics and Political Science from Western Washington University, Michaela specializes in economic impact studies and data analysis. She translates complex data into clear and actionable insights for clients across various sectors?.
* Bryan Lobel (Senior Planner): An expert in urban and economic planning, Bryan focuses on strategies for economic resilience and sustainability in rural communities. He has contributed to numerous statewide impact studies and is known for his work in economic recovery planning?.
* Elliot Weiss (Project Manager): Elliot brings expertise in urban planning and real estate development, holding a Master’s in Urban and Regional Planning and a Graduate Certificate in Real Estate Development from the University of Michigan. His work focuses on urban design, community engagement, and affordable housing projects?.
Past and Ongoing Projects
* Nisqually Earthquake Recovery (2001): CAI supported post-earthquake recovery efforts by providing economic and social impact assessments for the City of Seattle.
* Washington State Agricultural Fairs (Ongoing): CAI is conducting ongoing economic impact studies to evaluate the contributions of regional fairs to the state’s economy.
* City of Bremerton (Ongoing): Urban planning projects in Bremerton have focused on improving traffic management and public infrastructure near Naval Base Kitsap.
* Okanogan County (Ongoing): Economic resilience planning for rural communities, with an emphasis on addressing climate change-related challenges?.


Unique Business Offering
* CAI Live Platform: CAI stands out in the marketplace with its proprietary CAI Live platform, which integrates economic and planning expertise into a dynamic tool for real-time data visualization. This platform helps clients, such as municipalities and state governments, communicate their development strategies more effectively?.
* Data Storytelling Focus: Known for turning complex economic data into visual narratives, CAI excels in using data to tell compelling stories that inform decision-making processes. The firm is widely recognized for its ability to create actionable insights from detailed demographic and economic analyses?.
Capabilities
* Expertise spans GIS mapping, economic development, financial modeling, urban planning, and community engagement. The firm supports organizations in both urban and rural development planning, particularly in areas affected by economic or environmental challenges?.
Delivery Approach
* Collaborative and Technology-Driven: CAI integrates stakeholder feedback with cutting-edge technology to create tailored solutions. The firm's collaborative approach ensures clients are involved throughout the strategic planning process, and the CAI Live platform enhances transparency and data accessibility.

The section you are drafting requires the following:
Data Collection: Collect relevant income, demographic, and housing market data from legislative districts, metropolitan, and non-metropolitan counties in Washington.
Apply Alternative Methods: Calculate AMI using alternative methods and compare results to the current AMI method.
Comparison Tables: Create comparison tables showing the outcomes of each AMI method for different household sizes and geographic areas (e.g., legislative districts, metro/non-metro areas).

When drafting, you should make sure to address these questions and topic areas: 

Calculate AMI using alternate methods
1. Define and explain the methodology and formula for each alternative AMI calculation method.
2. Provide the source and date of each data set included in each alternative AMI calculation method.
3. Provide tables showing the results of each alternative AMI calculation method for currently used geographic areas for different household sizes.
4. Provide tables showing the results of each alternative AMI calculation method using legislative districts for different household sizes.
5. Indicate how many people would be affected by each alternative AMI calculation method in terms of additional people shifted in or out of each income eligible bracket.


Draft the section using the company details as best you can. Answer truthfully, to the best of your knowledge. If there is information you are missing, specify as a set of questions.

Draft: """

result = call_llm(prompt, oai_4)

In [172]:
print(result)

### Objective B: Calculate AMI Using Alternative Methods

#### Introduction
Community Attributes Inc. (CAI), leveraging its expertise in data storytelling, economic analysis, and strategic planning, proposes to undertake a comprehensive assessment of Area Median Income (AMI) calculations using alternative methodologies. This assessment aims to provide our clients with a deeper understanding of the economic landscape across various regions in Washington, from legislative districts to metropolitan and non-metropolitan counties.

#### Data Collection
CAI will collect up-to-date data on income, demographics, and housing markets. Data sources will include the U.S. Census Bureau, HUD, and local government databases, ensuring accuracy and relevancy. The data will be segmented by legislative districts as well as metro and non-metro areas to maintain precise and localized insights.

#### Methodologies for Alternative AMI Calculation
1. **Geometric Mean Method**: Unlike the traditional arithmeti

In [None]:
# just asking it for the methodology
