# Quick page-wise summarizer

Summarizer
Goal: Summarize a long-form RFP

    Page-wise summaries
        Input
            RFP, broken into pages
            Summary prompt
        Output
            Short summary, per page
    Consolidator
        Input
            Per page summary
            Consolidation prompt
        Output
            Consolidated summary

In [1]:
%load_ext autoreload
%autoreload 1

In [2]:
import sys
sys.path.append('../rfpgo/')
from credentials import *
from process.prompts import *
from utils import *
import os
import pandas as pd
from pathlib import Path
import PyPDF2
os.environ["OPENAI_API_KEY"] = OPENAI_KEY
os.environ["ANTHROPIC_API_KEY"] = ANTHROPIC_KEY

In [3]:
from langchain.llms import Ollama
from langchain_openai import OpenAI, ChatOpenAI
from langchain_anthropic import ChatAnthropic
# trying gemma 2
gemma = Ollama(model="gemma2")
oai_3 = ChatOpenAI(model='gpt-3.5-turbo')
oai_4 = ChatOpenAI(model='gpt-4-turbo')
oai_4o = ChatOpenAI(model='gpt-4o')
oai_4omini = ChatOpenAI(model='gpt-4o-mini')
anth_haiku = ChatAnthropic(model='claude-3-haiku-20240307')
anth_opus = ChatAnthropic(model='claude-3-opus-20240229')

  gemma = Ollama(model="gemma2")


## Summarizer
This is a different sort of summarizer than the quick one - the idea is to summarize page by page and then consolidate those into a single summary

In [None]:
class Summarizer(object):
    page_prompt = page_summary
    consolidate_prompt_long = consolidate_summary_long
    consolidate_prompt_short = consolidate_summary_short

    def __init__(self, llm, fn):
        self.llm = llm
        self.llm_name = llm.dict()['model']
        self.split_doc = self._splitter(fn)

    def _splitter(self, doc):
        # Open the PDF file
        pdf_file = open(fn, 'rb')

        # Create a PDF reader object
        pdf_reader = PyPDF2.PdfReader(pdf_file)

        # Loop through each page and extract the text
        collect = []
        for p in pdf_reader.pages:
            text = p.extract_text()
            collect.append(text)
        return collect 
    
    def summarize(self):
        # pagewise summaries
        self.page_summaries = []
        for s in self.split_doc:
            p = self.page_prompt.format(document=s)
            self.page_summaries.append(call_llm(p, self.llm))

        # consolidate
        joined_p = ''
        for i, p in enumerate(self.page_summaries):
            joined_p += f'Page {i+1}: {p}\n\n'
        c = self.consolidate_prompt_long.format(document=joined_p)
        self.summary = call_llm(c, self.llm)

        # short summary
        c = self.consolidate_prompt_short.format(document=self.summary)
        self.summary_short = call_llm(c, self.llm)

        


In [None]:
# example rfp
fn = '../data/labels/drafter_09262024/RFP_Study to evaluate methods to calculate area median income.pdf'

In [None]:
summary_dict = {}

for model in [gemma, oai_3, oai_4]:
    model_name = model.dict()['model']
    if model_name not in summary_dict:
        summary_dict[model_name] = Summarizer(model, fn)
        summary_dict[model_name].summarize()

    # output to csv
    result = pd.DataFrame.from_records(zip(
        summary_dict[model_name].split_doc, 
        summary_dict[model_name].page_summaries),
        columns=['document', 'summary'])
    result.loc[0, 'long_summary'] = summary_dict[model_name].summary
    result.loc[0, 'short_summary'] = summary_dict[model_name].summary_short
    result.to_csv(f'../data/output/drafter_09262024/summaries_{model_name}.csv', index=False)


**Key Highlights:**

* **Financial Terms:**
    * Detailed breakdown of compensation limits, expense reimbursement procedures, billing requirements, and payment timelines on Page 33.
    * Specific invoicing requirements, prohibited duplicate billing practices, and responsibility for disallowed costs outlined on Page 34.
    * Mandatory use of the Access Equity platform for subcontractor data collection and reporting.
* **Insurance Requirements:**
    * Specific types and minimum coverage amounts required for liability, cyber liability, automobile liability, professional errors and omissions, and fidelity insurance on Pages 35 & 36.
* **Legal and Ethical Obligations:**
    * Compliance with the Americans with Disabilities Act (ADA), restrictions on contract assignment, attorney fees provisions, confidentiality obligations, safeguarding personal information, and addressing potential conflicts of interest on Page 38.
    * Rules regarding conflicts of interest, copyright ownership, and 

In [None]:
# display results
print(gemma_summary.summary_short)
print('--')
print(gemma_summary.summary)


##  Washington Department of Commerce Contract Template Summary (For Bid Response)

**Focus:** This template outlines the legal framework for working with the Washington Department of Commerce (COMMERCE). **Your response must address ALL its provisions.**

**Key Areas to Address:**

* **Basic Information:** Clearly state your company details, contract amount, funding source, project dates, and purpose.
* **Special Terms & Conditions:**  Detail your compensation structure, expense policies, billing procedures, and payment timeline. Demonstrate understanding of COMMERCE's right to terminate or withhold payments.
* **Financial Management:**  
    * Outline your billing procedures and ensure compliance with cost limitations.
    * Clearly state how you will report subcontractor information through the Access Equity platform.
    * Provide evidence of all required insurance coverage (general liability, cyber liability, automobile liability, professional liability, and fidelity). 
* **Legal 