In [1]:
from ccai9012 import llm_utils

In [5]:
import pandas as pd
import os

## Initilize and test LLM

In [13]:
# Initialize LLM
api_key = llm_utils.get_deepseek_api_key()
llm = llm_utils.initialize_llm()

Enter your DEEPSEEK_API_KEY:  Â·Â·Â·Â·Â·Â·Â·Â·


In [21]:
# Test connection
test = ["Is 9.9 or 9.11 bigger?"]
llm_utils.ask_llm(test)


ðŸ“Œ Prompt:
['Is 9.9 or 9.11 bigger?']

Let's compare the two numbers: **9.9** and **9.11**.

---

**Step 1: Compare the whole number part**  
Both numbers have the same whole number part: **9**.

---

**Step 2: Compare the decimal part**  
- 9.9 means **9 + 0.9**  
- 9.11 means **9 + 0.11**

Now compare 0.9 and 0.11:  
0.9 = 0.90, which is greater than 0.11.

---

**Step 3: Conclusion**  
Since 0.90 > 0.11, we have:

\[
9.9 > 9.11
\]

---

\[
\boxed{9.9}
\]



## Sparse and embedding the document

In [5]:
retriever = llm_utils.build_pdf_retriever(
    pdf_path="data/1526439.pdf",
    embedding_model_name="BAAI/bge-base-en-v1.5"
)

  embedding_model = HuggingFaceEmbeddings(model_name=embedding_model_name)


## Summarize the document

In [13]:
summary = llm_utils.run_qa_chain(
    query="Please summarize the location, main objectives, actions, and stakeholders described in this energy plan document.",
    retriever=retriever,
    llm=llm,
    return_sources=True,
    save_path="output/summary.txt"
)


--- Final Answer ---
Based on the provided text, here is a summary of the described energy plan.

**Location:**
The project is located in **Aniak, Alaska**.

**Main Objectives:**
The overall goal is to create an **Energy Action Plan** for the Aniak Traditional Council. The key objectives are:
*   Improve energy efficiency in tribal buildings.
*   Summarize energy audit recommendations.
*   Provide actionable steps to guide future retrofit projects.
*   Help the Tribe achieve its overall energy goals.

**Actions:**
The project followed a specific procedure, which included these key actions:
1.  **Holding a Kick-off Meeting:** Discussing project goals with the Tribal Council and developing a data monitoring plan to track energy use and occupant comfort.
2.  **Conducting Energy Audits:** Energy Audits of Alaska performed assessments of the tribal buildings to identify energy efficiency measures.
3.  **Developing the Energy Action Plan:** Creating this document, which includes audit summa

## Ask specific question

In [45]:
answer = llm_utils.run_qa_chain(
    query="Give detailed description of the responsibility of Cold Climate Housing Research Center (CCHRC)",
    retriever=retriever,
    llm=llm,
    return_sources=True,
    save_path="output/question.txt"
)


--- Final Answer ---
The **Cold Climate Housing Research Center (CCHRC)** plays a key role in supporting energy efficiency and renewable energy projects in cold climates, particularly in communities like Aniak, Alaska. Based on the provided context, here are the detailed responsibilities of CCHRC in the **Aniak Energy Action Plan 2019**:

### **Primary Responsibilities:**
1. **Project Coordination & Planning**  
   - Collaborates with **Energy Audits of Alaska** to develop an **Energy Action Plan** for the Aniak Tribe.  
   - Helps the Tribe meet grant requirements (e.g., writing quarterly progress reports, final reports, and outreach materials).  

2. **Energy Audits & Assessments**  
   - Works with **Energy Audits of Alaska** to conduct **on-site energy assessments** of buildings in Aniak.  
   - Collects **baseline data** (energy use, building conditions, occupant comfort) before audits.  

3. **Data Management & Reporting**  
   - Stores **baseline data** on CCHRCâ€™s server and 

## Extract information form multiple documents for comparison

In [17]:
from langchain.prompts import PromptTemplate

structured_prompt = PromptTemplate.from_template(
"""
Given the following document text, extract key information and output a markdown table with columns:

| Location | Main Objectives | Key Actions | Stakeholders | Timeline |
|----------|-----------------|-------------|--------------|----------|

Use exact information from the text; if any info is missing, write "N/A".

Context:
{context}

Question:
{question}
"""
)

In [27]:
folder_path = "data"
save_csv_path = "output/multiple_comparison.csv"
query_text = "Extract key information from this document."
results = []

for fname in sorted(os.listdir(folder_path)):
    if not fname.lower().endswith(".pdf"):
        continue
    pdf_path = os.path.join(folder_path, fname)
    print(f"Processing {pdf_path} ...")

    # Build retriever for this pdf
    retriever = llm_utils.build_pdf_retriever(pdf_path)

    # Run QA chain for structured extraction
    extracted_text = llm_utils.run_qa_chain(
        query=query_text,
        retriever=retriever,
        llm=llm,
        prompt_template=structured_prompt,
        return_sources=False
    )

    results.append({
        "pdf_path": fname,
        "extracted_text": extracted_text,
    })

df = pd.DataFrame(results)
os.makedirs(os.path.dirname(save_csv_path), exist_ok=True)
df.to_csv(save_csv_path, index=False, encoding="utf-8-sig")
print(f"\n Saved results to {save_csv_path}")

Processing data/1526439.pdf ...

--- Final Answer ---
Based on the provided text, here is the extracted key information in the requested markdown table format.

| Location | Main Objectives | Key Actions | Stakeholders | Timeline |
|----------|-----------------|-------------|--------------|----------|
| Aniak | Improve building condition; Decrease energy use; Determine if future retrofit projects are successful. | Check fuel and electrical use monthly; Survey buildings for layout, insulation, condition, and safety; Collect and analyze baseline data; Maintain a maintenance notebook; Monitor data related to seasonal energy use and building operation. | Contractors; Traditional Council; Project staff; Tribal finance director; Tribal building maintenance manager; Occupants; Cold Climate Housing Research Center (CCHRC) | Monthly (energy bill analysis); Period just prior to the energy audit (for baseline data); Mid-2019 (time of publication) |
Processing data/1526994.pdf ...

--- Final Answe

## Combine the result extracted from the single dcoument

In [23]:
merged_df_list = []
df = pd.read_csv("output/multiple_comparison.csv")

for i, md in enumerate(df["extracted_text"]):
    try:
        parsed = llm_utils.parse_markdown_table(md)
        parsed["source_doc"] = df.loc[i, "pdf_path"] if "pdf_path" in df.columns else f"doc_{i+1}"
        merged_df_list.append(parsed)
    except Exception as e:
        print(f"Failed to parse doc {i+1}: {e}")

final_df = pd.concat(merged_df_list, ignore_index=True)
final_df.to_csv("output/merged_energy_policy.csv", index=False)

In [25]:
final_df

Unnamed: 0,Location,Main Objectives,Key Actions,Stakeholders,Timeline,source_doc
0,Aniak,Improve building condition; Decrease energy us...,Check fuel and electrical use monthly; Survey ...,Contractors; Traditional Council; Project staf...,Monthly (energy bill analysis); Period just pr...,1526439.pdf
1,Kwigillingok,Improve building condition; Decrease energy us...,Survey buildings; Analyze energy bills; Mainta...,I.R.A. Council; Project staff; Tribal finance ...,,1526994.pdf
2,Atmautluak,Create a data monitoring plan for each buildin...,1. Meet to discuss project goals. ; 2. Collect...,"Tribe, CCHRC (Cold Climate Housing Research Ce...",May 2018,1527001.pdf
3,,- Improve building condition; - Decrease energ...,- Survey buildings; - Analyze energy bills; - ...,- I.R.A. Council; - Contractors; - Service pro...,,1527003.pdf


## Is LLM able to further analysis the combined summary?

In [27]:
summary_prompt = PromptTemplate.from_template(
    """
    Below are extracted policy tables from multiple documents.

    Please compare and summarize:
    - What are the common energy policy goals?
    - Which regions have the most comprehensive plans?
    - Are there unique or innovative actions mentioned?
    - Summarize the main differences in stakeholders and timelines.

    Keep the answer concise and structured in paragraphs with bullet points.

    Tables:
    {context}
    """
)

from langchain.chains import LLMChain

all_tables = "\n\n".join(df["extracted_text"].dropna().tolist())

llm_chain = LLMChain(prompt=summary_prompt, llm=llm)
summary = llm_chain.invoke({"context": all_tables})

print(summary["text"])
with open("output/energy_policy_summary.txt", "w", encoding="utf-8") as f:
    f.write(summary["text"])

Based on the provided policy tables for the Alaskan communities of Aniak, Kwigillingok, and Atmautluak, here is a comparative summary.

### Common Energy Policy Goals
The communities share nearly identical primary objectives, focusing on practical, immediate improvements to building efficiency and energy management.
*   **Improve Building Condition:** All plans prioritize assessing and upgrading the physical state of buildings, including insulation, siding, and safety.
*   **Decrease Energy Use:** A universal goal is to reduce consumption of fuel and electricity to lower costs and increase sustainability.
*   **Data-Driven Planning:** Each project emphasizes the critical role of collecting and analyzing baseline data (e.g., from energy bills and building surveys) to inform future retrofit projects and measure success.

### Regions with the Most Comprehensive Plans
While all plans are focused, Atmautluak's project outline is the most structured and comprehensive.
*   **Atmautluak** has 