### The JSON outline of the research outline follows this structure:

{ 
   "title": "1. Introduction",
   "subsections": [
       {
           "title": "1.1 The Global Food Challenge and the Role of Irrigation",
           "point1": {
               "text": "...",
               "query1": "...",
               "query2": "...",
               ...
           },
           "point2": {
               "text": "...", 
               "query1": "...",
               "query2": "...",
               ...
           },
           ...
       },
       ...
   ]
}

#### The mapping from JSON to Excel is as follows:

1. The `point_text` value itself, mapped to the `text` field.
2. One or more `query` entries, mapped to the `query` field.
3. For each `query` entry, there will be the following dependent fields:
   * `doi`
   * `title` 
   * `full_text`
   * `bibtex`
   * `pdf_location`
   * `journal`
   * `citation_count`
   * `relevance_score`

To reconstruct the Excel table from the JSON, for each `point_text`, its value should be repeated for each associated `query`. This means that if there are 7 queries for a particular `point_text`, that `point_text` value should be repeated 7 times in the Excel table, with each repetition accompanied by the corresponding `query` and its dependent fields.

In [3]:
import os
import json
import pandas as pd

# Set the path to the documents folder
documents_folder = "documents"

# Create an Excel writer object
#output_file = documents_folder + "/master.xlsx"
#writer = pd.ExcelWriter(output_file, engine="openpyxl")

# Iterate over the JSON files in the documents folder
for filename in os.listdir(documents_folder):
    if filename.startswith("outline_") and filename.endswith(".json"):
        file_path = os.path.join(documents_folder, filename)
        
        # Extract the sheet name from the filename
        sheet_name = filename.split("_")[1].split(".")[0]
        
        # Load the JSON data from the file
        with open(file_path, "r") as file:
            json_data = json.load(file)
        
        # Create an empty list to store the data for the current JSON file
        data_list = []
        
        # Extract the data from the JSON structure
        for section in json_data:
            document_title = section["title"]
            for subsection in section.get("subsections", []):
                section_title = subsection["title"]
                for point_key, point_data in subsection.items():
                    if point_key.startswith("point"):
                        for query_key, query_data in point_data.items():
                            if query_key.startswith("query"):
                                data_list.append({
                                    "document_title": document_title,
                                    "section_title": section_title,
                                    "point_text": point_data["text"],
                                    "query": query_data,
                                    "doi": "",
                                    "title": "",
                                    "full_text": "",
                                    "bibtex": "",
                                    "pdf_location": "",
                                    "journal": "",
                                    "citation_count": "",
                                    "relevance_score": ""
                                })
        
        # Create a DataFrame from the data list
        df = pd.DataFrame(data_list)
        
        # Reorder the columns
        column_order = [
            "document_title",
            "section_title",
            "point_text",
            "query",
            "doi",
            "title",
            "full_text",
            "bibtex",
            "pdf_location",
            "journal",
            "citation_count",
            "relevance_score"
        ]
        df = df[column_order]
        
        # Save the DataFrame to a separate sheet in the Excel file
        df.to_excel(writer, sheet_name=sheet_name, index=False)

# Save the Excel file
writer.close()
print(f"Data from JSON files loaded and saved to {output_file}.")

NameError: name 'writer' is not defined

In [4]:
import openpyxl
import asyncio
import json
from llm_api_handler import LLM_APIHandler
import nest_asyncio
nest_asyncio.apply()

from prompts import get_prompt, remove_illegal_characters, outline, review_intention

class ExcelQueryProcessor:
    def __init__(self, file_path, api_key_path):
        self.workbook = openpyxl.load_workbook(file_path)
        self.api_handler = LLM_APIHandler(api_key_path)

    async def process_queries(self, sheet_name, prompt_template, input_columns, output_columns):
        sheet = self.workbook[sheet_name]
        header_row = next(sheet.iter_rows(values_only=True))
        column_names = {cell: index for index, cell in enumerate(header_row)}
        
        query_data = []
        for row in sheet.iter_rows(values_only=True, min_row=2):
            try:
                if not row[column_names["full_text"]]:
                    continue
                query_values = {col: row[column_names[col]] for col in input_columns}
                query_data.append({"unique_identifier": row[column_names["unique_identifier"]], "query": query_values})
            except Exception as e:
                print(f"Error occurred while processing row: {str(e)}")
                break

        if not query_data:
            print("No valid queries found. Skipping API calls.")
            return

        try:
            results = await self.process_queries_async(query_data, prompt_template, output_columns)
            self.update_sheet(sheet, results, output_columns, column_names)
            self.workbook.save(file_path)
        except Exception as e:
            print(f"Error occurred during processing: {str(e)}")

    async def process_queries_async(self, query_data, prompt_template, output_columns):
        tasks = [asyncio.ensure_future(self.process_single_query(query, prompt_template, output_columns)) for query in query_data]
        return await asyncio.gather(*tasks)

    async def process_single_query(self, query, prompt_template, output_columns):
        prompt = get_prompt(prompt_template, outline=outline, review_intention=review_intention, **query["query"])
        response = await self.api_handler.generate_gemini_content(prompt)
        return {"unique_identifier": query["unique_identifier"], "response": response}

    def update_sheet(self, sheet, results, output_columns, column_names):
        for col in output_columns:
            if col not in column_names:
                sheet.cell(row=1, column=sheet.max_column + 1, value=col)
                column_names[col] = sheet.max_column

        for result in results:
            if result is None:
                continue  # Skip writing if the result is None due to an error
            unique_identifier = result["unique_identifier"]
            response = result["response"]
            try:
                row = [r for r in sheet.iter_rows() if r[column_names["unique_identifier"]].value == unique_identifier][0]
                for col, value in response.items():
                    if col in output_columns:
                        cleaned_value = remove_illegal_characters(str(value))
                        sheet.cell(row=row[0].row, column=column_names[col], value=cleaned_value)
            except IndexError:
                print(f"Row not found for unique_identifier: {unique_identifier}")
            except Exception as e:
                print(f"Error occurred while writing to sheet: {str(e)}")

# Usage example
file_path = "documents/master.xlsx"
api_key_path = r"C:\Users\bnsoh2\OneDrive - University of Nebraska-Lincoln\Documents\keys\api_keys.json"
sheet_name = "1"
prompt_template = "paper_analysis"
input_columns = ["point_text", "section_title", "document_title", "full_text", "unique_identifier"]
output_columns = ["relevance_score", "explanation", "extract_1", "extract_2", "Alternate_section"]

processor = ExcelQueryProcessor(file_path, api_key_path)
asyncio.run(processor.process_queries(sheet_name, prompt_template, input_columns, output_columns))

Elapsed time: 1711119818.8536696
Elapsed time: 1711119818.8536696
Elapsed time: 1711119818.8536696
Elapsed time: 1711119818.8637197
Elapsed time: 1711119818.8637197
{
 "explanation": "The provided research paper does not contain the phrase you specified, \"the challenge of feeding a growing population with finite resources.\" So I am unable to provide any relevant extracts or a relevance score. However, the provided research paper does briefly mention other aspects of irrigation management, such as the need for water-efficient practices and the importance of real-time data for dynamic irrigation systems.",
 "extract_1": "",
 "extract_2": "",
 "relevance_score": "0.0",
 "Alternate_section": ""
}
Elapsed time: 0.0
{
    "explanation": "The provided document does not contain information relevant to the specific point in question, which is: \n\n* The challenge of feeding a growing population with finite resources\n\nHence, the relevance score is assigned as zero. ",
    "extract_1": null,


ERROR:root:Error in Gemini API call: The `response.parts` quick accessor only works for a single candidate, but none were returned. Check the `response.prompt_feedback` to see if the prompt was blocked.


Elapsed time: 1.7141025066375732
Error occurred during processing: The `response.parts` quick accessor only works for a single candidate, but none were returned. Check the `response.prompt_feedback` to see if the prompt was blocked.


{
    "explanation": "Unfortunately, the provided context does not contain any relevant information on the necessity of scalable water-efficient practices for increasing food demand. Thus, I cannot fulfill that part of the request.",
    "extract_1": "",
    "extract_2": "",
    "relevance_score": 0.0,
    "Alternate_section": ""
}
Elapsed time: 0.0
{
    "explanation": "The provided research paper discusses various approaches to irrigation scheduling, including evapotranspiration and soil water balance, soil moisture, plant water status, and simulation models. It focuses primarily on the automation of irrigation scheduling and data processing in real-time systems. However, it does not provide information on the automation of the scheduling process for each part of the irrigation management pipeline or the seamless integration of each section in the context of irrigation scheduling and management. Therefore, I cannot answer the question based on the provided research paper.",
    "extr

ERROR:root:Error in Gemini API call: No valid JSON found in the response


I apologize, but I am unable to properly analyze the relevance of the given research paper to the specific point of automating the integration of each section of the irrigation management pipeline and the seamless integration of each section in the context of irrigation scheduling and management, as the full context of the review and the research paper you provided are not available to me. Therefore, I cannot provide an analysis, extract the most relevant quotes, assign a relevance score, or provide alternate section headings.
Elapsed time: 0.0
{
    "explanation": "The provided paper looked into the use of sensor-based automated irrigation in commercial floriculture production to study the advantages it offers to producers and their acceptance of the new technology. The research was conducted in collaboration with Davis Floral Company (DFC), a commercial floriculture producer. Soil moisture sensor-based automated irrigation was compared to traditional timer-based grower-controlled irr

ERROR:root:Error in Gemini API call: No valid JSON found in the response


**explanation**

unfortunately, the provided research paper discusses water use and tolerance in cereal crops in Sub-Saharan Africa, but it does not discuss automated irrigation management systems. Therefore, I cannot provide an analysis of the paper's relevance to the point in question or assign a relevance score. 

**extract_1**

N/A

**extract_2**

N/A

**relevance_score**

N/A

**Alternate_section**

N/A
Elapsed time: 0.0020034313201904297
{
    "explanation": "I apologize, but the text you have provided does not contain information pertaining to the necessity of scalable water-efficient practices for increasing food demand and therefore, I am unable to assess its relevance to the specific point in the context of the review outline.",
    "extract_1": null,
    "extract_2": null,
    "relevance_score": 0.0,
    "Alternate_section": null
}
Elapsed time: 0.0010905265808105469
{
 "explanation": "The provided paper does not contain the definition of irrigation scheduling and management