### Objective

In this notebook, we leverage LangChain framework to develop a dual-chatbot system to perform research paper digesting tasks. Our idea is to let one chatbot to play the role of "journalist", while letting the other chatbot play the role of "author". By watching the conversation between those two chatbots, the user can understand better the main message conveyed by the paper. Additionally, it is also possible for users to ask their own questions to guide the direction of the conversation.

### 1. Import necessary libraries

In [1]:
from embedding_engine import Embedder
from chatbot import JournalistBot, AuthorBot

import ipywidgets as widgets
from IPython.display import display, FileLink, IFrame, HTML, clear_output
from pdfdocument.document import PDFDocument
from fpdf import FPDF
import fitz
import re

### 2. Create paper embeddings

In [2]:
paper = 'Learning the solution operator of parametric partial differential equations with physics-informed DeepOnets'

In [3]:
embedding = Embedder()
embedding.load_n_process_document("../Papers/"+paper+".pdf")
vectorstore = embedding.create_vectorstore(store_path=paper)
paper_summary = embedding.create_summary(arxiv_id='2103.10974')
# paper_summary = embedding.create_summary(llm_engine='Azure')
print(paper_summary)

Embeddings found! Loaded the computed ones
Published: 2021-03-19
Title: Learning the solution operator of parametric partial differential equations with physics-informed DeepOnets
Authors: Sifan Wang, Hanwen Wang, Paris Perdikaris
Summary: Deep operator networks (DeepONets) are receiving increased attention thanks
to their demonstrated capability to approximate nonlinear operators between
infinite-dimensional Banach spaces. However, despite their remarkable early
promise, they typically require large training data-sets consisting of paired
input-output observations which may be expensive to obtain, while their
predictions may not be consistent with the underlying physical principles that
generated the observed data. In this work, we propose a novel model class
coined as physics-informed DeepONets, which introduces an effective
regularization mechanism for biasing the outputs of DeepOnet models towards
ensuring physical consistency. This is accomplished by leveraging automatic
different

### 3. Develop UI for two chatbots

#### 3.1 Define two chatbots

In [4]:
# Create two chatbots
journalist = JournalistBot('OpenAI')
author = AuthorBot('OpenAI', vectorstore)

# Provide instruction
journalist.instruct(topic='physics-informed machine learning', abstract=paper_summary)
author.instruct('physics-informed machine learning')

#### 3.2 Define PDF highlight

In [5]:
def highlight_PDF(file_path, phrases, output_path):
    doc = fitz.open(file_path)
    
    for page in doc:
        for phrase in phrases:            
            text_instances = page.search_for(phrase)

            for inst in text_instances:
                highlight = page.add_highlight_annot(inst)
    
    doc.save(output_path, garbage=4)

#### 3.3 Define buttons and their click callback (UI)

In [6]:
# Buttons 
bot_ask = widgets.Button(description="Bot ask")
user_ask = widgets.Button(description="User ask")
download = widgets.Button(description="Download paper summary",
                         layout=widgets.Layout(width='auto'))

In [7]:
# Button click callback (bot ask button)
def create_bot_ask_callback(title):
    def bot_ask_clicked(b):

        if chat_log.value == '':
            # Starting conversation 
            bot_question = journalist.step("Start the conversation")
            line_breaker = ""

        else:
            # Ongoing conversation
            bot_question = journalist.step(chat_log.value.split("<br><br>")[-1])
            line_breaker = "<br><br>"

        chat_log.value += line_breaker + "<b style='color:blue'>👨‍🏫 Journalist Bot:</b> " + bot_question      

        # Author bot answers
        response, source = author.step(bot_question)  

        # Highlight relevant text in PDF
        phrases = [src.page_content for src in source]
        paper_path = "../Papers/"+title+".pdf"
        highlight_PDF(paper_path, phrases, 'highlighted.pdf')

        page_numbers = [str(src.metadata['page']+1) for src in source]
        unique_page_numbers = list(set(page_numbers))
        chat_log.value += "<br><b style='color:green'>👩‍🎓 Author Bot:</b> " + response + "<br>"
        chat_log.value += "(For details, please check the highlighted text on page(s): " + ', '.join(unique_page_numbers) + ")"
        
    return bot_ask_clicked

In [8]:
# Button click callback (user ask button)
def create_user_ask_callback(title):
    def user_ask_clicked(b):
        
        chat_log.value += "<br><br><b style='color:purple'>🙋‍♂️You:</b> " + user_input.value

        # Author bot answers
        response, source = author.step(user_input.value)
        
        # Highlight relevant text in PDF
        phrases = [src.page_content for src in source]
        paper_path = "../Papers/"+title+".pdf"
        highlight_PDF(paper_path, phrases, 'highlighted.pdf')
        
        page_numbers = [str(src.metadata['page']+1) for src in source]
        unique_page_numbers = list(set(page_numbers))
        chat_log.value += "<br><b style='color:green'>👩‍🎓 Author Bot:</b> " + response + "<br>"
        chat_log.value += "(For details, please check the highlighted text on page(s): " + ', '.join(unique_page_numbers) + ")"

        # Inform journalist bot about the asked questions 
        journalist.memory.chat_memory.add_user_message(user_input.value)

        # Clear user input
        user_input.value = ""
        
    return user_ask_clicked

Generate PDF to download

In [9]:
# Download button callback
def create_download_callback(title):
    def download_clicked(b):
        pdf = PDFDocument('paper_summary.pdf')
        pdf.init_report()

        # Remove HTML tags
        chat_history = re.sub('<.*?>', '', chat_log.value)  
        
        # Remove emojis
        chat_history = chat_history.replace('👨‍🏫', '')
        chat_history = chat_history.replace('👩‍🎓', '')
        chat_history = chat_history.replace('🙋‍♂️', '')
        
        # Add line breaks
        chat_history = chat_history.replace('Journalist Bot:', '\n\n\nJournalist: ')
        chat_history = chat_history.replace('Author Bot:', '\n\nAuthor: ')
        chat_history = chat_history.replace('You:', '\n\n\nYou: ')

        pdf.h2("Paper Summary: " + title)
        pdf.p(chat_history)
        pdf.generate()

        # Download PDF
        print('PDF generated successfully in the local folder!')
        
    return download_clicked

#### 3.4 User interface

In [10]:
paper = 'Learning the solution operator of parametric partial differential equations with physics-informed DeepOnets'

In [11]:
# Chat log text area
chat_log = widgets.HTML(
    value='',
    placeholder='',
    description='',
)

# User input field
user_input = widgets.Text(
    value='',
    placeholder='Question',
    description='',
    disabled=False,
    layout=widgets.Layout(width="60%")
)


# Attach callbacks
bot_ask.on_click(create_bot_ask_callback(paper))
user_ask.on_click(create_user_ask_callback(paper))
download.on_click(create_download_callback(paper))

# Use HBox and VBox for arranging the widgets
first_row = widgets.HBox([bot_ask])
second_row = widgets.HBox([user_ask, user_input])
third_row = widgets.HBox([download])

# Display the UI
display(chat_log, widgets.VBox([first_row, second_row, third_row]))

HTML(value='', placeholder='')

VBox(children=(HBox(children=(Button(description='Bot ask', style=ButtonStyle()),)), HBox(children=(Button(des…

PDF generated successfully in the local folder!


In [12]:
display(IFrame('highlighted.pdf', width=1000, height=600))