# Homework of week 2

**本周作业内容如下：**

    （1）Using your private domain data (for example, a set of research papers, a set of financial report documents, ...)
    （2）Write a python program to answer questions about your data
    • Receive question from console and generate responses in streaming way
    • Using OpenAI SDK (or Anthropic)
    （3）Refer to the examples in
    https://github.com/hzeng-otterai/chatbot-example/tree/main/backend_api

作业文件夹：
https://drive.google.com/drive/u/0/folders/1Xod8qKWVxqa5aW12g7pULK8fGJzpcjCW

**Summary of Homework of Week 2**

This Jupyter Notebook is focused on extracting text from PDF files and using an AI assistant to answer finance-related questions about Bayer. 

The notebook demonstrates how to integrate PDF text extraction with an AI assistant to provide detailed responses to finance-related queries.

    Define a system prompt template for the AI assistant, Bobby, to answer finance-related questions about Bayer.
    Read the extracted text files (excluding specific reports) and compile the content into a single context.
    Define asynchronous functions chat_func() and continous_chat() to handle user interactions with the AI assistant.
    Run the continuous chat function to allow user input and AI responses in a loop.



In [1]:
import json
from openai import OpenAI
from dotenv import load_dotenv
import os

# Specify the path to the .env file
dotenv_path = '../.env'

# Load environment variables from the specified .env file
load_dotenv(dotenv_path)

# Get the API key
openai_api_key = os.getenv('OPENAI_API_KEY')


In [2]:
import sys
import openai

print("Python version:", sys.version)
print("OpenAI version:", openai.__version__)

Python version: 3.9.7 (default, Sep 16 2021, 08:50:36) 
[Clang 10.0.0 ]
OpenAI version: 1.59.7


In [2]:
import glob
import PyPDF2

# extract the text from the multiple pdf files under files/bayer_finance, and save to txt files under files/bayer_finance.txt
def extract_text_from_pdf():
    context_files = glob.glob("../files/bayer_finance/*.pdf")
    for file_path in context_files:
        with open(file_path, 'rb') as file:
            reader = PyPDF2.PdfReader(file)
            text = ""
            for page_num in range(len(reader.pages)):
                page = reader.pages[page_num]
                text += page.extract_text() + "\n"
            txt_file_path = file_path.replace('.pdf', '.txt')
            with open(txt_file_path, 'w') as txt_file:
                txt_file.write(text)

extract_text_from_pdf()


In [None]:
from openai import AsyncOpenAI
import asyncio
from datetime import datetime
import nest_asyncio



nest_asyncio.apply()

client = AsyncOpenAI()

# Construct the system prompt
system_prompt_template = """You are Bobby, a virtual assistant create by Junjie. 
Today is {today}. You provide responses to the finance questions about Bayer.
Please answer each question truthfully to 
the best of your abilities based on the provided information. 
If you are unsure of an answer, please state that you are unsure.
...

<context>
{context}
</context>
"""


context_files = glob.glob("../files/bayer_finance/*.txt")
context_files = [file for file in context_files if not file.endswith(('bayer-annual-report-2023.txt', 'bayer-quarterly-statement-q1-2024.txt'))]
# print(context_files)
context_content = ""

for file_path in context_files:
    with open(file_path, 'r') as file:
        context_content += file.read() + "\n"

with open("final_context.txt", "w") as context_file:
    context_file.write(context_content)

system_prompt = system_prompt_template.format(
    context=context_content, 
    today=datetime.today().strftime('%Y-%m-%d')
)

async def chat_func(history):

    result = await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "system", "content": system_prompt}] + history,
        max_tokens=256,
        temperature=0.5,
        stream=True,
    )

    buffer = ""
    async for r in result:
        next_token = r.choices[0].delta.content
        if next_token:
            print(next_token, flush=True, end="")
            buffer += next_token

    print("\n", flush=True)

    return buffer

async def continous_chat():
    history = []

    # Loop to receive user input continously
    while(True):
        user_input = input("> ")
        if user_input == "exit":
            break

        print(f"User: {user_input}")
        history.append({"role": "user", "content": user_input})

        # notice every time we call the chat function
        # we pass all the history to the API
        bot_response = await chat_func(history)

        history.append({"role": "assistant", "content": bot_response})

asyncio.run(continous_chat())



User: hello
Hello! How can I assist you today with your finance questions about Bayer?

User: who are you
I am Bobby, a virtual assistant created by Junjie. I'm here to help you with finance-related questions about Bayer. How can I assist you today?

User: How about the finance condictions of Bayer
As of the latest available information from Q3 2024, Bayer's financial conditions can be summarized as follows:

1. **Sales**: Bayer reported sales of €9.968 billion in Q3 2024, which is a decline of 3.6% compared to €10.342 billion in Q3 2023. The decline was attributed to challenges in the Crop Science division, particularly due to lower sales of glyphosate-based herbicides and reduced corn acreage in Latin America.

2. **EBITDA**: The EBITDA before special items decreased by 25.8% to €1.251 billion in Q3 2024, down from €1.685 billion in Q3 2023. This decline was influenced by negative currency effects and a short-term incentive (STI) effect.

3. **Net Income**: Bayer reported a net incom