Arseno Feri Alzahabi | arse

# Generative AI for Financial Chatbots Development

Congratulations on making it this far in this self-paced Generative AI lesson series! Before you attempt this challenge, you should complete the workbook to have a baseline understanding of the materials presented in the challenge:

- [Generative AI Series 1: Generative AI for Finance](https://docs.sectors.app/recipes/generative-ai-python/01-background)
- [Generative AI Series 2: Tool Use and Function Calling for Finance LLMs](https://docs.sectors.app/recipes/generative-ai-python/02-tool-use)
- [Generative AI Series 3: Structured Output](https://docs.sectors.app/recipes/generative-ai-python/03-structured-output)
- [Generative AI Series 4: Conversational Tool Use AI](https://docs.sectors.app/recipes/generative-ai-python/04-conversational)

---

## Generative AI Workshop

The materials are specifically designed for the following workshop by Supertype, and it might be beneficial to join the workshop (\$9, +\$4 for certification grading, post-workshop support and API credits) for a live-instructor, hands-on experience if you're new to the topics covered.

- [Generative AI for financial chatbots workshop](https://supertype.ai/financial-chatbots/)

## Make a Copy for submission
Please use File > Save a Copy in Drive to duplicate this assignment template.

When you have completed the challenge, submit it to the GitHub discussion thread for grading! Good luck!

---

## Part 1: Text Extraction AI

For the Challenge in this chapter, we are going to build an AI agent that can (1) extract information from unstructured
text, (2) run validation checks on the extracted data based on schema constraints and business logic rules, and (3) generate a structured response ready
for downstream tools to process.

This has many practical applications. You can imagine an assistant chatbot that extract information from loose text such as news,
press releases, or even user's conversational queries, and then generate structured responses to be fed into a downstream tool. One might
also imagine a chatbot that allow user to upload a document, extract information, and then perform some actions based on the extracted data.

### 5 Instructions
There are 5 instructions in total. Each successful implementation earns you 1 point. Successfully running the following cell (`python -m pytest`) with the expectected output earns you another 1 point.

The total score for Part 1 is 6 points.

In [1]:
!pip install langchain-core
!pip install langchain-openai
!pip install langgraph
!pip install langchain-groq



In [2]:
%%file test_parser.py

from typing import Optional
import pytest
import os
import unittest
from dotenv import load_dotenv
from pydantic import BaseModel, Field, field_validator, model_validator

from langchain_core.output_parsers import PydanticOutputParser
from langchain_core.prompts import PromptTemplate

from langchain_core.exceptions import OutputParserException
from langchain_groq import ChatGroq

load_dotenv()
GROQ_API_KEY = os.getenv("GROK_API_KEY")

# 1. bring in your llm
llm = ChatGroq(
    temperature=0,
    model_name="llama3-groq-70b-8192-tool-use-preview",
    groq_api_key=GROQ_API_KEY,
)

class Stock(BaseModel):
    """Information about a company's stock"""

    symbol: str = Field(description="The stock symbol")
    name: str = Field(description="The name of the company for which the stock symbol represents")
    sector: Optional[str] = Field(default=None, description="The sector of the company")
    industry: Optional[str] = Field(default=None, description="The industry of the company")
    market_cap: Optional[int] = Field(default=None, description="The market capitalization of the company")
    # 2. implement the other fields
    # ...

    @model_validator(mode="before")
    @classmethod
    def validate_symbol_4_letters(cls, values: dict) -> dict:
        print(values)
        symbol = values['symbol']
        # 3. implement LLM validation logic
        # ...
        
        if len(symbol) != 4:
            raise ValueError("Symbol must be 4 letters long")
        return values
    
    @field_validator("market_cap", mode="before")
    @classmethod
    def validate_market_cap(cls, value: int) -> int:
        print(value)
        if value < 0:
            raise ValueError("Market cap must be greater than 0")
        return value

parser = PydanticOutputParser(pydantic_object=Stock)

prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

runnable = prompt | llm | parser


class TestParser(unittest.TestCase):
    def test_output_parser_symbol_valid(self):
        text = """
        Bank Central Asia (BBCA) is a bank in Indonesia and is part of the finance sector.
            It is in the banking industry and has a market capitalization of $8.5 billion.
        """
        # 4. implement when symbol and market cap (and other fields) are all valid
        out = runnable.invoke(text)
        assert len(out.symbol) == 4
        assert out.market_cap > 0
        assert len(out.name) > 0
        
        


    def test_output_parser_symbol_invalid(self):
        text = """
        Bank Central Asia (BCA) is a bank in Indonesia and is part of the finance sector.
            It is in the banking industry and has a market capitalization of $8.5 billion.
        """

        # assert exception is raised when the symbol is not 4 letters long
        with pytest.raises(OutputParserException):
            out = runnable.invoke(text)

    def test_output_parser_mcap_invalid(self):
        text = """
        Bank Central Asia (BBCA) is a bank in Indonesia and is part of the finance sector.
            It is in the banking industry and has a market capitalization of $-8.5 billion.
        """

        # 5. assert exception is raised when extraction task fail by detecting <0 market cap
        with pytest.raises(OutputParserException):
            out = runnable.invoke(text)


Overwriting test_parser.py


In [3]:
import os

# 6. run this with 3 passes
!python -m pytest test_parser.py

platform win32 -- Python 3.12.7, pytest-8.3.3, pluggy-1.5.0
rootdir: c:\Users\Arseno Feri Alzahabi\OneDrive\Project\Sector Training\LLM\Challange
plugins: anyio-4.6.2.post1
collected 3 items

test_parser.py [32m.[0m[32m.[0m[32m.[0m[32m                                                       [100%][0m



- Do not alter any of the `text` prompt. Doing so invalidatest the purpose of the quiz / challenge.
- Each correct implementation gets you 1 point. Successfully executing the cell above (`python -m pytest test_parser.py`) with the expected output gets you another 1 point. You get a total of 5+1 points from this section above.  

## Part 2: A LangGraph ReAct Agent with retriever tools

In [4]:
import json
import requests
from typing import List

from langchain_core.tools import tool
from langchain.agents import initialize_agent, AgentType
from langchain_groq import ChatGroq
from langchain.prompts import SystemMessagePromptTemplate
from langchain_core.messages import HumanMessage, SystemMessage
from langchain.agents import AgentExecutor
from langgraph.prebuilt import create_react_agent

from dotenv import load_dotenv

load_dotenv()


SECTORS_API_KEY = os.getenv('SECTOR_API_KEY')
GROQ_API_KEY = os.getenv('GROK_API_KEY')


def retrieve_from_endpoint(url: str) -> dict:
    headers = {"Authorization": SECTORS_API_KEY}

    try:
        response = requests.get(url, headers=headers)
        response.raise_for_status()
        data = response.json()
    except requests.exceptions.HTTPError as err:
        raise SystemExit(err)
    return json.dumps(data)


@tool
def get_company_overview(stock: str) -> str:
    """
    Get company overview

    @param stock: The stock symbol of the company
    @return: The company overview
    """

    url = f"https://api.sectors.app/v1/company/report/{stock}/?sections=overview"
    return retrieve_from_endpoint(url)

@tool
def get_top_companies_ranked(dimension: str, n:str,year:str) -> List[str]:
   # 7. implement this tool correctly, using the tool implementation above as reference

   """
    Get top companies ranked by a certain dimension

    @param dimension: The dimension to rank the companies by dividend_yield, Earnings, market_cap, pb, pe (P/E or price earning ratio), ps, revenue, total_dividend. This is a required field and only can fill by these list: ['dividend_yield', 'Earnings', 'market_cap', 'pb', 'pe', 'ps', 'revenue', 'total_dividend']
    @param n: The number of companies to return. this is required field and only can fill by string number
    @param year: year of the data if user don't fill this field, the default value is 2023
    @sub_sector : sub sector of the company if user don't fill this field then don't give value because this is optional field
    @return : The top companies ranked by the dimension
   """

   url = f"https://api.sectors.app/v1/companies/top/?classifications={dimension}&n_stock={n}&year={year}&sub_sector=&min_mcap_billion=5000"
   return retrieve_from_endpoint(url)


llm = ChatGroq(
    temperature=0,
    model_name="llama3-groq-70b-8192-tool-use-preview",
    groq_api_key=GROQ_API_KEY,
)

tools = [
    get_company_overview,
    get_top_companies_ranked,
]

# 8: ask that floating numbers are returned in 2 decimal points so the result is prettier
# return full company name, symbol, and the value (in the case of companies by p/e values, return the p/e
# but in 2 decimal points)

system_message = "Please provide the full company name, its symbol, and any associated values (such as P/E ratios) with floating-point numbers rounded to two decimal places for a cleaner presentation. If presenting companies by P/E values, ensure these values are rounded to two decimal points.Then sort the list from biggest value to smallest value."

# 9: implement the below correctly, with llm, tools, and system_message as state modifier
app = create_react_agent(llm, tools, state_modifier=system_message)

def query_app(text: str) -> str:
    out = app.invoke(
        {
            "messages": [
                HumanMessage(text),
            ]
        }
    )
    # return out["messages"][-1].content
    return out["messages"]

out_agent = query_app(
    "Get me the top 7 companies based on P/E values, along with their full company namem,company code, and PE values"
)

print(out_agent[-1].content)


Here are the top 7 companies based on P/E values, along with their full company name, company code, and PE values:
1. ABM Investama Tbk - ABMM.JK - PE: 2.09
2. Adaro Energy Indonesia Tbk - ADRO.JK - PE: 2.89
3. Indo Tambangraya Megah Tbk - ITMG.JK - PE: 3.74
4. United Tractors Tbk - UNTR.JK - PE: 3.99
5. Baramulti Suksessarana Tbk - BSSR.JK - PE: 4.02
6. Indika Energy Tbk - INDY.JK - PE: 4.03
7. Golden Energy Mines Tbk - GEMS.JK - PE: 4.25


In [5]:
# 10: follow up now with a second question, to get the overview of whichever symbol
# is 4th on the list above in `out_agent`

out_agent2 = query_app(f"{out_agent[-1].content}. get overview 4th company")

print(out_agent2[-1].content)

The company overview for United Tractors Tbk (UNTR.JK) is as follows:

- **Address:** Jl. Raya Bekasi km. 22, Cakung, Jakarta 13910
- **Daily Close Change:** -0.0128
- **Email:** ir@unitedtractors.com
- **Employee Number:** 38,196
- **Industry:** Machinery
- **Last Close Price:** 27,000
- **Latest Close Date:** 2024-11-05
- **Listing Board:** Main
- **Listing Date:** 1989-09-19
- **Market Cap:** 98,058,868,097,024
- **Market Cap Rank:** 19
- **Phone:** 4605959; 4605979
- **Sector:** Industrials
- **Sub-Industry:** Construction Machinery & Heavy Vehicles
- **Sub-Sector:** Industrial Goods
- **Website:** www.unitedtractors.com


## Conclusion

Congratulations on making your way through the challenges. My hope is that you find the session educational and fun, and I have, in my own way, inspired you to dive deeper into the exciting world of building financial chat agents using information retriever tools!

Please submit a link to your Google Colab notebook at [the correct GitHub repository discussion thread](https://github.com/onlyphantom/llm-python/discussions/39) for grading. If you score higher than 8 points (out of a possible 10) you will obtain a certification jointly issued by Supertype and Sectors. Run all cells and show all output.

If you need help, please reach out to us on Discord (exclusively for Practicum members).

Thank you again for your participation, and I hope you walked away with lots of new ideas on what to build next!