In [1]:
import nest_asyncio
nest_asyncio.apply()

import dotenv
dotenv.load_dotenv()

from src.harlus_contrast_tool import (
    ContrastTool,
    ClaimQueryEngineToolLoader,
    VerdictQueryEngineToolLoader,
    SentenceRetrieverToolLoader,
)

In [None]:
from src.harlus_contrast_tool.claim_getter import CLAIM_PARSER, PROMPT_GET_CLAIMS_TEXT

old_doc = r"C:\Users\info\Desktop\harlus\ml\tools\contrast_tool\data\report.pdf"

thesis_qengine = await ClaimQueryEngineToolLoader().load(old_doc)
thesis_sentence_retriever = await SentenceRetrieverToolLoader().load(old_doc)

In [1]:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.node_parser import MarkdownNodeParser
from llama_index.llms.openai import OpenAI
from llama_index.node_parser.docling import DoclingNodeParser
from llama_index.readers.docling import DoclingReader

import dotenv
dotenv.load_dotenv()

old_doc = r"C:\Users\info\Desktop\harlus\ml\tools\contrast_tool\data\Investment_Case_AMAT.pdf"

reader = DoclingReader()
# parser = DoclingNodeParser()
parser = MarkdownNodeParser()

index = VectorStoreIndex.from_documents(
    documents=reader.load_data(old_doc),
    transformations=[parser],
    show_progress=True,
)

llm = OpenAI(
    model="4o-mini",
    temperature=0.0,
    max_tokens=1000,
)

qengine = index.as_query_engine(
    llm=llm,
    multiple_calls=True,
)

  from .autonotebook import tqdm as notebook_tqdm
Parsing nodes: 100%|██████████| 1/1 [00:00<00:00, 999.83it/s]
Generating embeddings: 100%|██████████| 7/7 [00:01<00:00,  3.51it/s]


In [4]:
from src.harlus_contrast_tool.claim_getter import CLAIM_PARSER, PROMPT_GET_CLAIMS_TEXT

# prompt = PROMPT_GET_CLAIMS_TEXT
prompt = """\
What are all distinct investment theses? Each thesis should represent a clear, concise, and reasoned hypothesis about why an investment is expected to generate positive returns. A good thesis typically includes:
- A strategic rationale or competitive advantage.
- A market or financial performance expectation.
- A causal relationship between an observable factor and expected investment return.
Do not include vague or generic statements. Only extract well-formed, specific hypotheses.
"""

claims = qengine.query(prompt)
# parsed_claims = CLAIM_PARSER.parse(claims.response).claims

print(claims)
display([(n.text, n.metadata) for n in claims.source_nodes])

1. AMAT’s dominant position in semiconductor manufacturing equipment offers a competitive edge that should benefit from the ever-increasing complexity and capital intensity of chip production. As chip designs require more advanced processes, the demand for AMAT’s advanced deposition, CMP, etching, and ion implantation tools is expected to drive revenue growth and generate positive returns.

2. Streamlined operations at AMAT are expected to fuel margin expansion. The marked reduction in operating expenses relative to sales—from about 28% to 18%—demonstrates the company’s ability to leverage heightened sales with improved cost efficiency. This operational improvement is likely to boost profitability and support the company’s overall return on investment.

3. The evolving spending environment in the semiconductor industry, particularly in segments like memory chips, underpins a sustainable growth narrative. With key industry players indicating that current capital expenditure levels are s

[("## -Valuation:\n\n- o G1: 6% &amp; G2: 2%\n- o kEq: 10%\n- o Implied growth rate: 1,58%\n- o Fair value: 62,17 USD - Implying an upside of 20%\n\nUnfortunately from a fixed-income point-of-view nothing too spectacular there.\n\nOpinion\n\nThe main punch line is simple. In terms of importance, one could compare the idea of chips in electronics to brains and senses in humans. The fact that our electronics have  become  more  intelligent  in  the  last  years,  has  everything  to  do  with  the sophistication of the chips behind them.\n\nObviously,  that  is  the  basis  of  any  investment  in  Applied  Materials.  Each  year, electronics are becoming faster, more connected and less power-consuming. As a result, more computing, sensing, storage power is required from the chips driving them. More computing power goes hand in hand with more complexity. And finally, more complexity implies more steps in the capital-intensive process of chip making. And that's a trend from which Applied 

In [None]:
new_doc = r"C:\Users\info\Desktop\harlus\ml\tools\contrast_tool\data\transcript.pdf"

update_qengine = await VerdictQueryEngineToolLoader().load(new_doc)
update_sentence_retriever = await SentenceRetrieverToolLoader().load(new_doc)

tool = ContrastTool()

comments = tool.run(
    old_doc,
    thesis_qengine,
    thesis_sentence_retriever,
    new_doc,
    update_qengine,
    update_sentence_retriever,
)

for comment in comments:
    print(comment)

In [None]:
comments[0]