#### User Input (URL)


In [1]:
Url_1 = input("Enter the URL of the first file: ")
Url_2 = input("Enter the URL of the second file: ")

Url_list = [Url_1, Url_2]

In [2]:
from langchain_core.runnables import RunnablePassthrough, RunnableLambda, RunnableParallel
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.output_parsers import StrOutputParser
from langchain_classic.text_splitter import RecursiveCharacterTextSplitter
from langchain_classic.storage import InMemoryStore
from langchain_huggingface import ChatHuggingFace, HuggingFaceEndpoint
# from langchain_.retrievers import ParentDocumentRetriever
from langchain_classic.retrievers import ParentDocumentRetriever
from langchain_huggingface.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_core.prompts import PromptTemplate
from dotenv import load_dotenv
import os

load_dotenv()
Token_API_KEY = os.getenv("HF_TOKEN")
api_key = os.getenv("HUGGINGFACE_API_KEY")

str_parser = StrOutputParser()


  from .autonotebook import tqdm as notebook_tqdm
USER_AGENT environment variable not set, consider setting it to identify your requests.


#### Document Loader

In [3]:
web_doc = WebBaseLoader(Url_list)
documents = web_doc.load()

In [4]:
print(f"Number of documents loaded: {len(documents)}")
print("\n\n\n")
print(documents[1].page_content)
print("\n\n\n")
print(documents[1].metadata)

Number of documents loaded: 2







Why defence spending shouldn’t be labelled sustainable | Finance Watch


































































 











 












Press Events 



Other languages






French Blog


German Blog


Italian Blog









                  About
                  




About us
Our vision
Members
Team
Governance & funding
Join our team
Contact




                  Policy portal
                  




Sustainable Finance
Retail & Inclusion
Stability & Supervision
Public Finance
Digital Finance


The Blog 

                Get involved
                




Get involved
Support our campaigns
Become a member
Understand finance
Make a donation










Search









Menu





Donate

















Search



















                About
                




About us
Our vision
Members
Team
Governance & funding
Join our team
Contact




                Policy portal
                




All policy portal
Sustain

#### Text Splitter

In [5]:
chunks = RecursiveCharacterTextSplitter(chunk_size=100, chunk_overlap=20).create_documents([doc.page_content for doc in documents])
print(f"Number of chunks created: {len(chunks)}")

Number of chunks created: 337


#### Embeddings Model

In [6]:
embeddings = HuggingFaceEmbeddings(
	model_name="jinaai/jina-embeddings-v3",
	model_kwargs={'trust_remote_code': True}
)


To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
A new version of the following files was downloaded from https://huggingface.co/jinaai/xlm-roberta-flash-implementation:
- configuration_xlm_roberta.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
`torch_dtype` is deprecated! Use `dtype` instead!
A new version of the following files was downloaded from https://huggingface.co/jinaai/xlm-roberta-flash-implementation:
- rotary.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
A new version of the following files was downloaded from https://huggingface.co/jinaai/xlm-roberta-flash

#### Vector Database

In [28]:
vectorstore = Chroma.from_documents(documents=chunks, embedding=embeddings, collection_name="article_bot_collection")

In [32]:
stored_ids = vectorstore.get()['ids']
print(f"Number of stored documents: {len(stored_ids)}")
print(f"Sample stored document ID: {stored_ids[0]}")

Number of stored documents: 337
Sample stored document ID: ae66f1d2-d98e-40da-abea-364d8ef9f20a


In [33]:
store = InMemoryStore()
retriver = ParentDocumentRetriever(
	vectorstore=vectorstore, 
	byte_store=store,
	child_splitter=RecursiveCharacterTextSplitter(chunk_size=200),
	parent_splitter=RecursiveCharacterTextSplitter(chunk_size=500),
	search_kwargs={"k": 3}
)
retriver.add_documents(documents)

In [34]:
print(retriver)

vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x000001D93E977B60> byte_store=<langchain_core.stores.InMemoryStore object at 0x000001D96987C7D0> docstore=<langchain_classic.storage.encoder_backed.EncoderBackedStore object at 0x000001D96987FBC0> search_kwargs={'k': 3} child_splitter=<langchain_text_splitters.character.RecursiveCharacterTextSplitter object at 0x000001D96987FE60> parent_splitter=<langchain_text_splitters.character.RecursiveCharacterTextSplitter object at 0x000001D96987E900>


In [36]:
prompt  = PromptTemplate(
    template = """
            You are an intelligent and expert Content Analyst. Your task is to answer the user's question based strictly on the provided context from a specific article or document.

            Follow these instructions carefully:
            1.  **Focus on Context:** Use ONLY the provided context below to generate your answer. Do not use outside knowledge unless it's to define a simple term found in the text.
            2.  **Be Comprehensive:** If the context contains enough information, explain the answer in detail. Capture the nuances of the article.
            3.  **Admit Ignorance:** If the answer is not present in the provided context, clearly state: "I cannot find the answer to this question in the provided article." Do not make up an answer.
            4.  **Formatting:** Use Markdown to make the answer readable. Use bullet points for lists and bold text for key terms.
            5.  **Directness:** Get straight to the point. Avoid fluff.

            ---
            **Context:**
            {context}
            ---

            **User Question:**
            {question}

            **Your Answer:**
            """,
    input_variables=["context", "question"]
)

In [37]:
llm = HuggingFaceEndpoint(
    repo_id="deepseek-ai/DeepSeek-R1",
    task="text-generation",
    huggingfacehub_api_token=api_key,
)

model = ChatHuggingFace(llm=llm)

In [13]:
question = input("Enter yor qustion according to your provided contentent")


In [38]:
retiver_context = retriver.invoke(question)

In [39]:
print(retiver_context)

[Document(metadata={'source': 'https://www.finance-watch.org/blog/why-defence-spending-shouldnt-be-labelled-sustainable/', 'title': 'Why defence spending shouldn’t be labelled sustainable | Finance Watch', 'description': 'Welcome to the new consensus: where military spending could become the next frontier of so-called ‘sustainable’ finance.', 'language': 'en-US'}, page_content='French Blog\n\n\nGerman Blog\n\n\nItalian Blog\n\n\n\n\n\n\n\n\n\nSubscribe\n\n\nDonate\n\n\n\n\n\n\n\n\nThe Blog\nFinancial reform for EU citizen \n\nCategories\n\nSustainable Finance\n\n\n\n\n\n\nDigital Finance\nMiscellaneous\nPublic Finance\nRetail & Inclusion\nStability & Supervision\nSustainable Finance\n\n\n\n\n\nWhy defence spending shouldn’t be labelled sustainable\n\n29 August 2025\n\nReading level: Regular Reading Time: 4 min.'), Document(metadata={'source': 'https://www.finance-watch.org/blog/why-defence-spending-shouldnt-be-labelled-sustainable/', 'title': 'Why defence spending shouldn’t be labelled

In [40]:
def formatting(retiver_context):
    context = "\n\n".join([doc.page_content for doc in retiver_context])
    return context

In [41]:
parallel_chain = RunnableParallel(
    {'context': retriver | RunnableLambda(formatting),
     'question': RunnablePassthrough()}
)

In [42]:
parallel_chain.invoke(question)

{'context': 'French Blog\n\n\nGerman Blog\n\n\nItalian Blog\n\n\n\n\n\n\n\n\n\nSubscribe\n\n\nDonate\n\n\n\n\n\n\n\n\nThe Blog\nFinancial reform for EU citizen \n\nCategories\n\nSustainable Finance\n\n\n\n\n\n\nDigital Finance\nMiscellaneous\nPublic Finance\nRetail & Inclusion\nStability & Supervision\nSustainable Finance\n\n\n\n\n\nWhy defence spending shouldn’t be labelled sustainable\n\n29 August 2025\n\nReading level: Regular Reading Time: 4 min.\n\nWhy defence spending shouldn’t be labelled sustainable\n\n29 August 2025\n\nReading level: Regular Reading Time: 4 min.\n\n\n\n\nEuropean leaders are looking to ramp up defence spending, and some are pushing to classify military investments as ‘sustainable’ to make them more attractive. This trend – part of a broader ‘new consensus’ growing the EU’s defence capabilities – threatens to derail Europe’s climate goals and undermine the credibility of sustainable finance altogether.',
 'question': '(Why defence spending shouldn’t be labelled

In [43]:
main_chain = (
    parallel_chain
    | prompt
    | model
    | str_parser
)
result = main_chain.invoke(question)

In [44]:
print("Final Answer:", result)

Final Answer: <think>
Hmm, the user wants me to summarize a blog post about why defense spending shouldn't be labeled as sustainable. I need to base my response strictly on the provided context.

Looking at the context, I see it's from a blog titled "Why defence spending shouldn't be labelled sustainable" dated August 29, 2025. The key information provided is that European leaders want to increase defense spending, and some are pushing to classify military investments as "sustainable" to make them more attractive to investors. 

The blog argues this trend is problematic for two main reasons: it threatens Europe's climate goals and undermines the credibility of sustainable finance altogether. This is described as part of a broader "new consensus" to grow the EU's defense capabilities.

I notice the context is quite limited - it's essentially just the introduction to the blog post. There aren't any detailed arguments or evidence provided beyond these core points. Since I must stick stric