# Download Data

In [None]:
import requests
import zipfile
import os

url = "https://www.dropbox.com/s/vs6ocyvpzzncvwh/new_articles.zip?dl=1"  

zip_path = "new_articles.zip"
extract_folder = "new_articles"

print("Downloading file...")
response = requests.get(url)
with open(zip_path, "wb") as file:
    file.write(response.content)
print("Download complete!")

print("Extracting files...")
os.makedirs(extract_folder, exist_ok=True)
with zipfile.ZipFile(zip_path, "r") as zip_ref:
    zip_ref.extractall(extract_folder)
print("Extraction complete!")

os.remove(zip_path)
print("Cleanup complete!")


In [1]:
from langchain.vectorstores import Chroma
from langchain.embeddings import OllamaEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.llms import Ollama
from langchain.chains import RetrievalQA
from langchain.document_loaders import TextLoader, DirectoryLoader

In [2]:
loader = DirectoryLoader(path='new_articles', glob='./*.txt', loader_cls=TextLoader)

documents = loader.load()

In [3]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
texts = text_splitter.split_documents(documents)

In [4]:
len(texts)

233

In [5]:
print(texts[3])

page_content='Customers can customize the tools and apps or build their own using Pando’s APIs. This, along with the platform’s emphasis on no-code capabilities, differentiates Pando from incumbents like SAP, Oracle, Blue Yonder and E2Open, Jayakrishnan asserts.

“Pando comes pre-integrated with leading enterprise resource planning (ERPs) systems and has ready APIs and a professional services team to integrate with any new ERPs and enterprise systems,” he added. “Pando’s no-code capabilities enable business users to customize the apps while maintaining platform integrity — reducing the need for IT resources for each customization.”' metadata={'source': 'new_articles\\05-03-ai-powered-supply-chain-startup-pando-lands-30m-investment.txt'}


In [6]:
persist_directory = 'vector_store'

embedding = OllamaEmbeddings(model='deepseek-r1:14b')

vector_db = Chroma.from_documents(
    documents=texts,
    embedding=embedding,
    persist_directory=persist_directory
)

  embedding = OllamaEmbeddings(model='deepseek-r1:14b')


In [7]:
vector_db.persist()
vector_db = None

  vector_db.persist()


In [8]:
vector_db = Chroma(persist_directory=persist_directory,
                   embedding_function=embedding)

  vector_db = Chroma(persist_directory=persist_directory,


In [9]:
retriever = vector_db.as_retriever()

In [10]:
docs = retriever.get_relevant_documents("How much money did Pando raise?")

  docs = retriever.get_relevant_documents("How much money did Pando raise?")


In [11]:
len(docs)

4

In [12]:
retriever = vector_db.as_retriever(search_kwargs={"k":2})

In [13]:
retriever.search_type

'similarity'

In [14]:
retriever.search_kwargs

{'k': 2}

In [16]:
qa_chain = RetrievalQA.from_chain_type(
    llm=Ollama(model='deepseek-r1:14b'),
    chain_type="stuff",
    retriever=retriever,
    return_source_documents=True
)

In [17]:
def process_llm_response(llm_response):
    print(llm_response['result'])
    print('\n\nSources:')
    for source in llm_response['source_documents']:
        print(source.metadata['source'])

In [18]:
query = 'What is the news about Pando?'
llm_response = qa_chain.invoke(query)
process_llm_response(llm_response)

<think>
Okay, so I need to figure out what the latest news about Pando is. Let me look through the context provided.

First paragraph talks about Christine writing about Ryan Breslow's new company, Love, which is a wellness marketplace. It mentions products like supplements and essential oils. But that doesn't mention anything about Pando.

Next paragraph is about Revolut launching in Brazil. Again, no mention of Pando here.

The third paragraph is about the UK government planning to publish a review on AI with a deadline for responses. The focus is on foundation models and generative AI, mentioning principles like safety, security, transparency, etc. But this doesn't mention Pando either.

Wait, maybe I'm missing something. Is there any other part that talks about Pando? No, looking again, the context only covers three topics: Love (Ryan Breslow's company), Revolut in Brazil, and a UK government AI review. There's nothing here about Pando.

So, since none of the provided context is ab

In [19]:
query = 'Who led the round in pando?'
llm_response = qa_chain.invoke(query)
process_llm_response(llm_response)

<think>
Okay, so I need to figure out who led the round in Pando. Let me see what information I have from the context provided.

First, there are three separate pieces of context given:

1. The first part talks about Senator Elizabeth Warren writing a letter to FDIC chairman Martin Gruenberg regarding Tellus's claims. It mentions Lee being married to Connie Chan, who is a general partner at a16z (Andreessen Horowitz). However, it doesn't mention anything about funding rounds or Pando.

2. The second part discusses Christine writing about the launch of Ryan Breslow’s new company, Love, which is a wellness marketplace. Again, this doesn't relate to Pando or any funding rounds.

3. The third part talks about Revolut launching in Brazil and mentions some financial details, but again, no connection to Pando.

Since none of these contexts mention anything about Pando or any specific individuals leading a round for the company, I can conclude that there's no information available in the provi

In [None]:
query = 'What did databricks acquire?'
llm_response = qa_chain.invoke(query)
process_llm_response(llm_response)

In [20]:
query = 'What is generative ai?'
llm_response = qa_chain.invoke(query)
process_llm_response(llm_response)

<think>
Okay, so I'm trying to figure out what generative AI is. The context provided mentions "generative artificial intelligence (AI)" as part of foundation models that include large language models. It also talks about how these models have the potential to transform what people and businesses do. 

From this, I gather that generative AI is a type of AI that can create new content or outputs, like text, images, or other data. The mention of ChatGPT using LLM technology suggests it's an example of generative AI because it generates responses based on prompts.

Looking at the answer provided earlier, it correctly identifies generative AI as a subset focused on creating content through algorithms trained on vast data. It also highlights that models like ChatGPT are part of this category and can produce text, images, etc., with applications in various fields such as writing assistance or art creation.

I think the key points to include are:
1. Definition: Subset of AI focused on generat

In [None]:
query = 'Who is CMA?'
llm_response = qa_chain.invoke(query)
process_llm_response(llm_response)

In [21]:
from langchain.chat_models import ChatOllama

In [22]:
vector_db2 = Chroma(persist_directory=persist_directory,
                    embedding_function=embedding)

retriever = vector_db2.as_retriever(search_kwargs={"k":2})

In [23]:
turbo_llm = ChatOllama(
    temperature=0,
    model_name='deepseek-r1:14b'
)

  turbo_llm = ChatOllama(


In [24]:
qa_chain = RetrievalQA.from_chain_type(llm=turbo_llm,
                                       chain_type="stuff",
                                       retriever=retriever,
                                       return_source_documents=True)

In [25]:
query = 'What is generative ai?'
llm_response = qa_chain.invoke(query)
process_llm_response(llm_response)

Great, thank you for providing me with this context! Based on what you've given me, I can answer your questions as follows:

1. Can ChatGPT write essays?

Yes, ChatGPT can generate text in the form of essays based on a prompt provided to it. The AI model is trained on a large dataset of text and can produce coherent and well-structured text on a wide range of topics.

2. Can ChatGPT commit libel?

It's important to note that ChatGPT is a machine learning model, and it doesn't have the ability to commit libel or any other legal act. The output generated by ChatGPT is based on patterns in the training data, and it doesn't have personal beliefs or intentions. While the output may be misleading or untrue, it's not capable of committing libel or any other illegal act.

3. Does ChatGPT have an app?

Currently, there is no official ChatGPT app available for iPhone or Android devices. However, users can access the chatbot through their browser or by using a third-party app that utilizes ChatGP

In [26]:
print(qa_chain.combine_documents_chain.llm_chain.prompt.messages[0].prompt.template)

Use the following pieces of context to answer the user's question. 
If you don't know the answer, just say that you don't know, don't try to make up an answer.
----------------
{context}


In [27]:
print(qa_chain.combine_documents_chain.llm_chain.prompt.messages[1].prompt.template)

{question}
