In [2]:
!ollama list

NAME         	ID          	SIZE  	MODIFIED   
llama2:latest	78e26419b446	3.8 GB	2 days ago	


**INGESTING PDF**

In [6]:
!pip install --q unstructured langchain
!pip install --q "unstructured[all-docs]"

In [6]:
from langchain_community.document_loaders import UnstructuredPDFLoader
from langchain_community.document_loaders import OnlinePDFLoader

In [7]:
local_path = "the-geography-of-climate-tech.pdf" # replace with path to file to be laoded 

# Local PDF file uploads
if local_path:
  loader = UnstructuredPDFLoader(file_path=local_path)
  data = loader.load()
else:
  print("Upload a PDF file")

  from .autonotebook import tqdm as notebook_tqdm



Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
bin C:\Python311\Lib\site-packages\bitsandbytes\libbitsandbytes_cpu.so
'NoneType' object has no attribute 'cadam32bit_grad_fp32'
CUDA SETUP: Loading binary C:\Python311\Lib\site-packages\bitsandbytes\libbitsandbytes_cpu.so...
argument of type 'WindowsPath' is not iterable


  warn("The installed version of bitsandbytes was compiled without GPU support. "


In [8]:
# Preview first page
data[0].page_content

'The geography of climate tech\n\nThere can be no effective response to climate change without technology. We need technologies to help generate energy, produce food, manufacture goods, construct and operate buildings, and move people and materials—all while emitting few or no greenhouse gases or even removing greenhouse gases from the atmosphere. This imperative—and the commercial opportunity it represents—has contributed to a recent surge in investment in technologies for tackling climate change— at least US$80 billion since 2021, according to Deloitte analysis based on Pitchbook and Deloitte GreenSpace Navigator data.\n\nEnterprises, entrepreneurs, investors, and policymakers with an interest in climate tech would do well to familiarize themselves with the shifting geographic patterns of climate tech entrepreneurship and investment. Enterprises seeking to source a particular decarbonization technology, for instance, may wish to consider geographies that have fostered entrepreneurshi

**VECTOR EMBEDDINGS**

In [9]:
from langchain_community.embeddings import OllamaEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma

In [10]:
#Split and chunk
text_splitter = RecursiveCharacterTextSplitter(chunk_size=7500, chunk_overlap=100)
chunks = text_splitter.split_documents(data)

In [11]:
#Add to Vector database
vector_db = Chroma.from_documents(
    documents=chunks,
    embedding=OllamaEmbeddings(model="nomic-embed-text",show_progress=True),
    collection_name="local-rag"
)

OllamaEmbeddings: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:53<00:00,  8.86s/it]


**RETRIEVAL**

In [12]:
from langchain.prompts import ChatPromptTemplate, PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_community.chat_models import ChatOllama
from langchain_core.runnables import RunnablePassthrough
from langchain.retrievers.multi_query import MultiQueryRetriever

In [13]:
# LLM from Ollama
local_model = "knoopx/hermes-2-pro-mistral:7b-q8_0"
llm = ChatOllama(model=local_model)

In [14]:
QUERY_PROMPT = PromptTemplate(
    input_variables=["question"],
    template="""You are an AI language model assistant. Your task is to generate five
    different versions of the given user question to retrieve relevant documents from
    a vector database. By generating multiple perspectives on the user question, your
    goal is to help the user overcome some of the limitations of the distance-based
    similarity search. Provide these alternative questions separated by newlines.
    Original question: {question}""",
)

In [15]:
retriever = MultiQueryRetriever.from_llm(
    vector_db.as_retriever(), 
    llm,
    prompt=QUERY_PROMPT
)

# RAG prompt
template = """Answer the question based ONLY on the following context:
{context}
Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

In [16]:
chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

In [17]:
chain.invoke("What is this document about?")

OllamaEmbeddings: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.52s/it]
OllamaEmbeddings: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 10.45it/s]
OllamaEmbeddings: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  8.80it/s]
OllamaEmbeddings: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  8.30it/s]
OllamaEmbeddings: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 10.48it/s]


'This document is a report discussing the geographic distribution of climate tech companies, their funding, and the trends in entrepreneurship in this sector. It highlights that climate tech activity is mainly concentrated in eight countries: Australia, Canada, China, France, Germany, India, the United Kingdom, and the United States. The United States leads with the highest number of climate tech companies and funding amounts. However, the report also notes an increasing geographic diversification in climate tech entrepreneurship as countries outside the dominant eight show rapid growth in founding activity and increasing share of investment.'

In [18]:
chain.invoke("What are the seven technologies that half of all climate tech companies are working on?")

OllamaEmbeddings: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.87s/it]
OllamaEmbeddings: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  8.17it/s]
OllamaEmbeddings: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  5.48it/s]
OllamaEmbeddings: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  7.17it/s]
OllamaEmbeddings: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  6.10it/s]


'The seven technologies that half of all climate tech companies are working on are:\n\n1. Renewable energy (including solar, wind, and hydro)\n2. Energy storage and management\n3. Smart grids and distributed energy systems\n4. Carbon capture, utilization, and storage (CCUS)\n5. Mobility (including electric vehicles and alternative fuels)\n6. Agriculture and food (such as precision farming and sustainable food production)\n7. Building and construction (focused on energy-efficient materials and designs)\n\nSource: Deloitte Insights, "The geography of climate tech," accessed October 4, 2023.'

In [19]:
chain.invoke("What is GreenSpace Tech by Deloitte?")

OllamaEmbeddings: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.76s/it]
OllamaEmbeddings: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  5.77it/s]
OllamaEmbeddings: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  6.34it/s]
OllamaEmbeddings: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  5.59it/s]
OllamaEmbeddings: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  5.78it/s]


'GreenSpace Tech by Deloitte is not a specific entity or product but rather a research and analysis initiative by Deloitte focused on the global climate tech market. This initiative aims to provide insights into the trends, investments, and developments in the climate technology sector. The information provided in the text you shared seems to be a part of their research and analysis.'