In [85]:
import os
import getpass
import openai
import time

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key:')

OpenAI API Key:········


In [15]:
# from openpyxl import load_workbook

In [73]:
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import PyPDFLoader
from langchain.vectorstores import DocArrayInMemorySearch
from IPython.display import display, Markdown

In [74]:
file = 'C:/Users/markc/Hydrogen/H2_Scale_Ruth.pdf'
loader = PyPDFLoader(file_path=file)
pages = loader.load_and_split()

In [19]:
pages[59]

Document(page_content='28 \nThis report is available at no cost from the National Renewable Energy Laboratory at www.nrel.gov/publications.  4 Hydrogen  Production  \nHydrogen can be produced from different technology pathways  and diverse resources. Currently, \nmost  hydrogen in the U nited States  is produced via natural gas  SMR , totaling about  10 MMT /yr. \nSMR is the most economic technology available in most cases, primarily owing  to an abundance \nof low -cost natural gas. However,  alternative hydrogen production technologies with differing \ncosts and levels of maturity are available or under development. For example, interest in \nelectrolysis —the splitting of water into hydrogen and oxygen using electricity —has grown in \nrecent years . Electrolysis offers distinct advantages over  SMR, and , although current electrolytic \nproduction costs are high, further R&D —in addition to increased availability  of low -cost \nrenewable electricity —is expected to reduce costs . 

In [20]:
# This is used to pull Metadata from PDF for document loading to build knowledge database
# FAISS = Facebook AI Similarity Search - library for efficient similarity search of dense vectors
# - Containts algorithms that search in sets of vectors of any size, up to ones that possibly do not fit in RAM.

# k specifies number of similar vectors to be returned

from langchain.vectorstores import FAISS
from langchain.embeddings.openai import OpenAIEmbeddings

faiss_index = FAISS.from_documents(pages, OpenAIEmbeddings())
docs = faiss_index.similarity_search("How large is hydrogen demand today?", k=10)
for doc in docs:
    print(str(doc.metadata["page"]) + ":", doc.page_content[:196])



49: 27 
This report is available at no cost from the National Renewable Energy Laboratory at www.nrel.gov/publications.  Regional demand is  estimated according to the same distribution used for LDVs.
30: 8 
This report is available at no cost from the National Renewable Energy Laboratory at www.nrel.gov/publications.  (IRENA 2019) . Bloomberg New Energy Finance  estimate s a theoretical global max
37: 15 
This report is available at no cost from the National Renewable Energy Laboratory at www.nrel.gov/publications.  3 Potential  Hydrogen Demands  
Hydrogen can provi de value across a multitude 
132: 110 
This report is available at no cost from the National Renewable Energy Laboratory at www.nrel.gov/publications.  7 Conclusions  
This initial analysis indicates that the H2@Scale concept coul
41: 19 
This report is available at no cost from the National Renewable Energy Laboratory at www.nrel.gov/publications.  indirectly through the increased need for ammonia fertilizer in biomass pr

In [21]:
from langchain.indexes import VectorstoreIndexCreator

In [22]:
# Vector store created from "loader" docs. Can now start querying data.
index = VectorstoreIndexCreator(
    vectorstore_cls=DocArrayInMemorySearch).from_loaders([loader])

In [29]:
query ="Please summarize this report in fewer than 1000 words and use markdown to break out the top 5 key takeaways."

In [30]:
response = index.query(query)

In [31]:
display(Markdown(response))

 This report provides an analysis of the potential for hydrogen demand and supply options, as well as the economic, energy, and emissions impacts. It was written by Cullen, Scott McWhorter, Jonathan Zimmerman, and Rod Borup, with input from Matt Pellow, Ron Kent, Andrew Martinez, Ugi Otgonbaatar, Joe Powell, Bryan Chapman, Jeongwoo Han, Ian Sutherland, Trevor Brown, Scott Swartz, Brad Williams, Mark O’Malley, and Mary Biddy. 

**Key Takeaways:**
1. Hydrogen has potential to be a sustainable pathway for the global energy transition.
2. The report provides an analysis of the potential for hydrogen demand and supply options, as well as the economic, energy, and emissions impacts.
3. Estimates are based on manufacturer feedback from B. James et al. 2013.
4. The report is organized into 8 sections, including methods, demand and supply options, results, and conclusions.
5. The report is available at no cost from the National Renewable Energy Laboratory at www.nrel.gov/publications.

In [32]:
loader = PyPDFLoader(file_path=file)

In [33]:
docs = loader.load()

In [34]:
docs[0]

Document(page_content='NREL is a national laboratory of the U.S. Department of Energy  \nOffice of Energy Efficiency & Renewable Energy  \nOperated by the Alliance for Sustainable Energy, LLC  \nThis report is available at no cost from the National Renewable Energy  \nLaboratory (NREL) at www.nrel.gov/publications.   \n Contract No. DE -AC36 -08GO28308  \nTechnical Report  \nNREL/TP -6A20 -77610  \nOctober 2020    \nThe Technical and Economic Potential \nof the H2@Scale Concept within the \nUnited States  \nMark F. Ruth,1 Paige Jadun,1 Nicholas Gilroy,1 \nElizabeth  Connelly ,1 Richard Boardman,2 A.J. Simon,3 \nAmgad Elgowainy ,4 and Jarett Zuboy5 \n1 National Renewable Energy Laboratory  \n2 Idaho National Laboratory  \n3 Lawrence Livermore National Laboratory  \n4 Argonne National Laboratory  \n5 Independent Contractor  ', metadata={'source': 'C:/Users/markc/Hydrogen/H2_Scale_Ruth.pdf', 'page': 0})

In [45]:
# Attempt to use Embeddings on report. ???

from langchain.embeddings import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()

##### To see what/how embeddings work

In [38]:
embed = embeddings.embed_query("Hi my name is Iron Man")

In [39]:
print(len(embed))

1536


In [44]:
print(embed[:5])

[-0.03388969227671623, -0.03518833592534065, -0.022326737642288208, -0.030867839232087135, 0.0024131108075380325]


###### Attempt to create embeddings with all content in doc and store in vectorstor.

In [46]:
# Creates a DB with that searches for similarities given query.
# This happens under the hood, the user sees nothing so far.
db = DocArrayInMemorySearch.from_documents(docs, embeddings)

In [47]:
query = "How is hydrogen primarily used today?"

In [48]:
docs = db.similarity_search(query)

In [49]:
len(docs)

4

In [54]:
docs[0]

Document(page_content='7 \nThis report is available at no cost from the National Renewable Energy Laboratory at www.nrel.gov/publications.  underway in Europe, with some in the United States as well (SoCalGas 2018; EIA 2015c; \nEuropean Power to Gas 2018) . \n \nFigure 6. U.S. transportation energy sources  (based on energy content), 201 9 (EIA 2020d)  \nHydrogen has many existing and potential new applications specific to the industrial sector . In \nsome of these applications, such as oil refining, hydrogen is an essential feedstock. In others, such as iron refining, its use can improve process efficiency (Green 2018) . Oil refineries \nconsume —for processes including heavy oil hydrocracking and hydrotreating and \ndesulfurization of sour crude —about two  thirds of the approximately 10 million metric tons \n(MMT) of  on-purpose  feedstock hydrogen produced and consumed in the United States each \nyear (see Section 3.1) . The remainder is used for ammonia, methanol and other alcohol

In [55]:
# Need to retrieve information from Vectorstor to pass back to user
retriever = db.as_retriever()

In [56]:
# Use language model because we want to return text
llm = ChatOpenAI(temperature = 0.0)

In [57]:
# Join all relevant documents from vectorstor
qdocs = "".join([docs[i].page_content for i in range(len(docs))])


In [58]:
response = llm.call_as_llm(f"{qdocs} Question: Please summarize this report in fewer than 1000 words \
and use markdown to break out the top 5 key takeaways.") 


In [59]:
display(Markdown(response))

# Summary of "Hydrogen Demand and Potential for the H2@Scale Initiative"

This report explores the potential for hydrogen as an energy carrier and the role of the H2@Scale initiative in supporting its development. The report covers various applications of hydrogen, including transportation, industrial feedstocks, and energy storage. 

## Key Takeaways

1. Hydrogen has many existing and potential new applications specific to the industrial sector, including oil refining, iron refining, and ammonia production. 

2. The transportation sector is evolving to include advanced, high-efficiency powertrains, including fuel cell electric vehicles (FCEVs) and fuel cell buses and trucks. 

3. Hydrogen can be used in the production of synthetic fuels or in blends with conventional fuels, and many "power to gas" demonstration projects are underway in Europe and the United States. 

4. The H2@Scale system can support the U.S. electricity sector by enhancing the affordability of renewable power and supporting critical baseload nuclear generation. 

5. The report estimates the serviceable consumption potential for each demand opportunity, competitive hydrogen price points, and demand locations for various potential hydrogen demands, including oil refining, metals refining, ammonia, biofuels, synthetic HC, natural gas supplementation, seasonal energy storage for the electrical grid, and FCEVs.

###### All of the above can be simplified into a langchain chain below

In [98]:
# Retrieval QA chain
# "stuff" is the simplest method and stuffs all documents into context and makes one call to language model
# !!! Review other methods such as map_reduce, refine, and map_rerank

# Stuff method seems to work well for summarization and is faster retrieval time

qa_stuff = RetrievalQA.from_chain_type(
    llm=llm, 
    chain_type="stuff", 
    retriever=retriever, 
    verbose=True
)

In [99]:
query =  "Please summarize this report in fewer than 1000 words \
and use markdown to break out the top 5 key takeaways."

In [100]:
start_time = time.time()
response = qa_stuff.run(query)
run_time = time.time()-start_time
print(run_time)



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m
22.224737644195557


In [101]:
display(Markdown(response))

# Summary of Report on Hydrogen Demand and Potential Markets

This report explores the potential demand for hydrogen across various industries and applications as an energy carrier for intermediate or end use. The report considers consumption of captive hydrogen (produced intentionally by the consuming industry) and merchant hydrogen (produced intentionally for sale), but does not consider captive byproduct hydrogen production. The report estimates the serviceable consumption potential for each demand opportunity, competitive hydrogen price points, and demand locations for the following potential hydrogen demands: oil refining, metals refining, ammonia, biofuels, synthetic HC, natural gas supplementation, seasonal energy storage for the electrical grid, light-duty FCEVs, and medium- and heavy-duty FCEVs.

## Key Takeaways

1. Hydrogen has many existing and potential new applications specific to the industrial sector. In some of these applications, such as oil refining, hydrogen is an essential feedstock. In others, such as iron refining, its use can improve process efficiency.
2. The H2@Scale system can support the U.S. electricity sector by enhancing the affordability of renewable power and supporting critical baseload nuclear generation. The electric power sector accounts for about 40% of U.S. primary energy consumption, with the resulting electricity consumed by the industrial, residential, commercial, and transportation sectors.
3. The potential hydrogen demand varies for each application, as does the threshold price for hydrogen in each market. The viability of alternatives to hydrogen in various markets affects the estimated threshold price. For existing hydrogen markets, such as ammonia and oil refining, the prices in the demand curves estimated here are not necessarily the prices being paid for hydrogen today; rather, they are the hydrogen prices necessary to compete against the next-best option.
4. The success of initiatives to increase the use of zero-emission vehicles (ZEVs), including fuel cell electric vehicles (FCEVs), ultimately relies on consumer choices. The ZEV products must compare favorably with conventional vehicles in terms of cost, performance, fueling convenience, and driving range. Four FCEVs were available as of early 2020 in select locations. They have combined city/highway fuel economies of 60–66 miles per kilogram of hydrogen, driving ranges of 312–380 miles, and the ability to fill up at hydrogen dispensers in less than 5 minutes.
5. Hydrogen can also be used in the production of synthetic fuels or in blends with conventional fuels. In some U.S. regions, financial incentives exist for such blends. For example, California’s Low Carbon Fuel Standard requires fuel producers and importers to achieve a 20% reduction in the carbon intensity of transportation fuels by 2030. This can be achieved through sale of fuels including bio-based and fossil natural gas, electricity, hydrogen, ethanol, and biodiesel. Another California policy (SB 1383) promotes use of renewable fuels by requiring a 40%–50% reduction in emissions of short-lived climate pollutants including black carbon, fluorinated gases, and methane.

##### The above 5 steps could be broken down into a single simple step, but lose details in response.

In [75]:
response = index.query(query, llm=llm)

In [76]:
display(Markdown(response))

I'm sorry, but I cannot summarize the report as there is no clear indication of what the report is about. The context provided only includes acknowledgments and references, but no information on the actual content of the report.

In [77]:
index = VectorstoreIndexCreator(
    vectorstore_cls=DocArrayInMemorySearch,
    embedding=embeddings,
).from_loaders([loader])

In [None]:
# Key areas to explore/modify
# (1) vectorstore_cls types
# (2) chain_type