<a href="https://colab.research.google.com/github/JayP127/cfp/blob/main/csvPinecone4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction
This Google Colab notebook presents an innovative approach to evaluating the availability and accessibility of carbon footprint information for various products, a pressing need in our increasingly environmentally-conscious society. This investigation contributes to sustainability and transparency, empowering consumers and companies with vital knowledge to make informed decisions that minimize environmental impact.

The foundation of this notebook is Python, a versatile and widely-accepted language known for its simplicity and robustness. Leveraging the capabilities of Python, we've harnessed the power of the Langchain library, a tool that has been instrumental in handling language processing tasks. This has enabled us to engage with data in a meaningful and insightful manner, drawing accurate and valuable conclusions.

Our approach uses an Agent that employs three distinct tools for information retrieval.

**Pinecone Vectorstore:** This tool stores and manages vector embeddings of data that represent the carbon footprint of different products. The embeddings are derived from several comprehensive datasets, including GWP100 Global warming potential average of CML v4.8 2006, EDIP 2003, IPCC2021, and ReCiPe 2016 v1.3. Pinecone's efficient similarity search capabilities help us find the most relevant information expediently.

**Google Search Tool:** A vital and expansive source of information, the Google Search Tool serves as our gateway to the vast information available on the internet. This allows our Agent to broaden its search and cover more ground.

**Calculator:** Last but not least, our Agent incorporates a calculator tool to perform necessary computations and produce accurate estimates when needed.

Our Agent is designed to deliver the best possible answer to queries about product carbon footprints, striving to cite sources wherever possible. The amalgamation of these techniques provides a rich, comprehensive picture of the carbon footprint information landscape.

In this notebook, we take you through the entire process: from how we've integrated these different tools to form a robust, effective Agent to how we've leveraged this Agent to uncover and evaluate the carbon footprint data. Join us as we delve into the fascinating world of data, sustainability, and the power of informed decisions.


In [None]:
!pip -q install openai python-dotenv langchain google-search-results pinecone-client tiktoken pinecone-text wolframalpha

In [None]:
import os
from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file


In [None]:
from langchain.vectorstores import Pinecone
from langchain.embeddings import OpenAIEmbeddings
from langchain.llms import OpenAI
from langchain.document_loaders.csv_loader import CSVLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains import RetrievalQA
from langchain.agents import ZeroShotAgent, Tool, AgentExecutor
from langchain.agents import AgentType
from langchain.tools import BaseTool
from langchain import SerpAPIWrapper
from langchain import LLMChain

In [None]:
embeddings = OpenAIEmbeddings()

In [None]:
import pinecone
# initialize pinecone
pinecone.init(
    api_key=os.environ['PINECONE_API_KEY'],
    environment=os.environ['PINECONE_ENV'],
)

index_name = "cfp"

  from tqdm.autonotebook import tqdm


# Uploading the vectorstore- only needed for the initialization

In [None]:
loader = CSVLoader('/content/drive/MyDrive/CFP_CSVs/binder3.csv')

In [None]:
documents = loader.load()

In [None]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)

In [None]:
products=text_splitter.split_documents(documents)

In [None]:
print(products[0])
print(products[1])
print(products[2])

page_content='\ufeff: 0\nGeography: USA\nproduct_name: Frosted Flakes, 23 oz, produced in Lancaster, PA (one carton)---Kellogg Company\nreference_product_unit: unit\nCarbon Footprint of Product (kg CO2-Eq): 2.67\nSource of information: Meinrenken, Christoph J., Daniel Chen, Ricardo A. Esparza, Venkat Iyer, Sally P. Paridis, Aruna Prasad, and Erika Whillas. "The Carbon Catalogue, carbon footprints of 866 commercial products from 8 industry sectors and 5 continents."Â\xa0Scientific DataÂ\xa09, no. 1 (2022): 87.' metadata={'source': '/content/drive/MyDrive/CFP_CSVs/binder3.csv', 'row': 0}
page_content='\ufeff: 1\nGeography: USA\nproduct_name: Office Chair---KNOLL INC\nreference_product_unit: unit\nCarbon Footprint of Product (kg CO2-Eq): 3.51\nSource of information: Meinrenken, Christoph J., Daniel Chen, Ricardo A. Esparza, Venkat Iyer, Sally P. Paridis, Aruna Prasad, and Erika Whillas. "The Carbon Catalogue, carbon footprints of 866 commercial products from 8 industry sectors and 5 conti

Initial uploading to the vector store

In [None]:
docsearch = Pinecone.from_documents(products, embeddings, index_name=index_name)



Example of an initial basic query

In [None]:
query= "what is the carbon footprint of jeans"
products = docsearch.similarity_search(query)

In [None]:
print(products[0].page_content)

﻿: 10
Geography: USA
product_name: 501Â® Original Jeans â€“ Rinse Run---Levi Strauss & Co.
reference_product_unit: unit
Carbon Footprint of Product (kg CO2-Eq): 15.05
Source of information: Meinrenken, Christoph J., Daniel Chen, Ricardo A. Esparza, Venkat Iyer, Sally P. Paridis, Aruna Prasad, and Erika Whillas. "The Carbon Catalogue, carbon footprints of 866 commercial products from 8 industry sectors and 5 continents."Â Scientific DataÂ 9, no. 1 (2022): 87.


# Once the index is created in the Vector Store, you can create an access to it

In [None]:
llm = OpenAI(model_name="text-davinci-003", n=2, best_of=2)

In [None]:
docsearch = Pinecone.from_existing_index(index_name, embeddings)

climate_change = RetrievalQA.from_chain_type(
    llm=llm, chain_type="stuff", retriever=docsearch.as_retriever()
)

# Create the Agent

In [None]:
search = SerpAPIWrapper()
llm = OpenAI(temperature=0)
from langchain.utilities.wolfram_alpha import WolframAlphaAPIWrapper

In [None]:
wolfram = WolframAlphaAPIWrapper()

In [None]:
wolfram.run("convert 0.855 pounds/kWh to kg/kWh")

'Assumption: convert 0.855 lb/(h kW) (pounds per kilowatt hour) to kilograms per kilowatt hour \nAnswer: 0.3878 kg/(kW h) (kilograms per kilowatt hour)'

In [None]:
tools = [
    Tool.from_function(
        func=climate_change.run,
        name="cfp_pinecone",
        description="useful for answering questions about the carbon footprint of products and its Source of information. Input should be a fully formed question.",
    ),
    Tool.from_function(
        func=search.run,
        name="Search",
        description="useful for when you need to answer questions about current events",
    ),
    Tool.from_function(
        func=wolfram.run,
        name="calculator",
        description="useful for when you need to perform a calculation",
    ),
]

In [None]:
prefix = """Answer the following questions as best you can and include in the answer the Source of information. You have access to the following tools:"cfp_pinecone", "Search", "calculator"."""
suffix = """Begin! include the Source of information in your answer.

Question: {input}
{agent_scratchpad}"""

prompt = ZeroShotAgent.create_prompt(
    tools, prefix=prefix, suffix=suffix, input_variables=["input", "agent_scratchpad"]
)

In [None]:
print(prompt.template)

Answer the following questions as best you can and include in the answer the Source of information. You have access to the following tools:"cfp_pinecone", "Search", "calculator".

cfp_pinecone: useful for answering questions about the carbon footprint of products and its Source of information. Input should be a fully formed question.
Search: useful for when you need to answer questions about current events
calculator: useful for when you need to perform a calculation

Use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [cfp_pinecone, Search, calculator]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question

Begin! include the Source of information in your answer.

Question: {input}
{ag

In [None]:
llm_chain = LLMChain(llm=OpenAI(temperature=0), prompt=prompt)

In [None]:
tool_names = [tool.name for tool in tools]
agent = ZeroShotAgent(llm_chain=llm_chain, allowed_tools=tool_names)

In [None]:
agent_executor = AgentExecutor.from_agent_and_tools(
    agent=agent, tools=tools, verbose=True
)

In [None]:
agent_executor.run("what is the average carbon footprint of cattle in Brazil")



[1m> Entering new  chain...[0m
[32;1m[1;3mThought: I need to find out the carbon footprint of cattle in Brazil
Action: Search
Action Input: "carbon footprint of cattle in Brazil"[0m
Observation: [33;1m[1;3mBut cattle ranching has a climate problem: Livestock supply chains are responsible for 14.5% of global greenhouse gas emissions and account for as much as one third of total emissions in Latin America, totaling hundreds of millions of tons of carbon dioxide each year.[0m
Thought:[32;1m[1;3m I need to find out the average carbon footprint of cattle in Brazil
Action: Search
Action Input: "average carbon footprint of cattle in Brazil"[0m
Observation: [33;1m[1;3mCarbon Footprints of Beef Life cycle GHG emissions of Brazilian beef not including LUC, i.e., comprising methane, nitrous oxide, and fossil CO2, have been estimated at around 28 kg CO2e per kg CW at the farm-gate as a national average.[0m
Thought:[32;1m[1;3m I now know the final answer
Final Answer: The average 

'The average carbon footprint of cattle in Brazil is 28 kg CO2e per kg CW at the farm-gate. Source: Carbon Footprints of Beef Life cycle GHG emissions of Brazilian beef not including LUC, i.e., comprising methane, nitrous oxide, and fossil CO2.'