<a href="https://colab.research.google.com/github/IYashCanCode/Canoo-LLM-RAG/blob/main/Canoo_LLM_based_RAG.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [5]:
# installing libraries --> pip install langchain sentence-transformers faiss-cpu

from langchain.document_loaders import CSVLoader
from langchain.prompts import PromptTemplate
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
from langchain.llms import HuggingFaceHub
from langchain.memory import VectorStoreRetrieverMemory
from langchain.chains import LLMChain,RetrievalQA,ConversationalRetrievalChain,RetrievalQAWithSourcesChain
import os

In [6]:
os.environ['HUGGINGFACEHUB_API_TOKEN'] = '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'                          #Hugging Faces access token

In [7]:
csv_data = CSVLoader('Canoo EV.csv',source_column = 'Website Content')         #Loading the CSV file and only using 1 column which contains all the information

In [8]:
csv_content = csv_data.load()

In [9]:
chunker = RecursiveCharacterTextSplitter(chunk_size=1000,chunk_overlap=30)
csv_chunking = chunker.split_documents(csv_content)

In [10]:
embeddings = HuggingFaceEmbeddings(model_name = 'sentence-transformers/all-MiniLM-L6-v2')
vectorstore = FAISS.from_documents(csv_chunking,embeddings)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.7k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [11]:
vectorstore.save_local("vector_embeddings", index_name="base_and_adjacent")
vectorstore  =   FAISS.load_local("vector_embeddings", embeddings, index_name="base_and_adjacent")
retriever = vectorstore.as_retriever(search_type = 'mmr',search_kwargs={'k':5})
memory = VectorStoreRetrieverMemory(retriever=retriever)

In [12]:
model = HuggingFaceHub(repo_id = 'HuggingFaceH4/zephyr-7b-alpha',model_kwargs={'tempreature':0.5,'max_new_tokens':1500,'max_length':100})

In [13]:
text = """ You are an AI assistant for Canoo. You have to answer the queries asked to you by the user.
You have to answer the queries in detail.

Instructions to the assistant:
a) Explain in form pointers in detail and include only those pointers which matches the query asked by the user.
b) If you answer in points, explain the point in detail again, considering the important points from previous detailed explaination.
c) Only answer user query.DO not explain any other points asked in the query and do not use follow up questions by your own.
d) Keep your responses in form pointers.

User : {query}
Assistant :
"""

prompt = PromptTemplate(input_variables=["query"],template = text)

In [14]:
chain = ConversationalRetrievalChain.from_llm(llm=model,retriever = retriever)

In [15]:
def LLM(query):
  chat_history = []
  result = chain.invoke({"question": query,
                  "chat_history":chat_history})
  return result['answer'].split('Helpful Answer:')[-1]

In [16]:
list_of_questions = ["""Identify the industry in which Canoo operates, along with its size, growth rate, trends, and key players.""",
                     """Analyze Canoo's main competitors, including their market share, products or services offered, pricing strategies, and marketing efforts.""",
                     """Identify key trends in the market, including changes in consumer behavior, technological advancements, and shifts in the competitive landscape.""",
                     """Gather information on Canoo's financial performance, including its revenue, profit margins, return on investment, and expense structure."""]

In [17]:
for query in list_of_questions:
  response = LLM(query)
  print("User : ",query,"\nAssistant : ",response)
  print("\n\n")

User :  Identify the industry in which Canoo operates, along with its size, growth rate, trends, and key players. 
Assistant :   Canoo operates in the electric vehicle (EV) industry, which is experiencing significant growth and trends. According to a report by MarketsandMarkets, the global EV market size is expected to grow from $119.4 billion in 2020 to $310.4 billion by 2025, at a CAGR of 22.3% during the forecast period. Key players in the industry include Tesla, General Motors, Volkswagen, Ford, and Canoo. Canoo's revenue is ranked 14th among its top 10 competitors, with a revenue of $<1M in Q3 2023. However, the company's product offerings, including true steer-by-wire, leaf spring suspension systems, advanced drivetrain systems, battery and battery management systems, electrical systems architecture, lifestyle vehicles, delivery, multi-purpose delivery vehicles, and pickup, position it as a significant player in the industry. Canoo's Digital Ecosystem software provides key tools 