# Adapting LLM Q&A to our PDF data using LangChain through Open Source Models
![src/genai.png](src/genai.png)

## Initialize hugging face authentication using access token.


In [1]:
import os
#os.environ['HuggingFaceHub_API_Token']= 'mention_your_key_here'


## Listing CV PDF Files

In the code cell below, we list the tax filing instruction files stored locally in the 'docs/' directory using the `ls` command. This command is commonly used to display the contents of a directory.


In [2]:
!ls -lh docs/
text_folder = 'docs'


total 128K
-rw-r--r-- 1 jupyter jupyter 74K Jan 27 16:54 'Profile Linkedin Laurent.pdf'
-rw-r--r-- 1 jupyter jupyter 50K Jan  8 09:22 'Profile Linkedin Tony.pdf'


## Demo: Creating and Querying Vector Index

### Step 1: Create the Query
We start by creating a query that includes the questions "Who is Tony?" and "Who is Laurent?"


In [3]:
query = """Who is Tony?
Who is Laurent?"""

query


'Who is Tony?\nWho is Laurent?'

### Step 2: Create Vector Store

Next, we create a vector store using a subset of documents from the specified text folder.

The following code snippet demonstrates the importation of key modules from the `langchain` library:

- `UnstructuredPDFLoader`: Handles loading and parsing of unstructured PDF documents.
- `CharacterTextSplitter`: Facilitates effective text splitting, breaking down textual data into characters.
- `HuggingFaceEmbeddings`: Integrates Hugging Face's embeddings, enabling state-of-the-art language representations.
- `FAISS`: Provides support for efficient storage and retrieval of document vectors through FAISS.
- `torch`: Integrates with PyTorch, a powerful deep learning library, allowing advanced customization and compatibility with neural network models.

In [4]:
from langchain.document_loaders import UnstructuredPDFLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
import torch


#### Get a list of all files in the specified directory


In [5]:
all_files = [f for f in os.listdir(text_folder) if os.path.isfile(os.path.join(text_folder, f))]
all_files


['Profile Linkedin Laurent.pdf', 'Profile Linkedin Tony.pdf']

#### Load documents from PDF files

In [6]:
documents = []
for file in all_files:
    loader = UnstructuredPDFLoader(os.path.join(text_folder, file))
    documents.extend(loader.load())
    
documents


[Document(page_content="Contact\n\n+33698600201 (Mobile) laurent.grangeau@gmail.com\n\nwww.linkedin.com/in/ laurentgrangeau (LinkedIn) www.laurentgrangeau.fr (Personal)\n\nTop Skills\n\nPublic Cloud\n\nIndependence\n\nDevOps\n\nLanguages\n\nFrancais (Native or Bilingual)\n\nAnglais (Native or Bilingual)\n\nEspagnol (Elementary)\n\nCertifications\n\nMCP 70-536 - TS: Microsoft .NET Framework - Application Development Foundation\n\nCertification Architecte SI\n\nMCTS 70-562 - TS: Microsoft .NET Framework 3.5, ASP.NET Application Development\n\nM101P, MongoDB for Developers\n\nMCTS 70-516 - TS: Accessing Data with Microsoft .NET Framework 4\n\nHonors-Awards\n\nFinaliste Hackathon DSI\n\nMicrosoft MVP Azure - 2018\n\nLaurent Grangeau\n\nSolutions Architect at Google Paris et périphérie\n\nSummary\n\nJ'accompagne nos clients dans leur transformation numérique. Je\n\nles aide à gagner en productivité, réduire leur time-to-market, ainsi\n\nque leur coût d'infrastructure en développant leur str

#### Split documents into text chunks

In [7]:
text_splitter = CharacterTextSplitter(separator='\n', chunk_size=1000, chunk_overlap=50)
text_chunks = text_splitter.split_documents(documents)
text_chunks


[Document(page_content="Contact\n+33698600201 (Mobile) laurent.grangeau@gmail.com\nwww.linkedin.com/in/ laurentgrangeau (LinkedIn) www.laurentgrangeau.fr (Personal)\nTop Skills\nPublic Cloud\nIndependence\nDevOps\nLanguages\nFrancais (Native or Bilingual)\nAnglais (Native or Bilingual)\nEspagnol (Elementary)\nCertifications\nMCP 70-536 - TS: Microsoft .NET Framework - Application Development Foundation\nCertification Architecte SI\nMCTS 70-562 - TS: Microsoft .NET Framework 3.5, ASP.NET Application Development\nM101P, MongoDB for Developers\nMCTS 70-516 - TS: Accessing Data with Microsoft .NET Framework 4\nHonors-Awards\nFinaliste Hackathon DSI\nMicrosoft MVP Azure - 2018\nLaurent Grangeau\nSolutions Architect at Google Paris et périphérie\nSummary\nJ'accompagne nos clients dans leur transformation numérique. Je\nles aide à gagner en productivité, réduire leur time-to-market, ainsi\nque leur coût d'infrastructure en développant leur stratégie cloud\nExperience", metadata={'source': 'do

#### Set the Hugging Face model name and model arguments

In [8]:
model_name = 'sentence-transformers/all-MiniLM-L6-v2'
model_name


'sentence-transformers/all-MiniLM-L6-v2'

##### Check GPU availability

In [9]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
device


device(type='cuda')

In [10]:
model_kwargs = {'device': device.type}
model_kwargs


{'device': 'cuda'}

#### Initialize Hugging Face embeddings

In [11]:
embeddings = HuggingFaceEmbeddings(model_name=model_name, model_kwargs=model_kwargs)
embeddings


  return self.fget.__get__(instance, owner)()


HuggingFaceEmbeddings(client=SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
  (2): Normalize()
), model_name='sentence-transformers/all-MiniLM-L6-v2', cache_folder=None, model_kwargs={'device': 'cuda'}, encode_kwargs={}, multi_process=False)

#### Create a vector store using FAISS from the text chunks and embeddings


In [12]:
vectorstore=FAISS.from_documents(text_chunks, embeddings)
vectorstore


<langchain_community.vectorstores.faiss.FAISS at 0x7f3a82eff760>

#### Query the vectore store

In [14]:
vectorstore.similarity_search(query)

[Document(page_content="Contact\n+33698600201 (Mobile) laurent.grangeau@gmail.com\nwww.linkedin.com/in/ laurentgrangeau (LinkedIn) www.laurentgrangeau.fr (Personal)\nTop Skills\nPublic Cloud\nIndependence\nDevOps\nLanguages\nFrancais (Native or Bilingual)\nAnglais (Native or Bilingual)\nEspagnol (Elementary)\nCertifications\nMCP 70-536 - TS: Microsoft .NET Framework - Application Development Foundation\nCertification Architecte SI\nMCTS 70-562 - TS: Microsoft .NET Framework 3.5, ASP.NET Application Development\nM101P, MongoDB for Developers\nMCTS 70-516 - TS: Accessing Data with Microsoft .NET Framework 4\nHonors-Awards\nFinaliste Hackathon DSI\nMicrosoft MVP Azure - 2018\nLaurent Grangeau\nSolutions Architect at Google Paris et périphérie\nSummary\nJ'accompagne nos clients dans leur transformation numérique. Je\nles aide à gagner en productivité, réduire leur time-to-market, ainsi\nque leur coût d'infrastructure en développant leur stratégie cloud\nExperience", metadata={'source': 'do

### Step 3: Create the LLM model 
#### Convertational model

Next, we create call the LLM model and put in heart the created vectore store.

The following code snippet showcases the importation of modules from the `transformers` library and specific modules from the `langchain` library:

- `AutoTokenizer` and `AutoModelForCausalLM`: Imported from transformers for handling `tokenization` and causal language modeling.
- `pipeline`: Another module from `transformers` for simplified access to a variety of NLP tasks.
- `HuggingFacePipeline`: A module from `langchain` that likely provides additional functionalities or customization options when working with Hugging Face's transformers.
- `RetrievalQA`: Imported from `langchain.chains` for implementing a question-answering component within the LangChain framework.

In [15]:
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers import pipeline
from langchain import HuggingFacePipeline
from langchain.chains import RetrievalQA


### Setting Language Model Name

In the code snippet below, the variable `llm_model_name` is assigned the value `'TinyLlama/TinyLlama-1.1B-Chat-v1.0'`. This represents the name or identifier of a specific language model, likely used in the context of natural language processing tasks.


In [16]:
llm_model_name = 'TinyLlama/TinyLlama-1.1B-Chat-v1.0'


##### Initializing Tokenizer from Pretrained Model

The following code initializes a tokenizer using the `AutoTokenizer.from_pretrained` method, specifying the pretrained language model named by the variable `llm_model_name`. The resulting `tokenizer` object is then displayed.


In [17]:
tokenizer = AutoTokenizer.from_pretrained(llm_model_name)
tokenizer


LlamaTokenizerFast(name_or_path='TinyLlama/TinyLlama-1.1B-Chat-v1.0', vocab_size=32000, model_max_length=2048, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'bos_token': '<s>', 'eos_token': '</s>', 'unk_token': '<unk>', 'pad_token': '</s>'}, clean_up_tokenization_spaces=False),  added_tokens_decoder={
	0: AddedToken("<unk>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	1: AddedToken("<s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	2: AddedToken("</s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
}

##### Initializing Causal Language Model from Pretrained Model

In the following code, a causal language model is initialized using the `AutoModelForCausalLM.from_pretrained` method. The pretrained language model specified by the variable `llm_model_name` is loaded onto the device automatically (`device_map='auto'`), and the model's tensor data type is set to `torch.float16`.


In [18]:
model = AutoModelForCausalLM.from_pretrained(llm_model_name,
                                             device_map='auto',
                                             torch_dtype=torch.float16
                                             )
model


LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(32000, 2048)
    (layers): ModuleList(
      (0-21): 22 x LlamaDecoderLayer(
        (self_attn): LlamaAttention(
          (q_proj): Linear(in_features=2048, out_features=2048, bias=False)
          (k_proj): Linear(in_features=2048, out_features=256, bias=False)
          (v_proj): Linear(in_features=2048, out_features=256, bias=False)
          (o_proj): Linear(in_features=2048, out_features=2048, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear(in_features=2048, out_features=5632, bias=False)
          (up_proj): Linear(in_features=2048, out_features=5632, bias=False)
          (down_proj): Linear(in_features=5632, out_features=2048, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): LlamaRMSNorm()
        (post_attention_layernorm): LlamaRMSNorm()
      )
    )
    (norm): LlamaRMSNorm()
  )
  (lm_head): Linear(i

##### Creating Text Generation Pipeline

The code snippet below demonstrates the creation of a text generation pipeline using the `pipeline` function from the `transformers` library. Various parameters are set for the pipeline, including the pretrained language model (`model`), tokenizer (`tokenizer`), tensor data type (`torch.bfloat16`), device allocation (`device_map='auto'`), maximum number of generated tokens (`max_new_tokens`), and other generation-related parameters.


In [19]:
pipe = pipeline("text-generation",
                model=model,
                tokenizer= tokenizer,
                torch_dtype=torch.bfloat16,
                device_map="auto",
                max_new_tokens = 1024,
                do_sample=True,
                top_k=10,
                num_return_sequences=1,
                eos_token_id=tokenizer.eos_token_id
                )
pipe


<transformers.pipelines.text_generation.TextGenerationPipeline at 0x7f3a69c28c40>

##### Creating Hugging Face Pipeline with LangChain


In the code snippet below, a LangChain pipeline (`HuggingFacePipeline`) is created, utilizing the previously defined text generation pipeline (`pipe`). Additional model-specific arguments are provided, setting the temperature to 0 for the language model.


In [20]:
llm = HuggingFacePipeline(pipeline=pipe, model_kwargs={'temperature':0})
llm


HuggingFacePipeline(pipeline=<transformers.pipelines.text_generation.TextGenerationPipeline object at 0x7f3a69c28c40>, model_kwargs={'temperature': 0})

##### Creating Retrieval Question-Answering Chain

The following code snippet demonstrates the creation of a Retrieval Question-Answering chain (`chain`) using the LangChain module. The chain is initialized with the following components:

- `llm`: The Hugging Face pipeline for text generation, configured earlier.
- `chain_type`: A specific chain type, denoted as "stuff."
- `return_source_documents`: Set to `True` to include the source documents in the returned output.
- `retriever`: The retriever component is defined as a vector store converted to a retriever using `vectorstore.as_retriever()`.


In [21]:
chain =  RetrievalQA.from_chain_type(
    llm=llm # -> 'TinyLlama/TinyLlama-1.1B-Chat-v1.0' [Traitement de requete] -> Traitement automatique de langage naturel
    , chain_type="stuff"
    , return_source_documents=True
    , retriever=vectorstore.as_retriever() # -> 'sentence-transformers/all-MiniLM-L6-v2' [Donnees] -> Correspondance avec la base vectoriel
) 
chain


RetrievalQA(combine_documents_chain=StuffDocumentsChain(llm_chain=LLMChain(prompt=PromptTemplate(input_variables=['context', 'question'], template="Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.\n\n{context}\n\nQuestion: {question}\nHelpful Answer:"), llm=HuggingFacePipeline(pipeline=<transformers.pipelines.text_generation.TextGenerationPipeline object at 0x7f3a69c28c40>, model_kwargs={'temperature': 0})), document_variable_name='context'), return_source_documents=True, retriever=VectorStoreRetriever(tags=['FAISS', 'HuggingFaceEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x7f3a82eff760>))

##### Invoking Retrieval Question-Answering Chain

In the code snippet below, the Retrieval Question-Answering chain (`chain`) is invoked with a specific query. The result is stored in the variable `result`, and the generated answer is accessed using `result['result']`.


In [23]:
result = chain.invoke(query)
result


{'query': 'Who is Tony?\nWho is Laurent?',
 'result': " Laurent Grangeau is a Cloud Solution Architect at Sogeti Labs and a Cloud Architect at Finaxys. He is a member of CapGemini's Expert Connect program and is a member of the Cloud Services Expert Connect community. He has been working in the cloud for over 10 years and is passionate about new technologies. He holds certifications from Hashicorp and VMware and has a background in infrastructure and on-prem applications. Tony is also an advocate for the use of Serverless frameworks on AWS, and has spoken at various Meetup events on the topic.",
 'source_documents': [Document(page_content="Contact\n+33698600201 (Mobile) laurent.grangeau@gmail.com\nwww.linkedin.com/in/ laurentgrangeau (LinkedIn) www.laurentgrangeau.fr (Personal)\nTop Skills\nPublic Cloud\nIndependence\nDevOps\nLanguages\nFrancais (Native or Bilingual)\nAnglais (Native or Bilingual)\nEspagnol (Elementary)\nCertifications\nMCP 70-536 - TS: Microsoft .NET Framework - Appli

The result schema obtained from the Retrieval Question-Answering Chain consists of three main components: `query`, `result`, and `source_documents`. Here's a breakdown of each:

1. `query`:
    - Type: String
    - Description: Represents the original query provided to the Retrieval Question-Answering Chain.
2. `result`:
    - Type: String
    - Description: Contains the generated answer or response to the query. In this specific result, it provides information related to the academic and professional background of the individual, including education, work experience, and current role.
3. `source_documents`:
    - Type: List of Documents
    - Description: Contains a list of documents (`Document` objects) that contributed to the generation of the answer. Each `Document` includes:
        1. `page_content`: The content of the document.
        2. `metadata`: Additional metadata associated with the document, in this case, the 'source' field specifying the document source file.

In [24]:
print(result['result'])


 Laurent Grangeau is a Cloud Solution Architect at Sogeti Labs and a Cloud Architect at Finaxys. He is a member of CapGemini's Expert Connect program and is a member of the Cloud Services Expert Connect community. He has been working in the cloud for over 10 years and is passionate about new technologies. He holds certifications from Hashicorp and VMware and has a background in infrastructure and on-prem applications. Tony is also an advocate for the use of Serverless frameworks on AWS, and has spoken at various Meetup events on the topic.


##### Accessing Source Documents in the Result

The following code snippet iterates over the source documents present in the result obtained from the Retrieval Question-Answering chain. The metadata information, specifically the `source` field, is extracted and printed for each source document.


In [25]:
for source in result['source_documents']:
    print(source.metadata['source'])
    

docs/Profile Linkedin Laurent.pdf
docs/Profile Linkedin Tony.pdf
docs/Profile Linkedin Laurent.pdf
docs/Profile Linkedin Laurent.pdf


## Chatbot mode

In [None]:
X = True

while True:
    
    if X:
        print("[exit|quit] pour sortir")
        print()
        X = False
        
    # Take user input
    user_input = input(">> ")

    # Check if the user wants to quit
    if user_input.lower() in ["quit", "exit"]:
        print("Exiting the conversation.")
        break
    
    result = chain.invoke(user_input)
    
    print(f"\t<<{result['result']}")
    print()
    
    print("<< sources:")
    for source in result['source_documents']:
        print(f"\t<< {source.metadata['source']}")
    print()
    

[exit|quit] pour sortir



[exit|quit] pour sortir

## Flask

#### Save the chain

In [51]:
import pickle

# Save the RetrievalQA instance to a file
with open('saved_chain.pkl', 'wb') as file:
    pickle.dump(chain, file)


#### Launch the API

In [52]:
# !python app.py


#### POST request

In [53]:
!curl -X POST -H "Content-Type: application/json" -d '{"query": "Tell me more about Laurent"}' http://127.0.0.1:5001/api/search


{
  "result": " Laurent Grangeau is a Solutions Architect at Sogeti Paris and Peripheries. He has extensive experience in Cloud Solution Architect and has worked on various projects such as Migration from SI of Kepler-Cheuvreux to AWS, Development of custom applications for the Financial sector, Migration of legacy infrastructure and Cloud migration. He is certified in Hashicorp Vault Associate for Hashicorp Terraform and has participated in the design and delivery of IT projects for his clients. Laurent is passionate about cloud technologies, especially AWS and GCP. He is a regular speaker at industry events and has been featured in the media."
}


In [54]:
!curl -X POST -H "Content-Type: application/json" -d "{\"query\": \"What is the last name of Tony?\"}" http://10.128.15.193:5001/api/search

{
  "result": " Tony Jarriault"
}


In [36]:
!curl -X POST -H "Content-Type: application/json" -d "{\"query\": \"What is the Tony phone number at Capgemini ?\"}" http://10.128.15.193:5001/api/search

{
  "result": " The Tony phone number at Capgemini is 0134490643 (Work)"
}



## Streamlit


In [55]:
pip install streamlit openai langchain

Note: you may need to restart the kernel to use updated packages.


# Start streamlit (frontend)

In [None]:
!cd streamlit
!streamlit run streamlit/streamlit_app.py --server.port 8502


Collecting usage statistics. To deactivate, set browser.gatherUsageStats to False.
[0m
[0m
[34m[1m  You can now view your Streamlit app in your browser.[0m
[0m
[34m  Network URL: [0m[1mhttp://10.128.15.193:8502[0m
[34m  External URL: [0m[1mhttp://35.225.189.176:8502[0m
[0m

`from langchain_community.llms import OpenAI`.

To install langchain-community run `pip install -U langchain-community`.
 Tony has 10 years of immersion in the cloud and has experience in infrastructure and onprem applications. He has also been involved in the cloud since 2012, which includes AWS and on-prem applications. Tony is passionate about cloud technology and shares his passion by sharing his knowledge on public events. Tony is also an experienced Sogeti Cloud Solutions Architect. He has experience with AWS, as well as DevSecops and micro-services in on-prem applications.
