In [1]:
import os 
from langchain_chroma import Chroma
from langchain_unstructured import UnstructuredLoader
from langchain_ollama import OllamaEmbeddings
from langchain_community.vectorstores.utils import filter_complex_metadata
from langchain.prompts import ChatPromptTemplate
from langchain_ollama import OllamaLLM
from langchain_text_splitters import RecursiveCharacterTextSplitter

In [3]:
FOLDER_PATH = '/Users/keithatienza/Desktop/Academics/Emergent Consulting [HPE]/HPE LLM v2/HPE Files/'
CHROMA_PATH = '/Users/keithatienza/Desktop/Academics/Emergent Consulting [HPE]/HPE LLM v2/DB'

In [5]:
embed_model = "mxbai-embed-large"
llm_model = "llama3"

In [7]:
prompt_template = """
Answer the question based only on the following context:

{context}

---

Answer the question based on the context above: {question}
"""

In [9]:
!ollama pull mxbai-embed-large

[?25lpulling manifest ⠋ [?25h[?25l[2K[1Gpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest ⠸ [?25h[?25l[2K[1Gpulling manifest ⠼ [?25h[?25l[2K[1Gpulling manifest ⠴ [?25h[?25l[2K[1Gpulling manifest 
pulling 819c2adf5ce6... 100% ▕████████████████▏ 669 MB                         
pulling c71d239df917... 100% ▕████████████████▏  11 KB                         
pulling b837481ff855... 100% ▕████████████████▏   16 B                         
pulling 38badd946f91... 100% ▕████████████████▏  408 B                         
verifying sha256 digest 
writing manifest 
success [?25h


In [11]:
!ollama pull llama3

[?25lpulling manifest ⠋ [?25h[?25l[2K[1Gpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest ⠸ [?25h[?25l[2K[1Gpulling manifest ⠼ [?25h[?25l[2K[1Gpulling manifest ⠴ [?25h[?25l[2K[1Gpulling manifest 
pulling 6a0746a1ec1a... 100% ▕████████████████▏ 4.7 GB                         
pulling 4fa551d4f938... 100% ▕████████████████▏  12 KB                         
pulling 8ab4849b038c... 100% ▕████████████████▏  254 B                         
pulling 577073ffcc6c... 100% ▕████████████████▏  110 B                         
pulling 3f8eb4da87fa... 100% ▕████████████████▏  485 B                         
verifying sha256 digest 
writing manifest 
success [?25h


In [13]:
def load_documents():
    docs = []
    for file in os.listdir(FOLDER_PATH):
        if file.endswith('.pdf'):
            pdf_path = FOLDER_PATH + "/" + file
            loader = UnstructuredLoader(pdf_path)
            docs.extend(loader.load())
    docs = filter_complex_metadata(docs)
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=200)
    chunks = text_splitter.split_documents(docs)
    chunks = [c for c in chunks if c.metadata.get('category') != 'Header'] # customized to remove repetitive headers of product name, may not be generalizable
    return chunks

In [15]:
def populate_database(docs):
    db = Chroma(persist_directory=CHROMA_PATH, embedding_function=OllamaEmbeddings(model=embed_model))
    db.add_documents(docs)
    return db

In [17]:
def query_rag(query_text, db):
    results = db.similarity_search_with_score(query_text, k=20)
    context_text = "\n\n---\n\n".join(["Product:" + docs.metadata.get("filename")[:-7] + "\n\n" + docs.page_content for docs, _score in results])
    #print(results)
    prompt = ChatPromptTemplate.from_template(prompt_template)
    prompt = prompt.format(context=context_text, question=query_text)
    
    model = OllamaLLM(model="llama3")
    response_text = model.invoke(prompt)
    formatted_response = f"Response: {response_text} \n\n"
    #print(formatted_response)
    return response_text

In [19]:
docs = load_documents()

INFO: pikepdf C++ to Python logger bridge initialized


In [21]:
db = populate_database(docs)

INFO: Anonymized telemetry enabled. See                     https://docs.trychroma.com/telemetry for more information.
INFO: HTTP Request: POST http://127.0.0.1:11434/api/embed "HTTP/1.1 200 OK"


In [21]:
query = "Is the RTX 4000 GPU supported in any of HPE’s server models?" 
print(query_rag(query,db))

INFO: HTTP Request: POST http://127.0.0.1:11434/api/embed "HTTP/1.1 200 OK"
INFO: HTTP Request: POST http://127.0.0.1:11434/api/generate "HTTP/1.1 200 OK"


According to the provided context, the NVIDIA RTX 4000 Ada Graphics Accelerator for HPE is supported in the following HPE ProLiant servers:

* Product:HPE ProLiant ML350 Gen11-a50004308
* Product:HPE ProLiant DL380 Gen11-a50004307 (with some limitations)

So, yes, the RTX 4000 GPU is supported in at least two HPE server models.


In [26]:
query = "How many GPUs can fit in a Proliant DL380a Gen11 server?"
print(query_rag(query,db))

INFO: HTTP Request: POST http://127.0.0.1:11434/api/embed "HTTP/1.1 200 OK"
INFO: HTTP Request: POST http://127.0.0.1:11434/api/generate "HTTP/1.1 200 OK"


According to the first product description, the HPE ProLiant DL380a Gen11 server is "accelerator optimized" and supports 4 double-wide or 8 single-wide accelerators in a standard 2U 2P form factor.

Additionally, from the Accelerator Intel Data Center GPU Max 1100 48GB Accelerator for HPE notes, it's mentioned that this accelerator is supported in the front GPU cages of DL380a Gen11 4 Double Wide CTO Server (P54903-B21) only. This suggests that the server has at least 2 GPU cages.

Given these details, we can conclude that a Proliant DL380a Gen11 server can fit up to 8 single-wide GPUs or 4 double-wide GPUs.


In [30]:
query = "The customer has limited space in it’s datacenter and also no rack currently. Which server model would best fit its needs?"
print(query_rag(query,db))

INFO: HTTP Request: POST http://127.0.0.1:11434/api/embed "HTTP/1.1 200 OK"
INFO: HTTP Request: POST http://127.0.0.1:11434/api/generate "HTTP/1.1 200 OK"


Based on the context, I would recommend the HPE ProLiant ML350 Gen11-a50004308enw as the best option for the customer.

According to the product information, this server has a Form Factor of 4U Tower, which means it can be used as a standalone tower server. Additionally, it has an Optional Tower-to-Rack conversion kit (P47394-B21) that can convert the unit to a 5U Rack-mount server if needed.

Since the customer has limited space in their datacenter and no rack currently, using the server as a tower would be the best option. This way, they can still have a powerful server without taking up too much space. If they decide to upgrade or expand their infrastructure in the future, the tower-to-rack conversion kit provides an easy upgrade path.

The other servers mentioned (HPE ProLiant DL325 Gen11-a50004297enw and HPE ProLiant DL360 Gen11-a50004306enw) are all rack-mount servers, which might not be suitable for a customer with limited space.


In [23]:
query = "Which CPU available for the DL380 Gen11 has the highest amount of cores, and how many does it have?"
print(query_rag(query,db))

INFO: HTTP Request: POST http://127.0.0.1:11434/api/embed "HTTP/1.1 200 OK"
INFO: HTTP Request: POST http://127.0.0.1:11434/api/generate "HTTP/1.1 200 OK"


Based on the provided context, I can answer your question.

According to the product information, the HPE ProLiant DL380 Gen11 supports 5th Generation Intel Xeon Processors. Specifically, the "HPE ProLiant DL380 Gen11 5418Y 2.0GHz 24-core 1P 64GB-R MR408i-o NC 8SFF 800W PS Server" has a CPU with 24 cores.

Therefore, the answer is: The CPU available for the DL380 Gen11 that has the highest amount of cores is the 5418Y with 24 cores.


In [25]:
query = "Which CPU available for the DL380 Gen11 has the highest amount of cores, and what is the maximum core that you can install?"
print(query_rag(query,db))

INFO: HTTP Request: POST http://127.0.0.1:11434/api/embed "HTTP/1.1 200 OK"
INFO: HTTP Request: POST http://127.0.0.1:11434/api/generate "HTTP/1.1 200 OK"


Based on the context provided, it appears that the 5th Generation Intel Xeon Processors are supported by the HPE ProLiant DL380 Gen11 server. According to the specifications for these processors, they support up to 28 cores.

However, there is also a mention of "4th and 5th Generation Intel Xeon Scalable Processors" in the context, which suggests that the processor family includes multiple generations with varying core counts. The exact model number of the CPU is not provided in the context, but it appears that the highest amount of cores available for the DL380 Gen11 server is up to 28 cores.

It's important to note that the actual CPU model and specifications may vary depending on the specific configuration or upgrade options chosen.


In [27]:
query = "How many memory channels does the DL360 Gen 11 have?"
print(query_rag(query,db))

INFO: HTTP Request: POST http://127.0.0.1:11434/api/embed "HTTP/1.1 200 OK"
INFO: HTTP Request: POST http://127.0.0.1:11434/api/generate "HTTP/1.1 200 OK"


Based on the provided context, there is no specific information about the number of memory channels for the HPE ProLiant DL360 Gen11. However, we can infer that it supports up to 5600 MT/s HPE DDR5 Smart Memory up to 4.0 TB per socket, which suggests a high-density memory architecture.
