In [1]:
from langchain_community.document_loaders import PyPDFLoader
from langchain_community.embeddings.ollama import OllamaEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_community.chat_models import ChatOllama
from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from langchain_text_splitters import RecursiveCharacterTextSplitter

In [2]:
# Load PDF file
loader = PyPDFLoader("data/polaris.pdf")
# Break into smaller chunks
pages = loader.load_and_split(RecursiveCharacterTextSplitter(chunk_size = 2000, chunk_overlap = 100))
# Embedded to local data vector database
db  = Chroma.from_documents(pages, OllamaEmbeddings(model="mxbai-embed-large",show_progress=True))

OllamaEmbeddings: 100%|██████████| 136/136 [03:58<00:00,  1.75s/it]


In [3]:
retriever = db.as_retriever(search_type="similarity", search_kwargs={"k": 6}) # Turn VectorStore into a Retriever
local_model = "llama3" # [Phi 3, Mistral, Gamma]
llm = ChatOllama(model=local_model, temperature = 0) # temperature > 1 more creative, random, temperature < 1 deterministic, repetitive

# System prompt
prompt = PromptTemplate(
    template="""<|begin_of_text|><|start_header_id|>system<|end_header_id|> You are an assistant for question-answering tasks. 
    Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. 
    <|eot_id|><|start_header_id|>user<|end_header_id|>
    Question: {question} 
    Context: {context} 
    Answer: <|eot_id|><|start_header_id|>assistant<|end_header_id|>""",
    input_variables=["question", "context"],
)

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)


In [4]:
question = "How many nodes does Polaris have?"
print(question)
print(rag_chain.invoke(question))

How many nodes does Polaris have?


OllamaEmbeddings: 100%|██████████| 1/1 [00:00<00:00,  9.19it/s]


Based on the provided context, it appears that Polaris has multiple nodes with varying numbers of nodes. The context shows a series of lines with node counts ranging from 273 to 297. Therefore, I can conclude that Polaris has at least 273 and up to 297 nodes. However, without more specific information, I don't know the exact total number of nodes in Polaris.


In [5]:
question = "What is Gromacs?"
print(question)
print(rag_chain.invoke(question))

What is Gromacs?


OllamaEmbeddings: 100%|██████████| 1/1 [00:01<00:00,  1.28s/it]


Based on the provided context, I found that Gromacs is not mentioned explicitly. However, it's possible that GROMACS (not Gromacs) might be related to NVIDIA Nsight tools.

GROMACS is a popular molecular dynamics simulation package used in computational chemistry and biochemistry. It's possible that the context refers to using GROMACS with NVIDIA Nsight tools for some specific purpose or application.

If you could provide more information or clarify what you mean by "Gromacs", I'd be happy to try and help further!


In [6]:
question = "What type of workflow managers I can use on Polaris?"
print(question)
print(rag_chain.invoke(question))

What type of workflow managers I can use on Polaris?


OllamaEmbeddings: 100%|██████████| 1/1 [00:01<00:00,  1.58s/it]


Based on the provided context, I don't know what type of workflow managers you can use on Polaris. The context appears to be related to NVIDIA Nsight tools and does not mention workflow managers specifically. If you could provide more information or clarify your question, I'd be happy to try and assist you further!


In [7]:
question = "What is the default programming environment on Polaris?"
print(question)
print(rag_chain.invoke(question))

What is the default programming environment on Polaris?


OllamaEmbeddings: 100%|██████████| 1/1 [00:01<00:00,  1.11s/it]


I don't know the answer to that question. The provided context does not mention a default programming environment on Polaris. It appears to be a collection of documentation and instructions for using the Polaris system, including information on logging in, compiling applications, submitting jobs, and using Lustre file striping. There is no specific mention of a default programming environment.


In [8]:
question = "How can I change the programming environment on Polaris?"
print(question)
print(rag_chain.invoke(question))

How can I change the programming environment on Polaris?


OllamaEmbeddings: 100%|██████████| 1/1 [00:01<00:00,  1.60s/it]


Based on the provided context, I don't know how to change the programming environment on Polaris. The context appears to be related to NVIDIA Nsight tools and oneAPI Math Kernel Library (oneMKL) interfaces, but it does not mention anything about changing the programming environment on Polaris. If you could provide more information or clarify what you mean by "programming environment", I may be able to help further.


In [9]:
question = "How can I run a job on Polaris? Write an example job submission script?"
print(question)
print(rag_chain.invoke(question))

How can I run a job on Polaris? Write an example job submission script?


OllamaEmbeddings: 100%|██████████| 1/1 [00:01<00:00,  1.44s/it]


To run a job on Polaris, you can use the `qsub` command with various options to specify the job submission script, queue, and other parameters.

Here is an example job submission script:
```bash
#!/bin/bash

# Load the Singularity module
module load singularity

# Set the environment variable for the container image name
IMAGE_NAME=pytorch:22.06-py3

# Run the command inside the container
singularity run --fakeroot=true $IMAGE_NAME python my_script.py
```
This script loads the Singularity module, sets the `IMAGE_NAME` variable to the desired container image, and then runs a Python script (`my_script.py`) inside the container using the `singularity run` command with the `--fakeroot=true` option.

You can modify this script to suit your specific needs and submit it to Polaris using the `qsub` command. For example:
```bash
qsub -I -A <project_name> -q debug -l select=1 -l walltime=60:00 -l singularity_fakeroot=true -l filesystems=home:eagle:grand my_job_script.sh
```
This command submits 

In [10]:
question = "How can I login to Polaris?"
print(question)
print(rag_chain.invoke(question))

How can I login to Polaris?


OllamaEmbeddings: 100%|██████████| 1/1 [00:01<00:00,  1.03s/it]


Based on the provided context, it appears that Polaris is not a login system or a specific application that requires a login process. The context seems to be related to NVIDIA Nsight tools and oneAPI Math Kernel Library (oneMKL) interfaces.

To access Polaris, I'm afraid I don't have any information that would allow me to provide instructions on how to log in. It's possible that Polaris is not a publicly accessible system or service, or it may be a proprietary tool specific to your organization or industry.

If you could provide more context or clarify what Polaris is and why you need to access it, I'd be happy to try and assist you further.


In [21]:
pages[38]

Document(page_content='6.7 Applications and Libraries6.7.1 ApplicationsGromacs on PolarisWHAT IS GROMACS?GROMACS is a versatile package to perform molecular dynamics, i.e.simulate the Newtonian equations of motion for systems withhundreds to millions of particles. It is primarily designed forbiochemical molecules like proteins, lipids, and nucleic acids thathave a lot of complicated bonded interactions, but since GROMACS isextremely fast at calculating the nonbonded interactions (that usuallydominate simulations) many groups are also using it for research onnon-biological systems, e.g. polymers.USING GROMACS AT ALCFALCF offers assistance with building binaries and compilinginstructions for GROMACS. For questions, contact us atsupport@alcf.anl.gov.BUILDING GROMACSDownload latest source code: http://manual.gromacs.org/documentation/2022.1/download.htmltar -xzf gromacs-2022.1.tar.gzmodule swap PrgEnv-nvhpc PrgEnv-gnumodule load cudatoolkit-standalone/11.2.2module load gcc/10.3.0module loa