In [203]:
!pip show langchain

Name: langchain
Version: 0.1.0
Summary: Building applications with LLMs through composability
Home-page: https://github.com/langchain-ai/langchain
Author: 
Author-email: 
License: MIT
Location: /home/yinqiu/anaconda3/envs/tutorial/lib/python3.9/site-packages
Requires: aiohttp, async-timeout, dataclasses-json, jsonpatch, langchain-community, langchain-core, langsmith, numpy, pydantic, PyYAML, requests, SQLAlchemy, tenacity
Required-by: trulens-eval


# LangChain multi-doc retriever with ChromaDB

***New Points***
- Multiple Files
- ChromaDB
- Source info
- gpt-3.5-turbo API

## Setting up LangChain


In [204]:
import os

os.environ["OPENAI_API_KEY"] = "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"

In [205]:
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.llms import OpenAI
from langchain.chains import RetrievalQA
from langchain.document_loaders import TextLoader, PyPDFLoader
from langchain.document_loaders import DirectoryLoader
import time
from langchain.agents.agent_toolkits import create_retriever_tool
from langchain.agents.agent_toolkits import create_conversational_retrieval_agent
from langchain.chat_models import ChatOpenAI

## Load multiple and process documents

In [208]:
# Load and process the text files for SDMA 
loader_SDMA = TextLoader('./ref_SDMA.txt')
documents_SDMA = loader_SDMA.load()
#splitting the text into
text_splitter_SDMA = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
texts_SDMA = text_splitter_SDMA.split_documents(documents_SDMA)

In [209]:
# Load and process the text files for RSMA
loader_RSMA = TextLoader('./ref_RSMA.txt')
documents_RSMA = loader_RSMA.load()
#splitting the text into
text_splitter_RSMA = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
texts_RSMA = text_splitter_RSMA.split_documents(documents_RSMA)

In [210]:
# Test the results of textual embedding
print(len(texts_SDMA), len(texts_RSMA))

31 45


## create the DB

In [211]:
# Embed and store the texts
# Supplying a persist_directory will store the embeddings on disk
persist_directory_SDMA = str(time.time())

## here we are using OpenAI embeddings but in future we will swap out to local embeddings
embedding_SDMA = OpenAIEmbeddings()

vectordb_SDMA = Chroma.from_documents(documents=texts_SDMA,
                                 embedding=embedding_SDMA,
                                 persist_directory=persist_directory_SDMA)

# persiste the db to disk
vectordb_SDMA.persist()

# Now we can load the persisted database from disk, and use it as normal.
vectordb_SDMA = Chroma(persist_directory=persist_directory_SDMA,
                  embedding_function=embedding_SDMA)

In [212]:
# Embed and store the texts
# Supplying a persist_directory will store the embeddings on disk
persist_directory_RSMA = str(time.time())

## here we are using OpenAI embeddings but in future we will swap out to local embeddings
embedding_RSMA = OpenAIEmbeddings()

vectordb_RSMA = Chroma.from_documents(documents=texts_RSMA,
                                 embedding=embedding_RSMA,
                                 persist_directory=persist_directory_RSMA)

# persiste the db to disk
vectordb_RSMA.persist()

# Now we can load the persisted database from disk, and use it as normal.
vectordb_RSMA = Chroma(persist_directory=persist_directory_RSMA,
                  embedding_function=embedding_RSMA)

## Make a retriever

In [213]:
# Create the retriever
retriever_SDMA = vectordb_SDMA.as_retriever(search_kwargs={"k": 3})

# Print the retriever for checking
print(retriever_SDMA.search_type, retriever_SDMA.search_kwargs)

similarity {'k': 3}


In [214]:
# Create the retriever
retriever_RSMA = vectordb_RSMA.as_retriever(search_kwargs={"k": 3})

# Print the retriever for checking
print(retriever_RSMA.search_type, retriever_RSMA.search_kwargs)

similarity {'k': 3}


# The Conversational Agent


In [215]:
tool_SDMA = create_retriever_tool(
    retriever_SDMA,
    "search_by_similarity",
    "Searches and returns the most relevant content that can answer the question",
)
tools_SDMA = [tool_SDMA]
llm_SDMA = ChatOpenAI(temperature=0)
agent_executor_SDMA = create_conversational_retrieval_agent(llm_SDMA, tools_SDMA, verbose=True)

In [216]:
tool_RSMA = create_retriever_tool(
    retriever_RSMA,
    "search_by_similarity",
    "Searches and returns the most relevant content that can answer the question",
)
tools_RSMA = [tool_RSMA]
llm_RSMA = ChatOpenAI(temperature=0)
agent_executor_RSMA = create_conversational_retrieval_agent(llm_RSMA, tools_RSMA, verbose=True)

## The semantic router (test)

In [190]:
from langchain.utils.math import cosine_similarity
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import RunnableLambda, RunnablePassthrough
from langchain.chat_models import ChatOpenAI

In [191]:
RSMA_template = """You are a satellite expert who is good at modeling satellite communications. However, you can only answer the question about satellite models that use RSMA as the access protocol."""

SDMA_template = """You are a satellite expert who is good at modeling satellite communications. However, you can only answer the question about satellite models that use SDMA as the access protocol."""

embeddings = OpenAIEmbeddings()
prompt_templates = [RSMA_template, SDMA_template]
prompt_embeddings = embeddings.embed_documents(prompt_templates)


def prompt_router(input):
    query_embedding = embeddings.embed_query(input["query"])
    similarity = cosine_similarity([query_embedding], prompt_embeddings)[0]
    most_similar = prompt_templates[similarity.argmax()]  # route to the most similar base
    if most_similar == RSMA_template:
        print("ok1")
        agent_executor_final = agent_executor_RSMA  # select the RSMA template as the RAG dataset
    elif most_similar == SDMA_template:
        print("ok2")
        agent_executor_final = agent_executor_SDMA  # select the SDMA template as the RAG dataset
    return most_similar # should call the specific RAG system ---> select which database to use


# This chain does not have any memory
chain = (
    {"query": RunnablePassthrough()}
    | RunnableLambda(prompt_router)
    | ChatOpenAI()
    | StrOutputParser()
)

In [192]:
print(chain.invoke("You are an expert in satellite communications. Please help me to formulate a SDMA satellite communication model. I will show you the scenario later."))

ok2
Sure, I'd be happy to help with any questions you have specifically about satellite models that use SDMA (Space Division Multiple Access) as the access protocol. Just let me know what you'd like to know!


## Start the satellite communication modeling

In [217]:
result = agent_executor_SDMA({"input": "You are an expert in satellite communications. Please help me to formulate a satellite communication model. I will show you the scenario later."})




[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mOf course! I'd be happy to help you formulate a satellite communication model. Please provide me with the scenario or details of the satellite communication system you have in mind so that I can assist you better.[0m

[1m> Finished chain.[0m


In [218]:
result = agent_executor_SDMA({"input": "My application covers a wide range and is heterogeneous. Show me the stakeholders and their relationship in the satellite network briefly. Search based on your local knowledge."})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `search_by_similarity` with `{'query': 'stakeholders and their relationship in satellite network'}`


[0m[36;1m[1;3m[Document(page_content='We consider the heterogeneous mega-constellation network composed of one $N_{\\rm M}$-antenna GEO satellite and $N$  $N_{\\rm T}$-antenna LEO satellites, which is shown in Fig.~\\ref{fig_PSNC}. The GEO satellite serves $M$ single-antenna GEO ground users (GGUs) and each LEO satellite serves $K$ single-antenna LEO ground users (LGUs). By accommodating multiple LEO satellites under the coverage of the GEO satellite, the wireless coverage of the GEO satellite and the information rate of users, especially for the users at the edge areas and dead points, may be greatly enhanced. In order to fully utilize the limited frequserncy band resources, the LEO satellites are allowed to utilize the same licensed spectrum of the GEO satellite without adverse impact on the GGUs. For clarity,

In [219]:
result = agent_executor_SDMA({"input": "Since heterogeneous satellite networks are considered, to ensure scalability, please use the SDMA protocol at the LEO satellite. Show the transmit signals generated by the GEO satellite. Answer based on your local knowledge."})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mIn a heterogeneous satellite network scenario with the use of Space Division Multiple Access (SDMA) protocol at the LEO satellite, the transmit signals generated by the GEO satellite can be represented as follows:

At the $t$-th time slot, the transmit signals generated by the GEO satellite can be expressed as:

\[ \textbf{x}_{0} =  \sum\limits_{m=1}^{M}\textbf{w}_{0,m}s_{0,m} \]

Where:
- $\textbf{x}_{0}$: Transmit signal generated by the GEO satellite.
- $M$: Number of single-antenna GEO Ground Users (GGUs) served by the GEO satellite.
- $\textbf{w}_{0,m}$: Beamforming vector associated with the $m$-th GGU.
- $s_{0,m}$: Signal transmitted to the $m$-th GGU.

This formulation represents the transmission of signals from the GEO satellite to the designated GGUs in the network, ensuring effective communication within the heterogeneous satellite network environment.[0m

[1m> Finished chain.[0m


In [220]:
result = agent_executor_SDMA({"input": "The satellites are moving with time, which kind of channel I should use?"})




[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mIn a scenario where satellites are moving with time, the most suitable type of channel to use for communication is a time-varying channel. Time-varying channels are characterized by changes in channel conditions over time due to the movement of transmitters, receivers, or the environment.

Using a time-varying channel model is essential for satellite communication systems where the positions of the satellites are constantly changing. This type of channel model accounts for the dynamic nature of the communication environment and helps in adapting to the varying channel conditions to maintain reliable and efficient communication between the moving satellites and ground users.[0m

[1m> Finished chain.[0m


In [221]:
result = agent_executor_SDMA({"input": "Good, show me the corresponding channel model. Use your local knowledge."})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mIn a satellite communication system where satellites are moving with time, the corresponding channel model that can be used is a time-varying channel model. The time-varying channel model accounts for the changes in channel conditions over time due to the movement of the satellites.

The channel model for time-varying channels can be represented as follows:

\[ \textbf{H}(t) = \sum_{i=1}^{L} \alpha_i(t) \textbf{a}_r(\theta_i) \textbf{a}_t(\phi_i)^H \]

Where:
- $\textbf{H}(t)$: Channel matrix at time $t$.
- $L$: Number of paths in the channel.
- $\alpha_i(t)$: Complex gain of the $i$-th path at time $t$.
- $\textbf{a}_r(\theta_i)$: Receive antenna array response vector.
- $\textbf{a}_t(\phi_i)$: Transmit antenna array response vector.
- $\theta_i$ and $\phi_i$: Angles of arrival and departure for the $i$-th path.

This channel model captures the dynamic nature of the communication channel between the moving satellites and gro

In [222]:
result = agent_executor_SDMA({"input": "Good. I would like to maximize the system EE in a heterogeneous network. Show me the formulated problem with equations. Formulate the probelm by searching your local knowledge."})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `search_by_similarity` with `{'query': 'maximize system energy efficiency in a heterogeneous network formulation'}`


[0m[36;1m[1;3m[Document(page_content='\\subsection{EE maximization Problem Formulation}\n\nFor the considered heterogeneous mega-constellation network, the system EE is denoted by\n\\begin{equation}\n{\\rm EE}\\left(\\{\\textbf{w}_{n,k}\\}\\right)=\\sum\\limits_{n=1}^{N}\\frac{\\sum\\limits_{k=1}^{K}R_{n,k}\\left(\\{\\textbf{w}_{n,k}\\}\\right)}{\\mu \\left( \\sum\\limits_{k=1}^{K}\\left\\|\\textbf{w}_{n,k}\\right\\|^{2}\\right) + P_{\\rm C}},\n\\end{equation}\nwhere $\\mu \\in [1,\\infty)$ and $P_{\\rm C}$ denote the power amplifier efficiency factor and the constant power consumption by circuit modules, respectively.', metadata={'source': './ref_SDMA.txt'}), Document(page_content='\\subsection{EE  problem formulation}\nWe aim to maximize the system EE  of the considered system under the constra

In [46]:
result = agent_executor_SDMA({"input": "Great. What if the satelites enviroment keeps fiexed. What should I do to revise such a model? Answer based on your local knowledge."})




[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mIf the satellite environment remains fixed and does not change, you can revise the model by considering the fixed environmental parameters in the optimization problem. Here are some steps you can take to revise the model:

1. **Fixed Satellite Positions**: If the satellite positions are fixed and do not change, you can incorporate these fixed positions into the model. This can help in optimizing the beamforming vectors based on the specific satellite locations.

2. **Fixed Channel Characteristics**: Since the satellite environment is fixed, you can consider the fixed channel characteristics in the optimization problem. This includes factors such as path loss, shadowing, and fading effects that are constant in the environment.

3. **Fixed Interference Levels**: If the interference levels from other sources remain constant in the satellite environment, you can include these fixed interference levels in the model. This will help

In [132]:
result = agent_executor_SDMA({"input": "Great. Show me the equations of the new channel model. Using your local knowledge."})




[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mIn a fixed satellite environment where the channel characteristics remain constant over time, the channel model equation can be simplified to a static representation. The new channel model equation for a fixed satellite environment can be expressed as follows:

\[ \textbf{h} = \sqrt{G_{s}G_{k}\left(\frac{c}{4\pi f_{c}d_{s}}\right)^{2}}\textbf{g} \]

In this equation:
- \( G_{s} \) and \( G_{k} \) represent the satellite antenna gain and the user antenna gain, respectively.
- \( c \) is the speed of light, \( f_{c} \) is the carrier frequency, and \( d_{s} \) is the distance between the satellite and the user.
- \( \textbf{g} \) represents the channel gain or path loss in a fixed satellite environment.

This simplified static channel model equation captures the fixed characteristics of the communication link between the satellites and the ground users in a scenario where the satellite environment does not change dynamically.[

## The paper experiments (for the case)

In [None]:
result = agent_executor_SDMA({"input": "Great. Show me the equations of the new channel model. Using your local knowledge."})
