In [3]:
import os
import subprocess
import threading

os.environ['HF_HOME']='/srv/starter_content/cache'

In [4]:
model = "Trendyol/Trendyol-LLM-7b-chat-v0.1"

## NDP LLM Service Documentation

This Python code snippet is designed to launch various components of a chat service named "FastChat." Each function starts a different part of the service using the `subprocess.run` method to execute shell commands.

### `run_controller()`

Starts the controller for the FastChat service, responsible for managing and coordinating different parts of the service.

```python
def run_controller():
    subprocess.run(["python3", "-m", "fastchat.serve.controller", "--host", "127.0.0.1"])


In [5]:
def run_controller():
    subprocess.run(["python3", "-m", "fastchat.serve.controller", "--host", "127.0.0.1"])

## `run_worker`
Initiates a model worker for processing and generating responses based on specified models. Runs the model worker module, specifying the local host and a list of model names for processing requests. The --model-path argument should point to the directory where the models are stored.
```python 
def run_model_worker():
    subprocess.run(["python3", "-m", "fastchat.serve.model_worker", "--host", "127.0.0.1", "--model-names", "gpt-4-turbo,Trendyol-LLM-7b-chat-v0.1,text-embedding-ada-002", "--model-path", model])

```
### `run_api`

Launches an API server that handles API requests to the FastChat service.
Runs the API server module on the local host, acting as an interface between the service and external clients or applications.
```python
def run_api_server():
    subprocess.run(["python3", "-m", "fastchat.serve.openai_api_server", "--host", "127.0.0.1"])
```    


In [9]:
def run_model_worker():
    subprocess.run(["python3", "-m", "fastchat.serve.model_worker", "--host", "127.0.0.1", "--model-names", "gpt-3.5-turbo,Trendyol-LLM-7b-chat-v0.1,text-embedding-ada-002", "--model-path", model])

def run_api_server():
    subprocess.run(["python3", "-m", "fastchat.serve.openai_api_server", "--host", "127.0.0.1"])
def run_ui_server():
    subprocess.run(["python3", "-m", "fastchat.serve.gradio_web_server", "--host", "127.0.0.1"])


## Starting the `run_controller` Function in a Separate Thread

To enable the FastChat controller to run concurrently with the main program, the `run_controller` function is executed in a separate thread. This is achieved using Python's `threading` module, which allows for the execution of code in parallel to the main execution flow of the program.

### Code Snippet:

```python
import threading

controller_thread = threading.Thread(target=run_controller)
controller_thread.start()
```

In [6]:
controller_thread = threading.Thread(target=run_controller)
controller_thread.start()

2024-03-05 00:25:31 | INFO | controller | args: Namespace(host='127.0.0.1', port=21001, dispatch_method='shortest_queue', ssl=False)
2024-03-05 00:25:31 | ERROR | stderr | INFO:     Started server process [4337]
2024-03-05 00:25:31 | ERROR | stderr | INFO:     Waiting for application startup.
2024-03-05 00:25:31 | ERROR | stderr | INFO:     Application startup complete.
2024-03-05 00:25:31 | ERROR | stderr | INFO:     Uvicorn running on http://127.0.0.1:21001 (Press CTRL+C to quit)


## Starting the `run_model_worker` Function in a Separate Thread

To facilitate concurrent execution of the FastChat model worker alongside the main program and potentially other service components, the `run_model_worker` function is executed in a separate thread. This concurrent execution is made possible through the use of Python's `threading` module.

### Code Snippet:

```python
import threading

model_worker_thread = threading.Thread(target=run_model_worker)
model_worker_thread.start()
```

In [7]:
# !pip install langchain openai chromadb

In [10]:
model_worker_thread = threading.Thread(target=run_model_worker)
model_worker_thread.start()

2024-03-05 00:26:25 | INFO | model_worker | args: Namespace(host='127.0.0.1', port=21002, worker_address='http://localhost:21002', controller_address='http://localhost:21001', model_path='Trendyol/Trendyol-LLM-7b-chat-v0.1', revision='main', device='cuda', gpus=None, num_gpus=1, max_gpu_memory=None, dtype=None, load_8bit=False, cpu_offloading=False, gptq_ckpt=None, gptq_wbits=16, gptq_groupsize=-1, gptq_act_order=False, awq_ckpt=None, awq_wbits=16, awq_groupsize=-1, enable_exllama=False, exllama_max_seq_len=4096, exllama_gpu_split=None, exllama_cache_8bit=False, enable_xft=False, xft_max_seq_len=4096, xft_dtype=None, model_names=['gpt-3.5-turbo', 'Trendyol-LLM-7b-chat-v0.1', 'text-embedding-ada-002'], conv_template=None, embed_in_truncate=False, limit_worker_concurrency=5, stream_interval=2, no_register=False, seed=None, debug=False, ssl=False)
2024-03-05 00:26:25 | INFO | model_worker | Loading the model ['gpt-3.5-turbo', 'Trendyol-LLM-7b-chat-v0.1', 'text-embedding-ada-002'] on worke

2024-03-04 20:20:09 | INFO | model_worker | args: Namespace(host='127.0.0.1', port=21002, worker_address='http://localhost:21002', controller_address='http://localhost:21001', model_path='Trendyol/Trendyol-LLM-7b-chat-v0.1', revision='main', device='cuda', gpus=None, num_gpus=1, max_gpu_memory=None, dtype=None, load_8bit=False, cpu_offloading=False, gptq_ckpt=None, gptq_wbits=16, gptq_groupsize=-1, gptq_act_order=False, awq_ckpt=None, awq_wbits=16, awq_groupsize=-1, enable_exllama=False, exllama_max_seq_len=4096, exllama_gpu_split=None, exllama_cache_8bit=False, enable_xft=False, xft_max_seq_len=4096, xft_dtype=None, model_names=['gpt-3.5-turbo', 'Trendyol-LLM-7b-chat-v0.1', 'text-embedding-ada-002'], conv_template=None, embed_in_truncate=False, limit_worker_concurrency=5, stream_interval=2, no_register=False, seed=None, debug=False, ssl=False)
2024-03-04 20:20:09 | INFO | model_worker | Loading the model ['gpt-3.5-turbo', 'Trendyol-LLM-7b-chat-v0.1', 'text-embedding-ada-002'] on worke

## Running the `run_api_server` Function in a Separate Thread

To ensure the API server component of the FastChat service operates concurrently with other parts of the application, the `run_api_server` function is launched in a separate thread. This concurrency is achieved with the help of Python's `threading` module, allowing multiple components to run simultaneously, improving scalability and responsiveness.

### Code Snippet:

```python
import threading

api_server_thread = threading.Thread(target=run_api_server)
api_server_thread.start()


In [11]:
api_server_thread = threading.Thread(target=run_api_server)
api_server_thread.start()

2024-03-05 00:26:56 | INFO | openai_api_server | args: Namespace(host='127.0.0.1', port=8000, controller_address='http://localhost:21001', allow_credentials=False, allowed_origins=['*'], allowed_methods=['*'], allowed_headers=['*'], api_keys=None, ssl=False)
2024-03-05 00:26:56 | ERROR | stderr | INFO:     Started server process [4675]
2024-03-05 00:26:56 | ERROR | stderr | INFO:     Waiting for application startup.
2024-03-05 00:26:56 | ERROR | stderr | INFO:     Application startup complete.
2024-03-05 00:26:56 | ERROR | stderr | INFO:     Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
2024-03-05 00:27:29 | INFO | model_worker | Send heart beat. Models: ['gpt-3.5-turbo', 'Trendyol-LLM-7b-chat-v0.1', 'text-embedding-ada-002']. Semaphore: None. call_ct: 0. worker_id: de683719. 
2024-03-05 00:27:29 | INFO | controller | Receive heart beat. http://localhost:21002
2024-03-05 00:27:29 | INFO | stdout | INFO:     127.0.0.1:35308 - "POST /receive_heart_beat HTTP/1.1" 200 OK


In [13]:
!curl  -X 'GET' \
  'http://localhost:8000/v1/models' \
  -H 'accept: application/json'

{"object":"list","data":[{"id":"Trendyol-LLM-7b-chat-v0.1","object":"model","created":1709598624,"owned_by":"fastchat","root":"Trendyol-LLM-7b-chat-v0.1","parent":null,"permission":[{"id":"modelperm-b7PyASyzU9WuBLJjsaSogH","object":"model_permission","created":1709598624,"allow_create_engine":false,"allow_sampling":true,"allow_logprobs":true,"allow_search_indices":true,"allow_view":true,"allow_fine_tuning":false,"organization":"*","group":null,"is_blocking":false}]},{"id":"gpt-3.5-turbo","object":"model","created":1709598624,"owned_by":"fastchat","root":"gpt-3.5-turbo","parent":null,"permission":[{"id":"modelperm-bo3dwMczyrpu4s4LUyYaNa","object":"model_permission","created":1709598624,"allow_create_engine":false,"allow_sampling":true,"allow_logprobs":true,"allow_search_indices":true,"allow_view":true,"allow_fine_tuning":false,"organization":"*","group":null,"is_blocking":false}]},{"id":"text-embedding-ada-002","object":"model","created":1709598624,"owned_by":"fastchat","root":"text-emb

2024-03-05 00:30:24 | INFO | controller | Register a new worker: http://localhost:21002
2024-03-05 00:30:24 | INFO | stdout | INFO:     127.0.0.1:40450 - "POST /worker_get_status HTTP/1.1" 200 OK
2024-03-05 00:30:24 | INFO | controller | Register done: http://localhost:21002, {'model_names': ['gpt-3.5-turbo', 'Trendyol-LLM-7b-chat-v0.1', 'text-embedding-ada-002'], 'speed': 1, 'queue_length': 0}
2024-03-05 00:30:24 | INFO | stdout | INFO:     127.0.0.1:57632 - "POST /refresh_all_workers HTTP/1.1" 200 OK
2024-03-05 00:30:24 | INFO | stdout | INFO:     127.0.0.1:57640 - "POST /list_models HTTP/1.1" 200 OK
2024-03-05 00:30:24 | INFO | stdout | INFO:     127.0.0.1:58558 - "GET /v1/models HTTP/1.1" 200 OK
2024-03-05 00:30:29 | INFO | model_worker | Send heart beat. Models: ['gpt-3.5-turbo', 'Trendyol-LLM-7b-chat-v0.1', 'text-embedding-ada-002']. Semaphore: None. call_ct: 0. worker_id: de683719. 
2024-03-05 00:30:29 | INFO | controller | Receive heart beat. http://localhost:21002
2024-03-05 0

In [19]:
!curl http://localhost:8000/v1/chat/completions  -H "Content-Type: application/json"  -d '{ "model": "Trendyol-LLM-7b-chat-v0.1", "messages": [{"role": "user", "content": "Who is Fumio Kishida?"}]}'

2024-03-05 00:36:11 | INFO | stdout | INFO:     127.0.0.1:43408 - "POST /list_models HTTP/1.1" 200 OK
2024-03-05 00:36:11 | INFO | controller | names: ['http://localhost:21002'], queue_lens: [0.0], ret: http://localhost:21002
2024-03-05 00:36:11 | INFO | stdout | INFO:     127.0.0.1:43416 - "POST /get_worker_address HTTP/1.1" 200 OK
2024-03-05 00:36:11 | INFO | stdout | INFO:     127.0.0.1:49722 - "POST /model_details HTTP/1.1" 200 OK
2024-03-05 00:36:11 | INFO | stdout | INFO:     127.0.0.1:49724 - "POST /count_token HTTP/1.1" 200 OK
2024-03-05 00:36:29 | INFO | model_worker | Send heart beat. Models: ['gpt-3.5-turbo', 'Trendyol-LLM-7b-chat-v0.1', 'text-embedding-ada-002']. Semaphore: Semaphore(value=4, locked=False). call_ct: 3. worker_id: de683719. 
2024-03-05 00:36:29 | INFO | controller | Receive heart beat. http://localhost:21002
2024-03-05 00:36:29 | INFO | stdout | INFO:     127.0.0.1:37816 - "POST /receive_heart_beat HTTP/1.1" 200 OK


{"id":"chatcmpl-3vXgoUfUaiVj5VCpqXCcQC","object":"chat.completion","created":1709598998,"model":"Trendyol-LLM-7b-chat-v0.1","choices":[{"index":0,"message":{"role":"assistant","content":"Fumio Kishida is the current Prime Minister of Japan. He was elected to this position in November 2018 and is serving his second term. Kishida is a politician and former member of the Japanese Liberal Democratic Party (LDP). He has previously held various positions in government, including Minister of Finance, Minister of the Environment, and Minister of Economics, Trade and Industry.\nKishida is known for promoting liberal economic policies, including higher taxes and higher government spending. He also advocates for greater social equity within Japan, particularly for low-income families. Kishida is often seen as a champion of Japan's renewal process, which aims to reconstruct the country's economy and society after the pandemic and the economic resiliency challenges faced in recent years.\nBesides b

2024-03-05 00:36:38 | INFO | stdout | INFO:     127.0.0.1:49736 - "POST /worker_generate HTTP/1.1" 200 OK
2024-03-05 00:36:38 | INFO | stdout | INFO:     127.0.0.1:51494 - "POST /v1/chat/completions HTTP/1.1" 200 OK
2024-03-05 00:37:14 | INFO | model_worker | Send heart beat. Models: ['gpt-3.5-turbo', 'Trendyol-LLM-7b-chat-v0.1', 'text-embedding-ada-002']. Semaphore: Semaphore(value=5, locked=False). call_ct: 3. worker_id: de683719. 
2024-03-05 00:37:14 | INFO | controller | Receive heart beat. http://localhost:21002
2024-03-05 00:37:14 | INFO | stdout | INFO:     127.0.0.1:50418 - "POST /receive_heart_beat HTTP/1.1" 200 OK
2024-03-05 00:37:56 | INFO | stdout | INFO:     127.0.0.1:44870 - "POST /list_models HTTP/1.1" 200 OK
2024-03-05 00:37:56 | INFO | stdout | INFO:     127.0.0.1:46872 - "POST /v1/embeddings?model_name=Trendyol-LLM-7b-chat-v0.1 HTTP/1.1" 400 Bad Request
2024-03-05 00:37:59 | INFO | model_worker | Send heart beat. Models: ['gpt-3.5-turbo', 'Trendyol-LLM-7b-chat-v0.1', 

In [20]:
import requests

url = "http://localhost:8000/v1/chat/completions"

headers = {
    "Content-Type": "application/json",
}

data = {
    "model": "Trendyol-LLM-7b-chat-v0.1",
    "messages": [{"role": "user", "content": "Who is Fumio Kishida?"}],
}

response = requests.post(url, json=data, headers=headers)

print(response.text)


2024-03-05 00:43:56 | INFO | stdout | INFO:     127.0.0.1:59382 - "POST /list_models HTTP/1.1" 200 OK
2024-03-05 00:43:56 | INFO | controller | names: ['http://localhost:21002'], queue_lens: [0.0], ret: http://localhost:21002
2024-03-05 00:43:56 | INFO | stdout | INFO:     127.0.0.1:59396 - "POST /get_worker_address HTTP/1.1" 200 OK
2024-03-05 00:43:56 | INFO | stdout | INFO:     127.0.0.1:53230 - "POST /model_details HTTP/1.1" 200 OK
2024-03-05 00:43:56 | INFO | stdout | INFO:     127.0.0.1:53246 - "POST /count_token HTTP/1.1" 200 OK
2024-03-05 00:43:59 | INFO | model_worker | Send heart beat. Models: ['gpt-3.5-turbo', 'Trendyol-LLM-7b-chat-v0.1', 'text-embedding-ada-002']. Semaphore: Semaphore(value=4, locked=False). call_ct: 8. worker_id: de683719. 
2024-03-05 00:43:59 | INFO | controller | Receive heart beat. http://localhost:21002
2024-03-05 00:43:59 | INFO | stdout | INFO:     127.0.0.1:39090 - "POST /receive_heart_beat HTTP/1.1" 200 OK


{"id":"chatcmpl-xFbxx6rVgGWWGXH5LKu4Rn","object":"chat.completion","created":1709599449,"model":"Trendyol-LLM-7b-chat-v0.1","choices":[{"index":0,"message":{"role":"assistant","content":"Fumio Kishida is a Japanese politician and former Prime Minister of Japan. He served as Prime Minister for two terms, from 2011 to 2016. Kishida, born in Nagoya, Japonya, was elected to the House of Representatives in 1999 and served there as a member until he became Prime Minister. As Prime Minister, Kishida was known for his focus on national unity, promoting social stability, and maintaining a strong economy. He was also known for his efforts to improve Japan's nuclear energy program. Kishida is a grandson of the late Japanese Prime Minister Yoshihide Suga who served in the late 1950s and early 1960s. Kishida, who speaks fluent English, has held various positions in leadership and public service, such as Minister of Internal Affairs (2005-2008) and Chairman of the Democratic Party (2006-2008). He is

2024-03-05 00:44:09 | INFO | stdout | INFO:     127.0.0.1:53250 - "POST /worker_generate HTTP/1.1" 200 OK
2024-03-05 00:44:09 | INFO | stdout | INFO:     127.0.0.1:43186 - "POST /v1/chat/completions HTTP/1.1" 200 OK
2024-03-05 00:44:44 | INFO | model_worker | Send heart beat. Models: ['gpt-3.5-turbo', 'Trendyol-LLM-7b-chat-v0.1', 'text-embedding-ada-002']. Semaphore: Semaphore(value=5, locked=False). call_ct: 8. worker_id: de683719. 
2024-03-05 00:44:44 | INFO | controller | Receive heart beat. http://localhost:21002
2024-03-05 00:44:44 | INFO | stdout | INFO:     127.0.0.1:56742 - "POST /receive_heart_beat HTTP/1.1" 200 OK
2024-03-05 00:45:30 | INFO | model_worker | Send heart beat. Models: ['gpt-3.5-turbo', 'Trendyol-LLM-7b-chat-v0.1', 'text-embedding-ada-002']. Semaphore: Semaphore(value=5, locked=False). call_ct: 8. worker_id: de683719. 
2024-03-05 00:45:30 | INFO | controller | Receive heart beat. http://localhost:21002
2024-03-05 00:45:30 | INFO | stdout | INFO:     127.0.0.1:448

In [31]:
import requests

url = "http://localhost:8000/api/v1/chat/completions"

headers = {
    "accept": "application/json",
    "Content-Type": "application/json",
}

data = {
    "model": "Trendyol-LLM-7b-chat-v0.1",
    "messages": "What did the president say about Ketanji Brown Jackson?",
    "temperature": 0.7,
    "top_p": 1,
    "top_k": -1,
    "n": 1,
    "max_tokens": 650,
    "stop": "string",
    "stream": False,
    "user": "string",
    "repetition_penalty": 1,
    "frequency_penalty": 0,
    "presence_penalty": 0,
}

response = requests.post(url, json=data, headers=headers)

print(response.text)


2024-03-05 00:52:09 | INFO | stdout | INFO:     127.0.0.1:40834 - "POST /list_models HTTP/1.1" 200 OK
2024-03-05 00:52:09 | INFO | controller | names: ['http://localhost:21002'], queue_lens: [0.0], ret: http://localhost:21002
2024-03-05 00:52:09 | INFO | stdout | INFO:     127.0.0.1:40848 - "POST /get_worker_address HTTP/1.1" 200 OK
2024-03-05 00:52:09 | INFO | stdout | INFO:     127.0.0.1:57856 - "POST /model_details HTTP/1.1" 200 OK
2024-03-05 00:52:09 | INFO | stdout | INFO:     127.0.0.1:57862 - "POST /count_token HTTP/1.1" 200 OK


{"id":"chatcmpl-4SVmw7mPZM8PQCgkyWSF2a","object":"chat.completion","created":1709599930,"model":"Trendyol-LLM-7b-chat-v0.1","choices":[{"index":0,"message":{"role":"assistant","content":"\nBurada, Ketanji Brown Jackson hakkında ABD Başkanı Joe Biden'ın ne söylediğini öğrenin."},"finish_reason":"stop"}],"usage":{"prompt_tokens":14,"total_tokens":40,"completion_tokens":26}}


2024-03-05 00:52:10 | INFO | stdout | INFO:     127.0.0.1:57868 - "POST /worker_generate HTTP/1.1" 200 OK
2024-03-05 00:52:10 | INFO | stdout | INFO:     127.0.0.1:54680 - "POST /api/v1/chat/completions HTTP/1.1" 200 OK
2024-03-05 00:52:15 | INFO | model_worker | Send heart beat. Models: ['gpt-3.5-turbo', 'Trendyol-LLM-7b-chat-v0.1', 'text-embedding-ada-002']. Semaphore: Semaphore(value=5, locked=False). call_ct: 19. worker_id: de683719. 
2024-03-05 00:52:15 | INFO | controller | Receive heart beat. http://localhost:21002
2024-03-05 00:52:15 | INFO | stdout | INFO:     127.0.0.1:59794 - "POST /receive_heart_beat HTTP/1.1" 200 OK
2024-03-05 00:52:48 | INFO | stdout | INFO:     127.0.0.1:57768 - "POST /list_models HTTP/1.1" 200 OK
2024-03-05 00:52:48 | INFO | controller | names: ['http://localhost:21002'], queue_lens: [0.0], ret: http://localhost:21002
2024-03-05 00:52:48 | INFO | stdout | INFO:     127.0.0.1:57774 - "POST /get_worker_address HTTP/1.1" 200 OK
2024-03-05 00:52:48 | INFO |

In [32]:
os.environ['OPENAI_API_BASE'] = 'http://localhost:8000/v1'
os.environ['OPENAI_API_KEY'] = 'EMPTY'

In [47]:
!wget https://raw.githubusercontent.com/hwchase17/langchain/v0.0.200/docs/modules/state_of_the_union.txt

--2024-03-04 19:52:15--  https://raw.githubusercontent.com/hwchase17/langchain/v0.0.200/docs/modules/state_of_the_union.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.108.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 39027 (38K) [text/plain]
Saving to: ‘state_of_the_union.txt.1’


2024-03-04 19:52:15 (20.0 MB/s) - ‘state_of_the_union.txt.1’ saved [39027/39027]


In [33]:
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import TextLoader
from langchain.embeddings import OpenAIEmbeddings
from langchain.indexes import VectorstoreIndexCreator

embedding = OpenAIEmbeddings(model="text-embedding-ada-002")
loader = TextLoader("state_of_the_union.txt")
index = VectorstoreIndexCreator(embedding=embedding).from_loaders([loader])
llm = ChatOpenAI(model="gpt-3.5-turbo")

questions = [
    "Who is the speaker",
    "What did the president say about Ketanji Brown Jackson",
    "What are the threats to America",
    "Who are mentioned in the speech",
    "Who is the vice president",
    "How many projects were announced",
]

for query in questions:
    print("Query:", query)
    print("Answer:", index.query(query, llm=llm))

  warn_deprecated(
2024-03-05 00:54:00 | INFO | stdout | INFO:     127.0.0.1:58156 - "POST /list_models HTTP/1.1" 200 OK
2024-03-05 00:54:00 | INFO | controller | names: ['http://localhost:21002'], queue_lens: [0.0], ret: http://localhost:21002
2024-03-05 00:54:00 | INFO | stdout | INFO:     127.0.0.1:58162 - "POST /get_worker_address HTTP/1.1" 200 OK
2024-03-05 00:54:00 | INFO | stdout | INFO:     127.0.0.1:51452 - "POST /worker_get_embeddings HTTP/1.1" 200 OK
2024-03-05 00:54:00 | INFO | controller | names: ['http://localhost:21002'], queue_lens: [1.0], ret: http://localhost:21002
2024-03-05 00:54:00 | INFO | stdout | INFO:     127.0.0.1:58174 - "POST /get_worker_address HTTP/1.1" 200 OK
2024-03-05 00:54:01 | INFO | stdout | INFO:     127.0.0.1:51468 - "POST /worker_get_embeddings HTTP/1.1" 200 OK
2024-03-05 00:54:01 | INFO | controller | names: ['http://localhost:21002'], queue_lens: [2.0], ret: http://localhost:21002
2024-03-05 00:54:01 | INFO | stdout | INFO:     127.0.0.1:58186 -

Query: Who is the speaker


2024-03-05 00:54:03 | INFO | stdout | INFO:     127.0.0.1:51554 - "POST /worker_get_embeddings HTTP/1.1" 200 OK
2024-03-05 00:54:03 | INFO | stdout | INFO:     127.0.0.1:39058 - "POST /v1/embeddings HTTP/1.1" 200 OK
2024-03-05 00:54:03 | INFO | stdout | INFO:     127.0.0.1:58278 - "POST /list_models HTTP/1.1" 200 OK
2024-03-05 00:54:03 | INFO | controller | names: ['http://localhost:21002'], queue_lens: [12.0], ret: http://localhost:21002
2024-03-05 00:54:03 | INFO | stdout | INFO:     127.0.0.1:58292 - "POST /get_worker_address HTTP/1.1" 200 OK
2024-03-05 00:54:03 | INFO | stdout | INFO:     127.0.0.1:51564 - "POST /worker_get_conv_template HTTP/1.1" 200 OK
2024-03-05 00:54:03 | INFO | stdout | INFO:     127.0.0.1:51576 - "POST /model_details HTTP/1.1" 200 OK
2024-03-05 00:54:03 | INFO | stdout | INFO:     127.0.0.1:51584 - "POST /count_token HTTP/1.1" 200 OK
2024-03-05 00:54:05 | INFO | stdout | INFO:     127.0.0.1:51596 - "POST /worker_generate HTTP/1.1" 200 OK
2024-03-05 00:54:05 |

Answer: The speaker is Nancy Pelosi, the Speaker of the House of Representatives in the United States.
Query: What did the president say about Ketanji Brown Jackson


2024-03-05 00:54:10 | INFO | stdout | INFO:     127.0.0.1:51634 - "POST /worker_generate HTTP/1.1" 200 OK
2024-03-05 00:54:10 | INFO | stdout | INFO:     127.0.0.1:39066 - "POST /v1/chat/completions HTTP/1.1" 200 OK
2024-03-05 00:54:10 | INFO | stdout | INFO:     127.0.0.1:37156 - "POST /list_models HTTP/1.1" 200 OK
2024-03-05 00:54:10 | INFO | controller | names: ['http://localhost:21002'], queue_lens: [15.0], ret: http://localhost:21002
2024-03-05 00:54:10 | INFO | stdout | INFO:     127.0.0.1:37162 - "POST /get_worker_address HTTP/1.1" 200 OK
2024-03-05 00:54:10 | INFO | stdout | INFO:     127.0.0.1:38506 - "POST /worker_get_embeddings HTTP/1.1" 200 OK
2024-03-05 00:54:10 | INFO | stdout | INFO:     127.0.0.1:39058 - "POST /v1/embeddings HTTP/1.1" 200 OK
2024-03-05 00:54:10 | INFO | stdout | INFO:     127.0.0.1:37166 - "POST /list_models HTTP/1.1" 200 OK
2024-03-05 00:54:10 | INFO | controller | names: ['http://localhost:21002'], queue_lens: [16.0], ret: http://localhost:21002
2024-

Answer: The President, Ketanji Brown Jackson's appointment to the Supreme Court as the first ever African American woman to appear on the Supreme Court in American history. He called it a moment of history and a milestone for the country. The appointment was also presented as a sign of the country's progress towards equality and adherence to the founding principles of the United States.
Query: What are the threats to America


2024-03-05 00:54:30 | INFO | model_worker | Send heart beat. Models: ['gpt-3.5-turbo', 'Trendyol-LLM-7b-chat-v0.1', 'text-embedding-ada-002']. Semaphore: Semaphore(value=4, locked=False). call_ct: 37. worker_id: de683719. 
2024-03-05 00:54:30 | INFO | controller | Receive heart beat. http://localhost:21002
2024-03-05 00:54:30 | INFO | stdout | INFO:     127.0.0.1:51276 - "POST /receive_heart_beat HTTP/1.1" 200 OK
2024-03-05 00:54:40 | INFO | stdout | INFO:     127.0.0.1:38534 - "POST /worker_generate HTTP/1.1" 200 OK
2024-03-05 00:54:40 | INFO | stdout | INFO:     127.0.0.1:39066 - "POST /v1/chat/completions HTTP/1.1" 200 OK
2024-03-05 00:54:40 | INFO | stdout | INFO:     127.0.0.1:55060 - "POST /list_models HTTP/1.1" 200 OK
2024-03-05 00:54:40 | INFO | controller | names: ['http://localhost:21002'], queue_lens: [0.0], ret: http://localhost:21002
2024-03-05 00:54:40 | INFO | stdout | INFO:     127.0.0.1:55072 - "POST /get_worker_address HTTP/1.1" 200 OK
2024-03-05 00:54:40 | INFO | std

Answer: The threats to America are significant and diverse. They include:
1. Global terrorism: Terrorist groups such as al-Qaeda and ISIS pose a threat to the United States and our allies. They have committed acts of terrorism around the world, including attacks on civilian targets, businesses, and civilian airports.
2. Cyber attacks: The growing threat of cyber attacks includes hacking, phishing, and other attacks on critical infrastructure, financial systems, and private companies. These attacks pose risks to national security, economic stability, and public confidence in our systems.
3. Coronavirus pandemic: COVID-19 has caused severe health and economic challenges. It has caused significant deaths and disturbances, disrupted education and economic activity, and resulted in a global economic crisis.
4. Political divisions: The increasing political divisions in the United States pose a threat to our stability and unity. It has led to a rise in partisanship, polarization, and even vio

2024-03-05 00:54:49 | INFO | stdout | INFO:     127.0.0.1:41516 - "POST /worker_generate HTTP/1.1" 200 OK
2024-03-05 00:54:49 | INFO | stdout | INFO:     127.0.0.1:39066 - "POST /v1/chat/completions HTTP/1.1" 200 OK
2024-03-05 00:54:49 | INFO | stdout | INFO:     127.0.0.1:56602 - "POST /list_models HTTP/1.1" 200 OK
2024-03-05 00:54:49 | INFO | controller | names: ['http://localhost:21002'], queue_lens: [2.0], ret: http://localhost:21002
2024-03-05 00:54:49 | INFO | stdout | INFO:     127.0.0.1:56612 - "POST /get_worker_address HTTP/1.1" 200 OK
2024-03-05 00:54:49 | INFO | stdout | INFO:     127.0.0.1:46660 - "POST /worker_get_embeddings HTTP/1.1" 200 OK
2024-03-05 00:54:49 | INFO | stdout | INFO:     127.0.0.1:32774 - "POST /v1/embeddings HTTP/1.1" 200 OK
2024-03-05 00:54:49 | INFO | stdout | INFO:     127.0.0.1:56616 - "POST /list_models HTTP/1.1" 200 OK
2024-03-05 00:54:49 | INFO | controller | names: ['http://localhost:21002'], queue_lens: [3.0], ret: http://localhost:21002
2024-03

Answer: In the speech, there are several individuals mentioned who are important to President Zelenskyy and the Ukrainian people. They are:
1. President Zelenskyy
2. First Lady Olena Zelenska
3. Deputy Prime Minister Mykhailo Fedorov
4. Chairman of the National Güvenlik & Defense Council Oleksii Reznikov
5. Minister of Defense Oleksii Reznikov
6. President of the National Bank of Ukraine Vadym Boychenko
7. President of the National Energy Company Nuclear Energy Company
8. President of the National Rail Transportation Company Ukrtransnational
9. President of the National Telecom Company Ukrtelecom
10. President of the National Atomic Energy Corporation Energoatom
Query: Who is the vice president


2024-03-05 00:54:52 | INFO | stdout | INFO:     127.0.0.1:46686 - "POST /worker_generate HTTP/1.1" 200 OK
2024-03-05 00:54:52 | INFO | stdout | INFO:     127.0.0.1:39066 - "POST /v1/chat/completions HTTP/1.1" 200 OK
2024-03-05 00:54:52 | INFO | stdout | INFO:     127.0.0.1:56642 - "POST /list_models HTTP/1.1" 200 OK
2024-03-05 00:54:52 | INFO | controller | names: ['http://localhost:21002'], queue_lens: [4.0], ret: http://localhost:21002
2024-03-05 00:54:52 | INFO | stdout | INFO:     127.0.0.1:56650 - "POST /get_worker_address HTTP/1.1" 200 OK
2024-03-05 00:54:52 | INFO | stdout | INFO:     127.0.0.1:46692 - "POST /worker_get_embeddings HTTP/1.1" 200 OK
2024-03-05 00:54:52 | INFO | stdout | INFO:     127.0.0.1:32774 - "POST /v1/embeddings HTTP/1.1" 200 OK
2024-03-05 00:54:52 | INFO | stdout | INFO:     127.0.0.1:56660 - "POST /list_models HTTP/1.1" 200 OK
2024-03-05 00:54:52 | INFO | controller | names: ['http://localhost:21002'], queue_lens: [5.0], ret: http://localhost:21002
2024-03

Answer: The current president of the United States is Joe Biden. The current president of the United States is Joe Biden. The current president of the United States is Joe Biden.
Query: How many projects were announced
Answer: Over 400 projects have been announced in the past year. These projects are expected to create or retain over 750,000 jobs and save taxpayers on trillions of dollars.


2024-03-05 00:54:54 | INFO | stdout | INFO:     127.0.0.1:46714 - "POST /worker_generate HTTP/1.1" 200 OK
2024-03-05 00:54:54 | INFO | stdout | INFO:     127.0.0.1:39066 - "POST /v1/chat/completions HTTP/1.1" 200 OK


In [34]:
print("Answer:", index.query("Where is San Diego?", llm=llm))

2024-03-05 00:54:54 | INFO | stdout | INFO:     127.0.0.1:56674 - "POST /list_models HTTP/1.1" 200 OK
2024-03-05 00:54:54 | INFO | controller | names: ['http://localhost:21002'], queue_lens: [6.0], ret: http://localhost:21002
2024-03-05 00:54:54 | INFO | stdout | INFO:     127.0.0.1:56684 - "POST /get_worker_address HTTP/1.1" 200 OK
2024-03-05 00:54:54 | INFO | stdout | INFO:     127.0.0.1:46724 - "POST /worker_get_embeddings HTTP/1.1" 200 OK
2024-03-05 00:54:54 | INFO | stdout | INFO:     127.0.0.1:32774 - "POST /v1/embeddings HTTP/1.1" 200 OK
2024-03-05 00:54:54 | INFO | stdout | INFO:     127.0.0.1:56692 - "POST /list_models HTTP/1.1" 200 OK
2024-03-05 00:54:54 | INFO | controller | names: ['http://localhost:21002'], queue_lens: [7.0], ret: http://localhost:21002
2024-03-05 00:54:54 | INFO | stdout | INFO:     127.0.0.1:56700 - "POST /get_worker_address HTTP/1.1" 200 OK
2024-03-05 00:54:54 | INFO | stdout | INFO:     127.0.0.1:46730 - "POST /model_details HTTP/1.1" 200 OK
2024-03-05

Answer: San Diego, California, United States. It is located in the western part of the state, on the Pacific Coast.


2024-03-05 00:54:56 | INFO | stdout | INFO:     127.0.0.1:46748 - "POST /worker_generate HTTP/1.1" 200 OK
2024-03-05 00:54:56 | INFO | stdout | INFO:     127.0.0.1:39066 - "POST /v1/chat/completions HTTP/1.1" 200 OK


In [51]:
print("Answer:", index.query("I want to go to San Diego from St. George Utah give me a plan", llm=llm))

2024-03-04 20:06:17 | INFO | stdout | INFO:     127.0.0.1:41920 - "POST /list_models HTTP/1.1" 200 OK
2024-03-04 20:06:17 | INFO | controller | names: ['http://localhost:21002'], queue_lens: [0.0], ret: http://localhost:21002
2024-03-04 20:06:17 | INFO | stdout | INFO:     127.0.0.1:41930 - "POST /get_worker_address HTTP/1.1" 200 OK
2024-03-04 20:06:17 | INFO | stdout | INFO:     127.0.0.1:43510 - "POST /worker_get_embeddings HTTP/1.1" 200 OK
2024-03-04 20:06:17 | INFO | stdout | INFO:     127.0.0.1:48094 - "POST /v1/embeddings HTTP/1.1" 200 OK
2024-03-04 20:06:17 | INFO | stdout | INFO:     127.0.0.1:41936 - "POST /list_models HTTP/1.1" 200 OK
2024-03-04 20:06:17 | INFO | controller | names: ['http://localhost:21002'], queue_lens: [1.0], ret: http://localhost:21002
2024-03-04 20:06:17 | INFO | stdout | INFO:     127.0.0.1:41942 - "POST /get_worker_address HTTP/1.1" 200 OK
2024-03-04 20:06:17 | INFO | stdout | INFO:     127.0.0.1:43514 - "POST /model_details HTTP/1.1" 200 OK
2024-03-04

Answer: Here's a plan to get to San Diego from St. George Utah. If you leave at 21:00, you can reach San Diego by 00:00 AM. It's a 4-hour drive, which means that you'd arrive at 0.0:00 AM.
```


2024-03-04 20:06:21 | INFO | stdout | INFO:     127.0.0.1:43528 - "POST /worker_generate HTTP/1.1" 200 OK
2024-03-04 20:06:21 | INFO | stdout | INFO:     127.0.0.1:48104 - "POST /v1/chat/completions HTTP/1.1" 200 OK
2024-03-04 20:06:49 | INFO | model_worker | Send heart beat. Models: ['gpt-3.5-turbo', 'text-davinci-003', 'text-embedding-ada-002']. Semaphore: Semaphore(value=5, locked=False). call_ct: 39. worker_id: 69061595. 
2024-03-04 20:06:49 | INFO | controller | Receive heart beat. http://localhost:21002
2024-03-04 20:06:49 | INFO | stdout | INFO:     127.0.0.1:52762 - "POST /receive_heart_beat HTTP/1.1" 200 OK


In [54]:
print("Answer:", index.query("Who is narendra modi?", llm=llm))

2024-03-04 20:08:50 | INFO | stdout | INFO:     127.0.0.1:41298 - "POST /list_models HTTP/1.1" 200 OK
2024-03-04 20:08:50 | INFO | controller | names: ['http://localhost:21002'], queue_lens: [0.0], ret: http://localhost:21002
2024-03-04 20:08:50 | INFO | stdout | INFO:     127.0.0.1:41312 - "POST /get_worker_address HTTP/1.1" 200 OK
2024-03-04 20:08:50 | INFO | stdout | INFO:     127.0.0.1:52280 - "POST /worker_get_embeddings HTTP/1.1" 200 OK
2024-03-04 20:08:50 | INFO | stdout | INFO:     127.0.0.1:40064 - "POST /v1/embeddings HTTP/1.1" 200 OK
2024-03-04 20:08:50 | INFO | stdout | INFO:     127.0.0.1:41324 - "POST /list_models HTTP/1.1" 200 OK
2024-03-04 20:08:50 | INFO | controller | names: ['http://localhost:21002'], queue_lens: [1.0], ret: http://localhost:21002
2024-03-04 20:08:50 | INFO | stdout | INFO:     127.0.0.1:41336 - "POST /get_worker_address HTTP/1.1" 200 OK
2024-03-04 20:08:50 | INFO | stdout | INFO:     127.0.0.1:52290 - "POST /model_details HTTP/1.1" 200 OK
2024-03-04

Answer: Narendra Modi is an Indian politician and the current Prime Minister of India. He was born in Gujarat, India in 1947 and started his career in business. In 2015, he became the leader of the Bhartiya Janata Party (BJP), which is a right-wing political party in India. In 2018, he was elected as the Prime Minister and he has served in this pozisyon since then. Modi has been known for his pro-business policies and economic reforms, as well as his efforts to improve public services and reduce corruption in India. He is considered a popular leader in India and has a significant political influence.


2024-03-04 20:08:58 | INFO | stdout | INFO:     127.0.0.1:52300 - "POST /worker_generate HTTP/1.1" 200 OK
2024-03-04 20:08:58 | INFO | stdout | INFO:     127.0.0.1:40066 - "POST /v1/chat/completions HTTP/1.1" 200 OK
2024-03-04 20:09:04 | INFO | model_worker | Send heart beat. Models: ['gpt-3.5-turbo', 'text-davinci-003', 'text-embedding-ada-002']. Semaphore: Semaphore(value=5, locked=False). call_ct: 45. worker_id: 69061595. 
2024-03-04 20:09:04 | INFO | controller | Receive heart beat. http://localhost:21002
2024-03-04 20:09:04 | INFO | stdout | INFO:     127.0.0.1:40102 - "POST /receive_heart_beat HTTP/1.1" 200 OK
2024-03-04 20:09:49 | INFO | model_worker | Send heart beat. Models: ['gpt-3.5-turbo', 'text-davinci-003', 'text-embedding-ada-002']. Semaphore: Semaphore(value=5, locked=False). call_ct: 45. worker_id: 69061595. 
2024-03-04 20:09:49 | INFO | controller | Receive heart beat. http://localhost:21002
2024-03-04 20:09:49 | INFO | stdout | INFO:     127.0.0.1:36498 - "POST /rece