# Test Jan Server

In [6]:
import requests
import json
import time
import GPUtil

def send_request(payload):
    """Sends a POST request to the server and prints the response, including detailed GPU utilization metrics."""
    url = "http://localhost:1337/v1/chat/completions"
    headers = {
        "Content-Type": "application/json",
        "Accept": "application/json"
    }
    start_time = time.time()

    # Get initial GPU utilization
    gpus_before = GPUtil.getGPUs()
    gpu_load_before = [gpu.load for gpu in gpus_before]

    try:
        response = requests.post(url, headers=headers, json=payload)
        end_time = time.time()
        total_time = end_time - start_time

        # Get GPU utilization after the request
        gpus_after = GPUtil.getGPUs()
        gpu_load_after = [gpu.load for gpu in gpus_after]

        if response.status_code == 200:
            result = response.json()
            answer = result['choices'][0]['message']['content']
            token_count = result['usage']['total_tokens']
            avg_token_speed = token_count / total_time if total_time > 0 else 0

            # Calculate percentage change in GPU load for each GPU
            gpu_load_change = [(after - before) * 100 for before, after in zip(gpu_load_before, gpu_load_after)]

            print("Answer:", answer)
            print(f"Token Number: {token_count}")
            print(f"Avg Token Speed: {avg_token_speed:.2f} tokens/秒")
            print(f"Cost time: {total_time:.2f} 秒")
            for i, (before, after, change) in enumerate(zip(gpu_load_before, gpu_load_after, gpu_load_change)):
                print(f"GPU {i} Load Before: {before * 100:.2f}%")
                print(f"GPU {i} Load After: {after * 100:.2f}%")
                print(f"GPU {i} Load Change: {change:.2f}%")
        else:
            print(f"Request Error,Code: {response.status_code}, Info: {response.text}")

    except requests.exceptions.RequestException as e:
        print(f"Request Error,Info: {str(e)}")




# Simple Question

In [9]:
# Define the simple query payload
simple_payload = {
    "model": "gemma-2-2b-it",  # Change model here to "llama3.1-8b-instruct" if needed
    "messages": [{"role": "user", "content": "How many states are there in the United States? List them"}],
    "temperature": 0.5
}

# Example of how to use the function
print("Sending a simple query in Gemma:")
send_request(simple_payload)

Sending a simple query in Gemma:
Answer: There are **50** states in the United States of America. 

Here they are in alphabetical order:

1. Alabama
2. Alaska
3. Arizona
4. Arkansas
5. California
6. Colorado
7. Connecticut
8. Delaware
9. Florida
10. Georgia
11. Hawaii
12. Idaho
13. Illinois
14. Indiana
15. Iowa
16. Kansas
17. Kentucky
18. Louisiana
19. Maine
20. Maryland
21. Massachusetts
22. Michigan
23. Minnesota
24. Mississippi
25. Missouri
26. Montana
27. Nebraska
28. Nevada
29. New Hampshire
30. New Jersey
31. New Mexico
32. New York
33. North Carolina
34. North Dakota
35. Ohio
36. Oklahoma
37. Oregon
38. Pennsylvania
39. Rhode Island
40. South Carolina
41. South Dakota
42. Tennessee
43. Texas
44. Utah
45. Vermont
46. Virginia
47. Washington
48. West Virginia
49. Wisconsin
50. Wyoming 
<end_of_turn>
Token Number: 303
Avg Token Speed: 47.45 tokens/秒
Cost time: 6.39 秒
GPU 0 Load Before: 9.00%
GPU 0 Load After: 83.00%
GPU 0 Load Change: 74.00%


In [8]:
simple_payload = {
    "model": "llama3.1-8b-instruct", 
    "messages": [{"role": "user", "content": "How many states are there in the United States? List them"}],
    "temperature": 0.5
}
# Example of how to use the function
print("Sending a simple query in LLaMa 3.1:")
send_request(simple_payload)

Sending a simple query in LLaMa 3.1:
Answer: There are currently 50 states in the United States of America. Here is the list of all 50 states, in alphabetical order:

1. Alabama (AL)
2. Alaska (AK)
3. Arizona (AZ)
4. Arkansas (AR)
5. California (CA)
6. Colorado (CO)
7. Connecticut (CT)
8. Delaware (DE)
9. Florida (FL)
10. Georgia (GA)
11. Hawaii (HI)
12. Idaho (ID)
13. Illinois (IL)
14. Indiana (IN)
15. Iowa (IA)
16. Kansas (KS)
17. Kentucky (KY)
18. Louisiana (LA)
19. Maine (ME)
20. Maryland (MD)
21. Massachusetts (MA)
22. Michigan (MI)
23. Minnesota (MN)
24. Mississippi (MS)
25. Missouri (MO)
26. Montana (MT)
27. Nebraska (NE)
28. Nevada (NV)
29. New Hampshire (NH)
30. New Jersey (NJ)
31. New Mexico (NM)
32. New York (NY)
33. North Carolina (NC)
34. North Dakota (ND)
35. Ohio (OH)
36. Oklahoma (OK)
37. Oregon (OR)
38. Pennsylvania (PA)
39. Rhode Island (RI)
40. South Carolina (SC)
41. South Dakota (SD)
42. Tennessee (TN)
43. Texas (TX)
44. Utah (UT)
45. Vermont (VT)
46. Virginia (VA)

# Complex Question

In [10]:
multi_turn_payload = {
    "model": "gemma-2-2b-it",
    "messages": [
        {"role": "system", "content": "You are an assistant capable of discussing complex topics."},
        {"role": "user", "content": "Explain quantum entanglement."},
        {"role": "system", "content": "Quantum entanglement is a physical phenomenon..."},
        {"role": "user", "content": "How can it be used in computing?"}
    ],
    "temperature": 0.7
}

# Example of how to use the function
print("\nSending a complex, multi-turn conversation in Gemma:")
send_request(multi_turn_payload)


Sending a complex, multi-turn conversation in Gemma:
Answer: Quantum entanglement is a mind-bending phenomenon in quantum mechanics where two or more particles are intrinsically linked, even when separated by vast distances. This link, often described as "spooky action at a distance" by Einstein, means that measuring the state of one entangled particle instantly affects the state of the other, regardless of the distance between them. 

Here's a breakdown of how it's being harnessed in computing:

**The Power of Entanglement in Quantum Computing:**

1. **Superposition and Entanglement in Quantum Computers:**  
   - **Superposition:** In the quantum realm, particles can exist in multiple states simultaneously. This is unlike classical bits, which are either 0 or 1. 
   - **Entanglement:**  Entangled particles share a special connection.  When you measure one entangled particle, you instantly know the state of the other, even if they're light-years apart. 
   -  This allows quantum compu

In [13]:
multi_turn_payload = {
    "model": "llama3.1-8b-instruct",
    "messages": [
        {"role": "system", "content": "You are an assistant capable of discussing complex topics."},
        {"role": "user", "content": "Explain quantum entanglement."},
        {"role": "system", "content": "Quantum entanglement is a physical phenomenon..."},
        {"role": "user", "content": "How can it be used in computing?"}
    ],
    "temperature": 0.7
}

# Example of how to use the function
print("\nSending a complex, multi-turn conversation in LLaMa:")
send_request(multi_turn_payload)


Sending a complex, multi-turn conversation in LLaMa:
Answer: Quantum entanglement is a fundamental concept in quantum mechanics that describes the interconnectedness of two or more particles in a way that their properties, such as spin or momentum, become correlated. This means that the state of one particle can be instantaneously affected by the state of the other, regardless of the distance between them.

In computing, entanglement can be used in several ways:

1. **Quantum parallelism**: Entangled particles can be used to perform many calculations simultaneously, allowing for faster computation and solving complex problems that are intractable classically. This is the idea behind quantum parallelism, which can be used for tasks such as factoring large numbers, searching an unsorted database, and simulating quantum systems.
2. **Quantum teleportation**: Entanglement can be used to transfer information from one particle to another without physical movement, allowing for secure commun

# Multiple Parameter

In [14]:
# Define a base payload
base_payload = {
    "model": "llama3.1-8b-instruct",
    "messages": [{"role": "user", "content": "Compose a poem about autumn."}],
}

# Parameters to test
parameter_variations = [
    {"temperature": 0.5, "top_p": 0.9, "frequency_penalty": 0, "presence_penalty": 0},
    {"temperature": 1.0, "top_p": 0.9, "frequency_penalty": 0.5, "presence_penalty": 0.5},
    {"temperature": 0.8, "top_p": 0.8, "frequency_penalty": 0.1, "presence_penalty": 0.1},
    {"temperature": 0.3, "top_p": 1.0, "frequency_penalty": 0, "presence_penalty": 0}
]


# Execute tests with different parameter sets
for params in parameter_variations:
    # Update base payload with new parameters
    payload = {**base_payload, **params}
    print(f"\nTesting with parameters: {params}")
    send_request(payload)



Testing with parameters: {'temperature': 0.5, 'top_p': 0.9, 'frequency_penalty': 0, 'presence_penalty': 0}
Answer: As summer's warmth begins to fade,
A fiery glow across the shade,
Autumn's palette, vibrant and bright,
Unfolds its splendor, a wondrous sight.

The trees, like sentinels of old,
Don coats of gold, their leaves to hold,
Crimson, amber, and sienna hues,
A final dance, before winter's muse.

The air is crisp, the winds do blow,
Carrying scents of woodsmoke and decay,
As nature's final harvest comes to pass,
And earthy aromas fill the grass.

The sun, a burning ember, sets,
 Casting long shadows, where the trees are set,
Their branches etched, against the sky,
A fleeting beauty, as the seasons sigh.

The forest floor, a carpet deep,
A crunchy softness, where leaves do creep,
Golden, rustling, and softly fall,
A soothing melody, that echoes through it all.

Autumn's beauty, a bittersweet refrain,
A time of transition, a season's pain,
Yet, in its melancholy, we find peace,
A 