# Competition

The score in this homework is counted as extra credits for CEM II Stats portion.

Anyone who complete the competition will be ranked. The higher ranking you get, the more extra credits you will recieve. The minimum amount of extra credits is 2 point, and the maximum is 4 points.

**The winner will also recieve a physical grand prize from your dearest boss, Hamtaro!**

Hamtaro wants to tweak his e-commerce website for his one-time Super Father's Day Grand Sale Extravaganza which starts on 04/12/24 until 10/12/14. He has 6 possible tweaks.

*   Tweak 1: Display price in THB or Hamtaro coins
*   Tweak 2: Buy button at the bottom or at the top of the page
*   Tweak 3: Checkout process requires sign in or not
*   Tweak 4: Checkout process has orange button or grey button
*   Tweak 5: Checkout process has music or not
*   Tweak 6: Checkout process has final confirmation or not

Figure out which combination of tweaks should be used via multi-arm bandit.

In other words, choose the best combination of six binary settings that will optimize the conversion.

Note: the information for Tweak 1-6 is just flavor text. You should not use this information to help solve the problem.

In [None]:
import requests
import json
import random
from tqdm import tqdm

url = 'http://35.247.177.10/'

# Instruction

There is a 6-arm bandit (many arms can be toggled at once). The bandit will return only one binary reward for each pull, which is zero or one.
In other words, the returning reward depends on the arms being pulled and there are $2^6$ combinations in total.

You will receive an account that can pulls **1000** times.
Your goal is to maximize the cumulative reward of the **last 500 pulls**. In order to be eligible for the competition you must pull at least 501 times.

Note: the arm probability for each person is randomly assigned and will be different for each student.

#Pull arm

Send a **POST** request to http://35.247.177.10/update_state. \
You have to pass 2 attributes in the request json


*   times: number of pulls (positive interger). If `times` > 1, the same pulling policy, which is defined in `arms`, will be used. This variable is designed to reduce the number requests.
*   arms: list of arms that you want to configure (maximum length = 6, number of arm between = [1,6])


You can get the token from mycourseville. Go to the CEM2 course page and look at your portfollio. **Your authorization token is the concatenation of two scores, `<token1><token2>`**. Each token is a five digit number.

**!! Don't forget to pass your token as string in headers of request !!**

In [None]:
import requests
import json

# In this example call, you are showing 10 customers
#    with tweaks 1, 2, 3, 4, and 5 as yes
#    and tweak 6 is no.

your_token = '' # enter a ten character string which is the
                # concatenation of the two five digit tokens given to you.
                # if your token is less than five digits, pad zero in the front.

r = requests.post(f'{url}/update_state',
                  json={"times":10, "arms":[1,2,3,4,5]},
                  headers= {"Authorization": your_token})

In [None]:
r.content

b'{"limit_reach":false,"request_reward":[0,1,0,1,0,0,1,0,1,0],"status":200}\n'

In [None]:
json.loads(r.content)

{'limit_reach': False,
 'request_reward': [0, 1, 0, 1, 0, 0, 1, 0, 1, 0],
 'status': 200}

**Response** \
You will receive JSON object in response content. \
\
In JSON,

*   limit_reach: True if your account reach 1000 times of pulling, False otherwise.
*   request_reward: list of reward according to your request. The $i^{th}$ element indicates the reward of $i^{th}$ pull. The size of this list is equal to the variable `times`.


#Get the state of your account

In case that you forget to memorize the historical rewards, you can use our API to access to all of the rewards you have made.

Send **GET** request to http://35.247.177.10/get_state. \

**!! Don't forget to pass your token as string in headers of request !!**

In [None]:
r = requests.get(f'{url}/get_state',
                  headers= {"Authorization": your_token})

In [None]:
json.loads(r.content)

{'state': {'count': 10,
  'cumulative_reward': 4,
  'reward_list': [0, 1, 0, 1, 0, 0, 1, 0, 1, 0]},
 'status': 200}

**Response** \
You will receive JSON object in response content. \
\
In JSON, your account state is inside state attribute

*   count: number of pulls that you already used.
*   cumulative_reward: summation of received reward from the first request upto the last one.
*   reward_list: full history of received reward. The $i^{th}$ element indicates the reward of $i^{th}$ pull. This is an extend version of `request_reward` you received from using update API.

# Beware of the limit
After you reach the number of allowed requests, the API will not return any additional rewards.

In [None]:
r = requests.post(f'{url}/update_state',
                  json={"times":10, "arms":[1,2,3,4,5]},
                  headers= {"Authorization": your_token})

In [None]:
json.loads(r.content)

{'limit_reach': False,
 'request_reward': [0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
 'status': 200}

However, you can still get your **whole** historialcal rewards from `get_state` API.

Your token can pull the bandit's arm for exactly `1000` times.

In [None]:
r = requests.get(f'{url}/get_state',
                  headers= {"Authorization": your_token})

In [None]:
json.loads(r.content)

{'state': {'count': 20,
  'cumulative_reward': 5,
  'reward_list': [1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0]},
 'status': 200}

# Generate Class

In [None]:
class Tweak:
  def __init__(self, number):
    self.tweak = number
    self.on = dict()
    self.off = dict()
    self.on['n'],self.on['win'] = 0,0
    self.off['n'],self.off['win']= 0,0

  def update(self, n, win):
    self.on['n'] = self.on['n'] + n
    self.on['win'] = self.on['win'] + win

  def calculate_prob(self):
    prob_on = self.on['win'] / self.on['n'] if self.on['n'] > 0 else 0
    prob_off = self.off['win'] / self.off['n'] if self.off['n'] > 0 else 0
    return (prob_on, prob_off)

In [None]:
samples = [] # used for store random of each combination

In [None]:
def generate_sample():
  s = []
  c = dict()
  for i in tqdm(range(30)):
    round = []
    for j in range(1,7):
      choose = random.choice([0,1])
      if choose:
        round.append(j)
    t = tuple(round)
    if t in c:
      while t in c:
        round = []
        for j in range(1,7):
          choose = random.choice([0,1])
          if choose:
            round.append(j)
        t = tuple(round)
    else:
      c[t] = 1
    s.append(round)
  return s

def recheck(l):
  c = dict()
  s = False
  for sample in l:
    key = tuple(sample)
    if key in c:
      print(f'ซ้ำจาก list: {l} โดยมีค่้าที่ซ้ำเป็น: {key}')
      s = True
      break
    else:
      c[key] = 1
  if s:
    return True
  else:
    return False

In [None]:
count = 0
samples = generate_sample()
while recheck(samples):
  count += 1
  # print(f"รอบที่ {count}: เจอตัวซ้ำ! -> {samples}") # ดูว่าค่าที่สุ่มมาคืออะไร

  if count > 10: # กัน Infinity Loop ระหว่างเทส
      print("ระบบหยุดฉุกเฉิน: วนลูปเกินกำหนด")
      break
  samples = generate_sample()


100%|██████████| 30/30 [00:00<00:00, 119951.50it/s]


ซ้ำจาก list: [[3, 4, 5, 6], [2, 6], [2], [1, 5], [1, 5, 6], [1, 2, 5, 6], [3, 5, 6], [2, 3, 5], [2, 4, 5, 6], [1, 4], [1, 3, 4, 6], [2, 3, 5, 6], [2, 4, 5], [1, 6], [2, 3, 4, 5, 6], [2, 5, 6], [2, 4, 5, 6], [1, 3, 4], [1, 2, 4, 5, 6], [6], [1, 3], [3, 6], [1], [2, 5], [1, 4, 5, 6], [1, 2], [1, 4, 6], [1, 4, 5], [1, 2, 3, 6], [1, 2, 4, 5]] โดยมีค่้าที่ซ้ำเป็น: (2, 4, 5, 6)


100%|██████████| 30/30 [00:00<00:00, 96199.63it/s]


ซ้ำจาก list: [[1, 5], [1, 4, 5, 6], [4, 5], [1, 2, 3, 4], [1, 2, 3, 5, 6], [2, 3, 4], [1, 3, 4], [1, 3, 4, 6], [1, 3, 6], [3, 4], [3], [2, 4], [3, 6], [3, 4, 5], [1, 2, 5], [2, 3, 5], [1, 4], [], [1, 2, 4, 6], [1, 4, 6], [2, 5, 6], [1, 2, 4], [2, 6], [1, 2], [2, 3, 5, 6], [2], [2, 4, 6], [1, 2, 3, 4, 5], [1, 2, 4], []] โดยมีค่้าที่ซ้ำเป็น: (1, 2, 4)


100%|██████████| 30/30 [00:00<00:00, 75437.12it/s]


ซ้ำจาก list: [[1, 2, 3, 5, 6], [4, 5, 6], [1, 2, 3, 4], [1, 3, 5], [1, 2, 3, 6], [3, 4], [1, 2, 4, 5, 6], [1, 3, 4, 5], [1, 2, 4, 5], [1, 4, 6], [1, 2, 6], [2, 3, 5, 6], [1, 2, 5], [2, 5, 6], [5], [2, 3], [2, 3, 4, 5, 6], [2, 4, 6], [1, 4, 5], [1, 3, 4, 6], [2, 3, 4, 5], [2, 3, 4, 5], [1], [2, 3, 5], [], [1, 3, 6], [2, 3, 6], [6], [1, 5], [1, 3, 4]] โดยมีค่้าที่ซ้ำเป็น: (2, 3, 4, 5)


100%|██████████| 30/30 [00:00<00:00, 110183.12it/s]


ซ้ำจาก list: [[3, 4, 5], [2, 3, 5, 6], [1, 4, 5, 6], [1, 2, 5, 6], [1, 2, 3, 4, 5, 6], [2, 5, 6], [1, 5], [1, 3, 4], [2], [1, 3], [1, 2, 4, 5, 6], [3, 4, 6], [1, 3, 6], [2, 4, 5, 6], [], [1, 3, 4, 5], [1, 2, 3, 5, 6], [4, 6], [1, 2, 5], [1, 2, 4], [1], [1, 3, 4, 6], [3], [4, 5], [1, 6], [6], [2, 3, 4], [1, 6], [5, 6], [1, 2, 4, 5]] โดยมีค่้าที่ซ้ำเป็น: (1, 6)


100%|██████████| 30/30 [00:00<00:00, 50840.05it/s]


ซ้ำจาก list: [[5, 6], [2, 4, 6], [1, 2, 3, 4, 5], [1, 3, 4, 5, 6], [2, 4, 5], [1], [2, 4], [2, 4], [4, 5], [1, 2, 6], [1, 2, 3, 6], [2, 5], [3, 4, 5], [2, 3, 5, 6], [1, 3], [2, 6], [2, 3, 5, 6], [1, 2, 4], [1, 2, 3, 5, 6], [4], [1, 2, 4, 5, 6], [2, 3, 4, 5, 6], [3, 5], [1, 2, 3, 5, 6], [1, 2, 4, 6], [2, 4, 5, 6], [1, 4, 5], [1, 3, 6], [1, 2, 5, 6], [1, 2, 4, 5]] โดยมีค่้าที่ซ้ำเป็น: (2, 4)


100%|██████████| 30/30 [00:00<00:00, 86243.40it/s]


ซ้ำจาก list: [[1, 4, 5, 6], [1, 2], [3, 4], [1, 4], [4, 5, 6], [3, 4, 5], [2, 6], [1, 2, 3, 4, 6], [2], [2, 5], [1], [2, 4, 5, 6], [1, 3, 4], [1, 6], [1, 2, 4], [4], [1, 2, 3], [1, 2, 6], [2, 3, 6], [2, 3, 4, 6], [1, 2, 4, 5, 6], [5, 6], [2, 3, 4, 5], [1, 3, 4, 5, 6], [3, 4, 5, 6], [1, 4, 6], [4, 6], [1, 3], [2, 3, 4, 6], [2, 3, 5]] โดยมีค่้าที่ซ้ำเป็น: (2, 3, 4, 6)


100%|██████████| 30/30 [00:00<00:00, 85539.85it/s]


ซ้ำจาก list: [[2, 3, 4, 6], [1, 4], [5], [4, 6], [2, 3, 5], [], [6], [1], [1, 2, 6], [1, 5], [1, 2, 3, 4, 5], [1, 2, 3], [1, 2, 4, 5], [1, 2, 4, 5, 6], [2, 3, 4], [1, 3, 6], [3, 4], [3, 5, 6], [1, 3, 4, 5], [1, 3, 5, 6], [1, 5, 6], [2, 4], [2, 4, 6], [1, 4, 6], [1, 2, 3, 4], [3, 4, 5, 6], [1, 2, 4, 5, 6], [1, 6], [4, 5], [1, 3, 4]] โดยมีค่้าที่ซ้ำเป็น: (1, 2, 4, 5, 6)


100%|██████████| 30/30 [00:00<00:00, 100102.72it/s]


ซ้ำจาก list: [[1, 2, 3, 5, 6], [3, 4, 6], [2, 3, 4, 5], [1, 2, 4], [1, 2, 3, 6], [1], [1, 2, 5, 6], [3, 4], [1, 5], [], [4], [1, 4, 6], [4, 6], [1, 2], [3], [3, 4, 5, 6], [5], [1, 2, 3, 4], [1, 2, 3, 5], [1, 4], [2, 3, 6], [1, 6], [2, 3, 5, 6], [1, 2, 6], [1, 2, 4, 5], [1, 3, 4], [2, 3, 5, 6], [2], [1, 3, 4, 5, 6], [2, 3, 6]] โดยมีค่้าที่ซ้ำเป็น: (2, 3, 5, 6)


100%|██████████| 30/30 [00:00<00:00, 98380.86it/s]


In [None]:
for i in range(30):
  print(samples[i])

[4, 5, 6]
[1]
[2, 3, 5, 6]
[1, 2, 3, 4, 5, 6]
[1, 2, 4, 5]
[1, 2, 3, 5]
[2, 3, 4]
[1, 6]
[2, 3]
[1, 3, 5]
[1, 2, 3, 5, 6]
[1, 3, 4, 5]
[1, 4, 6]
[2, 5]
[2]
[1, 2, 3, 4, 5]
[1, 5, 6]
[3, 4]
[1, 3, 4, 6]
[2, 5, 6]
[6]
[4, 6]
[3, 4, 5, 6]
[1, 4, 5, 6]
[1, 2, 5]
[2, 3, 5]
[1, 2, 3, 6]
[1, 3]
[1, 4]
[1, 3, 5, 6]


In [None]:
if recheck(samples):
  print('มีค่าซ้ำ')
else:
  print('ไม่มีค่าซ้ำ')

ไม่มีค่าซ้ำ


In [None]:
check = dict()
for i in range(30):
  for j in samples[i]:
    if j in check:
      check[j] += 1
    else:
      check[j] = 1
print(check)

{4: 13, 5: 17, 6: 15, 1: 18, 2: 14, 3: 16}


In [None]:
import csv
filename = "samples.csv"
with open(file=filename, mode='w', newline='') as file:
  writer = csv.writer(file)
  writer.writerows(samples)

In [None]:
token = '2303311990'

In [None]:
first_phase = []

In [None]:
print(len(samples))

30


In [None]:
for i in tqdm(range(len(samples))):
  r = requests.post(f'{url}/update_state',
    json={"times":10, "arms":samples[i]},
    headers= {"Authorization": token})
  if r.status_code == 200:
    first_phase.append(r.json())
  else:
    print(f"Error {r.status_code}: {r.text}")

100%|██████████| 30/30 [00:14<00:00,  2.14it/s]


In [None]:
print(first_phase)

[{'limit_reach': False, 'request_reward': [0, 0, 0, 1, 0, 0, 0, 0, 1, 0], 'status': 200}, {'limit_reach': False, 'request_reward': [0, 0, 1, 0, 1, 0, 1, 0, 0, 1], 'status': 200}, {'limit_reach': False, 'request_reward': [0, 1, 0, 0, 0, 0, 0, 0, 0, 0], 'status': 200}, {'limit_reach': False, 'request_reward': [0, 0, 1, 0, 1, 0, 0, 0, 0, 0], 'status': 200}, {'limit_reach': False, 'request_reward': [0, 0, 0, 1, 1, 0, 0, 1, 1, 1], 'status': 200}, {'limit_reach': False, 'request_reward': [1, 0, 0, 0, 0, 0, 1, 0, 0, 0], 'status': 200}, {'limit_reach': False, 'request_reward': [0, 0, 0, 0, 1, 0, 0, 0, 0, 1], 'status': 200}, {'limit_reach': False, 'request_reward': [0, 0, 1, 0, 0, 1, 0, 0, 0, 0], 'status': 200}, {'limit_reach': False, 'request_reward': [1, 0, 0, 1, 0, 0, 0, 1, 0, 0], 'status': 200}, {'limit_reach': False, 'request_reward': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'status': 200}, {'limit_reach': False, 'request_reward': [0, 0, 0, 1, 0, 0, 0, 0, 1, 0], 'status': 200}, {'limit_reach': Fals

In [None]:
filename = "first_phase.csv"
with open(file=filename, mode='w', newline='') as file:
  writer = csv.writer(file)
  for i in first_phase:
    writer.writerow(i['request_reward'])

In [None]:
t1,t2,t3,t4,t5,t6 = Tweak(1),Tweak(2),Tweak(3),Tweak(4),Tweak(5),Tweak(6)

In [None]:
maximize = dict()

In [None]:
for index,result in enumerate(first_phase):
  res = tuple(samples[index])
  count = sum(result['request_reward'])
  p = count/10
  maximize[res] = p
  if 1 in samples[index]:
    t1.on['n']+=10
    t1.on['win']+=count
  else:
    t1.off['n']+=10
    t1.off['win']+=count
  if 2 in samples[index]:
    t2.on['n']+=10
    t2.on['win']+=count
  else:
    t2.off['n']+=10
    t2.off['win']+=count
  if 3 in samples[index]:
    t3.on['n']+=10
    t3.on['win']+=count
  else:
    t3.off['n']+=10
    t3.off['win']+=count
  if 4 in samples[index]:
    t4.on['n']+=10
    t4.on['win']+=count
  else:
    t4.off['n']+=10
    t4.off['win']+=count
  if 5 in samples[index]:
    t5.on['n']+=10
    t5.on['win']+=count
  else:
    t5.off['n']+=10
    t5.off['win']+=count
  if 6 in samples[index]:
    t6.on['n']+=10
    t6.on['win']+=count
  else:
    t6.off['n']+=10
    t6.off['win']+=count

In [None]:
print(maximize)

{(4, 5, 6): 0.2, (1,): 0.4, (2, 3, 5, 6): 0.1, (1, 2, 3, 4, 5, 6): 0.2, (1, 2, 4, 5): 0.5, (1, 2, 3, 5): 0.2, (2, 3, 4): 0.2, (1, 6): 0.2, (2, 3): 0.3, (1, 3, 5): 0.0, (1, 2, 3, 5, 6): 0.2, (1, 3, 4, 5): 0.2, (1, 4, 6): 0.2, (2, 5): 0.1, (2,): 0.4, (1, 2, 3, 4, 5): 0.2, (1, 5, 6): 0.1, (3, 4): 0.5, (1, 3, 4, 6): 0.3, (2, 5, 6): 0.3, (6,): 0.4, (4, 6): 0.5, (3, 4, 5, 6): 0.4, (1, 4, 5, 6): 0.2, (1, 2, 5): 0.3, (2, 3, 5): 0.3, (1, 2, 3, 6): 0.2, (1, 3): 0.2, (1, 4): 0.5, (1, 3, 5, 6): 0.3}


In [None]:
for key in maximize:
  if maximize[key] == max(maximize.values()):
    print(key)

(1, 2, 4, 5)
(3, 4)
(4, 6)
(1, 4)


In [None]:
print(f'tweak1: {t1.calculate_prob()}')
print(f'tweak2: {t2.calculate_prob()}')
print(f'tweak3: {t3.calculate_prob()}')
print(f'tweak4: {t4.calculate_prob()}')
print(f'tweak5: {t5.calculate_prob()}')
print(f'tweak6: {t6.calculate_prob()}')

tweak1: (0.24444444444444444, 0.30833333333333335)
tweak2: (0.25, 0.2875)
tweak3: (0.2375, 0.30714285714285716)
tweak4: (0.3153846153846154, 0.23529411764705882)
tweak5: (0.2235294117647059, 0.33076923076923076)
tweak6: (0.25333333333333335, 0.2866666666666667)


In [None]:
# create 5 candidates
candidates = [
    [4],
    [2,4],
    [4,6],
    [1,4],
    [3,4]
]

In [None]:
round_per_candidate = 40
phase3_stats = {tuple(c): {'wins': 0, 'count': 0} for c in candidates}
phase3 = []

In [None]:
for arm in tqdm(candidates):
  r = requests.post(f'{url}/update_state',
    json={"times":round_per_candidate, "arms":arm},
    headers= {"Authorization": token})
  if r.status_code == 200:
    data = data = r.json()
    phase3.append({
        "arm": tuple(arm),           # แนะนำ: เก็บชื่อ arm แปะไปด้วย จะได้ไม่งงตอนดึงมาใช้
        "rewards": data['request_reward'],
        "total_wins": sum(data['request_reward'])
    })
  else:
    print(f"Error {r.status_code}: {r.text}")

100%|██████████| 5/5 [00:02<00:00,  2.10it/s]


In [None]:
print(phase3)

[{'arm': (4,), 'rewards': [0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0], 'total_wins': 20}, {'arm': (2, 4), 'rewards': [1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0], 'total_wins': 18}, {'arm': (4, 6), 'rewards': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1], 'total_wins': 6}, {'arm': (1, 4), 'rewards': [0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1], 'total_wins': 15}, {'arm': (3, 4), 'rewards': [0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0], 'total_wins': 17}]


In [74]:
filename = "second_phase.csv"
with open(file=filename, mode='w', newline='') as file:
  writer = csv.writer(file)
  for i in phase3:
    writer.writerow(i['rewards'])

In [76]:
lastcandidates = {
    (4,):   {'a': 20 + 1, 'b': 20 + 1},
    (2, 4): {'a': 18 + 1, 'b': 22 + 1}
}

In [77]:
# Gemini บอกใช้ thompson sampling ดีกว่าเพราะเรามีข้อมูลเก่า ที่ความน่าจะเป็นไม่ห่างกันมาก ซึ่งจะมันจะให้โอกาส [2,4] เยอะกว่า UCB ที่จะเลือก [4,] ก่อนเพราะ  prob มันมากกว่าตอนเริ่ม
import numpy as np
from tqdm import tqdm
import requests


arms_list = list(lastcandidates.keys())


for _ in tqdm(range(500)):

    candidate = {}
    for arm in arms_list:
        theta = np.random.beta(lastcandidates[arm]['a'], lastcandidates[arm]['b'])
        candidate[arm] = theta


    best_arm = max(candidate, key=candidate.get)


    try:
        r = requests.post(f'{url}/update_state',
                          json={"times": 1, "arms": list(best_arm)},
                          headers={"Authorization": token})

        if r.status_code == 200:
            res = r.json()
            reward = res['request_reward'][0]

            if reward == 1:
                lastcandidates[best_arm]['a'] += 1
            else:
                lastcandidates[best_arm]['b'] += 1
        else:
            print(f"Error: {r.text}")
            break

    except Exception as e:
        print(e)
        break


100%|██████████| 500/500 [03:58<00:00,  2.09it/s]


In [78]:
for arm in arms_list:
    # คำนวณ Win Rate จริง (ตัด +1 ออกตอนโชว์ผล)
    real_wins = lastcandidates[arm]['a'] - 1
    real_losses = lastcandidates[arm]['b'] - 1
    total = real_wins + real_losses
    print(f"Arm {arm}: Win Rate = {real_wins/total:.3f} ({real_wins}/{total})")

Arm (4,): Win Rate = 0.482 (196/407)
Arm (2, 4): Win Rate = 0.428 (74/173)


In [79]:
r = requests.get(f'{url}/get_state',
                  headers= {"Authorization": token})

In [92]:
response = r.json()

In [97]:
print(f'{sum(response['state']['reward_list'][501:])}/500')

232/500
