## 多腕バンディット問題 (Multi-armed bandit problem)

source : これからの機械学習（森北出版）

---

### 問題設定
腕がK本あるスロットマシンを考える。  
払い戻される額をR、腕i (i = 1, 2, ..., K)を引いた場合のあたりが出る確率をpkとする。  
また、スロットマシンの状態は試行中に変化しないものとする。  
  
確率値pkが既知であれば、プレイヤーの最適な戦略は最大のR*pkを与える腕kを選び続けることである。  
しかしpkが未知であるとき、どのように腕を選べば払戻額を最大化することができるだろうか。  

In [1]:
# スロットマシンのプログラム
import numpy as np


class slotmachine(object):
    def __init__(self, 
                 reward_price=500, 
                 coin_price=100, 
                 distribution=(0.2, 0.3, 0.4, 0.5)):

        # 各スロットマシンの報酬Rとあたりが出る確率pkを定義する
        self.distributions = {}
        for i in range(len(distribution)):
            self.distributions[i] = (reward_price, distribution[i])
        self.reward_price = reward_price
        self.coin_price = coin_price
    
    def action_space(self):
        return len(self.distributions)
    
    def reward_space(self):
        return (self.reward_price, 0)

    def play(self, action):
        (R, p) = self.distributions[action]
        hit = np.random.choice([0, 1], 1, p=[1-p, p])[0]
        reward = R * hit - self.coin_price
        return reward, int(hit)

In [62]:
# ランダムに１０回引いてみる
m = slotmachine()
money = 1000
action_space_size = m.action_space()
print("選択肢の数 =",  action_space_size)

for i in range(10):
    action = np.random.choice(range(action_space_size))
    reward, hit = m.play(action)
    money += reward
    hit_message = "hit!" if hit==1 else "miss"
    print("action = {0}, {1}, money = {2}".format(action, hit_message, money))

選択肢の数 = 4
action = 1, miss, money = 900
action = 3, miss, money = 800
action = 2, miss, money = 700
action = 1, hit!, money = 1100
action = 1, miss, money = 1000
action = 0, hit!, money = 1400
action = 0, hit!, money = 1800
action = 1, hit!, money = 2200
action = 1, miss, money = 2100
action = 3, hit!, money = 2500


---

### greedyアルゴリズム
Algorithm
>まだn回選んだことがない腕がある場合、その腕を選ぶ  
>それ以外の場合、すべての腕に対して、これまでの報酬の平均を計算する  
>　　　　vi = (これまで腕iから得られた報酬の和) / (これまで腕iをプレイした回数)  
>viが最大の腕を選ぶ  

In [6]:
class Agent_greedy(object):
    def __init__(self, action_space_size, min_choose=1):
        self.action_space_size = action_space_size
        self.play_count = 0
        self.min_choose = min_choose
        
        # expected_reward[action_number] = (selected_count, sum of rewards, avarage of rewards)
        self.expected_reward = {}
        for i in range(action_space_size):
            self.expected_reward[i] = [0, 0., 0.]
            
    def act(self):
        keys_of_less_selected = [key for key in self.expected_reward if self.expected_reward[key][0] < self.min_choose]
        if len(keys_of_less_selected)!=0:
            action = int(np.random.choice(keys_of_less_selected, 1))
            return action
        else:
            max_val = max(self.expected_reward[x][2] for x in self.expected_reward)
            keys_of_max_expected = [key for key in self.expected_reward if self.expected_reward[key][2] == max_val]
            action = int(np.random.choice(keys_of_max_expected))
            return action
        
    def update(self, action, reward):
        self.play_count += 1
        self.expected_reward[action][0] += 1
        self.expected_reward[action][1] += reward
        self.expected_reward[action][2] = self.expected_reward[action][1] / self.expected_reward[action][0]
    
    def sum_reward(self):
        sum_reward = 0
        for value in self.expected_reward.values():
            sum_reward += value[1]
        return sum_reward
    
    def action_count(self):
        action_count = []
        for value in self.expected_reward.values():
            action_count.append(value[0])
        return action_count
        

In [52]:
# 動作例
m = slotmachine()
money = 0
action_space_size = m.action_space()

agent = Agent_greedy(action_space_size, min_choose=3)

for i in range(1000):
    action = agent.act()
    reward, hit = m.play(action)
    agent.update(action, reward)
    money += reward
    #hit_message = "hit!" if hit==1 else "miss"
    #print("action = {0}, {1}, money = {2}".format(action, hit_message, money))
    if (i+1)%100==0:
        print("try:{0}, money={1}".format(i+1, money))

try:100, money=13500
try:200, money=32500
try:300, money=46000
try:400, money=64000
try:500, money=79000
try:600, money=95500
try:700, money=108000
try:800, money=124500
try:900, money=141500
try:1000, money=157500


このアルゴリズムは、  
・最適ではない腕i'の報酬の期待値Rpi'を、最適な腕iの期待値Rpiよりも大きいと誤認したとき、  
　試行回数を増加すればいつかは訂正できる  
  
しかし、  
・最適な腕iの報酬の期待値Rpiが他のある腕i'の期待値Rpi'よりも小さいと誤認したとき、  
　試行回数をいくら増加させても訂正できるとは限らない  
という問題がある。
  
このリスクを前言させ、探索コストを減らす手法の一つとして、  
下記のε-greedyアルゴリズムが知られている。  

### ε-greedyアルゴリズム
Algorithm
>まだ選んだことがない腕がある場合、その腕から一つ選ぶ  
>確率εで、すべての腕からランダムに一つ選ぶ  
>確率1-εで、これまでの報酬の平均viが最大の腕を選ぶ  

In [7]:
class Agent_e_greedy(Agent_greedy):
    def __init__(self, action_space_size, epsilon=0.1, min_choose=1):
        super(Agent_e_greedy, self).__init__(action_space_size, min_choose)
        self.epsilon = epsilon

    def act_greedy(self):
        keys_of_less_selected = [key for key in self.expected_reward if self.expected_reward[key][0] < self.min_choose]
        if len(keys_of_less_selected)!=0:
            action = int(np.random.choice(keys_of_less_selected, 1))
            return action
        else:
            max_val = max(self.expected_reward[x][2] for x in self.expected_reward)
            keys_of_max_expected = [key for key in self.expected_reward if self.expected_reward[key][2] == max_val]
            action = int(np.random.choice(keys_of_max_expected))
            return action

    def act(self):
        if np.random.choice([1, 0], p=[self.epsilon, 1-self.epsilon]):
            action = int(np.random.choice(range(action_space_size)))
        else:
            action = self.act_greedy()
        return action
    

探索と利用のトレードオフを解く方法として  
「不確かなときは楽観的に (optimism in face of uncertainty)」という原理が知られている。  
  
この原理を用いた多腕バンディット問題の解放として、  
Upper Confidence Bound (UCB) アルゴリズムが有名である。  

### UCB1アルゴリズム
Algorithm  
>R : 払戻額の最大値と最小値の差  
>まだ選んだことのない腕があれば、そのうちの一つを選ぶ  
>各々の腕iから得られる報酬の期待値を計算する。  
>　　　　vi = (これまで腕iから得られた報酬の和) / (これまで腕iを選んだ回数)  
>各々の腕iから得られる報酬の信頼区間の半幅を計算する  
>　　　　Ui = R * root( (2ln(これまでの総プレイ回数)) / (これまで腕iをプレイした回数) )  
>vi + Ui が最大の腕iを選ぶ  

In [8]:
from math import sqrt, log

class Agent_ucb1(object):
    def __init__(self, action_space_size, reward_space=(500,0), min_choose=1):
        self.action_space_size = action_space_size
        self.play_count = 0
        self.min_choose = min_choose
        self.reward_width = abs(reward_space[0] - reward_space[1])
        
        # expected_reward[action_number] = (selected_count, sum of rewards, 
        #                                   avarage of rewards, 1/2 * Confidence interval of reward,
        #                                   average + 1/2*CI)
        self.expected_reward = {}
        for i in range(action_space_size):
            self.expected_reward[i] = [0, 0., 0., 0., 0.]

    def act(self):
        keys_of_less_selected = [key for key in self.expected_reward if self.expected_reward[key][0] < self.min_choose]
        if len(keys_of_less_selected)!=0:
            action = int(np.random.choice(keys_of_less_selected, 1))
            return action
        else:
            max_val = max(self.expected_reward[x][4] for x in self.expected_reward)
            keys_of_max_expected = [key for key in self.expected_reward if self.expected_reward[key][4] == max_val]
            action = int(np.random.choice(keys_of_max_expected))
            return action
        
    def update(self, action, reward):
        self.play_count += 1
        self.expected_reward[action][0] += 1
        self.expected_reward[action][1] += reward
        self.expected_reward[action][2] = self.expected_reward[action][1] / self.expected_reward[action][0]
        self.expected_reward[action][3] = self.reward_width * sqrt((2*log(self.play_count)) / (self.expected_reward[action][0]))
        
    def sum_reward(self):
        sum_reward = 0
        for value in self.expected_reward.values():
            sum_reward += value[1]
        return sum_reward
    
    def action_count(self):
        action_count = []
        for value in self.expected_reward.values():
            action_count.append(value[0])
        return action_count

---

### 性能比較
参考文献の設定に従い、K=4の場合のシミュレーションを行う。  
４つの腕の払戻額は同じ（１とする）で、払戻率はそれぞれ0.2, 0.3, 0.4, 0.5である。  
学習は１エポックあたり10,000回、これを10,000エポック繰り返す。  

In [13]:
import numpy as np
import matplotlib.pyplot as plt

m = slotmachine(reward_price=1,
                coin_price=0, 
                distribution=(0.2, 0.3, 0.4, 0.5))
action_space_size = m.action_space()
reward_space_size = m.reward_space()

agent_greed = Agent_greedy(action_space_size, min_choose=100)
agent_e_greed = Agent_e_greedy(action_space_size, epsilon=0.1, min_choose=1)
agent_ucb1 = Agent_ucb1(action_space_size, reward_space=reward_space_size, min_choose=1)

# 100 for debag
trial_num = 100
epoch_num = 100


def exam_block(agent, trial_num=trial_num):
    # record[trial] = [trial, [each action's selected_count list], sum of reward]
    record = []
    for trial in range(trial_num):
        action = agent.act()
        reward, hit = m.play(action)
        agent.update(action, reward)
        record.append([trial, agent.action_count(), agent.sum_reward()])

    # for debag
    print(record)
        
    return record
    
def epoch_block(agent, epoch_num=epoch_num, trial_num=trial_num):
    # total_record[trial] = [trial, [each action's total selected_count], total reward]
    record = []
    total_record = []
    
    for epoch in range(epoch_num):
        record = exam_block(agent, trial_num=trial_num)
        if epoch==0:
            total_record = record
        else:
            for trial in range(len(total_record)):
                for action in range(len(total_record[trial][1])):
                    total_record[trial][1][action] += record[trial][1][action]
                total_record[2] += record[2]
                
        # for debag
        print("epoch ", epoch)
        print("\n", record, "\n\n")
        
    return total_record
    

total_record_greed = epoch_block(agent_greed, epoch_num, trial_num)
total_record_e_greed = epoch_block(agent_e_greed, epoch_num, trial_num)
total_record_ucb1 = epoch_block(agent_ucb1, epoch_num, trial_num)

# for debag
print("greed", agent_greed.action_count(), agent_greed.sum_reward())
print(len(total_record_greed))
print(total_record_greed[0])
print("e-greed", agent_e_greed.action_count(), agent_e_greed.sum_reward())
print(len(total_record_e_greed))
print(total_record_e_greed[0])
print("ucb1", agent_ucb1.action_count(), agent_ucb1.sum_reward())
print(len(total_record_ucb1))
print(total_record_ucb1[0])



text="""

# drawing graph
def draw_learning_graph(total_record, row, col, num, title, color, x, xmin, xmax, ymin, ymax):
    action_record=[]
    action_record.append([])
    action_record *= action_space_size
    # for debag
    print("action_record", len(action_record))
    print(action_record)
    print("total_record", len(total_record), len(total_record[1][1]))
    
    for action in range(action_space_size):
        for trial in range(trial_num):
            record = [total_record[trial][1][action] / (trial+1)]
            action_record[action].append(record)
            # for debag
            print("trial, total_record[trial][1][action] ", trial, total_record[trial][1][action] )
            print("record", len(record), record)
            
            
            
    # for debag
    print("action_record", len(action_record), len(action_record[0]), action_record)

    plt.subplot(row, col, num)
            
    y = [np.array([1 for i in range(trial_num)])]

    for action_num in range(action_space_size+1):
        if action_num < action_space_size:
            y.append(np.array(action_record[action_num]))
        else:
            y.append(np.array([0 for i in range(trial_num)]))

        print(x.size)
        print(y[action_num].size)
        print(y[action_num+1].size)
        print("---")
        plt.fill_between(x, y[action_num], y[action_num+1], facecolor = color[action_num], alpha=0.5)
    plt.title(title)
    plt.xlim(xmin, xmax)
    plt.ylim(ymin, ymax)
        

x = np.array(range(0, trial_num))
color = ["tomato", "gold", "palegreen", "deepskyblue", "violet"]


#text=
draw_learning_graph(total_record_greed,
           row=4, col=1, num=1, title="Learning graph of greedy algorithm",color=color,
           x=x, xmin=0, xmax=trial_num, ymin=0, ymax=1)
draw_learning_graph(total_record_e_greed,
           row=4, col=1, num=2, title="Learning graph of e-greedy algorithm",color=color,
           x=x, xmin=0, xmax=trial_num, ymin=0, ymax=1)
draw_learning_graph(total_record_ucb1,
           row=4, col=1, num=2, title="Learning graph of UCB1 algorithm",color=color,
           x=x, xmin=0, xmax=trial_num, ymin=0, ymax=1)
"""

#plt.tight_layout()

[[0, [1, 0, 0, 0], 0.0], [1, [1, 0, 0, 1], 0.0], [2, [1, 1, 0, 1], 1.0], [3, [1, 2, 0, 1], 1.0], [4, [1, 3, 0, 1], 2.0], [5, [1, 3, 1, 1], 3.0], [6, [2, 3, 1, 1], 3.0], [7, [3, 3, 1, 1], 3.0], [8, [4, 3, 1, 1], 3.0], [9, [5, 3, 1, 1], 3.0], [10, [6, 3, 1, 1], 3.0], [11, [6, 4, 1, 1], 4.0], [12, [7, 4, 1, 1], 4.0], [13, [7, 4, 2, 1], 5.0], [14, [7, 5, 2, 1], 6.0], [15, [7, 5, 2, 2], 6.0], [16, [7, 5, 2, 3], 7.0], [17, [7, 5, 2, 4], 8.0], [18, [7, 5, 2, 5], 9.0], [19, [7, 6, 2, 5], 9.0], [20, [7, 7, 2, 5], 10.0], [21, [7, 8, 2, 5], 11.0], [22, [7, 8, 2, 6], 12.0], [23, [7, 8, 2, 7], 12.0], [24, [7, 9, 2, 7], 13.0], [25, [7, 10, 2, 7], 13.0], [26, [7, 10, 2, 8], 14.0], [27, [8, 10, 2, 8], 14.0], [28, [8, 10, 2, 9], 15.0], [29, [8, 10, 3, 9], 15.0], [30, [8, 10, 3, 10], 16.0], [31, [8, 11, 3, 10], 16.0], [32, [8, 12, 3, 10], 17.0], [33, [8, 13, 3, 10], 17.0], [34, [8, 13, 3, 11], 17.0], [35, [8, 13, 4, 11], 18.0], [36, [8, 14, 4, 11], 18.0], [37, [9, 14, 4, 11], 18.0], [38, [9, 15, 4, 11],

[[0, [100, 100, 100, 3401], 1829.0], [1, [100, 100, 100, 3402], 1830.0], [2, [100, 100, 100, 3403], 1831.0], [3, [100, 100, 100, 3404], 1831.0], [4, [100, 100, 100, 3405], 1831.0], [5, [100, 100, 100, 3406], 1831.0], [6, [100, 100, 100, 3407], 1831.0], [7, [100, 100, 100, 3408], 1832.0], [8, [100, 100, 100, 3409], 1833.0], [9, [100, 100, 100, 3410], 1834.0], [10, [100, 100, 100, 3411], 1835.0], [11, [100, 100, 100, 3412], 1836.0], [12, [100, 100, 100, 3413], 1837.0], [13, [100, 100, 100, 3414], 1838.0], [14, [100, 100, 100, 3415], 1838.0], [15, [100, 100, 100, 3416], 1838.0], [16, [100, 100, 100, 3417], 1839.0], [17, [100, 100, 100, 3418], 1840.0], [18, [100, 100, 100, 3419], 1841.0], [19, [100, 100, 100, 3420], 1842.0], [20, [100, 100, 100, 3421], 1842.0], [21, [100, 100, 100, 3422], 1843.0], [22, [100, 100, 100, 3423], 1844.0], [23, [100, 100, 100, 3424], 1845.0], [24, [100, 100, 100, 3425], 1846.0], [25, [100, 100, 100, 3426], 1847.0], [26, [100, 100, 100, 3427], 1847.0], [27, [100,

[[0, [100, 100, 100, 7401], 3832.0], [1, [100, 100, 100, 7402], 3833.0], [2, [100, 100, 100, 7403], 3833.0], [3, [100, 100, 100, 7404], 3833.0], [4, [100, 100, 100, 7405], 3834.0], [5, [100, 100, 100, 7406], 3835.0], [6, [100, 100, 100, 7407], 3835.0], [7, [100, 100, 100, 7408], 3836.0], [8, [100, 100, 100, 7409], 3837.0], [9, [100, 100, 100, 7410], 3838.0], [10, [100, 100, 100, 7411], 3839.0], [11, [100, 100, 100, 7412], 3840.0], [12, [100, 100, 100, 7413], 3840.0], [13, [100, 100, 100, 7414], 3841.0], [14, [100, 100, 100, 7415], 3841.0], [15, [100, 100, 100, 7416], 3841.0], [16, [100, 100, 100, 7417], 3841.0], [17, [100, 100, 100, 7418], 3842.0], [18, [100, 100, 100, 7419], 3843.0], [19, [100, 100, 100, 7420], 3844.0], [20, [100, 100, 100, 7421], 3844.0], [21, [100, 100, 100, 7422], 3844.0], [22, [100, 100, 100, 7423], 3844.0], [23, [100, 100, 100, 7424], 3844.0], [24, [100, 100, 100, 7425], 3845.0], [25, [100, 100, 100, 7426], 3845.0], [26, [100, 100, 100, 7427], 3846.0], [27, [100,

[[0, [50, 32, 42, 1077], 575.0], [1, [50, 32, 42, 1078], 575.0], [2, [51, 32, 42, 1078], 576.0], [3, [51, 32, 42, 1079], 577.0], [4, [51, 32, 42, 1080], 577.0], [5, [51, 32, 42, 1081], 577.0], [6, [51, 32, 42, 1082], 578.0], [7, [51, 32, 42, 1083], 579.0], [8, [51, 32, 42, 1084], 580.0], [9, [51, 32, 42, 1085], 580.0], [10, [51, 32, 42, 1086], 580.0], [11, [51, 32, 42, 1087], 581.0], [12, [51, 32, 42, 1088], 581.0], [13, [51, 32, 42, 1089], 581.0], [14, [51, 32, 42, 1090], 582.0], [15, [51, 32, 42, 1091], 582.0], [16, [51, 32, 42, 1092], 582.0], [17, [51, 32, 42, 1093], 582.0], [18, [51, 32, 42, 1094], 582.0], [19, [51, 32, 42, 1095], 582.0], [20, [51, 32, 42, 1096], 582.0], [21, [51, 32, 42, 1097], 582.0], [22, [51, 32, 42, 1098], 582.0], [23, [51, 32, 42, 1099], 582.0], [24, [51, 32, 42, 1100], 583.0], [25, [51, 32, 42, 1101], 584.0], [26, [51, 32, 42, 1102], 585.0], [27, [51, 32, 42, 1103], 586.0], [28, [51, 32, 42, 1104], 587.0], [29, [51, 32, 42, 1105], 587.0], [30, [51, 33, 42, 1

[[0, [124, 85, 110, 3882], 2023.0], [1, [124, 85, 110, 3883], 2023.0], [2, [124, 85, 110, 3884], 2024.0], [3, [124, 85, 110, 3885], 2025.0], [4, [124, 85, 110, 3886], 2026.0], [5, [124, 85, 110, 3887], 2026.0], [6, [124, 85, 110, 3888], 2026.0], [7, [124, 85, 110, 3889], 2026.0], [8, [124, 85, 110, 3890], 2027.0], [9, [124, 85, 110, 3891], 2028.0], [10, [124, 85, 110, 3892], 2028.0], [11, [124, 85, 110, 3893], 2029.0], [12, [124, 85, 110, 3894], 2030.0], [13, [124, 85, 110, 3895], 2031.0], [14, [124, 85, 110, 3896], 2031.0], [15, [124, 86, 110, 3896], 2031.0], [16, [124, 86, 110, 3897], 2031.0], [17, [124, 86, 110, 3898], 2032.0], [18, [124, 86, 110, 3899], 2033.0], [19, [124, 86, 110, 3900], 2033.0], [20, [124, 86, 110, 3901], 2033.0], [21, [124, 87, 110, 3901], 2034.0], [22, [124, 87, 110, 3902], 2034.0], [23, [124, 87, 110, 3903], 2034.0], [24, [124, 87, 110, 3904], 2035.0], [25, [124, 87, 110, 3905], 2035.0], [26, [124, 87, 110, 3906], 2036.0], [27, [124, 87, 110, 3907], 2036.0], [

epoch  69

 [[0, [194, 158, 166, 6383], 3340.0], [1, [194, 158, 166, 6384], 3341.0], [2, [194, 158, 166, 6385], 3342.0], [3, [194, 158, 166, 6386], 3343.0], [4, [194, 158, 166, 6387], 3344.0], [5, [194, 158, 166, 6388], 3344.0], [6, [194, 158, 166, 6389], 3345.0], [7, [194, 158, 166, 6390], 3345.0], [8, [194, 158, 166, 6391], 3345.0], [9, [194, 158, 166, 6392], 3345.0], [10, [194, 158, 166, 6393], 3346.0], [11, [194, 158, 166, 6394], 3346.0], [12, [194, 158, 166, 6395], 3346.0], [13, [194, 158, 166, 6396], 3346.0], [14, [194, 158, 166, 6397], 3346.0], [15, [194, 158, 166, 6398], 3346.0], [16, [194, 158, 166, 6399], 3346.0], [17, [194, 158, 166, 6400], 3346.0], [18, [194, 158, 166, 6401], 3346.0], [19, [194, 158, 166, 6402], 3347.0], [20, [195, 158, 166, 6402], 3347.0], [21, [195, 158, 166, 6403], 3347.0], [22, [195, 158, 166, 6404], 3347.0], [23, [195, 158, 166, 6405], 3347.0], [24, [195, 159, 166, 6405], 3347.0], [25, [195, 159, 166, 6406], 3348.0], [26, [195, 159, 166, 6407], 3349.0]

 [[0, [262, 217, 218, 9004], 4688.0], [1, [262, 217, 218, 9005], 4689.0], [2, [262, 217, 218, 9006], 4689.0], [3, [262, 217, 218, 9007], 4689.0], [4, [262, 217, 219, 9007], 4690.0], [5, [262, 217, 219, 9008], 4690.0], [6, [262, 217, 219, 9009], 4690.0], [7, [262, 217, 219, 9010], 4691.0], [8, [262, 217, 219, 9011], 4692.0], [9, [263, 217, 219, 9011], 4693.0], [10, [263, 217, 219, 9012], 4694.0], [11, [263, 217, 220, 9012], 4695.0], [12, [263, 217, 220, 9013], 4696.0], [13, [263, 217, 220, 9014], 4696.0], [14, [263, 217, 220, 9015], 4697.0], [15, [263, 217, 220, 9016], 4697.0], [16, [263, 217, 220, 9017], 4698.0], [17, [263, 217, 221, 9017], 4698.0], [18, [264, 217, 221, 9017], 4699.0], [19, [264, 217, 222, 9017], 4700.0], [20, [264, 217, 222, 9018], 4700.0], [21, [264, 217, 222, 9019], 4700.0], [22, [264, 217, 222, 9020], 4701.0], [23, [264, 217, 222, 9021], 4701.0], [24, [265, 217, 222, 9021], 4701.0], [25, [265, 217, 222, 9022], 4701.0], [26, [265, 217, 222, 9023], 4702.0], [27, [265

[[0, [917, 847, 925, 912], 1289.0], [1, [918, 847, 925, 912], 1289.0], [2, [919, 847, 925, 912], 1289.0], [3, [919, 847, 925, 913], 1290.0], [4, [919, 848, 925, 913], 1290.0], [5, [919, 849, 925, 913], 1290.0], [6, [919, 850, 925, 913], 1291.0], [7, [919, 850, 925, 914], 1291.0], [8, [920, 850, 925, 914], 1292.0], [9, [920, 851, 925, 914], 1292.0], [10, [921, 851, 925, 914], 1292.0], [11, [921, 851, 925, 915], 1292.0], [12, [921, 852, 925, 915], 1293.0], [13, [921, 853, 925, 915], 1293.0], [14, [921, 853, 926, 915], 1294.0], [15, [922, 853, 926, 915], 1294.0], [16, [922, 853, 926, 916], 1295.0], [17, [922, 854, 926, 916], 1295.0], [18, [922, 855, 926, 916], 1295.0], [19, [922, 855, 926, 917], 1295.0], [20, [922, 855, 927, 917], 1296.0], [21, [923, 855, 927, 917], 1296.0], [22, [923, 855, 928, 917], 1297.0], [23, [923, 855, 929, 917], 1297.0], [24, [923, 856, 929, 917], 1297.0], [25, [923, 857, 929, 917], 1298.0], [26, [923, 857, 930, 917], 1298.0], [27, [923, 858, 930, 917], 1298.0], [

[[0, [1822, 1744, 1869, 1866], 2618.0], [1, [1822, 1744, 1870, 1866], 2619.0], [2, [1822, 1745, 1870, 1866], 2620.0], [3, [1822, 1745, 1871, 1866], 2620.0], [4, [1822, 1745, 1872, 1866], 2620.0], [5, [1822, 1746, 1872, 1866], 2620.0], [6, [1822, 1746, 1872, 1867], 2621.0], [7, [1822, 1746, 1873, 1867], 2621.0], [8, [1822, 1746, 1874, 1867], 2622.0], [9, [1823, 1746, 1874, 1867], 2623.0], [10, [1824, 1746, 1874, 1867], 2623.0], [11, [1825, 1746, 1874, 1867], 2623.0], [12, [1826, 1746, 1874, 1867], 2623.0], [13, [1827, 1746, 1874, 1867], 2623.0], [14, [1828, 1746, 1874, 1867], 2623.0], [15, [1828, 1746, 1874, 1868], 2623.0], [16, [1828, 1746, 1874, 1869], 2624.0], [17, [1828, 1747, 1874, 1869], 2625.0], [18, [1829, 1747, 1874, 1869], 2625.0], [19, [1830, 1747, 1874, 1869], 2626.0], [20, [1830, 1747, 1875, 1869], 2627.0], [21, [1830, 1748, 1875, 1869], 2628.0], [22, [1830, 1748, 1875, 1870], 2628.0], [23, [1830, 1749, 1875, 1870], 2628.0], [24, [1830, 1750, 1875, 1870], 2629.0], [25, [183

In [21]:
np.array(range(0, 10, 1))

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [246]:
aa = np.array([1, 2, 3, 4])
bb = np.array([2])
aa/bb

array([ 0.5,  1. ,  1.5,  2. ])

In [268]:
a = [[1]*10]
a.append([1,2,3])
print(a)

[[1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 2, 3]]


In [255]:
aaa=[]
aaa.append([]) 
print(aaa*4)

[[], [], [], []]


In [244]:
rt = []
r1 = [1, [1, 2, 3, 4], 0.5]
r2 = [2, [1, 2, 3, 4], 0.5]

rt = r1

rt[0] += r2[0]
print(len(rt[1]))
for i in range(len(rt[1])):
    rt[1][i] += r2[1][i]
rt[2] += r2[2]

print(rt)

ry1 = [rt[0], rt[1][0], rt[2]]
print(ry1)

4
[3, [2, 4, 6, 8], 1.0]
[3, 2, 1.0]


In [114]:
# あとで消す
a = {1:(0,10), 2:(1,20), 3:(0,30), 4:(2,40), 5:(4,10), 6:(3,50), 7:(4,22)}
a.values()
a.keys()

zero_selected = min(list(a.values())[0])
print(zero_selected)

keys_of_zero_selected = [key for key in a if a[key][0] == 0]
print(keys_of_zero_selected)

#for key in a.keys():
#    print(a[key][0])

0
[1, 3]


In [117]:
# あとで消す
max_key = max(a, key=(lambda x:a[x][1]))
max_val = max(a[x][1] for x in a)
#key = [key for key in a if a[key][0]==max_val]

print(max_key, max_val)

6 50


In [111]:
# あとで消す
max_val = max(a.values())
key = [key for key in a if a[key][0] < 2]

print(max_val, key)

(4, 22) [1, 2, 3]


In [221]:
print(a)
for i in a.values():
    print(i[1])

{1: (0, 10), 2: (1, 20), 3: (0, 30), 4: (2, 40), 5: (4, 10), 6: (3, 50), 7: (4, 22)}
10
20
30
40
10
50
22
