# Thomson sampling on different Brands

A shoe company is interested in seeing which brands are most popular before launching one
new store. Today there are many brands that do not sell very well. We were asked to do an analysis of 10 different brands via a market analysis on the internet, where we register the number of clicks.

### Import libraries

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import random

### Importing the dataset Brands

List all csv files only from folder

In [2]:
import glob
 
csv_files = glob.glob('*.{}'.format('csv'))
csv_files

['MOCK_DATA (1).csv',
 'MOCK_DATA (10).csv',
 'MOCK_DATA (2).csv',
 'MOCK_DATA (3).csv',
 'MOCK_DATA (4).csv',
 'MOCK_DATA (5).csv',
 'MOCK_DATA (6).csv',
 'MOCK_DATA (7).csv',
 'MOCK_DATA (8).csv',
 'MOCK_DATA (9).csv']

In [3]:
mock_data = []
 
for file in csv_files:
    mock_data.append(pd.read_csv(file))
     
dataset = pd.concat(mock_data, ignore_index=True)
dataset

Unnamed: 0,Brand_1,Brand_2,Brand_3,Brand_4,Brand_5,Brand_6,Brand_7,Brand_8,Brand_9,Brand_10
0,0,1,0,0,0,1,0,0,0,1
1,0,1,0,0,1,0,0,0,0,0
2,0,0,0,1,0,1,0,0,1,0
3,0,0,1,0,0,0,0,0,1,1
4,0,0,1,1,1,0,0,0,0,1
...,...,...,...,...,...,...,...,...,...,...
9995,1,0,1,0,1,1,0,0,1,1
9996,1,1,1,0,1,0,0,0,1,0
9997,1,0,0,1,1,1,1,0,1,0
9998,1,1,1,1,1,0,1,0,0,1


In [4]:
dataset.rename(columns = {'Brand_1':'Echo', 
                            'Brand_2':'Comfy Shoes',
                            'Brand_3':'New Balance',
                            'Brand_4':'Asics',
                            'Brand_5':'Skechers',
                            'Brand_6':'Nike',
                            'Brand_7':'Old Balance',
                            'Brand_8':'Nikou',
                            'Brand_9':'Fake Asics',
                            'Brand_10':'Craft'
                         }, inplace = True)

In [5]:
dataset.sum()

Echo           4946
Comfy Shoes    5046
New Balance    4949
Asics          5006
Skechers       4943
Nike           5012
Old Balance    4918
Nikou             0
Fake Asics     4917
Craft          4997
dtype: int64

### Implementing Thompson Sampling

Checks which brands have received the most clicks after 10000

In [6]:
rows = 10000
brands = 10

In [7]:
#Return the winning index of brands

def GetBrandSelected(rows, brands):
    
    total_reward = 0
    brands_selected = []
    
    numbers_of_rewards_1 = [0] * brands
    numbers_of_rewards_0 = [0] * brands
    number_of_selections = [0] * brands
    
    for row in range(0, rows):
      brand_selected = 0
      max_random_value = 0
      for i in range(0, brands):
          random_value = random.betavariate(numbers_of_rewards_1[i] + 1, numbers_of_rewards_0[i] + 1)
          if (random_value > max_random_value):
              brand_selected = i
              max_random_value = random_value
      brands_selected.append(brand_selected)
      number_of_selections[brand_selected] += 1     
      reward = dataset.values[row, brand_selected]
      if (reward == 1):
          numbers_of_rewards_1[brand_selected] += 1
      else:
          numbers_of_rewards_0[brand_selected] += 1
      total_reward += reward
    
    return number_of_selections.index(max(number_of_selections))

In [8]:
def Print(_brands_selected):
    brands_selected = _brands_selected
    plt.hist(brands_selected)
    plt.title('Histogram of brands selections')
    plt.xlabel('Brands')
    plt.ylabel('Number of times each brand was selected')
    plt.show()

<p> Loopa igenom GetBrandsSelected för att se om resultaten är desamma var gång. </p>

In [9]:
for i in range(0, brands):
    print(GetBrandSelected(rows, brands))

3
2
1
2
4
3
4
4
5
2


<p>  Det finns avvikelser i resultaten. Det beror troligtvis på att kolumnerna i datasetet är för lika. Det går inte att trolla fram mer data för mer precisa resultat. Det man kan göra dock är att köra proceduren flera gånger och addera resultaten. En array kan spara hur många gånger varje element får högst resultat.  </p>

In [10]:
counter = [0] * 10 #Tom array med tio brands
for i in range(0, 100): #Kör loopen 100 gånger
    counter[GetBrandSelected(rows, brands)] += 1 #Öka antalet vinster vinnande index får med 1
print(counter)

[7, 11, 8, 9, 7, 37, 7, 0, 3, 11]


<p> Algoritmen körs hundra gånger och det blir mer uppenbart att index 5 är dominerande och att det märket sannolikt kommer att sälja bäst. Var gång modellen körs varierar resultaten en del men kategori 5/Nike vinner var gång testet har körts.</p>

In [11]:
results_dataframe = pd.DataFrame({'Result': counter}, 
                                 index=[dataset.columns])
results_dataframe = results_dataframe.sort_values(by=['Result'], ascending=False)
print(results_dataframe)

             Result
Nike             37
Comfy Shoes      11
Craft            11
Asics             9
New Balance       8
Echo              7
Skechers          7
Old Balance       7
Fake Asics        3
Nikou             0
