<a href="https://colab.research.google.com/github/VegaSera/DS-Unit-2-Applied-Modeling/blob/master/module1-define-ml-problems/Wesley_Mountford_LS_DS12_231_assignment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Lambda School Data Science

*Unit 2, Sprint 3, Module 1*

---


# Define ML problems

You will use your portfolio project dataset for all assignments this sprint.

## Assignment

Complete these tasks for your project, and document your decisions.

- [X] Choose your target. Which column in your tabular dataset will you predict?
- [X] Is your problem regression or classification?
- [X] How is your target distributed?
    - Classification: How many classes? Are the classes imbalanced?
    - Regression: Is the target right-skewed? If so, you may want to log transform the target.
- [X] Choose your evaluation metric(s).
    - Classification: Is your majority class frequency >= 50% and < 70% ? If so, you can just use accuracy if you want. Outside that range, accuracy could be misleading. What evaluation metric will you choose, in addition to or instead of accuracy?
    - Regression: Will you use mean absolute error, root mean squared error, R^2, or other regression metrics?
- [X] Choose which observations you will use to train, validate, and test your model.
    - Are some observations outliers? Will you exclude them?
    - Will you do a random split or a time-based split?
- [X] Begin to clean and explore your data.
- [X] Begin to choose which features, if any, to exclude. Would some features "leak" future information?

If you haven't found a dataset yet, do that today. [Review requirements for your portfolio project](https://lambdaschool.github.io/ds/unit2) and choose your dataset.

Some students worry, ***what if my model isn't “good”?*** Then, [produce a detailed tribute to your wrongness. That is science!](https://twitter.com/nathanwpyle/status/1176860147223867393)

###Choosing target:

Our target is whether or the combination of cards and relics will win the run.

### Regression or Classification:

Our problem is a binary classification problem.

### Evaluation metric

Our model will be based on a win/lose ratio.

We will split the dataset and only use data from a single character to start, that being the Ironclad. This reduces the total number of relics and cards that we use as our columns. We will do a random split to validate and test the data.

Our win rate at first glance is 9%. Our baseline accuracy is 91% assuming all runs fail.


In [0]:
import pandas as pd
import numpy as np
import json
from pandas.io.json import json_normalize
import requests
import os, sys

raw_data = 'https://raw.githubusercontent.com/VegaSera/SlayTheSpireAnalysisAndModelling/master/data/jsons/2018-10-25-02-34%231352.json'
request_data = requests.get(raw_data)
json_data = request_data.json()

In [2]:
type(json_data)
filename = 'sts.json'
with open(filename, 'w') as f:
  json.dump(json_data, f)
  print("File " + filename + " has been created.")

File sts.json has been created.


In [3]:
 data = json.load(open('sts.json'))
 data = json_normalize(data=data)
 df = pd.DataFrame(data=data)
 df.columns = [i.replace('event.', '') for i in df.columns]
 df
 #There's only a few columns we're interested in with this data. 
 #First and foremost, we're only interested in Ironclad class data for now. We also want to make sure it's an unseeded and not endless run.
 #Secondly, we only care about the master_deck and relics columns, as well as the ascension level. (Higher ascensions are harder, but also tend to be more experienced players.)
 #We can expand this to events, potions, and neow bonuses later.
 #Our column we're going to predict is the victory column.

 #Full breakdown of columns analysis below

Unnamed: 0,gold_per_floor,floor_reached,playtime,items_purged,score,play_id,local_time,is_ascension_mode,campfire_choices,neow_cost,seed_source_timestamp,circlet_count,master_deck,relics,potions_floor_usage,damage_taken,seed_played,potions_obtained,is_trial,path_per_floor,character_chosen,items_purchased,campfire_rested,item_purchase_floors,current_hp_per_floor,gold,neow_bonus,is_prod,is_daily,chose_seed,campfire_upgraded,win_rate,timestamp,path_taken,build_version,purchased_purges,victory,max_hp_per_floor,card_choices,player_experience,relics_obtained,event_choices,is_beta,boss_relics,items_purged_floors,is_endless,potions_floor_spawned,ascension_level,special_seed,killed_by
0,"[99, 99]",0,5,[],0,2eebda8a-6486-4fed-b32e-306c66ce5b52,20181024222406,True,[],,715104038123476,0,"[Strike_R, Strike_R, Strike_R, Strike_R, Strik...",[Burning Blood],[],[],-4518276804806975519,[],False,[],IRONCLAD,[],0,[],"[68, 68]",99,,False,False,False,0,0.0,1540434246,[],2018-10-23,0,False,"[75, 75]",[],1360121,[],[],True,[],[],False,[],20,,
1,"[109, 122, 51, 69, 79, 79, 95, 95, 118, 118, 1...",28,1514,[],260,e2f90237-597e-4815-b21e-fdacbeff5325,20181024192405,True,"[{'data': 'Inflame', 'floor': 6.0, 'key': 'SMI...",NONE,0,0,"[Defend_R, Defend_R, Defend_R+1, Defend_R+1, B...","[Burning Blood, War Paint, Dream Catcher, Rega...","[24, 27]","[{'damage': 3.0, 'enemies': 'Cultist', 'floor'...",-4152570914957429597,"[{'floor': 1.0, 'key': 'Regen Potion'}, {'floo...",False,"[M, M, $, M, M, R, M, ?, T, R, ?, M, R, M, R, ...",IRONCLAD,"[Inflame, Shovel]",1,"[3, 25]","[80, 80, 80, 80, 64, 64, 62, 62, 62, 62, 59, 5...",55,RANDOM_COMMON_RELIC,False,False,False,4,0.0,1540434245,"[M, M, ?, ?, M, R, M, ?, T, R, ?, M, R, M, R, ...",2018-10-18,0,False,"[80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 8...","[{'not_picked': ['Twin Strike', 'Seeing Red'],...",549798,"[{'floor': 9.0, 'key': 'Dream Catcher'}, {'flo...","[{'damage_healed': 0.0, 'max_hp_gain': 0.0, 'm...",False,"[{'not_picked': ['Philosopher's Stone', 'Empty...",[],False,"[1, 2, 4, 7, 18, 21, 22]",2,0.0,Cultist and Chosen
2,"[116, 134, 284, 304, 304, 337, 5, 5, 5, 5, 20,...",51,2972,"[Defend_B, Writhe]",1689,1fca3222-ef9e-4a2f-a174-b7f19aef0373,20181025102403,True,"[{'data': 'Apotheosis', 'floor': 8.0, 'key': '...",NONE,2450301981597,0,"[Strike_B, Defend_B, Defend_B, Defend_B, Zap, ...","[Astrolabe, Shuriken, Dodecahedron, Bag of Pre...","[50, 50, 50, 50]","[{'damage': 1.0, 'enemies': 'Jaw Worm', 'floor...",1772559142742755415,"[{'floor': 2.0, 'key': 'Dexterity Potion'}, {'...",False,"[M, M, ?, M, ?, E, $, R, T, ?, M, E, M, E, R, ...",DEFECT,"[Apotheosis, Loop, Happy Flower, Sweeping Beam...",3,"[7, 7, 38, 38, 38]","[63, 56, 56, 53, 71, 69, 69, 69, 69, 62, 56, 6...",317,BOSS_RELIC,False,False,False,2,0.0,1540434243,"[M, M, ?, M, ?, E, $, R, T, ?, M, E, M, E, R, ...",2018-10-18,2,True,"[71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 7...","[{'not_picked': ['Leap', 'Coolheaded'], 'picke...",450227,"[{'floor': 6.0, 'key': 'Shuriken'}, {'floor': ...","[{'damage_healed': 0.0, 'max_hp_gain': 0.0, 'm...",False,"[{'not_picked': ['Snecko Eye', 'Ectoplasm'], '...","[7, 38]",False,"[2, 12, 13, 18, 21, 25, 28, 30, 35, 40, 44, 46...",15,0.0,
3,"[109, 284, 284, 301, 301, 328, 341, 4, 31, 31,...",16,1029,[],98,f09c6c42-fc39-4e2a-b522-10ac1964c0f2,20181024212406,False,"[{'floor': 10, 'key': 'REST'}, {'floor': 15, '...",NONE,540939955171934,0,"[Defend_G, Defend_G, Defend_G, Defend_G, Defen...","[Ring of the Snake, Orichalcum, Potion Belt, V...","[6, 16]","[{'damage': 0, 'enemies': '2 Louse', 'floor': ...",3263333215736852912,"[{'floor': 4, 'key': 'FearPotion'}, {'floor': ...",False,"[M, ?, ?, M, ?, E, M, $, T, R, $, M, T, ?, R]",THE_SILENT,"[Vajra, Blade Dance, Enlightenment, Backstab]",2,"[8, 8, 8, 8]","[70, 70, 70, 68, 73, 56, 55, 55, 55, 75, 75, 7...",122,RANDOM_COMMON_RELIC,False,False,False,0,0.0,1540434246,"[M, ?, ?, M, ?, E, ?, $, T, R, $, M, ?, ?, R, ...",2018-10-18,0,False,"[70, 70, 70, 70, 75, 75, 75, 75, 75, 75, 75, 7...","[{'not_picked': ['Bane', 'PiercingWail'], 'pic...",50942,"[{'floor': 6, 'key': 'Potion Belt'}, {'floor':...","[{'damage_healed': 0, 'gold_gain': 175, 'playe...",False,[],[],False,"[4, 6]",0,,Slime Boss
4,"[112, 131, 151, 169, 185, 185, 185, 185, 211, ...",16,835,[],156,888a667e-514a-4ab8-9630-4fb8f756e0b2,20181024222406,True,"[{'data': 'Armaments', 'floor': 6, 'key': 'SMI...",NONE,492896237467054,0,"[Strike_R+1, Strike_R, Strike_R, Strike_R, Def...","[Burning Blood, Regal Pillow, Art of War]",[5],"[{'damage': 12, 'enemies': 'Jaw Worm', 'floor'...",-1027952450643440236,"[{'floor': 3, 'key': 'PowerPotion'}, {'floor':...",False,"[M, M, M, M, M, R, $, R, T, ?, M, M, R, E, R]",IRONCLAD,[],1,[],"[66, 70, 75, 79, 47, 47, 47, 47, 47, 31, 25, 1...",271,TRANSFORM_CARD,False,False,False,3,0.0,1540434246,"[M, M, M, M, M, R, $, R, T, ?, M, M, R, E, R, ...",2018-10-18,0,False,"[80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 8...","[{'not_picked': ['Clash', 'Perfected Strike'],...",742076,"[{'floor': 9, 'key': 'Regal Pillow'}, {'floor'...","[{'damage_healed': 0, 'gold_gain': 0, 'player_...",False,[],[],False,"[3, 4]",10,,Hexaghost
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1347,"[109, 109, 119, 44, 61, 61, 387, 387, 387, 406...",16,732,"[Defend_G, Strike_G]",215,39f669dc-5296-4c69-835d-8d1086ac82e1,20181024223410,True,"[{'data': 'Neutralize', 'floor': 6.0, 'key': '...",NONE,424605293847628,0,"[Defend_G, Defend_G, Defend_G, Defend_G, Strik...","[Ring of the Snake, Old Coin, Pear, Whetstone,...","[16, 16]","[{'damage': 2.0, 'enemies': 'Jaw Worm', 'floor...",-2398076132163144891,"[{'floor': 5.0, 'key': 'LiquidBronze'}, {'floo...",False,"[M, ?, M, $, M, R, E, R, T, M, R, E, M, $, R]",THE_SILENT,"[Potion Belt, Noxious Fumes, Dodge and Roll, D...",1,"[14, 14, 14, 14]","[57, 50, 46, 46, 46, 46, 4, 23, 33, 31, 31, 29...",5,THREE_CARDS,False,False,False,3,0.0,1540434850,"[M, ?, M, $, M, R, E, R, T, M, R, E, M, ?, R, ...",2018-10-18,2,False,"[66, 66, 66, 66, 66, 66, 66, 66, 76, 76, 76, 7...","[{'not_picked': ['Reflex', 'Concentrate'], 'pi...",655254,"[{'floor': 7.0, 'key': 'Old Coin'}, {'floor': ...","[{'cards_removed': ['Strike_G'], 'damage_heale...",False,[],"[4, 14]",False,"[5, 10, 13]",19,0.0,Slime Boss
1348,"[359, 359, 377, 35, 35, 35, 64, 76, 76, 76, 91...",42,1969,[Pain],597,84572ebe-c222-4826-b94f-ebf8a081ee33,20181024193410,True,"[{'data': 'Poisoned Stab', 'floor': 6, 'key': ...",CURSE,1275947156115651,0,"[Defend_G, Defend_G, Defend_G, Defend_G+1, Def...","[Ring of the Snake, Du-Vu Doll, FaceOfCleric, ...","[14, 16, 27, 29, 29, 40, 42]","[{'damage': 0, 'enemies': 'Small Slimes', 'flo...",5982099541127059923,"[{'floor': 2, 'key': 'EssenceOfSteel'}, {'floo...",False,"[M, ?, M, $, ?, R, E, M, T, ?, M, M, ?, E, R, ...",THE_SILENT,"[Du-Vu Doll, Poisoned Stab, Snake Skull, Noxio...",3,"[4, 4, 20, 20, 20, 36, 36]","[70, 70, 67, 67, 67, 67, 44, 45, 45, 31, 27, 2...",45,TWO_FIFTY_GOLD,False,False,False,4,0.0,1540434850,"[M, ?, M, $, ?, R, E, M, T, ?, ?, M, ?, E, R, ...",2018-10-18,1,False,"[70, 70, 70, 70, 70, 70, 71, 72, 72, 72, 73, 7...","[{'not_picked': ['Bane', 'Outmaneuver'], 'pick...",449612,"[{'floor': 7, 'key': 'Thread and Needle'}, {'f...","[{'damage_healed': 0, 'gold_gain': 0, 'player_...",False,"[{'not_picked': ['Coffee Dripper', 'Sozu'], 'p...",[24],False,"[8, 12, 14, 16, 18, 21, 33, 39, 40]",3,,Giant Head
1349,"[99, 99, 109, 126, 141, 141, 216, 216, 270, 29...",27,1680,[Strike_G],240,46da02ec-502b-4a57-9636-e94adef7e1a7,20181025103356,False,"[{'data': 'Noxious Fumes', 'floor': 6, 'key': ...",NONE,58700530103036,0,"[Defend_G, Defend_G, Defend_G+1, Defend_G, Str...","[Ring of the Snake, Anchor, War Paint, Lee's W...","[10, 12, 18, 22]","[{'damage': 0, 'enemies': '2 Louse', 'floor': ...",-3538684622114372836,"[{'floor': 3, 'key': 'Energy Potion'}, {'floor...",False,"[M, ?, M, M, M, R, ?, R, T, E, $, M, M, ?, R, ...",THE_SILENT,"[Lee's Waffle, Backflip]",0,"[11, 21]","[70, 63, 61, 59, 56, 56, 45, 45, 45, 20, 77, 7...",121,RANDOM_COLORLESS,False,False,False,4,0.0,1540434836,"[M, ?, M, M, ?, R, ?, R, T, E, $, M, M, ?, R, ...",2018-10-18,1,False,"[70, 70, 70, 70, 70, 70, 70, 70, 70, 70, 77, 7...","[{'not_picked': ['Quick Slash', 'Accuracy'], '...",94298,"[{'floor': 9, 'key': 'Anchor'}, {'floor': 10, ...","[{'cards_removed': ['Defend_G'], 'damage_heale...",False,"[{'not_picked': ['Velvet Choker', 'Busted Crow...",[11],False,"[1, 3, 4, 12, 13, 18]",0,,Sentry and Sphere
1350,"[318, 337, 352, 362, 377, 377, 377, 388, 388, ...",50,3767,[Decay],749,e4516c63-90f8-4a74-a47b-aa1ca21923fd,20181024223408,True,"[{'data': 'True Grit', 'floor': 6, 'key': 'SMI...",NO_GOLD,0,0,"[Bash+1, Shrug It Off+1, Immolate+1, Metallici...","[Burning Blood, Old Coin, Champion Belt, Membe...","[12, 21, 25, 25, 41, 50, 50]","[{'damage': 5.0, 'enemies': 'Small Slimes', 'f...",-3804802173818118160,"[{'floor': 3, 'key': 'Strength Potion'}, {'flo...",False,"[M, M, M, M, M, R, ?, M, T, M, $, E, R, $, R, ...",IRONCLAD,"[Membership Card, Feel No Pain+1, Fiend Fire, ...",4,"[11, 14, 14, 14, 14, 28, 38, 38, 38, 38, 38, 38]","[73, 77, 76, 68, 52, 52, 52, 48, 48, 54, 54, 5...",83,ONE_RARE_RELIC,False,False,False,6,0.0,1540434848,"[M, M, M, M, M, R, ?, M, T, M, $, E, R, $, R, ...",2018-10-18,1,False,"[80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 8...","[{'not_picked': ['Warcry', 'Cleave'], 'picked'...",292855,"[{'floor': 9, 'key': 'Champion Belt'}, {'floor...","[{'damage_healed': 0, 'gold_gain': 0, 'player_...",False,"[{'not_picked': ['Eternal Feather', 'Lizard Ta...",[11],False,"[3, 10, 12, 20, 24, 46, 48]",8,0.0,Awakened One


*  Gold per floor
  *  Amount of gold on each floor.
  *  May leak simply due to more entries being tied to higher floor numbers.
*  Floor_reached
  *  Guaranteed leakage, anything below max floor is guaranteed to lose.
*  Playtime
  *  While there are a couple 5 minute speedruns out there, the vast majority of the data is of typical timescales. Longer runs can translate to a higher probability of victory.
*  items_purged
  *  These are items that the player has willingly chosen to get rid of, like cards.
*  score
  *  Higher scores tend to result in higher chances to win simply due to killing more things.
*  play_id
  *  Some sort of identifying factor. Not useful to us.
*  local_time
  *  The local time at the time of the run. Probably isn't useful to us, but maybe runs have a higher concentration of wins at certain times of the day.
*  is_ascension_mode / ascension_level
  *  Ascension mode is a mode that gets unlocked once you beat the game with all three characters. There are 20 levels of this mode, each unlocking when you beat the last ascension level on that character. 
  *  is_ascension_mode is a boolean indicating whether or not it's ascension at all, while ascension_level is an int from 0-20. 0 indicates that is_ascension_mode is false.
  *  Each level increases the difficulty of the run, but the highest levels also tend to be populated by skilled and experienced players.
*  campfire_choices, campfire_rested, campfire_upgraded
  *  Players may either rest or smith at a campfire, or take other actions if they have certain relics. 
  *  This column is not useful to us at this time, but may be in later models.
*  neow_cost / neow_bonus
  *  At the start of each run, the player may choose a risk/reward option. neow_cost indicates the risk, the penalty the player took, and neow_bonus indicates the reward, the bonus that the player took with the penalty.
*  seed_source_timestamp
  *  It is currently uncertain what this indicates, however since it is just a timestamp, we can likely safely ignore it for our purposes.
*  circlet_count
  *  Circlets are relics that the player can obtain after they have exhausted the pool of relics available to them. As far as I'm aware, this almost never happens.
*  master_deck
  *  This is the deck that the player completed the run with, or was defeated with. It contains every card available to them.
  *  This is one of our critical columns.
* relics
  *  These are the relics the player ended the game with. This is another of our critical columns.
* potions_floor_usage, potions_floor_spawned
  *  Details when the player used their potions, and which potions were used.
* damage_taken
  *  More damage taken tends to equate to longer runs, but not always. Skilled players can beat the game with very little damage taken.
* seed_played. chose_seed, special_seed
  *  Seeds are rolled on every new game. However, players can input the seed manually if they choose. 
  *  We will not be considering seeded runs, as it's possible to brute force your way through it through multiple attempts.
* potions_obtained
  *  Details when certain potions were picked up. Higher number of potions obtained will likely result in wins, due to run length.
* is_trial
  * Currently unknown.
* path_per_floor, path_taken
  *  Shows the path the player took through the tower. 
  * M - Monster
  * $ - Shop
  * E - Elite
  * ? - Event Room
  * B - Boss
* character_chosen
  * Will be one of IRONCLAD, THE_SILENT, DEFECT, or WATCHER
* items_purchased, item_purchase_floors
  * Details which items were purchased, and when. Likely too complex to implement.
* current_hp_per_floor, max_hp_per_floor
  * Like above, this will likely be leakage just due to number of entries.

* Others I did not have the patience to get to yet:
       'gold', 'is_prod', 'is_daily', 'win_rate', 'timestamp',
       'build_version', 'purchased_purges', 'victory',
       'card_choices', 'player_experience',
       'relics_obtained', 'event_choices', 'is_beta', 'boss_relics',
       'items_purged_floors', 'is_endless', 'killed_by'],

In [6]:
 df2 = df[df['character_chosen'] == 'IRONCLAD']# We only want this one class to start, to make things easier
 df2 = df2[(df2['chose_seed'] == False) & (df2['is_endless'] == False)] #Choosing seeds is a good way to make the tower deterministic. It is impossible to win endless.
 df2['victory'].mean() #This is our initial baseline measurement. 90.61% failure rate.

0.09392265193370165

In [7]:
df2.columns

Index(['gold_per_floor', 'floor_reached', 'playtime', 'items_purged', 'score',
       'play_id', 'local_time', 'is_ascension_mode', 'campfire_choices',
       'neow_cost', 'seed_source_timestamp', 'circlet_count', 'master_deck',
       'relics', 'potions_floor_usage', 'damage_taken', 'seed_played',
       'potions_obtained', 'is_trial', 'path_per_floor', 'character_chosen',
       'items_purchased', 'campfire_rested', 'item_purchase_floors',
       'current_hp_per_floor', 'gold', 'neow_bonus', 'is_prod', 'is_daily',
       'chose_seed', 'campfire_upgraded', 'win_rate', 'timestamp',
       'path_taken', 'build_version', 'purchased_purges', 'victory',
       'max_hp_per_floor', 'card_choices', 'player_experience',
       'relics_obtained', 'event_choices', 'is_beta', 'boss_relics',
       'items_purged_floors', 'is_endless', 'potions_floor_spawned',
       'ascension_level', 'special_seed', 'killed_by'],
      dtype='object')

In [9]:
df3 = df2[['master_deck', 'relics', 'ascension_level', 'victory']].copy()
df3
#From here we'll encode each of the cards and relics as its own column.
#We will need to account for relics like Prismatic Shard and the possibility of events that can give cards from other classes.

Unnamed: 0,master_deck,relics,ascension_level,victory
0,"[Strike_R, Strike_R, Strike_R, Strike_R, Strik...",[Burning Blood],20,False
1,"[Defend_R, Defend_R, Defend_R+1, Defend_R+1, B...","[Burning Blood, War Paint, Dream Catcher, Rega...",2,False
4,"[Strike_R+1, Strike_R, Strike_R, Strike_R, Def...","[Burning Blood, Regal Pillow, Art of War]",10,False
5,"[Strike_R, Strike_R, Strike_R, Strike_R, Defen...","[Burning Blood, Oddly Smooth Stone, Smiling Ma...",20,False
6,"[Strike_R, Strike_R, Strike_R, Strike_R, Defen...","[Burning Blood, Oddly Smooth Stone, Dodecahedr...",0,False
...,...,...,...,...
1341,"[Strike_R+1, Strike_R, Strike_R+1, Strike_R, S...","[Burning Blood, Bronze Scales, Lantern, Darkst...",0,False
1344,"[Strike_R, Strike_R, Defend_R, Defend_R, Bash,...","[Burning Blood, StoneCalendar, WingedGreaves, ...",0,False
1345,"[Strike_R+1, Strike_R, Strike_R, Strike_R, Def...","[Burning Blood, Meat on the Bone, Golden Idol,...",1,False
1346,"[Strike_R, Strike_R, Strike_R, Strike_R, Strik...","[Burning Blood, NeowsBlessing, Question Card, ...",1,False
