In [1]:
from script import *

# ZOOTR Analytics
Current version - Season 4 standard settings  

Written by:
[@cleartonic](https://twitter.com/cleartonic)  
GitHub: [Link](https://github.com/cleartonic/zootr_analytic)

In [2]:
num_seeds()

Number of seeds (sample size): 25000 


## Table of Contents
- Way of the Hero Regions
- All Way of the Hero Regions, split by song
- Barren Regions
- Required WOTH items
- Required WOTH locations
- Required Skulltula locations
- WOTH Hints
- Barren Hints
- Sphere distribution
- Playthrough steps distribution
- Distribution of Child entrances
- Distribution of Adult entrances
- Number of stones & medallions required per playthrough

- Exploratory Area

## Foreward

This is a collection of statistics true to the spoiler logs. This is not supposed to serve as some epic data-oriented approach to changing how people look at seeds - it's simply an organized way of looking at freely available data.

Goal is to have this dataset set up such that future tournaments/events, we can run the exact same analysis and compare differences.

While organizing this data I had two concepts at play:
- #1: Demonstrate data from spoiler logs cleanly
- #2: Can we answer the question of why season 4 seeds seem worse?

For both of these, #1 is achieved and #2 is not - it is a difficult question. First, in order to say something is better/worse than something else, you'd have to have the baseline of the other thing (in this case, Season 3 or before). Not only that, the comparison of "what" is incredibly subjective. 
    
I had a few theories or anti-theories on what would constitute "good" or "bad" seeds:
- Forced area revisits: The idea of having to specifically get an item in a first location, go to a second location,
    then revisit the first location with the item retrieved in the second location. Although this is certainly can
    lead to bad seeds, there often are ways to avoid this with logic breaking or even doing things in non-intended
    order (compared to playthrough log). Unfortunately, this is difficult to calculate, because the playthrough logs will often     jump heavily between areas. Therefore there'd have to be a very detailed and complex analytic engine to "unwind" each seed 
    from the playthrough logs.
- Sphere count: Spheres are interesting, but it's simply the code making any possible playthrough. Further
    (as many experienced players are aware), very tedious logic chains can cause huge sphere counts. For example,
    some Adult temples' key locations will be multiple spheres by themselves. Further, things like Mido's skip
    trivializing WOTH implications for Forest Temple, etc. So sphere count is somewhat interesting but 
    not at all the best indicator of "good" vs. "bad". I'd say they're generally correlated for slightly faster
    seeds having low sphere counts, but inconclusive on the higher sphere counts 
- New settings: Candidly, there's nothing in this data set that can even come close to confirming/denying the theory of whether 
    or not the new settings have had a significant impact. The big ones include starting location, open bridge, free Zelda. My 
    hunch is that the settings are not correlated to seed quality, and there's been an unfortunate poor sampling in recent    seeds as a whole. But I really don't know. 
        
For those reasons and a few other data organization/science reasons, I decided not to pursue this deeply. In my opinion, if you want to truly get into the data science methods of how and why seeds are "good" or "bad", you'd have to first codify the time cost of every location. In other words, creating an extremely dynamic map of how long generally required checks are between each other, including things like warping, save & quitting (with random locations), Farore's Wind... entirely too much work for any reasonable human being. 
    
Whenever looking at data points, remember that the randomizer can create so many possibilites. So looking at likelihood of 1 random occurrence at 1 location or something is not very important. But by looking at the macro level of what's happening, one can use intuition to investigate deeper. Further, on some charts (such as Entrance starting locations), not very important the ranking between choices, but general groups of high/low will be moderately reliable for outcomes. 

Finally, I've always thought the idea of plandomizer to be a bit lame, but the idea of generating a ton of seeds and scanning for certain qualities like sphere counts, certain item relationships, etc. for 'challenge' seeds could be fun. Let me know if you agree!

This is an open source project (refer to GitHub link at the top), though I have all the logs and their extracts into tables saved in a local database on my computer. 

## Analysis
### Way of the Hero Regions
#### "On average, what percent of regions are WOTH per seed?"
- These are NOT for hints- this is what is fully WOTH for the seed per the log.

There are three tables here: All WOTH, WOTH for song vs. non-song, and then filtered for songs

In [3]:
b2()

Unnamed: 0_level_0,overall
region,Unnamed: 1_level_1
Kakariko Village,88.26
Sacred Forest Meadow,72.31
the Graveyard,65.48
Death Mountain Crater,61.54
Ganon's Castle,59.04
Gerudo Training Grounds,58.76
Lon Lon Ranch,57.0
Spirit Temple,53.18
Hyrule Castle,51.32
Bottom of the Well,51.26


### All Way of the Hero Regions, split by song
#### "For WOTH regions, what percentage of the rewards are items vs. songs?"
- These are not mutually exclusive, which is why most with song & non-song will add up to more than the pct. 
- This means WOTH often refers to song, non-song, or both in the same seed.

In [4]:
b2_1()

Unnamed: 0_level_0,song,non-song,overall
region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Kakariko Village,58.2,70.97,88.26
Sacred Forest Meadow,69.82,7.88,72.31
the Graveyard,36.73,45.99,65.48
Death Mountain Crater,47.01,27.82,61.54
Ganon's Castle,0.0,59.04,59.04
Gerudo Training Grounds,0.0,58.76,58.76
Lon Lon Ranch,47.62,17.66,57.0
Spirit Temple,0.0,53.18,53.18
Hyrule Castle,47.92,6.7,51.32
Bottom of the Well,0.0,51.26,51.26



Filtered below for song areas only:



Unnamed: 0_level_0,song,non-song,overall
region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Kakariko Village,58.2,70.97,88.26
Sacred Forest Meadow,69.82,7.88,72.31
the Graveyard,36.73,45.99,65.48
Death Mountain Crater,47.01,27.82,61.54
Lon Lon Ranch,47.62,17.66,57.0
Hyrule Castle,47.92,6.7,51.32
Desert Colossus,43.33,12.28,50.38
Hyrule Field,25.9,34.05,49.89
Ice Cavern,32.28,21.36,46.62
Temple of Time,34.87,0.0,34.87


### Barren Regions
#### "On average, what percent of areas are barren per seed?"

- This is NOT the gossip hint system, this is given per the spoiler. The hints are picked from barren regions from this list.
- Notice how songs areas are never barren. Same goes for barren hints. It appears the randomization engine avoids dealing with this song problem entirely by avoiding hints at those regions altogether.

In [5]:
b1()

Unnamed: 0_level_0,count,pct
location,Unnamed: 1_level_1,Unnamed: 2_level_1
outside Ganon's Castle,21236,84.944
Haunted Wasteland,20691,82.764
Hyrule Castle,20486,81.944
Zora's Fountain,14255,57.02
Gerudo's Fortress,13785,55.14
Zora's Domain,12920,51.68
Gerudo Valley,12904,51.616
Water Temple,12411,49.644
Jabu Jabu's Belly,11837,47.348
Zora's River,8649,34.596


### Required WOTH items
#### "What percentage of seeds require X item to complete?"
There are a few major caveats here:
- This is showing strictly logically required items.
- Magic and Bow are shown as the non-Ganondorf required WOTH. Not entirely sure why or how this works in the log.
- Progressives were split out, meaning differing requirements for seeds are split by the max required progressive.

In [6]:
b3()

Unnamed: 0_level_0,count,pct
hint_item,Unnamed: 1_level_1,Unnamed: 2_level_1
Bolero of Fire,1354,5.416
Bomb Bag,13935,55.74
Boomerang,20051,80.204
Bottle,914,3.656
Bow,7128,28.512
Dins Fire,21507,86.028
Eponas Song,10532,42.128
Fire Arrows,2368,9.472
Goron Tunic,590,2.36
Hover Boots,22790,91.16


### Required WOTH locations
#### "What is the percentage chance of a particular check being WOTH?" 
- Given there's so many possible outcomes with ZOOTR, this chart is mostly cosmetic, but yields some insight. 
- Regions may be better to analyze, except for some logically restricted checks. There may be some value in looking at this chart per region, as shown in the Exploratory area at the end of this workbook.

In [7]:
b4()

Unnamed: 0_level_0,count,pct
location,Unnamed: 1_level_1,Unnamed: 2_level_1
Song from Impa,11979,47.916
Song from Saria,11925,47.7
Song from Windmill,11923,47.692
Song from Malon,11905,47.62
Sheik in Crater,11752,47.008
Sheik at Colossus,10833,43.332
Sheik in Forest,10176,40.704
Song from Composers Grave,9183,36.732
Sheik at Temple,8718,34.872
Sheik in Ice Cavern,8071,32.284



Subset for Skulltula House rewards:


Unnamed: 0_level_0,count,pct
location,Unnamed: 1_level_1,Unnamed: 2_level_1
Kak 10 Gold Skulltula Reward,2297,9.188
Kak 20 Gold Skulltula Reward,2315,9.26
Kak 30 Gold Skulltula Reward,2218,8.872
Kak 40 Gold Skulltula Reward,2070,8.28
Kak 50 Gold Skulltula Reward,1847,7.388


### Required Skulltula locations
#### "What is the percentage chance for each Gold Skulltula location being required?"
- These are "required" in that the playthrough log chose them. Of course there's usually flexibility, but it may be worthwhile to assess the most frequently chosen ones.
- Nicely, all 100 are represented here (which somewhat surprised me).

In [8]:
b5()

Unnamed: 0_level_0,count,pct
location,Unnamed: 1_level_1,Unnamed: 2_level_1
LLR GS Tree,11412,45.648
LLR GS Rain Shed,11412,45.648
Kak GS Tree,11263,45.052
Kak GS Guards House,11192,44.768
ZR GS Tree,11162,44.648
GC GS Center Platform,11039,44.156
Kak GS Skulltula House,11034,44.136
Kak GS House Under Construction,10836,43.344
Market GS Guard House,10604,42.416
OGC GS,10566,42.264


### WOTH Hints
#### "What is the likelihood of a given hint pointing to a particular region?"
- Most likely regions being given as barren hint are at the top of this document. This is what is hinted, which is once again different than the entire WOTH status. 
- This is overall a less important chart, this is just the 2 random WOTH regions you get as hints. More important is the first WOTH table by region, what has fully been decided for the seed. This table is more likely to suffer from sample size inadequacy, but still seems mostly correlated with WOTH by region.

In [9]:
b8()

Unnamed: 0_level_0,count,pct
location,Unnamed: 1_level_1,Unnamed: 2_level_1
Kakariko Village,8571,8.57
the Graveyard,6018,6.02
Sacred Forest Meadow,4897,4.9
Death Mountain Crater,4317,4.32
Gerudo Training Grounds,4268,4.27
Ganon's Castle,4220,4.22
the Lost Woods,3864,3.86
Spirit Temple,3738,3.74
Lon Lon Ranch,3645,3.64
Bottom of the Well,3569,3.57


### Barren Hints
#### "What is the likelihood of a given hint pointing to a particular region?"
- Same notes as above.

In [10]:
b7()

Unnamed: 0_level_0,count,pct
location,Unnamed: 1_level_1,Unnamed: 2_level_1
Water Temple,4847,9.69
Fire Temple,3353,6.71
Zora's Fountain,2723,5.45
Gerudo's Fortress,2549,5.1
Jabu Jabu's Belly,2504,5.01
Zora's Domain,2462,4.92
Gerudo Valley,2387,4.77
Zora's River,2363,4.73
Death Mountain Trail,2274,4.55
Lake Hylia,2119,4.24


### Sphere distribution
#### "What is the distribution of spheres per the playthrough log?"
- Generally, spheres are not the best indicator of seed length or quality, for a number of reasons including 1) the fact that the randomizer code simply makes any possible outcome and determines it sufficient for the playthrough log, 2) logic breaks can drastically decrease sphere requirements, and 3) various areas drastically increase the sphere count for the sake of key logic, most notably most Adult dungeons.
- Still, low sphere counts can be correlated generally to fast & minimal seeds.

In [11]:
b6()

Unnamed: 0_level_0,count,pct
sphere,Unnamed: 1_level_1,Unnamed: 2_level_1
8,1,0.0
9,37,0.15
10,84,0.34
11,301,1.2
12,593,2.37
13,868,3.47
14,1202,4.81
15,1553,6.21
16,1883,7.53
17,2073,8.29


### Playthrough steps distribution
#### "What is the distribution of number of steps in the playthrough log?"
- Whereas the above table has sphere counts, this table has the literal number of "steps" (entries) in the playthrough log, grouped to the tens. This is perhaps a bit more interesting than spheres, as these are all 'required' things to be done, regardless of how many sphere there are. 
- Gold Skulltulas are included, so of course high skull seeds will boost these heavily.
- There were about 10 or so seeds with the lowest "steps" of 40, one of which is seed "J6BU632Y5P".
- The single highest "steps" seed was 143, seed "QS8N6QRK1U".

In [12]:
b6_1()

Unnamed: 0_level_0,count,pct
grouping,Unnamed: 1_level_1,Unnamed: 2_level_1
40 - 49,369,1.48
50 - 59,2874,11.5
60 - 69,6046,24.18
70 - 79,4490,17.96
80 - 89,3550,14.2
90 - 99,1936,7.74
100 - 109,2210,8.84
110 - 119,1750,7.0
120 - 129,1312,5.25
130 - 139,400,1.6


### Distribution of Child entrances
#### "What is the distribution of possible outcomes for Child entrances?"
- Nothing entirely interesting, appears to be distributed based on number of unique entrances to all accessible areas.
- Some of these are grouped by region, others by individual location. A quirk of the log system.

In [13]:
b9()

Unnamed: 0_level_0,count,pct
new_spawn,Unnamed: 1_level_1,Unnamed: 2_level_1
Hyrule Field,1347,5.39
Market,587,2.35
KF House of Twins,426,1.7
Kokiri Forest,420,1.68
HC Great Fairy Fountain,415,1.66
Market Guard House,412,1.65
Kakariko Village,411,1.64
LLR Talons House,410,1.64
Market Treasure Chest Game,409,1.64
KF Know It All House,408,1.63


### Distribution of Adult entrances
#### "What is the distribution of possible outcomes for Adult entrances?"
- Same as above.

In [14]:
b10()

Unnamed: 0_level_0,count,pct
new_spawn,Unnamed: 1_level_1,Unnamed: 2_level_1
Hyrule Field,1360,5.44
Market,547,2.19
Market Potion Shop,416,1.66
GC Shop,405,1.62
DMT Great Fairy Fountain,404,1.62
Market Mask Shop,398,1.59
KF Sarias House,396,1.58
Market Man in Green House,395,1.58
Market Shooting Gallery,394,1.58
Market Bombchu Shop,393,1.57


### Number of stones & medallions required per playthrough
#### "What is the distributed of required dungeons per seed?"
- Notably this is strictly logically required AD ("All Dungeons") seeds, so things like Epona's or Saria's logic breaks would slightly push AD percentage down.
- To be clear, 5 dungeons means starting with a Medallion, 6 means starting with a Stone, and 8 means regardless of starting Medallion/Stone, all dungeons required.
- This is strictly dungeons required (i.e., blue warp). Check the following table for a more thorough view.

In [15]:
b11()

Unnamed: 0_level_0,count,pct
dungeons_required,Unnamed: 1_level_1,Unnamed: 2_level_1
5,12561,50.24
6,5870,23.48
8,6568,26.27


#### "What is the percentage of All Dungeons per seed, where stone boss hearts are effectively counted as full dungeons?"
- This first chart is a more advanced view of the above that says, "Alright, if a STONE boss heart has a required item (that is not a small key and is present in the playthrough log), then we should effectively count it as a full dungeon". 
- The second chart shows the detail behind which dungeons were used for their blue warp vs. for their heart on a stone

Personally I found this to be one of the most intruiging tables in the whole analysis. Look at 5 dungeons, 1 stone heart - this means that 7.18% of seeds that give you a starting medallion require you to get a WOTH item on a stone dungeon's boss! I was fairly skeptical of this metric, but I investigated deeper and looked at many logs, and it appears to be the case. There are a few fringe examples of Wallet or Goron Tunic, but the majority are very much true WOTH items. 

And yes, there are some extremely cruel seeds at the bottom of this chart - the dungeon score of 7 or 8 that have less than 8 blue warp means that there are multiple stone dungeon hearts that have WOTH items, despite the seed not being All Dungeons. As an example, check out seed "I8WQ0S1RUA" (S4 standard settings). 

In [16]:
b11_2()

Unnamed: 0_level_0,count,pct
dungeon_score,Unnamed: 1_level_1,Unnamed: 2_level_1
5,10690,42.76
6,7031,28.12
7,686,2.74
8,6592,26.37



Breakout by stone hearts



Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count,pct
dungeon_score,blue_warp,stone_heart,Unnamed: 3_level_1,Unnamed: 4_level_1
5,5,0,10690,42.76
6,5,1,1795,7.18
6,6,0,5236,20.94
7,5,2,73,0.29
7,6,1,613,2.45
8,5,3,3,0.01
8,6,2,21,0.08
8,8,0,6568,26.27


# Exploratory Area
The rest of these tables are some conjecture and freeform ideas I played around with. They're less stable but perhaps others can help & draw insights down here too

### Distribution of check density
This chart shows each region and how many checks it has. 

In [17]:
b12()

Unnamed: 0_level_0,checks,keys,effective_checks
region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Kakariko Village,15,0,15
Spirit Temple,20,5,15
Ganon's Castle,16,2,14
Gerudo Training Grounds,22,9,13
Shadow Temple,18,5,13
Bottom of the Well,14,3,11
Forest Temple,14,5,9
the Lost Woods,8,0,8
the Graveyard,8,0,8
Fire Temple,15,8,7


Now, look at the table for WOTH split out by song/non-song (ctrl+f `b2_1()`). Keep in mind how many effective checks there are for area (which is total checks minus guaranteed small keys for dungeons).

Generally, you'd expect that higher effective checks would yield more favorable results for WOTH. This is fairly common consensus, "Oh, let's go to GTG, it's got a lot of checks, so it should be correlated to WOTH". And that seems alright, but perhaps not exactly correct overall.

So let's look at Gerudo Training Ground. There are 13 effective checks there. Then it has one of the highest WOTH region scores. It has both a high WOTH score and high effective checks, making it overall valuable. 

Let's look at Sacred Forest Meadow. It's has 3 checks, two of which are songs. It has a massive score for WOTH because of its two songs. So, despite it having only 3 checks, it is one of the most correlated WOTH locations. So it seems like an exception, that it has low checks, but high WOTH score. 

So what is the 'criteria', the 'weighting' that causes this? It seems somewhat undefined. Songs might be a good answer for those song regions, but what about others? Look at Shadow Temple - it's WOTH score is actually not that good relatively, despite it having one of the highest effective check counts. You can easily deduce that because Shadow Temple is blocked by multiple requirements (a song, magic and Din's Fire). But then, GTG also has a ton of requirements per room. But what do you do as the player when you're given the chance to either go to GTG or Shadow Temple? Can you safely refer to the WOTH chart, or is there something else at play? 

I think it's easy to say "very much depends on the seed" when dealing with these scenarios, which may be true, but it's not all strictly logical. Just because some seed gave the player Din's Fire at the mid point when everything else "feels" dried up, it does not mean that the seed intentionally placed something in Shadow Temple (aka, Din's was "bait"). 

### WOTH locations, filtered for examples
The next few tables are from the "Required WOTH locations" table above, filtered by some example regions (Deku Tree, Water and Shadow Temples):

In [18]:
b14_1()

Unnamed: 0_level_0,count,pct
location,Unnamed: 1_level_1,Unnamed: 2_level_1
Deku Tree Slingshot Chest,1732,6.928
Deku Tree Basement Chest,1717,6.868
Deku Tree Map Chest,1709,6.836
Deku Tree Compass Room Side Chest,1673,6.692
Deku Tree Compass Chest,1643,6.572
Deku Tree Slingshot Room Side Chest,1643,6.572
Deku Tree Queen Gohma Heart,1627,6.508


In [19]:
b14_2()

Unnamed: 0_level_0,count,pct
location,Unnamed: 1_level_1,Unnamed: 2_level_1
Water Temple Longshot Chest,687,2.748
Water Temple River Chest,619,2.476
Water Temple Boss Key Chest,591,2.364
Water Temple Compass Chest,416,1.664
Water Temple Morpha Heart,392,1.568
Water Temple Map Chest,386,1.544
Water Temple Cracked Wall Chest,353,1.412
Water Temple Torches Chest,353,1.412
Water Temple Central Bow Target Chest,349,1.396
Water Temple Dragon Chest,339,1.356


In [20]:
b14_3()

Unnamed: 0_level_0,count,pct
location,Unnamed: 1_level_1,Unnamed: 2_level_1
Shadow Temple Bongo Bongo Heart,954,3.816
Shadow Temple Hover Boots Chest,798,3.192
Shadow Temple Map Chest,764,3.056
Shadow Temple Spike Walls Left Chest,725,2.9
Shadow Temple Falling Spikes Upper Chest,697,2.788
Shadow Temple After Wind Hidden Chest,686,2.744
Shadow Temple After Wind Enemy Chest,678,2.712
Shadow Temple Wind Hint Chest,678,2.712
Shadow Temple Compass Chest,673,2.692
Shadow Temple Falling Spikes Switch Chest,671,2.684


While helping me look over some of this data, atz pointed out some things about the above tables, mostly that key logic does have a fairly significant weighting for some individual checks. 

Look at Deku Tree, and how relatively consistent it is per check, except for a slight drop-off for Gohma, which is likely due to requirement of Slingshot. 

Then look at Water Temple, where there appears to be a clear increase in placement on the checks that are even less likely to be small keys. 

Then look at Shadow Temple - without knowing better, it seems somewhat random and inconsistent. Very much important to keep in mind that sampling size is a big factor when looking at the grain this deep (at a specific location level).

### Percentage of WOTH per region compared to itself
Truthfully, I'm not exactly sure how to analyze this properly. This chart below is showing per area, WOTH vs. non-WOTH item placement. So for example, look at Bottom of the Well. The sum of the two numbers, divided by all seeds in the data set is 14, which is the number of checks in the area. So roughly 5% of the checks placed here are WOTH and the rest are junk. 

Of course, the more checks there are in an area, the lower the percent is. But there may be some nugget of truth or information in here, when you compare it to check density or WOTH by region?

In [21]:
b13()

Unnamed: 0_level_0,Unnamed: 1_level_0,count,pct
region,woth_item,Unnamed: 2_level_1,Unnamed: 3_level_1
Bottom of the Well,non woth,332401,94.97
Bottom of the Well,woth,17599,5.03
Death Mountain Crater,non woth,105474,84.38
Death Mountain Crater,woth,19526,15.62
Death Mountain Trail,non woth,115501,92.4
Death Mountain Trail,woth,9499,7.6
Deku Tree,non woth,188256,94.13
Deku Tree,woth,11744,5.87
Desert Colossus,non woth,61021,81.36
Desert Colossus,woth,13979,18.64
