In [1]:
from script import *

# ZOOTR Analytics
Current version - Season 4 standard settings  
10,000 logs

## Table of Contents
- Way of the Hero Regions
- All Way of the Hero Regions, split by song
- Barren Regions
- Required WOTH items
- Required WOTH locations
- Required Skulltula locations
- WOTH Hints
- Barren Hints
- Sphere distribution
- Distribution of Child entrances
- Distribution of Adult entrances
- Number of stones & medallions required per playthrough

- Exploratory Area

## Foreward

This is a collection of statistics true to the spoiler logs. This is not supposed to serve as some epic data-oriented approach to changing how people look at seeds - it's simply an organized way of looking at freely available data.

Goal is to have this dataset set up such that future tournaments/events, we can run the exact same analysis and compare differences.

While organizing this data I had two concepts at play:
- #1: Demonstrate data from spoiler logs cleanly
- #2: Can we answer the question of why season 4 seeds seem worse?

For both of these, #1 is achieved and #2 is not - it is a difficult question. First, in order to say something is better/worse than something else, you'd have to have the baseline of the other thing (in this case, Season 3 or before). Not only that, the comparison of "what" is incredibly subjective. 
    
I had a few theories or anti-theories on what would constitute "good" or "bad" seeds:
- Forced area revisits: The idea of having to specifically get an item in a first location, go to a second location,
    then revisit the first location with the item retrieved in the second location. Although this is certainly can
    lead to bad seeds, there often are ways to avoid this with logic breaking or even doing things in non-intended
    order (compared to playthrough log). Unfortunately, this is difficult to calculate, because the playthrough logs will often     jump heavily between areas. Therefore there'd have to be a very detailed and complex analytic engine to "unwind" each seed 
    from the playthrough logs.
- Sphere count: Spheres are interesting, but it's simply the code making any possible playthrough. Further
    (as many experienced players are aware), very tedious logic chains can cause huge sphere counts. For example,
    some Adult temples' key locations will be multiple spheres by themselves. Further, things like Mido's skip
    trivializing WOTH implications for Forest Temple, etc. So sphere count is somewhat interesting but 
    not at all the best indicator of "good" vs. "bad". I'd say they're generally correlated for slightly faster
    seeds having low sphere counts, but inconclusive on the higher sphere counts 
        
For those reasons and a few other data organization/science reasons, I decided not to pursue this deeply. In my opinion, if you want to truly get into the data science methods of how and why seeds are "good" or "bad", you'd have to first codify the time cost of every location. In other words, creating an extremely dynamic map of how long generally required checks are between each other, including things like warping, save & quitting (with random locations), Farore's Wind... entirely too much work for any reasonable human being. 
    
Whenever looking at data points that seem close, remember that the randomizer can create so many possibilites. So looking at likelihood of 1 random occurrence at 1 location or something is not very important. But by looking at the macro level of what's happening, one can use intuitionto investigate deeper. Further, on some charts (such as Entrance starting locations), not very important the ranking between choices, but general groups of high/low will be moderately reliable for outcomes. 

## Analysis
### Way of the Hero Regions
#### "On average, what percent of regions are WOTH per seed?"
- These are NOT for hints- this is what is fully WOTH for the seed per the log.

There are three tables here: All WOTH, WOTH for song vs. non-song, and then filtered for songs

In [2]:
b2()

Unnamed: 0_level_0,overall
region,Unnamed: 1_level_1
Kakariko Village,88.54
Sacred Forest Meadow,72.7
the Graveyard,65.26
Death Mountain Crater,61.52
Gerudo Training Grounds,59.13
Ganon's Castle,58.89
Lon Lon Ranch,57.48
Spirit Temple,53.6
Bottom of the Well,51.79
Hyrule Castle,51.08


### All Way of the Hero Regions, split by song
#### "For WOTH regions, what percentage of the rewards are items vs. songs?"
- These are not mutually exclusive, which is why most with song & non-song will add up to more than the pct. 
- This means WOTH often refers to song, non-song, or both in the same seed.

In [3]:
b2_1()

Unnamed: 0_level_0,song,non-song,overall
region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Kakariko Village,58.26,71.01,88.54
Sacred Forest Meadow,70.33,7.75,72.7
the Graveyard,36.39,45.71,65.26
Death Mountain Crater,46.89,27.73,61.52
Gerudo Training Grounds,0.0,59.13,59.13
Ganon's Castle,0.0,58.89,58.89
Lon Lon Ranch,48.04,18.08,57.48
Spirit Temple,0.0,53.6,53.6
Bottom of the Well,0.0,51.79,51.79
Hyrule Castle,47.53,6.87,51.08



Filtered below for song areas only:



Unnamed: 0_level_0,song,non-song,overall
region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Kakariko Village,58.26,71.01,88.54
Sacred Forest Meadow,70.33,7.75,72.7
the Graveyard,36.39,45.71,65.26
Death Mountain Crater,46.89,27.73,61.52
Lon Lon Ranch,48.04,18.08,57.48
Hyrule Castle,47.53,6.87,51.08
Desert Colossus,42.75,12.52,50.04
Hyrule Field,25.91,33.6,49.44
Ice Cavern,32.08,21.59,46.83
Temple of Time,35.05,0.0,35.05


### Barren Regions
#### "On average, what percent of areas are barren per seed?"

- This is NOT the gossip hint system, this is given per the spoiler. The hints are picked from barren regions from this list.
- Read 'count' column as number of seeds that have X region as a barren region.
- Notice how songs areas are never barren. Same goes for barren hints. It appears the randomization engine avoids dealing with this song problem entirely by avoiding hints at those regions altogether.

In [4]:
b1()

Unnamed: 0_level_0,count,pct
location,Unnamed: 1_level_1,Unnamed: 2_level_1
outside Ganon's Castle,8442,84.428443
Haunted Wasteland,8255,82.558256
Hyrule Castle,8172,81.728173
Zora's Fountain,5762,57.625763
Gerudo's Fortress,5484,54.845485
Zora's Domain,5234,52.345235
Gerudo Valley,5125,51.255126
Water Temple,4972,49.724972
Jabu Jabu's Belly,4712,47.124712
Zora's River,3493,34.933493


### Required WOTH items
#### "What percentage of seeds require X item to complete?"
There are a few major caveats here:
- This is showing strictly logically required items.
- Magic and Bow are shown as the non-Ganondorf required WOTH. Not entirely sure why or how this works in the log.
- Progressives were split out, meaning differing requirements for seeds are split by the max required progressive.

In [5]:
b3()

Unnamed: 0_level_0,count,pct
hint_item,Unnamed: 1_level_1,Unnamed: 2_level_1
Bolero of Fire,532,5.320532
Bomb Bag,5585,55.855586
Boomerang,8040,80.408041
Bottle,386,3.860386
Bow,2829,28.292829
Dins Fire,8565,85.658566
Eponas Song,4206,42.064206
Fire Arrows,956,9.560956
Goron Tunic,209,2.090209
Hover Boots,9099,90.9991


### Required WOTH locations
#### "What is the percentage chance of a particular check being WOTH?" 
- Given there's so many possible outcomes with ZOOTR, this chart is mostly cosmetic.
- Regions are better to analyze, except for some logically restricted checks.

In [6]:
b4()

Unnamed: 0_level_0,count,pct
location,Unnamed: 1_level_1,Unnamed: 2_level_1
Song from Malon,4804,48.044804
Song from Saria,4761,47.614761
Song from Impa,4753,47.534753
Song from Windmill,4712,47.124712
Sheik in Crater,4689,46.894689
Sheik at Colossus,4275,42.754275
Sheik in Forest,4110,41.10411
Song from Composers Grave,3639,36.393639
Sheik at Temple,3505,35.053505
Sheik in Ice Cavern,3208,32.083208



Subset for Skulltula House rewards:


Unnamed: 0_level_0,count,pct
location,Unnamed: 1_level_1,Unnamed: 2_level_1
Kak 20 Gold Skulltula Reward,941,9.410941
Kak 10 Gold Skulltula Reward,895,8.950895
Kak 30 Gold Skulltula Reward,883,8.830883
Kak 40 Gold Skulltula Reward,791,7.910791
Kak 50 Gold Skulltula Reward,748,7.480748


### Required Skulltula locations
#### "What is the percentage chance for each Gold Skulltula location being required?"
- These are "required" in that the playthrough log chose them. Of course there's usually flexibility, but it may be worthwhile to assess the most frequently chosen ones.
- Nicely, all 100 are represented here (which somewhat surprised me).

In [7]:
b5()

Unnamed: 0_level_0,count,pct
location,Unnamed: 1_level_1,Unnamed: 2_level_1
LLR GS Tree,4532,45.324532
LLR GS Rain Shed,4532,45.324532
Kak GS Tree,4477,44.774477
Kak GS Guards House,4448,44.484448
ZR GS Tree,4436,44.364436
GC GS Center Platform,4384,43.844384
Kak GS Skulltula House,4384,43.844384
Kak GS House Under Construction,4302,43.024302
Market GS Guard House,4214,42.144214
OGC GS,4210,42.10421


### WOTH Hints
#### "What is the likelihood of a given hint pointing to a particular region?"
- Most likely regions being given as barren hint are at the top of this document. This is what is hinted, which is once again different than the entire barren status. 
- This is less important, this is just the 2 random barren regions you get as hints. More important is the first WOTH table by region, what has fully been decided for the seed. This table is much more likely to suffer from sample size inadequacy.

In [8]:
b8()

Unnamed: 0_level_0,count,pct
location,Unnamed: 1_level_1,Unnamed: 2_level_1
Kakariko Village,3394,8.49
the Graveyard,2378,5.95
Sacred Forest Meadow,1986,4.97
Death Mountain Crater,1735,4.34
Gerudo Training Grounds,1712,4.28
Ganon's Castle,1677,4.19
the Lost Woods,1562,3.91
Spirit Temple,1555,3.89
Lon Lon Ranch,1501,3.75
Kokiri Forest,1430,3.58


### Barren Hints
#### "What is the likelihood of a given hint pointing to a particular region?"
- Same notes as above.

In [9]:
b7()

Unnamed: 0_level_0,count,pct
location,Unnamed: 1_level_1,Unnamed: 2_level_1
Water Temple,1923,9.62
Fire Temple,1402,7.01
Zora's Fountain,1123,5.62
Gerudo's Fortress,1008,5.04
Jabu Jabu's Belly,988,4.94
Zora's Domain,985,4.93
Gerudo Valley,969,4.85
Zora's River,933,4.67
Death Mountain Trail,902,4.51
the Market,866,4.33


### Sphere distribution
#### "What is the distribution of spheres per the playthrough log?"
- Generally, spheres are not the best indicator of seed length or quality, for a number of reasons including 1) the fact that the randomizer code simply makes any possible outcome and determines it sufficient for the playthrough log, 2) logic breaks can drastically decrease sphere requirements, and 3) various areas drastically increase the sphere count for the sake of key logic, most notably most Adult dungeons
- Still, low sphere counts can be correlated generally to fast & minimal seeds

In [10]:
b6()

Unnamed: 0_level_0,count,pct
sphere,Unnamed: 1_level_1,Unnamed: 2_level_1
8,1,0.01
9,15,0.15
10,21,0.21
11,105,1.05
12,251,2.51
13,358,3.58
14,498,4.98
15,607,6.07
16,743,7.43
17,853,8.53


### Distribution of Child entrances
#### "What is the distribution of possible outcomes for Child entrances?"
- Nothing entirely interesting, appears to be distributed based on number of unique entrances to all accessible areas.
- Some of these are grouped by region, others by individual location. A quirk of the log system

In [11]:
b9()

Unnamed: 0_level_0,count,pct
new_spawn,Unnamed: 1_level_1,Unnamed: 2_level_1
Hyrule Field,532,5.32
Market,253,2.53
KF Midos House,188,1.88
Market Man in Green House,170,1.7
HC Great Fairy Fountain,169,1.69
Kokiri Forest,168,1.68
LLR Tower,167,1.67
Market Shooting Gallery,167,1.67
Kak Windmill,166,1.66
Kak Impas House,166,1.66


### Distribution of Adult entrances
#### "What is the distribution of possible outcomes for Adult entrances?"
- Same as above.

In [12]:
b10()

Unnamed: 0_level_0,count,pct
new_spawn,Unnamed: 1_level_1,Unnamed: 2_level_1
Hyrule Field,525,5.25
Market,227,2.27
Market Potion Shop,174,1.74
GC Shop,174,1.74
KF Sarias House,174,1.74
OGC Great Fairy Fountain,171,1.71
LW Bridge,169,1.69
Market Bombchu Shop,166,1.66
GV Carpenter Tent,166,1.66
Kokiri Forest,160,1.6


### Number of stones & medallions required per playthrough
#### "What is the percentage of All Dungeons per seed?"
- Notably this is strictly logically required AD seeds, so things like Epona's or Saria's logic breaks would slightly push AD percentage down.
- This is strictly dungeons required (i.e., blue warp). Check the following table for a more thorough view.

In [13]:
b11()

Unnamed: 0_level_0,count,pct
dungeons_required,Unnamed: 1_level_1,Unnamed: 2_level_1
6,7372,73.73
9,2626,26.26


#### "What is the percentage of All Dungeons per seed, where boss hearts are effectively counted as full dungeons?"
- Astute observers will see the 9 dungeons_required number increase by a very small percent. That's right - 3 seeds out of the initial 10,000 sample are not "all dungeons" perse, where they have 6 required dungeon blue warps, but have a WOTH item on all three stone dungeons' boss hearts. Brutal!

In [14]:
b11_2()

Unnamed: 0_level_0,count,pct
dungeons_required,Unnamed: 1_level_1,Unnamed: 2_level_1
6,6185,61.86
7,1104,11.04
8,80,0.8
9,2629,26.29


# Exploratory Area
The rest of these tables are some conjecture and freeform ideas I played around with. They're less stable but perhaps others can help & draw insights down here too

### Distribution of check density
This chart shows each region and how many checks it has. 

In [15]:
b12()

Unnamed: 0_level_0,checks,keys,effective_checks
region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Kakariko Village,15,0,15
Spirit Temple,20,5,15
Ganon's Castle,16,2,14
Gerudo Training Grounds,22,9,13
Shadow Temple,18,5,13
Bottom of the Well,14,3,11
Forest Temple,14,5,9
the Lost Woods,8,0,8
the Graveyard,8,0,8
Fire Temple,15,8,7


Now, look at the table for WOTH split out by song/non-song (ctrl+f `b2_1()`). Keep in mind how many effective checks there are for area (which is total checks minus guaranteed small keys for dungeons).

Generally, you'd expect that higher effective checks would yield more favorable results for WOTH. This is fairly common consensus, "Oh, let's go to GTG, it's got a lot of checks, so it should be correlated to WOTH". And that seems alright, but perhaps not exactly correct overall.

So let's look at Gerudo Training Ground. There are 13 effective checks there. Then it has one of the highest WOTH region scores. It has both a high WOTH score and high effective checks, making it overall valuable. 

Let's look at Sacred Forest Meadow. It's has 3 checks, two of which are songs. It has a massive score for WOTH because of its two songs. So, despite it having only 3 checks, it is one of the most correlated WOTH locations. So it seems like an exception, that it has low checks, but high WOTH score. 

So what is the 'criteria', the 'weighting' that causes this? It seems somewhat undefined. Songs might be a good answer for those song regions, but what about others? Look at Shadow Temple - it's WOTH score is actually not that good relatively, despite it having one of the highest effective check counts. You can easily deduce that because Shadow Temple is blocked by multiple requirements (a song, magic and Din's Fire). But then, GTG also has a ton of requirements per room. But what do you do as the player when you're given the chance to either go to GTG or Shadow Temple? Can you safely refer to the WOTH chart, or is there something else at play? 

I think it's easy to say "very much depends on the seed" when dealing with these scenarios, which may be true, but it's not all strictly logical. Just because some seed gave the player Din's Fire at the mid point when everything else "feels" dried up does it mean that the seed intentionally placed something in Shadow Temple (aka "bait"). 

### WOTH locations, filtered for examples
The next few tables are from the "Required WOTH locations" table above, filtered by some example regions (Deku Tree, Water and Shadow Temples):

In [16]:
b14_1()

Unnamed: 0_level_0,count,pct
location,Unnamed: 1_level_1,Unnamed: 2_level_1
Deku Tree Compass Room Side Chest,703,7.030703
Deku Tree Basement Chest,700,7.0007
Deku Tree Map Chest,680,6.80068
Deku Tree Slingshot Chest,674,6.740674
Deku Tree Slingshot Room Side Chest,645,6.450645
Deku Tree Compass Chest,638,6.380638
Deku Tree Queen Gohma Heart,635,6.350635


In [17]:
b14_2()

Unnamed: 0_level_0,count,pct
location,Unnamed: 1_level_1,Unnamed: 2_level_1
Water Temple Longshot Chest,274,2.740274
Water Temple River Chest,262,2.620262
Water Temple Boss Key Chest,251,2.510251
Water Temple Map Chest,161,1.610161
Water Temple Compass Chest,159,1.590159
Water Temple Cracked Wall Chest,155,1.550155
Water Temple Central Bow Target Chest,150,1.50015
Water Temple Torches Chest,149,1.490149
Water Temple Morpha Heart,143,1.430143
Water Temple Central Pillar Chest,136,1.360136


In [18]:
b14_3()

Unnamed: 0_level_0,count,pct
location,Unnamed: 1_level_1,Unnamed: 2_level_1
Shadow Temple Bongo Bongo Heart,405,4.050405
Shadow Temple Hover Boots Chest,323,3.230323
Shadow Temple Spike Walls Left Chest,306,3.060306
Shadow Temple Falling Spikes Switch Chest,283,2.830283
Shadow Temple Compass Chest,282,2.820282
Shadow Temple Map Chest,281,2.810281
Shadow Temple Wind Hint Chest,272,2.720272
Shadow Temple Falling Spikes Upper Chest,271,2.710271
Shadow Temple After Wind Hidden Chest,270,2.70027
Shadow Temple Invisible Spikes Chest,270,2.70027


While helping me look over some of this data, atz pointed out some things about the above tables, mostly that key logic does have a fairly significant weighting for some individual checks. 

Look at Deku Tree, and how relatively consistent it is per check, except for a slight drop-off for Gohma, which is likely due to requirement of Slingshot. 

Then look at Water Temple, where there appears to be a clear increase in placement on the checks that are even less likely to be small keys. 

Then look at Shadow Temple - without knowing better, it seems somewhat random and inconsistent. Very much important to keep in mind that sampling size is a big factor when looking at the grain this deep (at a specific location level).

### Percentage of WOTH per region compared to itself
Truthfully, I'm not exactly sure how to analyze this properly. This chart below is showing per area, WOTH vs. non-WOTH item placement. So for example, look at Bottom of the Well. The sum of the two numbers, divided by all seeds in the data set is 14, which is the number of checks in the area. So roughly 5% of the checks placed here are WOTH and the rest are junk. 

Of course, the more checks there are in an area, the lower the percent is. But there may be some nugget of truth or information in here, when you compare it to check density or WOTH by region?

In [19]:
b13()

Unnamed: 0_level_0,Unnamed: 1_level_0,count,pct
region,woth_item,Unnamed: 2_level_1,Unnamed: 3_level_1
Bottom of the Well,non woth,132909,94.94
Bottom of the Well,woth,7077,5.06
Death Mountain Crater,non woth,42235,84.48
Death Mountain Crater,woth,7760,15.52
Death Mountain Trail,non woth,46185,92.38
Death Mountain Trail,woth,3810,7.62
Deku Tree,non woth,75317,94.16
Deku Tree,woth,4675,5.84
Desert Colossus,non woth,24434,81.45
Desert Colossus,woth,5563,18.55
