# Noita data exploration

What is noita? It's a super hard and fun game.
  
https://noitagame.com/
  
it's fantastic and I can spend hours talking about it so better just play it :) or go for a coffee with me :)

The game saves tons of data after each run that isn't really used anywhere. Each run has its own files at  
```AppData/LocalLow/Nolla_Games_Noita/save00/stats/sessions```
  
and there, after each run, you can find all kinds of info, like: 
* what time you started the new game, 
* how long did the game last,
* was it victorious,
* which biomes did you visit, 
* what killed you, 
* hp & $ you ended up with, 
* cause of death, 
* how many enemies you killed etc.

I have collected files from around 1000 ganes, it may be worth to explore that data and see what interesting stories can I pull out of this.  
I continue to gather data from various users.

In [254]:
with open('noita_path.txt') as path_file:
    path = path_file.read()

Loading the data and putting it into a dict with datetime as a key. File names are generated with date and time a particular game started. YYYYMMDD-HHMMSS_....xml

In [255]:
import xmltodict
import json
import os
import pandas as pd
from datetime import datetime


stats = {}
kills = {}

for file in os.listdir(path):
    with open(f'{path}/{file}', encoding='UTF-8') as file:
        xml = xmltodict.parse(file.read())
        if file.name.endswith('kills.xml'):
            kills[file.name[-25: -10]] = xml
        else:
            stats[file.name[-25: -10]] = xml

Each run has 2 files associetad with it. Example file names & files:  
* '20210429-123554_kills.xml'
* '20210429-123554_stats.xml'

In [256]:
test_kills = '20230221-165450_kills.xml'
test_stats = '20230221-165450_stats.xml'

Let's take a look at an example file, starting with the easier one - **20230221-165450_kills.xml**:

In [257]:
with open(f'{path}/{test_kills}', encoding='UTF-8') as file:
    test_kills_file = xmltodict.parse(file.read())
    
test_kills_file

{'Stats': {'@deaths': '1',
  '@kills': '28',
  '@player_kills': '0',
  '@player_projectile_count': '0',
  'death_map': {'E': {'@key': 'NULL | $damage_midas', '@value': '1'}},
  'kill_map': {'E': [{'@key': 'acidshooter_weak', '@value': '1'},
    {'@key': 'firebug', '@value': '1'},
    {'@key': 'fireskull', '@value': '1'},
    {'@key': 'longleg', '@value': '4'},
    {'@key': 'miner', '@value': '1'},
    {'@key': 'miner_weak', '@value': '4'},
    {'@key': 'rat', '@value': '2'},
    {'@key': 'scavenger_grenade', '@value': '1'},
    {'@key': 'scavenger_smg', '@value': '2'},
    {'@key': 'shotgunner', '@value': '3'},
    {'@key': 'slimeshooter', '@value': '1'},
    {'@key': 'slimeshooter_weak', '@value': '3'},
    {'@key': 'zombie_weak', '@value': '4'}]}}}

It's a recent run and I know for a fact it was victorious. 
* Deaths should always be a 1, as there's no way to respawn and every game ends with your death.
* Kills seems to be just total entities I killed - boring.
* player_kills is interesting. Possibly a sign the authors wanted to implement a multiplayer at some point, other than that you can kill yourself with your own projectile and that'd make it 1? Maybe worth testing.
* player_projectile_count - I have no idea what that is. The name suggests to be a count of projectiles that were shot but the count is 0, so that's not it... 
* Death map seems to hold info on what killed me with what kind of damage. Victorious runs will usually say "midas damage".
* Kill map is how many of each enemy type I killed. It was a short run where I decided to just run for it so the kill count will be small.

In [258]:
with open(f'{path}/{test_stats}', encoding='UTF-8') as file:
    test_stats_file = xmltodict.parse(file.read())
    
test_stats_file

{'Stats': {'@BUILD_NAME': 'Noita-Build-Apr 23 2021-18:44:24',
  'stats': {'@biomes_visited_with_wands': '10',
   '@damage_taken': '97.9292',
   '@dead': '1',
   '@death_count': '0',
   '@death_pos.x': '6401.68',
   '@death_pos.y': '15163',
   '@enemies_killed': '29',
   '@gold': '165',
   '@gold_all': '1075',
   '@gold_infinite': '0',
   '@healed': '1.5',
   '@heart_containers': '0',
   '@hp': '100',
   '@items': '24',
   '@kicks': '12',
   '@killed_by': ' | midas',
   '@killed_by_extra': '',
   '@places_visited': '10',
   '@playtime': '776.6',
   '@playtime_str': '0:12:56',
   '@projectiles_shot': '1414',
   '@streaks': '0',
   '@teleports': '0',
   '@wands_edited': '6',
   '@world_seed': '82045564'},
  'biome_baseline': {'@biomes_visited_with_wands': '6',
   '@damage_taken': '24.7409',
   '@dead': '0',
   '@death_count': '0',
   '@death_pos.x': '0',
   '@death_pos.y': '0',
   '@enemies_killed': '28',
   '@gold': '362',
   '@gold_all': '1072',
   '@gold_infinite': '0',
   '@healed': '

This file - **xxx_stats.xml** - is much more complex. From the top:
* Build - game version. 
* biomes_visited_with_wands - possibly a stat to decide whether or not to give the player the wandless trophy.
* damage_taken - self-explanatory, worth noting the game engine probably multiplies the value by 25 like all other damage.
* dead - the game always ends with death.
* death_count - no idea, honestly
* death_pos - I can use it on a death map to see where I died the most.
* enemies_killed is 1 more than in the kills file, possibly because of the final boss, which kills are tracked but isnt in the kills file.
* gold - $ I held at the end probably
* gold_all - probably the total amount of gold I gathered.
* gold_infinite - a flag whether I had an infinite gold.
* healed, heart_containers and hp I do not entirely understand
* items - I highly doubt I picked up 24 items unless it also counts heart containers and spell refresh.
* kicks - yeah.
* killed_by and killed_by_extra is pretty cool, the game tracks what killed you and whether or not you have been polymorphed.
* places_visited - how many biomes I run through. It's explicitely listed at the end.
* playtime in seconds, playtime converted for the stats screen,
* ...

There's actually some weirdness going on in the stats file I don't understand, to be more specific the difference between stats and biome_baseline. I'll try to find some documentation on that, if no I'll experiment. For now I'll go with what I have, so my play hours, the play time etc.

From the further investigation is seems like biome_baseline is some kind of save mechanic, meaning the game was saved and then loaded up again, probably.

Looks like the parser is having some trouble with some dictionaries, I'll have to mitigate that. I'd love to clarify the diff between 'stats' and 'biome_baseline'

I feel like the most valuable variable I can grab and analyse other stuff against is time of the game - when did it start and how long it lasted. Problem - some games might have been saved and returned on a later time and this is not recorded in the files. I'll have to ignore that fact, most likely there's no work-around.

There's also some decent info on biomes, usual death type (if not from midas, which is usually the death after finishing the game) vs play time and biomes visited. 

In [270]:
def _dict_from_stats(old: dict, start_time):
    if type(old['Stats']['biomes_visited']['E']) == dict:
        if old['Stats']['biomes_visited'] is not None and len(old['Stats']['biomes_visited']['E']) >= 1:
            biomes_visited = [y[1:] for x, y in old['Stats']['biomes_visited']['E'].items() if x=="@key"]
        else:
            biomes_visited = []
    if type(old['Stats']['biomes_visited']['E']) == list:
        biomes_visited = [x['@key'][1:] for x in old['Stats']['biomes_visited']['E']]
    return dict(
        zip(
            (
                'datetime_the_game_started',
                'damage_taken',
                'death',
                'death_X',
                'death_Y',
                'enemies_killed',
                'gold_on_death',
                'gold_total',
                'healed',
                'hp',
                'items',
                'kicks',
                'killed_by',
                'killed_by_extra',
                'places_visited',
                'playtime',
                'projectiles',
                'winstreak',
                'teleports',
                'wands_edited',
                'game_seed',
                'biomes_visited',
            ),(
                datetime.strptime(start_time, '%Y%m%d-%H%M%S'),
                old['Stats']['stats']['@damage_taken'],
                old['Stats']['stats']['@dead'],
                old['Stats']['stats']['@death_pos.x'],
                old['Stats']['stats']['@death_pos.y'],
                old['Stats']['stats']['@enemies_killed'],
                old['Stats']['stats']['@gold'],
                old['Stats']['stats']['@gold_all'],
                old['Stats']['stats']['@healed'],
                old['Stats']['stats']['@hp'],
                old['Stats']['stats']['@items'],
                old['Stats']['stats']['@kicks'],
                old['Stats']['stats']['@killed_by'],
                old['Stats']['stats']['@killed_by_extra'],
                old['Stats']['stats']['@places_visited'],
                datetime.strptime(old['Stats']['stats']['@playtime_str'], '%H:%M:%S').time(),
                old['Stats']['stats']['@projectiles_shot'],
                old['Stats']['stats']['@streaks'],
                old['Stats']['stats']['@teleports'],
                old['Stats']['stats']['@wands_edited'],
                old['Stats']['stats']['@world_seed'],
                biomes_visited,
            )
        )
    )

def _cols_from_stats(old: dict, start_time):
    if old['Stats']['biomes_visited'] is not None:
        if type(old['Stats']['biomes_visited']['E']) == dict:
            if len(old['Stats']['biomes_visited']['E']) >= 1:
                biomes_visited = [y[1:] for x, y in old['Stats']['biomes_visited']['E'].items() if x=="@key"]
            else:
                biomes_visited = []
        elif type(old['Stats']['biomes_visited']['E']) == list:
            biomes_visited = [x['@key'][1:] for x in old['Stats']['biomes_visited']['E']]
        else:
            raise Exception(f"something weird with biomes for {start_time}")
    else:
        biomes_visited = None
    return [
        datetime.strptime(start_time, '%Y%m%d-%H%M%S'),
        old['Stats']['stats']['@damage_taken'],
        old['Stats']['stats']['@dead'],
        old['Stats']['stats']['@death_pos.x'],
        old['Stats']['stats']['@death_pos.y'],
        old['Stats']['stats']['@enemies_killed'],
        old['Stats']['stats']['@gold'],
        old['Stats']['stats']['@gold_all'],
        old['Stats']['stats']['@healed'],
        old['Stats']['stats']['@hp'],
        old['Stats']['stats']['@items'],
        old['Stats']['stats']['@kicks'],
        old['Stats']['stats']['@killed_by'],
        old['Stats']['stats']['@killed_by_extra'],
        old['Stats']['stats']['@places_visited'],
        datetime.strptime(old['Stats']['stats']['@playtime_str'], '%H:%M:%S').time(),
        old['Stats']['stats']['@projectiles_shot'],
        old['Stats']['stats']['@streaks'],
        old['Stats']['stats']['@teleports'],
        old['Stats']['stats']['@wands_edited'],
        old['Stats']['stats']['@world_seed'],
        biomes_visited
    ]

def stats_to_pandas(old):
    return pd.DataFrame(
        data=[_cols_from_stats(stats, key) for key, stats in old.items()],
        columns=[
            'datetime_the_game_started',
            'damage_taken',
            'dead',
            'death_X',
            'death_Y',
            'enemies_killed',
            'gold_on_death',
            'gold_total',
            'healed',
            'hp',
            'items',
            'kicks',
            'killed_by',
            'killed_by_extra',
            'places_visited',
            'playtime',
            'projectiles',
            'winstreak',
            'teleports',
            'wands_edited',
            'game_seed',
            'biomes_visited',
        ],
    ).astype(
        {
            'datetime_the_game_started':'datetime64[ns]',
            'damage_taken':'float64',
            'dead':'int',
            'death_X':'float64',
            'death_Y':'float64',
            'enemies_killed':'int64',
            'gold_on_death':'int64',
            'gold_total':'int64',
            'healed':'float64',
            'hp':'float64',
            'items':'int64',
            'kicks':'int64',
            'killed_by':'str',
            'killed_by_extra':'str',
            'places_visited':'int64',
            'projectiles':'int64',
            'winstreak':'int64',
            'teleports':'int64',
            'wands_edited':'int64',
            'game_seed':'int64',
            'biomes_visited':'object',
        }
    )

In [271]:
df = stats_to_pandas(stats)

In [272]:
df.head() #YASSSSS

Unnamed: 0,datetime_the_game_started,damage_taken,dead,death_X,death_Y,enemies_killed,gold_on_death,gold_total,healed,hp,...,killed_by,killed_by_extra,places_visited,playtime,projectiles,winstreak,teleports,wands_edited,game_seed,biomes_visited
0,2021-04-29 12:35:54,1753.76,1,6393.15,15163.0,95,4687,11407,7.0,100.0,...,| midas,,8,01:21:59,4443,0,0,1,1571725091,"[biome_boss_arena, biome_boss_victoryroom, bio..."
1,2021-07-30 21:23:21,3.62108,1,1904.35,838.847,18,210,210,0.0,100.0,...,,,1,00:02:13,92,0,0,0,152013471,[biome_coalmine]
2,2021-07-30 21:26:59,4.25036,1,-354.112,898.863,40,695,695,0.0,100.0,...,Heikko haulikkohiisi | projectile,,3,00:07:58,259,0,0,0,992607272,"[biome_coalmine, biome_coalmine_alt, biome_hol..."
3,2021-07-30 21:37:55,0.0,1,228.648,-78.0028,0,0,0,0.0,100.0,...,,,0,00:00:01,1,0,0,0,1768656056,
4,2021-07-30 21:38:05,4.31733,1,630.492,762.189,36,1662,1662,0.0,100.0,...,| explosion,,3,00:13:20,276,0,0,0,1006912916,"[biome_coalmine, biome_coalmine_alt, biome_hol..."


In [273]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 609 entries, 0 to 608
Data columns (total 22 columns):
 #   Column                     Non-Null Count  Dtype         
---  ------                     --------------  -----         
 0   datetime_the_game_started  609 non-null    datetime64[ns]
 1   damage_taken               609 non-null    float64       
 2   dead                       609 non-null    int32         
 3   death_X                    609 non-null    float64       
 4   death_Y                    609 non-null    float64       
 5   enemies_killed             609 non-null    int64         
 6   gold_on_death              609 non-null    int64         
 7   gold_total                 609 non-null    int64         
 8   healed                     609 non-null    float64       
 9   hp                         609 non-null    float64       
 10  items                      609 non-null    int64         
 11  kicks                      609 non-null    int64         
 12  killed_b

In [276]:
df.drop(columns=['game_seed']).describe()

Unnamed: 0,damage_taken,dead,death_X,death_Y,enemies_killed,gold_on_death,gold_total,healed,hp,items,kicks,places_visited,projectiles,winstreak,teleports,wands_edited
count,609.0,609.0,609.0,609.0,609.0,609.0,609.0,609.0,609.0,609.0,609.0,609.0,609.0,609.0,609.0,609.0
mean,382.313556,0.853859,5948.089,5228.697304,189.793103,7000265.0,7003466.0,5.766138,100.0,22.045977,33.615764,6.068966,7874.458128,0.0,1.880131,6.671593
std,851.207857,0.353538,121115.8,6204.765696,292.855766,121952700.0,121954400.0,60.439723,0.0,24.184757,37.415916,5.907887,25697.823826,0.0,4.94846,7.746001
min,0.0,0.0,-287958.0,-1122.0,0.0,0.0,0.0,0.0,100.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,4.02266,1.0,-189.37,605.029,16.0,280.0,430.0,0.0,100.0,4.0,6.0,2.0,99.0,0.0,0.0,0.0
50%,8.34233,1.0,262.511,1835.0,72.0,1033.0,1792.0,0.0,100.0,15.0,24.0,5.0,944.0,0.0,0.0,4.0
75%,101.273,1.0,4304.0,13111.0,282.0,3853.0,7921.0,0.0,100.0,37.0,49.0,9.0,5971.0,0.0,2.0,12.0
max,9877.49,1.0,2974800.0,18019.3,2846.0,2147484000.0,2147529000.0,1389.85,100.0,252.0,345.0,45.0,517770.0,0.0,57.0,49.0


0      True
1      True
2      True
3      True
4      True
       ... 
604    True
605    True
606    True
607    True
608    True
Name: dead, Length: 609, dtype: bool