### Surviving Mars Maps
Surviving Mars is a sci-fi city builder all about colonizing Mars and, well, surviving. The [Surviving Mars Maps dataset](https://www.kaggle.com/peijenlin/surviving-mars-maps) contains a list of all the colony locations with their environmental conditions and breakthroughs available on each map. I've not played the game, nor ever heard of it, but I had some time and wanted to at least take a cursory look at the data. What I found let me to more questions than I had when I started.


In [1]:
import numpy as np
import pandas as pd

In [2]:
df = pd.read_csv('./data/MapData-Evans-GP-Flatten.csv', skipinitialspace=True)
totalRows = len(df)
print(str(totalRows) + " rows loaded.")

50901 rows loaded.


In [3]:
# let's look at the names of all the columns
column_names = df.columns
print(column_names)

Index(['Latitude °', 'Latitude', 'Longitude °', 'Longitude', 'Topography',
       'Difficulty Challenge', 'Altitude', 'Temperature', 'Metals',
       'Rare Metals', 'Concrete', 'Water', 'Dust Devils', 'Dust Storms',
       'Meteors', 'Cold Waves', 'Map Name', 'Named Location',
       'Advanced Drone Drive', 'Alien Imprints', 'Ancient Terraforming Device',
       'Artificial Muscles', 'Autonomous Hubs', 'Cloning',
       'Construction Nanites ', 'Core Metals', 'Core Rare Metals',
       'Core Water', 'Cryo-sleep', 'Designed Forestation', 'Dome Streamlining',
       'Dry Farming', 'Eternal Fusion', 'Extractor AI', 'Factory Automation',
       'Forever Young', 'Frictionless Composites', 'Gem Architecture',
       'Gene Selection', 'Giant Crops', 'Good Vibrations', 'Hive Mind ',
       'Hull Polarization', 'Hypersensitive Photovoltaics',
       'Inspiring Architecture', 'Interplanetary Learning', 'Lake Vaporators',
       'Landscaping Nanites', 'Magnetic Extraction', 'Martian Diet',
      

Let's print the unique values appearing in each column.

In [4]:
for icol in range(len(column_names)):
    print(column_names[icol])
    print( df[column_names[icol]].unique() )
    print('')

Latitude °
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70]

Latitude
['S' 'N']

Longitude °
[  0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17
  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35
  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53
  54  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71
  72  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89
  90  91  92  93  94  95  96  97  98  99 100 101 102 103 104 105 106 107
 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125
 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143
 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161
 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179
 180]

Lo

The first question I'd like to answer:

    Does non-flat topography have a detrimental effect on the good indicators of a thriving city? In other words, does the Topography type "Relatively Flat" have an advantage over the other three types when it comes to indicators of advanced technology, like "Advanced Drone Drive", "Artificial Muscles", "Forever Young", and all the other good-sounding advances?
    
To answer the question, we must first decide what are indicators of a thriving city. Looking at the column names, it seems that all the boolean type columns are upgrades, meaning that if you were playing the game, then you'd eventually want your civilization to have these capabilities. If that's the case, we can simply check whether the flat topography, ostensibly better areas of the map for building cities, have more True than False in these columns. 

In [5]:
# get the names of all boolean type columns
bool_cols = df.select_dtypes(include=['bool']).columns

Now that we have a list of all the columns of type bool, lets create a new column to hold the count of all the True values in those columns. Then we can check if the mean of all "Relatively Flat" counts is more than the mean counts of the other 3 types. 

In [6]:
# a new column for the True counts of the bool columns in each row
df['bool True count'] = df.loc[:,bool_cols].sum(axis=1)
df.head(5)

Unnamed: 0,Latitude °,Latitude,Longitude °,Longitude,Topography,Difficulty Challenge,Altitude,Temperature,Metals,Rare Metals,...,Superfungus,Superior Cables,Superior Pipes,Sustained Workload,The Positronic Brain,Vector Pump,Vocation-Oriented Society,Wireless Power,Zero-Space Computing,bool True count
0,0,S,0,E,Relatively Flat,140,-929,-1,2,2,...,True,False,False,False,False,False,True,False,True,17
1,0,S,1,E,Relatively Flat,100,-1160,-1,2,2,...,False,False,False,True,False,False,False,False,False,17
2,0,S,1,W,Relatively Flat,100,-1160,-3,2,2,...,False,False,False,False,False,False,True,False,False,17
3,0,S,2,E,Relatively Flat,140,-1160,-1,2,2,...,False,True,False,False,False,False,False,False,False,17
4,0,S,2,W,Relatively Flat,100,-929,-3,2,2,...,False,False,False,False,False,True,False,False,True,17


In [7]:
# get the indexes of "Relatively Flat" rows
idx = np.where(df['Topography']=="Relatively Flat")
# the number of "Relatively Flat " entries
nb_rflat = len(idx[0])
# get the average boolean True count for all "Relatively Flat" entries
df.loc[idx]['bool True count'].values.sum()/nb_rflat

17.0

Now, let's compare this value, 17, with the mean boolean True counts for the other three types of Topography entries. First up is "Steep":

In [8]:
idx = np.where(df['Topography']=="Steep")
nb_steep = len(idx[0])
df.loc[idx]['bool True count'].values.sum()/nb_steep

17.0

Ok, ... that's weird, it's 17 too. Next let's find the mean counts for "Rough":

In [9]:
idx = np.where(df['Topography']=="Rough")
nb_rough = len(idx[0])
df.loc[idx]['bool True count'].values.sum()/nb_rough

17.0

Whuuut? Is the mean for "Mountainous" also 17?  

In [10]:
idx = np.where(df['Topography']=="Mountainous")
nb_mount = len(idx[0])
df.loc[idx]['bool True count'].values.sum()/nb_mount

17.0

Are you kidding me? Each of the four Topography types has the same mean number of True values? There are 17 True values in every row in this 50901 rows set! These Trues are not all in the same columns either; you can simply print more rows to verify. I assume then that this value is dictated by the ruleset. Maybe I need to learn more about this game to understand why this is.