## A little note about Truthy and Falsey
Will serve you well when thinking about conditions and 
making choices with if / else statements

In [16]:
# Key point, 1s and 0s are treated as True / False, so we can
# check if a feature is in a given column two ways

# This reads nice
if row1['one_hot_978'] == 1:
    print('row 1: 978 feature exists')
    
    
# This is shorthand, which is less readable, but personal preference
if row1['one_hot_978']:  
    print('row 1: 978 feature exists just the same')
    
    
# Similarly if you want to want to negate the condition, you 
# can use the `not` keyword

if not row2['one_hot_978']:
    print('row 2: 978 feature does not exist')

# Or you can just see if it equals zero
if not row2['one_hot_978'] == 0:
    print('row 2: 978 feature does not exist')

row 1: 978 feature exists
row 1: 978 feature exists just the same
row 2: 978 feature does not exist


In [17]:
# This is because
row2['one_hot_978'] == False

True

In [18]:
# 
not row2['one_hot_978'] == False

False

In [19]:
# 
def decision_tree(row):
    """
    takes a dictionary called `row`, and asks a series of 
    if questions to determine the mineral name
    
    Args:
        row (dict): a dictionary of 1s and zeros of key features
    
    Retruns:
        mineral (str): the mineral classification
    """
    if row['one_hot_978']:
        if row['one_hot_915']:
            if row['one_hot_909']:
                mineral = 'EUD'
            else:
                if row['one_hot_911']:
                    mineral = 'Donnalyte'
                else:
                    if row['one_hot_805']:
                        mineral = 'Kainosite or Xenotime'
        else:
            mineral = 'unknown'
            
    else:
        mineral = 'unknown'
    
    return mineral
            
    

# Let's create some data
Do confirm or add your own. Note I've only built the tree to solve for these three mineral.

Do you can put in other minearals and the function should return "unknown". Which means it will fail gracefully, instead of give and Error I think.

In [29]:
# create a dictionary containing the one-hot columns we need / want.
# this is a hack because you'll have them in the columns of the 
# DataFrame eventually, and the syntax for access the columns will 
# be the same. Pandas DataFrame are basically fancy dictionaries 
# for each row

# Eud
row1 = {'one_hot_978': 1, 
        'one_hot_915': 1,
        'one_hot_909': 1,
        'one_hot_911': 0,
        'one_hot_805': 0
        }

# Donnalyte
row2 = {'one_hot_978': 1, 
        'one_hot_915': 1,
        'one_hot_909': 0,
        'one_hot_911': 1,
        'one_hot_805': 0
        }

# Kainosite or Xenotime
row3 = {'one_hot_978': 1, 
        'one_hot_915': 1,
        'one_hot_909': 0,
        'one_hot_911': 0,
        'one_hot_805': 1
        }

In [58]:
## should be Donnalyte
for i, row in enumerate([row1, row2, row3]):
    print(f'row {i} --> {decision_tree(row)}')


row 0 --> EUD
row 1 --> Donnalyte
row 2 --> Kainosite or Xenotime


Now, that is all fine and good, but we can make a mini-DataFame from of these three 3 rows. Eventually we'll just make this directly from the spreadhsheet of everything and make sure the column names in the function match the column names in the file.

In [31]:
import pandas as pd
df = pd.DataFrame([row1, row2, row3])

In [33]:
# this is starting to look more like your spreadsheet. It doesn't have all 
# the columns, but you get the idea. Indeed it doesn't need al the columns.
df

Unnamed: 0,one_hot_978,one_hot_915,one_hot_909,one_hot_911,one_hot_805
0,1,1,1,0,0
1,1,1,0,1,0
2,1,1,0,0,1


So once you have the `decision_tree()` function working for a single row. you can loop over the entire collection of rows. In Pandas there is a handy method to do this called `apply`.

In [35]:
# It looks magical, but it basically applies our function across all the rows.
df['mineral_predicton'] = df.apply(decision_tree, axis=1) 

Amazing!

Now when we look at our DataFrame we can see that it has a new column

In [37]:
df

Unnamed: 0,one_hot_978,one_hot_915,one_hot_909,one_hot_911,one_hot_805,mineral_predicton
0,1,1,1,0,0,EUD
1,1,1,0,1,0,Donnalyte
2,1,1,0,0,1,Kainosite or Xenotime


In [48]:
# we can reorder the columns if we want to 
new_column_order = list(df.columns[-1:]) + list(df.columns[1:])
df_predictions = df[new_column_order]

Now we can save this new CSV to a file.

In [47]:
df_predictions.to_csv('predictions.csv')

Unnamed: 0,one_hot_978,one_hot_915,one_hot_909,one_hot_911,one_hot_805,mineral_predicton
0,1,1,1,0,0,EUD
1,1,1,0,1,0,Donnalyte
2,1,1,0,0,1,Kainosite or Xenotime


In [None]:
# In practice, we'll want to use the full CSV file which you can access like so.
df = pd.read_csv('filename.csv')