# Predicting stratigraphy from geochemistry, and predicting Cu and Zn from other geochemical analytes

Hypothesis:
1. Can geochemistry be used to accurately predict the dominant catchment lithology of a stream sediment sample in the dataset?
2. Can geochemistry of many analytes be used to accurately predict the concentrations of one analyte, such as copper?

Referencia: https://towardsdatascience.com/exploring-use-cases-of-machine-learning-in-the-geosciences-b72ea7aafe2

In [4]:
#%pip install openpyxl

In [1]:
import pandas as pd

df = pd.read_excel("../data/rgs2020_data.xlsx")

In [3]:
# Analitos analizados por ICP-MS
cols = ['MASTERID', 'LAT', 'LONG', 'STRAT'] + [col for col in df.columns if 'ICP' in col]
quest = df.loc[df["NAME"] == "QUEST", cols]

In [4]:
quest = quest.dropna(axis=1, how="all")

In [5]:
quest.shape

(1959, 39)

In [6]:
quest

Unnamed: 0,MASTERID,LAT,LONG,STRAT,Ag_ICP_PPB,Al_ICP_PCT,As_ICP_PPM,Ba_ICP_PPM,Bi_ICP_PPM,Ca_ICP_PCT,...,Sr_ICP_PPM,Te_ICP_PPM,Th_ICP_PPM,Ti_ICP_PCT,Tl_ICP_PPM,U_ICP_PPM,V_ICP_PPM,W_ICP_PPM,Zn_ICP_PPM,La_ICP_PPM
31185,ID093G071002,53.61138,-122.97780,LTQCh,320.0,1.48,8.3,233.5,0.10,0.95,...,64.0,0.12,2.9,0.040,0.30,7.4,78.0,-0.1,81.7,15.5
31186,ID093G071003,53.62691,-122.96569,LTQCh,220.0,1.05,1.8,171.0,0.06,1.52,...,114.0,0.10,1.3,0.014,0.12,5.5,24.0,-0.1,21.5,15.5
31187,ID093G071004,53.65023,-122.98068,LTQCh,360.0,1.22,6.7,162.0,0.08,0.99,...,50.5,0.06,0.9,0.028,0.12,1.5,36.0,-0.1,111.6,11.0
31188,ID093G071005,53.66609,-123.02010,LTQCh,500.0,1.60,7.3,244.0,0.12,0.72,...,47.5,0.06,0.7,0.029,0.14,1.8,60.0,-0.1,86.8,24.0
31189,ID093G071006,53.63407,-122.99388,LTQCh,340.0,1.76,4.5,205.5,0.10,0.92,...,57.0,0.08,1.7,0.065,0.22,2.9,58.0,-0.1,99.4,15.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
43980,ID093O075043,55.06867,-123.11723,TrJTk,360.0,0.60,7.4,23.5,0.06,2.34,...,73.5,0.06,0.6,0.017,0.10,2.6,24.0,-0.1,122.4,3.5
43981,ID093O075044,55.06867,-123.11723,TrJTk,380.0,0.63,6.9,26.0,0.06,2.26,...,71.5,0.06,0.5,0.019,0.08,1.7,24.0,-0.1,115.5,3.5
43982,ID093O075045,55.04947,-123.08231,CmOKe,140.0,1.10,5.5,53.0,0.08,1.13,...,35.0,0.08,1.1,0.041,0.08,2.6,32.0,-0.1,77.9,8.0
43983,ID093O075046,55.06389,-123.09069,CmOKe,160.0,0.95,10.6,37.5,0.08,1.63,...,49.5,0.06,0.8,0.035,0.08,5.4,36.0,-0.1,79.9,6.0


In [91]:
path = "/home/alberto/geoscience_ml/"

In [92]:
import os
os.listdir(path)

['litho-Table 1.csv',
 'analytes.csv',
 'Critical_Minerals_ID.ipynb',
 'quest_zn_ml_blended.csv',
 'README.md',
 'quest_cu_ml_blended.csv',
 '.git',
 'ml_analytes.ipynb',
 'quest_geo_unit_ml.csv',
 'metrics.ipynb',
 'ml_geo_unit_v3_Quest.ipynb']

In [105]:
a = pd.read_csv(path + 'analytes.csv')

In [107]:
a.shape

(1959, 40)

In [109]:
a["rock_type"].value_counts()

basaltic volcanic rocks                                      569
mudstone, siltstone, shale fine clastic sedimentary rocks    393
volcaniclastic rocks                                         245
paragneiss metamorphic rocks                                 240
calc-alkaline volcanic rocks                                 132
undivided sedimentary rocks                                   71
limestone, marble, calcareous sedimentary rocks               39
basaltic volcaniclastic rocks                                 39
lower amphibolite/kyanite grade metamorphic rocks             35
pegmatitic intrusive rocks                                    25
coarse volcaniclastic and pyroclastic volcanic rocks          23
calcsilicate metamorphic rocks                                18
granite, alkali feldspar granite intrusive rocks              18
chert, siliceous argillite, siliciclastic rocks               17
mudstone/laminite fine clastic sedimentary rocks              15
granodioritic intrusive r

In [111]:
a["STRAT"].value_counts()

TrJTk     1096
KTpg       240
MTrCc      177
CPSm        93
uTrJNc      63
LTQCh       60
PJog        28
Kpe         25
lJCl        23
PJml        20
KTmc        18
OMEa        15
Kgr         13
MJqm        13
SDs         10
EKgd         9
?ETOo        9
ETOo         8
CmOKe        8
ETgd         6
uPrPzS       6
ETgr         4
uKTS         4
CmOs         3
DPBc         2
CTrus        2
EJqm         1
EJsy         1
Pzum         1
TrJum        1
Name: STRAT, dtype: int64