# Mineral Potential Mapping with Python - A Practical Tool

In this notebook, a simple but practical tool has been developed with python which helps with geochemical modelling. The core of all of the processes is based on an article by Mahyar Yousefi and Emmanuel Carranza <a name="cite_ref-1"></a>[<sup>[1]</sup>](#cite_note-1). According to the article, assignment of weights to different classes of evidence in an evidential map is usually done through expert's opinion with some trial-and-error. In order to do this process in a data-driven way rather than using expert's opinion, a new method using prediction-area (P-A) plots and normalized density is introduced to determine weights of every single evidential map. In the following lines the literature disccused in the aforementioned work is programmed in Python.

The programming language used in here is python. There are several reasons for the choice. The most important ones is that this language is agile and is easy to read and develop. Researchers can easily modify the code based on their needs.

This program has been developed in three parts. The first Part, [Part 1](#Part1), is where all the preprocessing happens - from installing/importing the necessary libraries to statistical analysis of the data. The second Part, [Part 2](#Part2), is where the core of the processes has been programmed - gridding, mapping, PC analysis, Concentration-Area fractal modelling, descretizing, and Prediction-Area diagram drawing. The [final part](#Finalwords) is dedicated to discussion current issues with the code and suggestions for future development.

## Table of Contents

1. [Part 1](#Part1)
    1. [Libraries](#Libraries)
    2. [The data](#data)
    3. [Input Variables](#var)
    4. [Statistical Analysis](#stats)
        1.[Basic Statistics](#basicstats)
2. [Part 2](#Part2)
    1. [Gridding](#gridding)
4. [References](#refer)

## Part 1 <a name="Part1"></a>

Preprocessing starts here. This stage includes: installing/importing the necessary libraries, creating a pandas DataFrame and importing the data, importing all the necessary variables, and analyzing the data statistically.

### Libraries <a name="Libraries"></a>

This program relies on several well-known libraries including `numpy`, `pandas`, `matplotlib`, `scikit-learn`, and `scipy`. This program is designed to have the fewest number of dependencies. Please make sure you have the packages installed. Otherwise, go on and uncomment lines in the following box to install them.

In [1]:
#import sys
#!{sys.executable} -m pip install numpy pandas matplotlib sklearn scipy

In case the packages have been installed, we only need to import them.

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl

from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.decomposition import PCA
from scipy.interpolate import griddata
from scipy.stats import describe, probplot, stats
from scipy import interpolate
from math import *

### The data <a name="data"></a>


In the following cell, the data, which is a csv file, is imported into a Pandas DataFrame. This helps with handling the data in the most flexible way possible and makes data processing much easier than any other data handling method. After importing, some minor processing are done to polish the data; like dropping empty cells, and duplicates from the dataframe.

By default, the program reads the file from the same directory as it is in. But the user can give the desired file path to the program as long as it is a csv file. In case importing an excel file, please read the [documentation](https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html) for modifying the code accordingly.

In [6]:
raw_data = pd.read_csv('Data.csv')
raw_data = raw_data.replace("", np.nan)
raw_data = raw_data.dropna()
raw_data = raw_data.drop_duplicates().loc[:, 'X':]

It's best that the user sees the first few lines of the main csv file, which is now imported to a DataFrame. The following cell does this.

In [5]:
raw_data.head()

Unnamed: 0,X,Y,Zn,Pb,Ag,Cu,Mo,Cr,Ni,Co,Ba
0,431434.79,3305001.94,87,7,0.02,42,2,57,38,14,393
1,432522.6,3298058.43,70,6,0.02,56,2,27,29,20,404
2,438045.35,3291987.05,75,5,0.02,69,2,44,23,17,417
3,436260.76,3294412.9,70,6,0.02,54,1,20,19,12,377
4,439294.48,3297653.81,168,14,0.02,27,2,31,18,14,277


This data belongs to an area in south eastern regions of Iran. This DataFrame includes cartesian coordinates, and 9 element concenterations. All of the following processes are done using this dataset. 

### Input Variables<a name="var"></a>

There are several variables that the user needs to enter so the program can run. Here, the user enters all of them together so there wouldn't be much confusion over them further down the file. Different interpolation methods are discussed in the [gridding](#gridding) section.

In [8]:
input_elements = input("\n For what columns should the geochemical processes be done? "
                       "\n Please use 'space' as a separator. \n")
# Zn Pb Ag Cu Mo Cr Ni Co Ba

input_X = input(" Which column corresponds with X axis of the Catesian system?\n")
# X

input_Y = input("\n Which column corresponds with Y axis of the Catesian system?\n")
# Y

input_interpol = input("\n Which interpolation method should be used? cubic, linear, or nearest? "
                       "Please write the desired one.\n")
# linear


 For what columns should the geochemical processes be done? 
 Please use 'space' as a separator. 
Zn Pb Ag Cu Mo Cr Ni Co Ba
 Which column corresponds with X axis of the Catesian system?
X

 Which column corresponds with Y axis of the Catesian system?
Y

 Which interpolation method should be used? cubic, linear, or nearest?Please write the desired one.
linear


### Statistical Analysis<a name="stats"></a>

The data should be statistically examined so the user can have a grasb on the data structure. In the next parts, some tools for basic statistics have been developed. 

#### Basic Statistics <a name="basicstats"></a>

A function has been developed which returns the number of observations, minimum and maximum of the data for each column, arithmatic mean, variance, skewness, and kurtosis. All of these calculations, then, are stored in a `list` for use in further steps.

In [7]:
def basic_stats(raw_data, element_concentration):

    selected_values = describe(raw_data.loc[:,f'{element_concentration}'].values)

    s1 = str(selected_values.nobs)
    s2 = str(selected_values.minmax[0])
    s3 = str(selected_values.minmax[1])
    s4 = str(round(selected_values.mean, 2))
    s5 = str(round(selected_values.variance, 2))
    s6 = str(round(selected_values.skewness, 2))
    s7 = str(round(selected_values.kurtosis, 2))

    return [s1,s2,s3,s4,s5,s6,s7]

We need to iterate this function for our desired columns, which in this case is all of the 9 columns containing element concentration. The next block iterates this process for all of the given elements and stores them in a  DataFrame.

In [9]:
def basic_stats_df(raw_data, element_concentration):

    columns = ['No. of Observations', 'Min', 'Max', 'Mean', 'Variance', 'Skewness', 'Kurtosis']
    index_names = element_concentration.split()
    df = pd.DataFrame(columns=columns)


    data = []
    selected_columns = raw_data.loc[:, index_names].columns
    for counter, column in enumerate(selected_columns):
        values = basic_stats(raw_data,element_concentration=column)
        zipped = zip(columns, values)
        data.append(dict(zipped))
        

    df = df.append(data,True)

    index = pd.Index(index_names)
    df = df.set_index(index)

    return df

Now, that all is set, let's call in the above function to get our hands on basic statistics.

In [10]:
basic_stats_df(raw_data,input_elements)

Unnamed: 0,No. of Observations,Min,Max,Mean,Variance,Skewness,Kurtosis
Zn,846,2.0,752.0,113.81,6459.84,2.57,11.51
Pb,846,2.0,321.0,23.82,639.82,6.11,56.93
Ag,846,0.02,1.0,0.03,0.01,11.75,136.01
Cu,846,2.0,1200.0,65.07,2856.14,13.27,255.58
Mo,846,1.0,12.0,1.73,1.07,4.13,29.63
Cr,846,7.0,1200.0,120.74,13063.54,3.41,18.32
Ni,846,2.0,1200.0,68.02,3993.93,7.91,123.54
Co,846,2.0,247.0,28.66,249.24,4.54,51.95
Ba,846,15.0,4650.0,424.46,50244.0,8.25,148.27


## References <a name="refer"></a>

<a name="cite_note-1"></a> 1. [^](#cite_ref-1) Yousefi, Mahyar, and Emmanuel John M. Carranza. “Prediction–Area (P–A) Plot and C–a Fractal Analysis to Classify and Evaluate Evidential Maps for Mineral Prospectivity Modeling.” Computers & Geosciences, vol. 79, June 2015, pp. 69–81.