## Welcome to the CFIGS Python for Geosciences Workshop!
### Who are we?

#### Piero Sampaio (PhD Student - Curtin)
I did my undergraduate and MSc. in the Federal University of Rio de Janeiro (UFRJ), where I was born and raised. My research is in geochemistry and igneous petrology, more specifically, I use isotopic signatures from ophiolites to try to understand how Earth's mantle composition evolved since the Neoproterozoic. I use Python on a daily basis to process my data, perform advanced data analysis and generate good-looking visuals.

#### Dr. Taryn Scharf (Postdoctoral Researcher - Curtin)
I am a postdoctoral researcher in the Timescales of Mineral Systems Group, Curtin University (https://research.curtin.edu.au/scieng/research/timescales-of-mineral-systems/). I am involved in developing computational tools that can be integrated with standard geological approaches for mineral analysis. My toolkit includes machine learning (tabular and image data sets), data analytics, and a variety of coding languages. Python is my preferred programming language for scientific data analysis.  I completed my PhD in Applied Geoscience in 2024 at Curtin University. Prior to my PhD, I spent 7 years as a geologist in the mining industry of South Africa. I was initially involved with brown fields exploration, geostatistical resource estimation, and the development of custom scripted solutions for clients. I later gained exposure to mine design, mine scheduling, and requirements analysis for software development.

#### Dr. Luc Doucet (Senior Research Fellow - Curtin)
Co-leader of the Earth Dynamics Research Group (http://geodynamics.curtin.edu.au/), I come from Bourg-en-Bresse, a small town in France famously known for its blue-white-red tricoloured (delicious) chickens. After a PhD in St Etienne, France (2012), I was awarded a three years fellowship from the Belgium Fund for Scientific Research to apply the "non-traditional" stable-isotope systematics to study the formation of both oceanic and continental lithosphere. After a two-year academic career break (being a stay-at-home dad), I moved to Curtin University in March 2018 to join the Earth Dynamics Research Group to decipher the present-day and past connections between Earth's mantle, supercontinent and superocean cycles. In 2023, I was awarded an ARC Future Fellowship to study the link between the deep carbon cycle with critical mineral deposits.

### The notebok environment
We'll be using Python notehooks to introduce you to the wonderful world of coding. Notebooks are handy ways to run short bits of code (in this case Python) and view results, without having to run a full program or script. This is ideal for practicing short examples or performing small pieces of analytics.
 

## (Very) simple math
Python has a series of in-built mathematical operators.
* Addition: + 
* Subtraction: _
* Multiplication: *
* Division: /
* Modulus: %
* Exponentiation: **

In [None]:
1+1 #Addition: +

In [None]:
2-1 #Substraction: -

In [None]:
2*3 #Multiplication: *

In [None]:
3/3 #Division: /

In [None]:
26%8 #Modulus: %

In [None]:
2**3 #Exponentiation: **

 ## Variables
We can assign values to variables, which will store the value and make it easier to work with no matter the data type. Data types determine how the data behaves and what we can do to it. Examples of data typyes are:

- int: integer numbers
- float: numbers with decimal values
- str: text - strings of characters
- bool: True/False values

We can also have composite data types that store collections of data. Python has 4 built-in composite data types:

- list: mutable (can be changed) object that contains other values
- tuple: same, but immutable (cannot be changed)
- dict: key-value pairs
- set: like a list but only contains unique values

In [None]:
my_integer = 42 # integer number
my_string = "Python" # string (text)
my_float = 4.56 # float (decimal values or fractional numbers)
my_boolean = True # boolean value (condition that be True or False)
my_list = ["a",23,399.8,False] # Although a list can hold multiple variable types, it makes more sense to make a list of the SAME variable type). Note the square brackets
my_dictionary = {
    "str":"a",
    "int":23,
    "float":399.8,
    "bool":False
} # a dictionary is made of key: value pairs. Keys are always strings values. 
#Note the curly brackets
my_tuple = (3,4,5) #tuples are delcared with round brackets
my_set = {3,4,5} or my_set = set([3,4,5]) # sets are declared with curly brackets

Depending on the **data type**, different actions can be performed on the data

In [None]:
my_integer = 42 
print(my_integer + 5)

my_float = 4.568 
print(round(my_float,1))


In [None]:
#Strings are alphanumeric. You can manipulate strings extensively in Python
my_string = 'Hello world'
print(f' Original string: {my_string}')

replacement = my_string.replace(' ', '_')
print(f'\n Replacement: {replacement}')

uppercase = my_string.upper()
print(f'\n Capitalise: {uppercase}')

substring = my_string[0:5]
print(f'\n Substring: {substring}') 

multiply = my_string * 3
print(f'\n Mutiplication: {multiply}')

In [None]:
# a list of numbers. Note the square brackets.
my_list = [2,5,3,8,5] 

my_list.append(100) #you can add to a list
print(my_list)

print(my_list[3]) # you can get items out of a list using their positional index

my_list[3]='999' #you can change the values of items in a list
print(my_list)

list_length = len(my_list) #You can query the length of composite data types like lists
print(f'There are {list_length}  items in my_list')

<font color="green"><b>TRY IT YOURSELF:</b> 
<ol style="color: green";">
<li>Print out the depth of drillhole ANT002</li>
 <li>Append 5 to list_1 and print the updated list</li>
 <li>Try add  3 and then 6 to my_set like this: my_set.add(3). Print out the updated set. What do you notice? </li>
<li>Multiply list_1 by the 2nd number of list_1. Find find the length of the resultant list using len(). </li>
</ol>  

In [None]:
#1 - Print out the depth of drillhole ANT002.
#HINT: drillhole_depths[key]

drillhole_depths = {
    "ANT001":755.7,
    "ANT002":624.7,
    "ANT003":342.7
} 

In [None]:
# 2 - Append 5 to list_1 and print the updated list
# HINT: list_1.append(...)

list_1 = [2,4,7,4]

In [None]:
# 3 - Try add 3 and then 6 to my_set like this: my_set.add(3). 
#Print out the updated set. What do you notice?
#HINT: my_set.add(....) OR my_set.update((3,6))
my_set = {3,4,5} 

In [None]:
# 4 - Multiply list_1 by the 2nd number of list_1. 
#Find find the length of the resultant list using len().
#Type your code here: len(list_1 * list_1[index]) 

list_1 = [2,4,7,4]

## Functions
A function is a piece of code that performs an action. Code is often broken up into a series of functions, each performing a specific task. 

* In Python we declare functions using the `def` keyword at the beginning of the line. 
* All code that forms part of the function is placed on the next **indentation** level. Each indentation is performed by pressing the tab key. 
* We can press shift + tab to go back to the previous indentation
* We can use a function when we call it by name. 

The classic introduction to any programming language is the hello world function. We will also use the print function, which prints (😲) something to the console.

In [None]:
def hello_world(): #def define the function hello_world. You can name your function anything you like, but it's best to make it sensible
  print("hello world!") # the function print hello world

hello_world()

In [None]:
def addition(a,b): 
  c = a + b
  return c 

addition_result = addition(3,4) 
print(addition_result)


 <font color="green"><b>TRY IT YOURSELF:</b> Create a function that calculates the average of 3 numbers and prints the result. Use variables so that the 3 inputs can be changed by the user without having to change the code inside the function.</font>  
 

In [None]:
#Type your code here

# Loops
If we want to repeat an action multiple times, sometimes changing only a few factors, we can use a loop to automate that action. Loops come in two flavours:
`for` and `while` loops.
## `For` loop:
A `for` loop repeats an action for every object in a given group (list, dict, array, etc). The loop will be performed for all code that is within the next **indentation** level. Each indentation is performed by pressing the `tab` key. We can press `shift` + `tab` to go back to the previous indentation

In [None]:
# different indentation levels
a
  b
    c
      d


In [None]:
#For loop example:

file_names = ['ANT001_COLLAR.csv', 'Ant001_stratigraphy.csv', 'ANT002_COLLAR.csv', 'ant002_assay.csv', 'aNT003_COLLAR'] 

unique_drillhole_ids = set()

for file in file_names:
    file = file.lower()
    drillhole_id = file.split("_")[0] 
    unique_drillhole_ids.add(drillhole_id) #

print(f'There are {len(unique_drillhole_ids)} drillholes.')
print(f'Unique drillhole IDs: {unique_drillhole_ids}')

 <font color="green"><b>TRY IT YOURSELF:</b> Two dictionaries are provided. One is total drillhole depth, the other is rc precollar depth. Use a for-loop to loop through all drillholes and calculate the total diamond drill depth by substracting the precollar depth from the total depth.</font>  

In [None]:
drillhole_total_depths = {
    "ANT001":755.7,
    "ANT002":624.7,
    "ANT003":342.7,
    "ANT004": 150
} 

drillhole_rc_precollar_depth = {
    "ANT001":104.7,
    "ANT002":117,
    "ANT003":104.7,
    "ANT004": 141
    
}
#Hint: you can get a list of dictionary keys as follows: drillhole_ids = list(drillhole_total_depths.keys())
#Use a for-loop to loop through all IDS in drillhole_ids
# Use the id's as dictionary keys to get total depth and rc depth per drillhole
# Subtract precollar depth from total depth (inside the for-loop)
# Print the key value and total diamond drill depth calculated (inside the for-loop)



## `While` loop:
A `while` loop repeats an action while a condition is `True`


In [None]:
#While loop example
drill_depth_in_meters = [0, 1.5, 2.2, 3.5, 4.6, 5.8, 6.6, 7.4, 8.6, 9.7, 10.6]

length_drilled = 0
i=0

while length_drilled <7:
    length_drilled = length_drilled + drill_depth_in_meters[i]
    print(f'{length_drilled} meteres drilled after {i+1} iterations') 
    i =i + 1

 <font color="green"><b>TRY IT YOURSELF:</b> A counter has been provided.  While the counter is less than 10, increment its value by 2 and print out the counter. </font>  

In [None]:
i=0
#Type your code here

# Conditionals and control
Conditionals in Python, like in many other programming languages, allow you to make decisions in your code. They help your programs to behave differently based on certain conditions. In Python, you use if, elif (else if), and else statements to create conditionals.

1. if Statements:
The if statement is used to execute a block of code if a condition is true.

2. elif Statements:
elif is short for "else if". It allows you to check multiple conditions. If the first if condition is false, it checks the next elif condition, and so on.

3. else Statements:
The else statement is used to execute a block of code if none of the conditions in the if and elif statements are true.

4. Nesting Conditionals:
You can also nest conditionals inside each other to handle more complex situations.

In [None]:
# if statement:
num = 15
if num % 2 == 0:
    print("Even")

In [None]:
# else statement
num = 15
if num % 2 == 0:
    print("Even")
else:
    print("Odd")

In [None]:
# elif statement:
average_grams_per_tonne = 7.85
if average_grams_per_tonne >= 10:
    print("High grade")
elif average_grams_per_tonne >= 5:
    print("Medium grade")
elif average_grams_per_tonne>=1:
    print("Low grade")
else:
    print('Subeconomic')


## We can mix loops and conditionals to make more complex actions
Have you ever inhereted a messy  folder of project data? Let's standardise file names in the Antrim_Data folder provided.

In [None]:
import os

folder_path = 'Antrim_data'

#get a list of everything in the folder
folder_contents = os.listdir(folder_path)

#loop through every item in the list
for item in folder_contents:

    #if the item is not an excel file (e.g. it's an image), continue looping (ignore the item)
    if 'xlsx' not in  item:
        continue #skip the rest of the iteration and jump to the next loop iteration

    #create new file names
    item_name = item.lower()
    if "majors" in item_name:
        new_item_name = "antrim_assay_major_elements.xlsx"
    elif "traces" in item_name:
        new_item_name = "antrim_assay_trace_elements.xlsx"
    elif "collar" in item_name:
        new_item_name = "antrim_collars.xlsx"
    elif "geology" in item_name:
        new_item_name = "antrim_stratigraphy.xlsx"
    else:
        continue

    old_path = os.path.join(folder_path, item)
    new_path = os.path.join(folder_path, new_item_name)
    if not os.path.exists(new_path):
        os.rename(old_path, new_path)
        print(f"Renamed: {item} to {new_item_name}")
        


 <font color="green"><b>TRY IT YOURSELF:</b> Let's loop through the contents of the Antrim folder and separate Excel and image files into their own folders. Follow the instructions below. <br>

In [None]:
#import os

# create a variable to hold the Antrim_data folder path (see above example)

#create two new folders, one for excel and one for image files
#e.g. excel_data_folder = os.path.join(folder_path, 'drillhole_data')

#To not overwrite a folder, first we test to see if it already exists. We only create the new folder if it DOESN'T already exist
#E.g. if not os.path.exists(excel_data_folder):
        #os.makedirs(excel_data_folder)

#Get a list of everything in the folder. See above for example.

#loop through every item in the list
# for ... in ...:
    
#If the item is a folder, continue looping (ignore the item)
    #E.g. if os.path.isdir(os.path.join(folder_path,item)):
        #continue
    #If it's not a folder, get the file extension
    #E.g. file_extension = os.path.splitext(file)[-1]

    #If it's an excel file, move it into the excel folder
    #if file_extension == '.xlsx':
        #old_file_path = os.path.join(folder_path, file)
        #new_file_path = os.path.join(excel_data_folder, file)
        #os.replace(old_file_path, new_file_path)

    #Repeat to find .jpg files and move them to the image folder


In [None]:
#Example solution
import os

folder_path = 'Antrim_data'

#create two new folders, one for excel and one for image files
excel_data_folder = os.path.join(folder_path, 'drillhole_data')
image_data_folder = os.path.join(folder_path, 'drillhole_images')

#To not overwrite a folder, first we test to see if it already exists
#We only create the new folder if it DOESN'T already exist
if not os.path.exists(excel_data_folder):
        os.makedirs(excel_data_folder)

if not os.path.exists(image_data_folder):
        os.makedirs(image_data_folder)

#get a list of everything in the folder
folder_contents = os.listdir(folder_path)

#loop through every item in the list
for item in folder_contents:

    #if the item is a folder, continue looping (ignore the item)
    if os.path.isdir(os.path.join(folder_path,item)):
        continue
    
    file_extension = os.path.splitext(item)[-1]

    if file_extension == '.xlsx':
        old_file_path = os.path.join(folder_path, item)
        new_file_path = os.path.join(excel_data_folder, item)
        os.replace(old_file_path, new_file_path)
        
    if file_extension == '.jpg':
        old_file_path = os.path.join(folder_path, item)
        new_file_path = os.path.join(image_data_folder, item)
        os.replace(old_file_path, new_file_path)
        


# The true power of Python: Libraries (or modules)
Python goes way beyond by using different libraries. Libraries extend the functionality of Python. There are some built-in Python libraries (e.g. math), but there are heaps of user-created libraries. Whenever you want to do something on Python, chances are that someone has already created a library to make your life easier. First, we must import the desired library using the `import` statement. Generally, imports are in the first part of the code.

Examples of built-in libraries are `math`, `os`, `time`,`stats`, etc

The `math` library, for example, has many useful mathematical functions already implemented. Once we import the library we can access the functions in the library by using the `.` after the name of the library:
`math.cos()` would give you the cosine function. Certain libraries also have attributes which can be accessed in the same way, such as `math.pi`. Note that this is not a function, so it does not need the parentheses after it.

We have already used the `os` library to help us organise the drillhole datafiles.

In [None]:
import math
c = math.cos(math.pi) # we use the dot to access the cosine function
print(c)

We can also import only certain functions of a library. In that case we do not need to use the `math.cos()` anymore, we can call the function directly.

In [None]:
from math import cos
c = cos(math.pi)
print(c)

Finally, we can also import a library with an alias that will be used throughout the code. For the `math` library that doesn't really matter but for libraries with longer names (classic example would be `matplotlib.pyplot`) it is useful.

## External libraries

These libraries, created and shared by the global Python community, cater to diverse needs such as data analysis, web development, machine learning, and more. For instance, the Pandas library simplifies data manipulation and analysis through its powerful data structures, the Matplotlib library enables high-quality data visualization and numpy for numerical calculations. These libraries expand Python's capabilities dramatically, allowing developers to leverage pre-built solutions and focus more on problem-solving rather than reinventing the wheel.

In this workshop, we`ll be using mostly external libraries and applying them to geology related problems. The main libraries we'll use are:
- [numpy](https://numpy.org/)
- [scipy](https://scipy.org/)
- [matplotlib](https://matplotlib.org/)
- [pandas](https://pandas.pydata.org/)
- [geopandas](https://geopandas.org/en/stable/)
- [seaborn](https://seaborn.pydata.org/)
- [pyrolite](https://pyrolite.readthedocs.io/en/main/) (developed by Morgan Williams of CSIRO)
- [PyGMT](https://www.pygmt.org/latest/index.html) (maintained by a group of geophysicists in various places)

For example the `numpy` library extends (a lot) the functionality of the `math` library. One use case is that `math` operates on single objects, whereas `numpy` operates on arrays (a type of group of objects) directly, which saves us from using loops all the time. This is great, because loops are very inefficient in terms of speed. It might not be vital for short tasks, but it scales very quickly.

## Numpy
This is your go-to for almost all numerical applications in Python. Fast, highly optimized library for working with arrays of numbers.

In [None]:
# with math
import math

number_range = range(0,20,1) # built-in function to define a range of numbers: range(start number, number to stop before, increment)
results = [] # create list to store results
for i in number_range:
  results.append(math.sqrt(i))
print(results)

# math only takes real numbers as input

In [None]:
# with numpy
import numpy as np # common alias for numpy

results = np.sqrt(number_range) # we can operate directly on the range object
results # the object is returned as an array data type, which is the type on which numpy operates

In [None]:
# numpy also has its own version of the range function which automatically generates an array
arr = np.arange(0,100,5) # np.arange(start, stop, step)
arr2 = np.linspace(0,100,21) # (start, stop, num_intervals) 
print(f'array 1: {arr}')
print(f'array 2: {arr2}')

In [None]:
# We can access items in the array by using indexes (positional number in the list).
# Keep in mind that Python starts counting from 0.
arr = np.arange(0,100,5) # start, stop, step
print(arr[2]) # returns the third item in the array

In [None]:
print(arr2[-1]) # -1 is a shortcut for last item in the array. We can count backwards from -1

In [None]:
# we can also index with slices
# [start:stop] where stop is not inclusive
print(arr[0:5]) # first five items of the array

In [None]:
# And finally this works for multidimensional arrays as well
# The dimensions are separated by commas, so notation is similar to that of a matrix
# Arrays also don't need to be 1d, they can have multiple dimensions. For example, here we reshape our 20-number arrange into a 2D matrix that has 2 rows and 10 columns:
arr_2d = arr.reshape(2,10)
print(arr_2d)
print(f'slice through array 2d:\n {arr_2d[:,4:6]}') # returns the item in the fifth column of the first line (remember we count from 0)

## Pandas

Pandas is a popular open-source data analysis and manipulation library for Python. It provides powerful data structures like DataFrame, which allows users to work with structured data seamlessly. One of Pandas' key strengths lies in its ability to read data from multiple formats, such as CSV, Excel, SQL databases, and more. This flexibility simplifies the process of importing and handling data, regardless of its source, making it a go-to choice for data scientists and analysts.

Reading Data from Multiple Formats

Pandas simplifies the data import process. For instance, to read a CSV file into a DataFrame, you can use the pd.read_csv('filename.csv') function. Similarly, Pandas supports pd.read_excel(), pd.read_sql(), and various other functions tailored to specific data formats. This versatility streamlines the data preprocessing phase, allowing analysts to focus on extracting insights rather than dealing with data intricacies.

Filtering Capabilities

Pandas excels at data filtering, enabling users to extract specific subsets of data efficiently. Through techniques like boolean indexing and query operations, users can filter data based on conditions. For instance, df[df['column_name'] > threshold] filters rows where the 'column_name' values exceed a defined threshold. This functionality allows users to explore and analyze specific segments of their data easily.

Applications in Geology

Geologists often deal with extensive datasets containing information about rock compositions, mineral compositions, structural data, spectral data, stratigraphy, and more. Pandas' ability to handle large datasets and its filtering capabilities are invaluable in such scenarios. Geologists can use Pandas to filter seismic data based on specific time frames, analyze mineral compositions, or explore geological features based on various parameters. Furthermore, Pandas seamlessly integrates with visualization libraries like Matplotlib and Seaborn, enabling geologists to create insightful charts and plots for better data interpretation.

In summary, Pandas in Python offers a robust and versatile framework for data analysis and manipulation. By harnessing Pandas' capabilities, geologists can enhance their data-driven decision-making processes and gain deeper insights into geological phenomena. 

In pandas the datasets are stored as pandas.DataFrame objects if 2D or as pandas.Seriesobjects if 1D (vector). One advantage of pandas for handling tabular data and data analysis compared to numpy is that we can access columns by their names instead of having to memorize indexes.

In an object-oriented framework such as Python, objects can have associated methods and attributes. This was already the case for numpy, but is also key in the pandas DataFrameand Series objects.

A method is like a function specific for an object. Once we have an object of a certain class instantiated (which just means we declared a variable of a certain class), we can call methods for that objects. We call a method by using object.method()notation. Sometimes methods take arguments, just like functions.

An attribute is a static property of the object. So instead of being called by placing parentheses after the name, we just use object.attribute to visualize the attributes.



In [None]:
import pandas as pd

In [None]:
#Let's read the Antrim major element data, skipping unwanted rows

#But first, a bit of house-keeping to help us view all the table columns and rows
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

#Now read in the data file
majors_df = pd.read_excel('Antrim_data/drillhole_data/antrim_assay_major_elements.xlsx', header=1, skiprows=[2, 3, 4, 5])
majors_df

**Basic dataframe commands:** 
Pandas is filled with useful functionality to manipulate tabular data. <br>
Here are a few to know, to complete the following exercises:

In [None]:
#Get a list of the column names:
majors_df.columns # .values turns this into an array object

In [None]:
#Access columns by name: 
majors_df['Interval'].head()

In [None]:
#Select specific rows and/or columns using lable location
majors_df.loc[0:3, ['Location', 'Interval']]

In [None]:
#Select specific rows and/or columns using integer location
majors_df.iloc[0:3, 1:3]

In [None]:
#Add a column: majors_df['New_column'] = np.NAN
majors_df['New_column'] = np.nan
majors_df.head()

In [None]:
#Use columns in calculations: 
majors_df['New_calculation'] = majors_df['LOI']*100
majors_df.head()

In [None]:
#Remove columns and/or rows:
majors_df.drop(['New_column','New_calculation'], axis=1, inplace=True) #axis = 0 means rows
majors_df.head()

In [None]:
#Filter a dataframe for particular values
filtered_df = majors_df[majors_df['Location']=='ANT001']
filtered_df

**Dataframe manipulation:**
Now that we know some essential commands, let's clean up our data.

In [None]:
#Let's rename the sample id  and drillhole id column:
majors_df.rename(columns={majors_df.columns[0]: "sampleid", 'Location':'hole' }, inplace=True)
majors_df.head()

In [None]:
#There are some below-detection values ('<') in the SO3 column, rows 17-19. 
#Let's take a look at them:
majors_df.loc[17:19]

In [None]:
#Let's change all '<' values in the data from to to 0.005

#We'll use slicing to get the major element columns that we want:
major_elements = majors_df.columns[3:]

#Let's assign a value of 0.005
majors_df[major_elements] = majors_df[major_elements].replace('<', 0.005)

#Display the updated rows
majors_df.loc[17:19]

In [None]:
#Finger-errors are common. Sometimes we end up with text in our numeric fields
#E.g. SO3 column
majors_df.head(3)

In [None]:
#Let's loop through all the columns and make non-numeric data NAN
for column in major_elements:
    majors_df[column] = pd.to_numeric(majors_df[column], errors='coerce')
majors_df.head(3)

In [None]:
#Let's remove any duplicate values in the dataframe
majors_df.drop_duplicates(subset=majors_df.columns[2:],keep='first', inplace= True, ignore_index = True)
majors_df.head()

In [None]:
#Let's update our new assay table to contain From and To columns instead of an Interval column
#We will remove the interval column
majors_df['From'] = majors_df['Interval'].str.split('-').str[0]
majors_df['To'] = majors_df['Interval'].str.split('-').str[1]
majors_df.drop('Interval',axis = 1, inplace = True)
majors_df.head()

So far we have:
- Read in the data from excel
- Replaced '<' with 0.005
- Removed accidental text in numeric columns
- Removed duplicated rows
- Created new From and To columns
- Remove unwanted columns ('Interval')

We would need to repeat all this code for the table containing trace elements. Instead, let's create a function that will do all this data processing on each assay file we give it. 

In [None]:
def process_assay_data(file_path, header_row, rows_to_skip):
    dataframe = pd.read_excel(file_path, header=header_row, skiprows=rows_to_skip)

    assay_columns = dataframe.columns[3:]
    dataframe[assay_columns] = dataframe[assay_columns].replace('<', 0.005)
    
    for column in assay_columns:
        dataframe[column] = pd.to_numeric(dataframe[column], errors='coerce')

    dataframe.drop_duplicates(inplace= True, ignore_index = True)

    dataframe['From'] = dataframe['Interval'].str.split('-').str[0]
    dataframe['To'] = dataframe['Interval'].str.split('-').str[1]
    dataframe.drop('Interval',axis = 1, inplace = True)

    dataframe.rename(columns={dataframe.columns[0]: "sampleid", 'Location':'hole' }, inplace=True)

    return dataframe

major_elements = process_assay_data('Antrim_data/drillhole_data/antrim_assay_major_elements.xls', 1, [2, 3, 4, 5])
trace_elements = process_assay_data('Antrim_data/drillhole_data/antrim_assay_trace_elements.xls', 0, [1,2, 3, 4, 5])   
    

In [None]:
major_elements.head()

In [None]:
trace_elements

There are standards in our trace element data. <br> Lets remove them from the assay table and set them aside for later.

In [None]:
standards = trace_elements[trace_elements['hole']=='STANDARD']
trace_elements = trace_elements[trace_elements['hole'] != 'STANDARD']

In [None]:
trace_elements.tail()

In [None]:
standards.head()

In [None]:
#Let's merge the major and trace element assay into one file, and save the file
merged_assay_df = pd.merge(major_elements, trace_elements, on=['hole','sampleid', 'From', 'To'])
merged_assay_df.head()

In [None]:
#Let's save our new assay table to a csv file, which can be imported into any software.
merged_assay_df.to_csv('Antrim_data/drillhole_data/Antrim_assay_data.csv')

In [None]:
#Lets do some very basic comparisons of the values in our boreholes
average_per_drillhole = merged_assay_df.groupby('hole').mean(numeric_only=True).round(2)
average_per_drillhole

## Matplotlib

Matplotlib is a powerful plotting library. We will use it a lot to graph data and show images.

In [None]:
import matplotlib.pyplot as plt

Let's plot MgO vs Ni for drillhole ANT001

In [None]:
#filter out data for ant001:
ant001 = merged_assay_df[merged_assay_df['hole']=='ANT001']

# create a figure, you can adjust the numbers (10, 6) to your desired width and height
plt.figure(figsize=(6, 4)) 

# Creating the scatter plot
plt.scatter(ant001['MgO'], # what's on the x-axis
         ant001['Ni'], # what's on the y-axis
         ) 

#Finishing touches
plt.xlabel("MgO (%)") # give label to your x axis
plt.ylabel("Ni (ppm)") # give label to your y axis
plt.show()
plt.close()

  <font color="green"><b>TRY IT YOURSELF:</b> Using the ant001 data we just filtered out, plot Cu vs MgO </font>  

In [None]:
#Write your code here

There are many ways to customise your Matplotlib charts. Here we plot two series on one chart and customise the chart appearance.

In [None]:
 #get a list of drillhole ID's to loop through
drillholes = merged_assay_df['hole'].unique()

# create the figure OUTSIDE of the loop. We only need to create it ONCE
plt.figure(figsize=(6, 4))  

#inside the loop, do your plotting
for hole_id in drillholes:
    data = merged_assay_df[merged_assay_df['hole'] == hole_id] #filter out the data you want
    plt.scatter(data['MgO'], # what's on the x-axis
         data['Ni'], # what's on the y-axis
         marker="s", # marker style
         alpha =0.5, #transparency
         s = 50,   #marker size   
         label=hole_id) # label for the legend

#Outside the loop, apply your finishing touches
plt.legend(title="Dillhole ID") # place a legend
plt.title("MgO vs Ni in Drillhole ANT001")
plt.xlabel("MgO (%)") 
plt.ylabel("Ni (ppm)") 
plt.show()
plt.close()
    

## Understanding errors

Let's run each of these cells and fix the errors that pop up.

In [None]:
if not os.path.isfile(os.path.join(folder_path,item):

In [None]:
my_list = list(range(2,7))
print(f'my list contains {len(my_list)} items: {my_list}')
my_list_item = my_list[6]

In [None]:
grade_value = 10.5
print (grades_value)

In [None]:
for i in range(5):
print(i+5)

In [None]:
9/0

In [None]:
Antrim_drillhole_names = 'ant1'
Antrim_drillhole_names.append('ant2')

In [None]:
x = 9
y = 'eight'

print(x+y)

In [None]:
print(9 * 'eight')

In [None]:
Antrim_collar_coordinates = {
    'ANT001': [572022, 7978167],
    'ANT002': [564818,7995173]
    }

print(Antrim_collar_coordinates['ANT003'])

# FAQ

### Am I supposed to remember all this syntax?
No. You're a geoscientist not a programmer! You should have awareness of **what is possible**. You can always look up syntax online. If you're trying to do something you've never done before, simply google it. There will be a lot of online help to guide you through your question.

### Where do I find help if I'm trying to code on my own?
- Online community forums e.g. Stack exchange (highly recommended - if  you have a question, it's probably been asked on stack exchange already)
- Free learning websites e.g. Geeksforgeeks - great if you want a quick, clean example of  how to use a particular command.
- Python modules  have documentation and tutorials online. There are important resources when you need to understand the specifics of any command you are running.
- Many Python courses are available online. Often, they will be paid courses, but some allow you to 'audit' the course for free.
### Why learn python if I can ask ChatGPT, Gemini etc.?
Generative AI can be extremely useful when coding. E.g.
- It can help you create data processing pipelines from plain-text prompts, especially when you're not sure exactly how to do it.
- It can help with debugging, especially when you don't understand what's going wrong.
- It can provide you the syntax for every-day commands when you can't remember them.

However, despite appearing intelligent, **it is only a machine**. There are two important reasons to be python-literate:
- Generative AI often makes mistakes and if you are not Python-literate, these can be hard to spot. **Generative AI is powerful but we strongly recommend developing Python-literacy so you can guard against errors.**  Just because the code works,  doesn't mean it's doing what you want!  However, if your company allows, it's a valuable tool to save you time and help you learn. Remember not to put **sensitive information** into the model, as inputs are used for training the model further. 
- The more specialised your needs, the less likely ChatGPT will be able to help. If you're python-literate, you can build on the feedback from sources like ChatGPT and StackExchange to meet your own unique needs.
