This repository comprises the work required for Prac classes from Week 6, 7, 8, and 9. Modern day exploration requires a versatile skill-set, including being able to analyse, visualise and interpret data. This section of the course should give you hard-skills in the Python programming language. Plus you will learn about Machine Learning, Plate Tectonic Reconstructions, all while using specific Python packages (numpy, scipy, scikit-learn, matplotlib, cartopy, pygplates, etc), and a few other data and research tools and environments (e.g. Git, GPlates, databases, Docker containers).
The prac classes are compulsory. And the assessment will be a write-up of one of the pracs. More details given in the assessment section.
Download this repo which contains all the small datasets and the codes/notebooks we will be using: https://github.com/intelligent-exploration/mineralexplorationcourse/archive/master.zip
There are a few additional datasets that will be needed for some of the examples and exercises. Download them as instructed in the notes, but mostly they can all be found at https://www.dropbox.com/sh/mhvss5491nij1sb/AABC4VkGNkU4A1Ej5gusk3BFa?dl=0
Pull it from docker hub or build the Dockerfile and run!
or....
We need two different environments. One is particularly needed for pyGplates. We are using the conda package manager. Conda is installed on the computers in the lab. If you wish you use your own computer you can download it from https://repo.continuum.io/miniconda/
Once Conda is installed, to create the environment to work in use:
conda create -n pyGEOL scipy=1.2 scikit-learn=0.20 matplotlib=3.0 pyshp=2.0 numpy=1.16 jupyter=1.0 cartopy=0.17 pandas=0.24 notebook=5.7.4
Then to activate and run the notebooks, invoke:
conda activate pyGEOL
jupyter notebook
python 2 specifically is required for the week 9 notebook only because of the pygplates dependency.
Note: Windows users before creating your conda environment you will need to set
set CONDA_FORCE_32BIT=1
Make the conda environement:
conda create -n py2GEOL python=2.7 scipy=1.2 scikit-learn=0.20 matplotlib=2.0 pyshp=1.2 numpy=1.15 jupyter=1.0 cartopy=0.17 pandas=0.24 notebook=5.7.4
Download and unzip pygplates:
Add pygplates to your Python Path in whatever environement you are using
From within Python via e.g.:
import sys
sys.path.append("C:\pygplates_rev12_python27_win32")
import pygplates
Or from outside of Python
export PYTHONPATH=${PYTHONPATH}:/Users/Downloads/pygplates_rev12_python27_MacOS64/
This will be a hands-on programming intro. You will learn the basics of coding and Python. If you have experience using command line or some programming language (e.g. Matlab, C++) then you should find this straight-forward. If you have never programmed before, great! This will be the beginning of a great relationship with getting data to do what you want!
Mac https://www.youtube.com/watch?v=ftJoIN_OADc https://www.youtube.com/watch?v=SPZamL2Xbsc
Linux https://www.youtube.com/watch?v=3xp-ixFbDuE
Windows https://www.youtube.com/watch?v=dX2-V2BocqQ
https://snakify.org/en/lessons/print_input_numbers/ https://www.geeksforgeeks.org/python-programming-examples/
Python lists, 2d arrays and numpy arrays
-
https://www.machinelearningplus.com/python/numpy-tutorial-part1-array-python-examples/
-
https://docs.scipy.org/doc/numpy/reference/generated/numpy.savetxt.html
-
https://www.ict.social/python/basics/multidimensional-lists-in-python
-
https://matplotlib.org/gallery/images_contours_and_fields/image_annotated_heatmap.html
-
General Intro to ML: https://www.seas.upenn.edu/~cis519/fall2017/lectures/01_introduction.pdf https://scikit-learn.org/stable/tutorial/statistical_inference/supervised_learning.html
-
Linear regression: https://www.datacamp.com/community/tutorials/essentials-linear-regression-python https://www.cs.toronto.edu/~frossard/post/linear_regression/ https://towardsdatascience.com/linear-regression-using-gradient-descent-97a6c8700931 https://www.youtube.com/watch?v=zPG4NjIkCjc https://www.youtube.com/watch?v=JvS2triCgOY Mean squared error https://www.youtube.com/watch?v=r-txC-dpI-E
-
Gradient descent: https://en.wikipedia.org/wiki/Gradient_descent https://ml-cheatsheet.readthedocs.io/en/latest/gradient_descent.html
-
Logistic regression: https://towardsdatascience.com/logistic-regression-detailed-overview-46c4da4303bc
-
https://www.kaggle.com/emilyhorsman/basic-logistic-regression-with-numpy https://medium.com/@martinpella/logistic-regression-from-scratch-in-python-124c5636b8ac https://medium.com/technology-nineleaps/logistic-regression-gradient-descent-optimization-part-1-ed320325a67e https://www.youtube.com/watch?v=yIYKR4sgzI8 https://www.youtube.com/watch?v=yIYKR4sgzI8&t=58s https://www.cc.gatech.edu/~bboots3/CS4641-Fall2016/Lectures/Lecture6.pdf
-
https://www.geeksforgeeks.org/learning-model-building-scikit-learn-python-machine-learning-library/
A Python introduction leading into visualising geo-spatial data.
Then a deeper lesson exploring some new Python libraries and advanced features of Python data manipulation on various types of data.
We will use Python to perform machine learning on a well-constructed dataset
This is an in-depth exploration linking some of the earlier workflows. This notebook will teach you how to to create and manipulate data, use python from outside of Jupyter (i.e. normal python), and to use bespoke Python packages (pygplates and build-your-own). This task can be expanded to your full project assessment if chosen.
-
Pass/Fail. The 3 notebooks from week 8 and 9 contain a specific assessment at the end. Complete the task and show your tutor.
-
10%. Choose one of the datasets and key figures from the workbooks of week 7 and 8 to write up as a full paper-style report. This should include a short abstract, a methods sections, and an interpretation of the results. You can do this all within the Jupyter notebook environment. This is due at the start of the Week 9 lecture, i.e. Thursday May 2, 2019 10am.