- Team: Big Data > Big Lore
- Team members: JustinGOSSES,jazzskier, GeophysicsPanda, and dalide.
- When: September 24th, 2017
- Where: Station Houston
- What: Agile Hackathon -> event & sponsors
- Thanks: to AWS cloud services Houston and Sigopt for technical help and the other sponsors for feeding us so we didn't have to leave our keyboard
Predict stratigraphic surfaces based on training on human-picked stratigraphic surfaces. Used 2000+ wells with Picks from the Mannville, including McMurray, in Alberta, Canada.
Instead of assuming there is a mathematical or pattern basis for stratigraphic surfaces that can be teased out of logs, focus on creating programatic features and operations that mimic the comparison-based observations that would have been done by a geologist.
There has been studies that attempt to do similiar things for decades. A lot of them assume a mathematical pattern to stratigraphic surfaces and either don't train specifically on human-picked tops or do so lightly. We wanted to try as close a geologic approach (as opposed to mathematical or geophysical approach) as possible. What we managed to get done by the end of the hackathon is sorta a small scale first pass.
Eventually, we want to get to the point where we've identified a large number of feature types that both have predictive value and can be tied back to geologist insight. There are a lot of observations happening visually (and therefore not consciously) when a geologist looks at a well log and correlates it. We want to focus on engineering features that mimic these observations and the multitude of scales at which they occur.
In addition to automating correlation of nearby wells based on picks that already exist, which has value, we think this will help geologist have better discussions, and more quantitative discussions, about the basis of their correlation and why correlations might differ between geologists. You can imagine a regional area with two separate teams with different approaches to picking a top. You could use this to programmatically pick tops in area B like they are picked in area A and also the inverse. The differences in pick style then becomes easier to analyze with less additional work.
Datasets for Hackathon project
Report for Athabasca Oil Sands Data McMurray/Wabiskaw Oil Sands Deposit http://ags.aer.ca/document/OFR/OFR_1994_14.PDF
@dalide used the Alberta Geological Society's UWI conversion tool to find lat/longs for each of the well UWIs. These were then used to find each well's nearest neighbors as demonstrated in this notebook.
On February 11th, 2018, @JustinGosses reorganized the folder to get a lot of the notebooks out of the top-level and into sub-folders as things were getting too crowded. This might cause the directory urls to some files to be incorrect. This will be the case for any notebook from the Hackathon or 2017. Fixing this problem will just require adding a ../ or ../../ to the front of the directory in most cases.
Key Jupyter Notebooks for Hackathon project
Final Data Prep & Machine Learning for the prediction finished by end of hackathon https://github.com/JustinGOSSES/MannvilleGroup_Strat_Hackathon/blob/master/data_prep_wells_xgb.ipynb
Version of feature engineering work done during hackathon (but didn't get to include during hackathon) https://github.com/JustinGOSSES/MannvilleGroup_Strat_Hackathon/blob/master/Feature_Brainstorm_Justin_vD-Copy1.ipynb
Key Jupyter Notebooks Post Hackathon
Code development has moved to the
modular_redo sub-folder. Things were made more modular to better enable short bits of work when time available. The notebooks are a bit messy but will clean up in near future.
The code runs faster and and mean absolute error is down from 90 to 15.03 and now 7+. Key approaches were:
- Leverage knowledge from nearby wells.
- Instead of distinguishing between 2 classes, pick and not pick, distinguish between 3 classes: (a) pick, (b) not pick but within 3 meters and (c) not pick and not within 3 meters of pick.
- More features
- Two steps: first step is classification. Second step uses classification and finds the mean prediction point (may go to regression ML as second step in near future.
Distribution of Absolute Error in Test Portion of Dataset for Top McMurray Surface in Meters.
Future Work [also see issues]
- Visualize probabilty of pick along well instead of just returning max probability prediction in each well.
- Generate average aggregate wells in different local areas for wells at different prediction levels. See if there are trends or if this helps to idenetify geologic meaningful features that correlate to many combined machine-learning model features.
- Explore methods to visualize weigtings of features on individual well basis using techniques similar to those learned in image-based deep-learning.
- Cluster wells using unsupervised learning and then see if clusters can be created that correlated with supervised prediction results. (initial trials with UMAP give encouraging results)
Eventual Move of this Repository Contents to a Different Repository
The plan is that once things are winnowed down to a final approach, the resulting code will be moved the StratPickSupML repository will it will be cleaned into one or more modules and demo notebooks with less clutter of failed but possibly useful if reworked approaches.
This repo isn't particularly organized and there hasn't be a lot of time spent (actually no time spent) to make jumping in and helping out easy. That being said, there's no reason you couldn't just jump in an start improving things. The original group is working on this at a low level when we have time. There are a few issues that are enhancements that would be a good place to start.