Facies are the studied from core samples in every half foot and matched with logging data in well location. Facies classification is one of the most important tasks that geoscientists work on development and exploration projects. Sedimentary facies reflect particular physical, chemical, and biological condition that unit experienced during sedimentation process. To study these facies, rock samples are required. In this project, I'll use these log data to train supervised classifiers in order to predict discrete facies groups.
- Pandas
- Numpy
- Matplotlip
- Sklearn
- IPython.display
- Seaborn
- Math
- Pydotplus....
The data set used came from a University of Kansas class exercise on the Hugoton and Panoma gas fields. For more on the origin of the data, see Dubois et al. (2007). This was a consortium project to use machine learning techniques to create a reservoir model of the largest gas fields in North America, the Hugoton and Panoma Fields.
Geologically sometimes the boundaries of the facies are not clear and could show some transition, hence following table list the facies, the abreviated labels and approximately next attacching facies.
A thorough data cleaning, transformation, data exploration and features engineering was performed in order to prepare dataset for testing the classifier models.
Different classification models, i.e. Logistic Reggression, Support Vector Machine, KNN, Decision Tree, Random Forest and Gradient Boosting have been tested and compared for the best model selection. The'classification report' and an accuracy of~80.7 % shows 'Gradient Boosting' might be the best choice.
The diagonal row in confusion metrix represents the true positives, the instances that were correctly classified, rest of the matrix represents the number of instances that were incorrectly classified. Fi Score shows 0.80, which is pretty good.
Features show the combined effect to predict the log facies. Classification report shows F1 score for individual facies. BS is ~ 95% and PS is ~68%
Precision shows that what proportion of positive predictions are correct and recall tells us what proportion of positives are correctly predicted. Curves exhibit the precision and recall range from 60 to almost 100% of the Facies classes.
Curves showing in top left corner exhibit max true positive and minimizing false positive. Results show that curves tend to be in top left corner, however still counting the false positive.