Skip to content

Latest commit

 

History

History
79 lines (42 loc) · 4.27 KB

README.md

File metadata and controls

79 lines (42 loc) · 4.27 KB

Well-Logs-Facies-Classification

Facies are the studied from core samples in every half foot and matched with logging data in well location. Facies classification is one of the most important tasks that geoscientists work on development and exploration projects. Sedimentary facies reflect particular physical, chemical, and biological condition that unit experienced during sedimentation process. To study these facies, rock samples are required. In this project, I'll use these log data to train supervised classifiers in order to predict discrete facies groups.

Laibraries & Python Modules

  • Pandas
  • Numpy
  • Matplotlip
  • Sklearn
  • IPython.display
  • Seaborn
  • Math
  • Pydotplus....

Data

The data set used came from a University of Kansas class exercise on the Hugoton and Panoma gas fields. For more on the origin of the data, see Dubois et al. (2007). This was a consortium project to use machine learning techniques to create a reservoir model of the largest gas fields in North America, the Hugoton and Panoma Fields.

text

test

Geologically sometimes the boundaries of the facies are not clear and could show some transition, hence following table list the facies, the abreviated labels and approximately next attacching facies.

test

Exploratory Data Analysis

A thorough data cleaning, transformation, data exploration and features engineering was performed in order to prepare dataset for testing the classifier models.

Target Variable Distribution

text

Features Relevancy & Multivariate Analysis

text

text

Testing Different Classifiers

Different classification models, i.e. Logistic Reggression, Support Vector Machine, KNN, Decision Tree, Random Forest and Gradient Boosting have been tested and compared for the best model selection. The'classification report' and an accuracy of~80.7 % shows 'Gradient Boosting' might be the best choice.

text

Performence of Gradient Boosting Model

The diagonal row in confusion metrix represents the true positives, the instances that were correctly classified, rest of the matrix represents the number of instances that were incorrectly classified. Fi Score shows 0.80, which is pretty good.

text

Interpreting the Metrix & Features Importance

Features show the combined effect to predict the log facies. Classification report shows F1 score for individual facies. BS is ~ 95% and PS is ~68%

text

text

Precision - Recall Curve

Precision shows that what proportion of positive predictions are correct and recall tells us what proportion of positives are correctly predicted. Curves exhibit the precision and recall range from 60 to almost 100% of the Facies classes.

text

ROC - Receiver Operating Characteristic Curve

Curves showing in top left corner exhibit max true positive and minimizing false positive. Results show that curves tend to be in top left corner, however still counting the false positive.

text

Origional Facies VS Predictions

text