# Reservoir prediction demo notebook

#### Import file with all auxilliary functions 

src file contains all functions needed for execution. 


In [1]:
import source as src

## Specify parameters 
Necessary parameters are here specified.
- *training_data_path* is path to well data, which represents the training set;
- *test_data_path* is path to attributes folder, which represents the test set;
- *number_of_configurations* is number of created model configurations. Choice of this value is a trade-off between model quality and stability, and training time. Estimated training time for N model configurations is around N / 2000 minutes;
- *predictions_folder_path* is path to folder where all results will be stored.

Note: Attributes provided in training data **must** be the same as in test folder.

*src.build_config* takes these parameters as input, and specifies all other parameters that algorithm uses.

In [5]:
training_data_path =  "points_seismic_attributes_18092020_shuffled.csv"
test_data_path = "all_attributes_merged.csv"
number_of_configurations = 50000
predictions_folder_path = "predictions/"

conf = src.build_config(training_data_path = training_data_path, test_data_path = test_data_path,
                        number_of_configurations = number_of_configurations, predictions_folder_path = predictions_folder_path)

## Training phase
Training phase takes well data as input, creates XGBoost configurations, filters optimal configurations and predicts mean Hef, p10 Hef, p50 Hef and p90 Hef for well points.

Cross plot between real and predicted Hef values is provided.

In [8]:
src.create_configurations(training_data_path = training_data_path, conf = conf)


50000 / 50000 model configurations created, please wait...
50000 model configurations created.


In [48]:
optimal = src.select_optimal_configurations(training_data_path = training_data_path, conf = conf)

#### MSE and R-squared
Output of this phase are also mean squared error and R-squared coefficient for mean optimal model.

In [23]:
src.print_training_results(optimal)

Mean squared error on training set is 10.328555346641183. Mean squared error on validation set is 32.27390322836693.
R-squared coefficient on training set is 0.9451583991277018. R-squared coefficient on validation set is 0.8644360084687484.
Total mean squared error is 16.313650223475477. Total R-squared coefficient is 0.9192340062592239.


#### Feature importance
Following pie plot displays importance of most valuable features provided as input.

In [50]:
src.plot_feature_importance(conf)

In [52]:
src.plot_grouped_feature_importance(conf)

## Test phase
In test phase, model is used to predict Hef values on whole attribute maps, specified in test_data_path parameter. 

In [56]:
src.predict_on_test_data(test_data_path = test_data_path, 
                         training_data_path = training_data_path,
                         conf = conf)

#### Map conversion
Predicted maps are now converted to.irap format. Paths to .irap files are provided below.

In [9]:
src.convert_prediction_to_map(conf = conf)
    
src.map_postprocessing(attribute_path = conf['prediction_map_path_p10'])
src.map_postprocessing(attribute_path = conf['prediction_map_path_p50'])
src.map_postprocessing(attribute_path = conf['prediction_map_path_p90'])
    

src.save_irap_file(prediction_map_path = conf['prediction_map_path_p10'],
                   prediction_irap_path = conf['prediction_irap_path_p10'])
src.save_irap_file(prediction_map_path = conf['prediction_map_path_p50'],
                   prediction_irap_path = conf['prediction_irap_path_p50'])
src.save_irap_file(prediction_map_path = conf['prediction_map_path_p90'],
                   prediction_irap_path = conf['prediction_irap_path_p90'])
    
src.map_postprocessing(attribute_path = conf['prediction_irap_path_p10'])
src.map_postprocessing(attribute_path = conf['prediction_irap_path_p50'])
src.map_postprocessing(attribute_path = conf['prediction_irap_path_p90'])

print("P10 prediction map is saved to " + conf["prediction_irap_path_p10"] + ".")
print("P50 prediction map is saved to " + conf["prediction_irap_path_p50"] + ".")
print("P90 prediction map is saved to " + conf["prediction_irap_path_p90"] + ".")

P10 prediction map is saved to predictions/Hef_prediction_p10_irap_50k_no_pca_new_val_set_linear.txt.
P50 prediction map is saved to predictions/Hef_prediction_p50_irap_50k_no_pca_new_val_set_linear.txt.
P90 prediction map is saved to predictions/Hef_prediction_p90_irap_50k_no_pca_new_val_set_linear.txt.


#### Map plotting
P90, P50 and P10 maps are plotted out.
Each map can be zoomed in by selecting rectangle to be zoomed. Zoom out can be performed by double click on the image. Pan button (upper right) allows user to browse throughout the map. Mouse hovering over each point shows information about predicted value of Hef. Hovering over wells (black points) gives information about each well.
Each plot can be downloaded in .png format via Download plot as a png button (upper right).

In [58]:
src.partial_irap_plot(map_id = "Htot", conf = conf)

In [46]:
src.partial_irap_plot(map_id = 'p90', conf = conf, limits = [440440, 452700, 5070300, 5077600])

In [34]:
src.partial_irap_plot(map_id = 'p90', conf = conf)

In [38]:
src.partial_irap_plot(map_id = 'p50', conf = conf, limits = [440440, 452700, 5070300, 5077600])

In [44]:
src.partial_irap_plot(map_id = 'p50', conf = conf)

In [40]:
src.partial_irap_plot(map_id = 'p10', conf = conf, limits = [440440, 452700, 5070300, 5077600])

In [42]:
src.partial_irap_plot(map_id = 'p10', conf = conf)

#### NTG and Hef distributions
Histograms below show predicted NTG (left) and Hef (right) distributions

In [60]:
src.draw_hef_ntg_distributions(conf = conf)