## Getting testing data coordinates from S3

In this notebook we will download a dataframe populated with testing data.

However, all values will be blank (NaN).

We expect competitors to run their predictive models and fill in the blank locations
using their predictions at the IL/XL/TWT locations.

There are two blind wells for evaluation.

Once results are submitted, we will be calculating R^2 scores with ground truth data
and results will be put on the leaderboard.

Doing imports.

In [1]:
from pandas import read_json

## Loading AWS Credentials

See Tutorials #1 for setting up credentials on local machines.

## Getting the Blank DataFrame

We have the blind wells in the same format as training wells.

They have inline, crossline, and two-way time values provided (coordinates).

We expect you to run your feature extraction, feature engineering, and predictions around or at
these locations and populate this DataFrame without adding/removing or changing the shape.

We will then evaluate results comparing these to ground truth data.

See the output of this cell for what it looks like.

**Any result that is not the same shape as this DataFrame will not be considered.**

**Any result that has `NaN` values in the results DataFrame will not be considered.**

In [2]:
well_bucket = 's3://sagemaker-gitc2021/poseidon/wells/'
well_file = 'poseidon_geoml_testing_wells_blank.json.gz'

well_df = read_json(
    path_or_buf=well_bucket + well_file,
    compression='gzip',
)

well_df.set_index(['well_id', 'twt'], inplace=True)

well_df

Unnamed: 0_level_0,Unnamed: 1_level_0,inline,xline,rhob,p_impedance,s_impedance
well_id,twt,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
well_07,2700.0,3256.5233,1389.3571,,,
well_07,2700.5,3256.5234,1389.3559,,,
well_07,2701.0,3256.5236,1389.3547,,,
well_07,2701.5,3256.5237,1389.3535,,,
well_07,2702.0,3256.5238,1389.3523,,,
...,...,...,...,...,...,...
well_13,3383.5,2407.1952,2675.8771,,,
well_13,3384.0,2407.1905,2675.8925,,,
well_13,3384.5,2407.1858,2675.9078,,,
well_13,3385.0,2407.1811,2675.9232,,,


Below are the statistics. As you can see, the rhob, p_impedance, and s_impedance are blank.

In [3]:
well_df.describe()

Unnamed: 0,inline,xline,rhob,p_impedance,s_impedance
count,1773.0,1773.0,0.0,0.0,0.0
mean,3007.407384,1766.345919,,,
std,386.799776,584.865566,,,
min,2407.1763,1389.0076,,,
25%,2408.2571,1389.0613,,,
50%,3256.5844,1389.2113,,,
75%,3257.0021,2671.0803,,,
max,3257.3135,2675.9386,,,


We will accept results uploaded into these S3 buckets.

Please use following paths and file names as a template. The code cell after the
explanation will have an example.

**Intermediate:**

`bucket =` *`s3://sagemaker-gitc2021/poseidon/wells/submissions/intermediate/`*

`file_name =` *`TeamName_Intermediate_Results_YYYYMMDD.json.gz`*

**Final:**

`bucket =` *`s3://sagemaker-gitc2021/poseidon/wells/submissions/final/`*

`file_name = `*`TeamName_Final_Results_YYYYMMDD.json.gz`*

Final submissions must be in the same `.json.gz` format. This can be achieved by using the
following code snippet. This assumes your populated DataFrame variable is named `result`.

In [None]:
bucket = 's3://sagemaker-gitc2021/poseidon/wells/submissions/intermediate/'

file_name = 'MyTeam_Intermediate_Results_20210416.json.gz'

# Making sure extension is in the file name.
if not file_name.lower().endswith('.json.gz'):
    file_name += '.json.gz'

my_result.reset_index(inplace=True)
my_result.to_json(
    path_or_buf=bucket + file_name,
    double_precision=4,
    compression='gzip'
)