<h1>Autobuild IC database - NPD</h1><br>
Auto-populate a blank SQL Server IC database with data from the <a href="https://factpages.npd.no/factpages/Default.aspx?culture=en", target="_blank">NPD FactPages</a>.

<b>Part 1. Downloads the following data types, reformats for IC and exports to .csv.</b>

Exploration well headers<br>
Development well headers<br>
Cores<br>
Core photos (also saves .jpf files to disc)<br>
Thin sections<br>
CO2<br>
Oil samples<br>
Lithostratigraphy<br>
Drill stem tests<br>
Casing and leak-off tests<br>
Drilling mud<br>
Documents<br>
Shapefiles (download from <a href="https://www.npd.no/en/about-us/information-services/available-data/map-services/", target="_blank">NPD Map Services</a> and unzip)<br>

<b>Part 2. Connects to and pushes data to the database.</b><br>
Only works for Well Headers, References and Lithostrat just now.<br>
Creates dynamic IC projects, well queries, and builds text dictionaries from Lithostrat.<br>

In [None]:
# Part 1. Download the following data types from the NPD, reformat for IC and export .csv files.

# Exploration well headers
# Development well headers
# Cores
# Core photos (also saves .jpf files to disc)
# Thin sections
# CO2
# Oil samples
# Lithostratigraphy
# Drill stem tests
# Casing and leak-off tests
# Drilling mud
# Documents
# Shapefiles (download and unzip)
# See https://factpages.npd.no/factpages/Default.aspx?culture=en

# Part 2. Connect to and push data to SQL Server database.
# Also creates dynamic IC projects, well queries, builds text dictionaries from Lithostrat.

# This notebook is missing section that creates text dictionaries from lithostrat (SYMBOLSDICTIONARY)
# Writing files to C:\Alan Python\IC_wellheaders

In [1]:
# Uncomment your chosen data source -
    # web: downloads data live using permanent NPD links
    # file: if you have manually downloaded data in Excel format and saved to 'input data' folder 
# Use file for for testing purposes

#source = 'web'
source = 'file'

<h2>Part 1. Download NPD data, reformat for IC and output to .csv</h2>
    
<h3>Well Headers</h3>

In [2]:
import pandas as pd
import numpy as np
from pandas import ExcelFile
from pandas import ExcelWriter
import urllib.request
import requests, zipfile, io

#Change Pandas display settings to show all columns
pd.set_option('display.max_columns', None)  
pd.set_option('display.expand_frame_repr', False)
pd.set_option('max_colwidth', None)
#pd.set_option('display.max_rows', 500)

In [4]:
# Download the latest NPD well headers in Excel format
# Navigate to NPD Factpages > Wellbore > Table View > Exploration/Development > All - Long List> Export Excel.
# Assign to two dataframes, one for Exploraion wells and one for Development wells


if source == 'web':
    df_explo_web = pd.read_excel('https://factpages.npd.no/ReportServer_npdpublic?/FactPages/TableView/wellbore_exploration_all&rs:Command=Render&rc:Toolbar=false&rc:Parameters=f&rs:Format=EXCEL&Top100=false&IpAddress=108.171.128.169&CultureCode=en', 
                             sheet_name='wellbore_exploration_all')
    df_dev_web = pd.read_excel('https://factpages.npd.no/ReportServer_npdpublic?/FactPages/TableView/wellbore_development_all&rs:Command=Render&rc:Toolbar=false&rc:Parameters=f&rs:Format=EXCEL&Top100=false&IpAddress=108.171.128.169&CultureCode=en', 
                           sheet_name='wellbore_development_all')

if source == 'file':
    # Navigate to NPD Factpages > Wellbore > Table View > Exploration/Development > All - Long List> Export Excel.
    df_explo_file = pd.read_excel('input data/wellbore_exploration_all.xlsx', 
                             sheet_name='wellbore_exploration_all')
    df_dev_file = pd.read_excel('input data/wellbore_development_all.xlsx', 
                           sheet_name='wellbore_development_all')

Print the original column titles in each dataframe.
print("\nExploration well header column titles:")
print(list(df_explo.columns))
print("\nDevelopment well header column titles:")
print(list(df_dev.columns))

# This will take a minute to download and process the two files. Ignore file size warnings.


Exploration well header column titles:
['Wellbore name', 'Well name', 'Drilling operator', 'Drilled in production licence', 'Purpose', 'Status', 'Content', 'Type', 'Subsea', 'Entered date', 'Completed date', 'Field', 'Drill permit', 'Discovery', 'Discovery wellbore', 'Bottom hole temperature [°C]', 'Sitesurvey', 'Seismic location', 'Maximum inclination [°]', 'Kelly bushing elevation [m]', 'Final vertical depth (TVD) [m RKB]', 'Total depth (MD) [m RKB]', 'Water depth [m]', 'Kick off  point [m RKB]', 'Oldest penetrated age', 'Oldest penetrated formation', 'Main area', 'Drilling facility', 'Drilling facility type', 'Drilling facility category', 'Licensing activity awarded in', 'Multilateral', 'Purpose - planned', 'Entry year', 'Completed year', 'Reclassified from/to wellbore', 'Reentry activity', 'Plot symbol', '1st level with HC, formation', '1st level with HC, age', '2nd level with HC, formation', '2nd level with HC, age', '3rd level with HC, formation', '3rd level with HC, age', 'Dril

In [None]:
(num_explo_rows, num_explo_cols) = df_explo.shape
(num_dev_rows, num_dev_cols) = df_dev.shape
print('{} rows and {} columns in Exploration wells.'.format(num_explo_rows, num_explo_cols))
print('{} rows and {} columns in Development wells.'.format(num_dev_rows, num_dev_cols))

In [None]:
print('The first 3 rows of Exploration wells:')
df_explo.head(n=3)

In [None]:
print('The first 3 rows of Development wells:')
df_dev.head(n=3)

<h4>QC - Which column headers are unique to Exploration or Development wells?</h4>

In [None]:
explo_columns = df_explo.columns.tolist()
dev_columns = df_dev.columns.tolist()

# List well headers unqiue to each dataframe
print('Attributes unique to Exploration wells:\n', sorted(set(explo_columns) - set(dev_columns)))
print('\nAttributes unique to Development wells:\n', sorted(set(dev_columns) - set(explo_columns)))

<h4>Rename attributes for IC</h4>

In [None]:
# These are IC's default well header attributes (when matching columns in Import Well Header File)
# Try to use as many of these as possible when renaming below.
# Any other columns will need to be added to IC as Well Attributes.

# QUESTION - WHY IS THIS A DICT AND NOT LIST?

ic_default_attributes = {'Name', 'Code', 'Alternate 1', 'Alternate 2', 'API number', 'UWI number', 'Comment', 'Geodatum', 
                         'Longitude', 'Latitude', 'Grid system', 'Surface X', 'Surface Y', 'Elevation Reference',
                         'Elevation', 'KBE', 'RTE', 'DFE', 'GLE', 'SPUD date', 'Completion date', 'Status', 
                         'Quadrant', 'Block', 'Sub block', 'Field', 'Location', 'Operator', 'Country',
                         'Basin', 'Province', 'County', 'State', 'Section', 'Township', 'Range', 'Terminal depth',
                         'Water depth', 'Facility', 'Discovery name', 'Seismic line', 'Intent', 'Licence number'}

# Rename columns from/to. 
# Check spelling and capitalisation carefully when renaming to match IC's default attributes.

attributes_to_rename = {'Wellbore name' : 'Name',
                        'Well name' : 'Alternate 1',
                        'Drilling operator' : 'Operator',
                        'Drilled in production licence' : 'Licence number',
                        'Purpose' : 'Intent',
                        'Purpose - planned' : 'Intent - planned',
                        'Status' : 'Well status',
                        'Content' : 'Well content',
                        'Entered date' : 'SPUD date',
                        'Completed date' : 'Completion date',
                        'Discovery' : 'Discovery name',
                        'Seismic location' : 'Seismic line',
                        'Kelly bushing elevation [m]' : 'KBE',
                        'Total depth (MD) [m RKB]' : 'Terminal depth',
                        'Water depth [m]' : 'Water depth',
                        'Kick off  point [m RKB]' : 'Kick off point [m RKB]', #remove extra space
                        'Main area' : 'Location',
                        'Drilling facility' : 'Facility',
                        '1st level with HC, formation' : '1st level with HC formation', #remove commas to be csv friendly
                        '1st level with HC, age' : '1st level with HC age',
                        '2nd level with HC, formation' : '2nd level with HC formation',
                        '2nd level with HC, age' : '2nd level with HC age',
                        '3rd level with HC, formation' : '3rd level with HC formation',
                        '3rd level with HC, age' : '3rd level with HC age',
                        'Geodetic datum' : 'Geodatum',
                        'NS decimal degrees' : 'Latitude',
                        'EW decimal degrees' : 'Longitude',
                        'NS UTM [m]' : 'Surface Y',
                        'EW UTM [m]' : 'Surface X',
                        'Wellbore name, part 1' : 'Quadrant',
                        'Wellbore name, part 2' : 'Block', 
                        'Pressrelease url' : 'Press Release URL',
                        'FactPage url' : 'FactPage URL',
                        'Factmaps' : 'FactMaps URL'}

# Apply renaming to each of the dataframes
df_explo.rename(columns=attributes_to_rename, inplace=True)
df_dev.rename(columns=attributes_to_rename, inplace=True)

# QC only renamed columns
print("Renamed attributes only:")
renamed_columns = list(attributes_to_rename.values())
df_explo[renamed_columns].head(n=3)
#df_dev[renamed_columns].head(n=3)

<h4>Delete some attributes we don't need in IC</h4>

In [None]:
# Coordinates are repeated elsewhere so we can delete the component parts from the dataframes.
# And we've renamed Wellbore name parts 1 and 2 to Quadrant and Block, and do not need the other parts.

attributes_to_drop = ['Plot symbol', 'NS degrees', 'NS minutes', 'NS seconds', 'NS code', 'EW degrees', 'EW minutes', 'EW seconds', 'EW code', 
                      'Wellbore name, part 3', 'Wellbore name, part 4', 'Wellbore name, part 5', 'Wellbore name, part 6']

df_explo.drop(attributes_to_drop, axis=1, inplace=True)
df_dev.drop(attributes_to_drop, axis=1, inplace=True)

print('Prove we still have well names and coordinates:')
df_explo[['Name', 'Latitude', 'Longitude', 'Surface Y', 'Surface X']].head(n=3)

<h4>Truncate well list based on column and value(s)</h4>

In [None]:
# Enter the column and values you want to return, e.g. Location: BARENTS SEA, or Quadrant: 6204, 6205.
fltr_column = 'Location'

# List the names you want to *KEEP*!
fltr_value = ['NORTH SEA', 'NORWEGIAN SEA', 'BARENTS SEA']

# Apply the filter to the dataframes
indexNames = df_explo[~df_explo[fltr_column].isin(fltr_value)].index
df_explo.drop(indexNames , inplace=True)

indexNames = df_dev[~df_dev[fltr_column].isin(fltr_value)].index
df_dev.drop(indexNames , inplace=True)

# Get dataframe shape and unpack tuples
(exploRows, exploCols) = df_explo.shape
(devRows, devCols) = df_dev.shape

# Print out the results
print("After filtering on {}: {}, you are left with:\n {} rows for Exploration wells, and {} rows for Development wells."
      .format(fltr_column, fltr_value, exploRows, devRows))
print('The first and last rows are:')

# Print the first and last rows of the Exploration dataframe to check that the filter has worked
df_explo.iloc[[0, -1]]

<h4>CREATE FILES - create Reference files for IC containing URLs for Exploration and Development wells</h4>

In [None]:
# Converts three URL columns into three rows. Adds a Title column and sorts by Well and Title.
df_explo_references = df_explo[['Name', 'Press Release URL', 'FactPage URL', 'FactMaps URL']]
df_explo_references = pd.melt(df_explo_references, id_vars='Name', value_vars=['Press Release URL', 'FactPage URL', 'FactMaps URL'], var_name='Title', value_name='URL')
df_explo_references.sort_values(['Name', 'Title'], inplace=True)

# Remove empty rows, specifically where no 'Press Release URL' for Exploration references
df_explo_references['URL'].replace(' ', np.nan, inplace=True)
df_explo_references.dropna(subset=['URL'], inplace=True)

# Name and create file for Exploration wells
explo_ref_filename = 'output data/IC_explo_references.csv'
df_explo_references.to_csv(explo_ref_filename, index=False)
print('Created file:', explo_ref_filename)
df_explo_references.head(n=6)

In [None]:
# As above, but creates 'Reference' file for Development Wells (minus the Press Release URL)
df_dev_references = df_dev[['Name', 'FactPage URL', 'FactMaps URL']]
df_dev_references = pd.melt(df_dev_references, id_vars='Name', 
                            value_vars=['FactPage URL', 'FactMaps URL'], 
                            var_name='Title', value_name='URL')
df_dev_references.sort_values(['Name', 'Title'], inplace=True)

# Name and create file for Development wells
dev_ref_filename = 'output data/IC_dev_references.csv'
df_dev_references.to_csv(dev_ref_filename, index=False)
print('Created file:', dev_ref_filename)
df_dev_references.head(n=4)

In [None]:
# Drop URL attributes
# Now that we've output the URLs to separate files, we no longer need them in the Exploration and Development dataframes.
df_explo.drop(['Press Release URL', 'FactPage URL', 'FactMaps URL'], axis=1, inplace=True)
df_dev.drop(['FactPage URL', 'FactMaps URL'], axis=1, inplace=True)

df_explo.head(n=3)

In [None]:
# Add new column(s) and assign constant value, e.g. Country: NORWAY.
df_explo['Country'] = 'NORWAY' 
df_dev['Country'] = 'NORWAY'

# IC Version 4.3.1 and earlier only. Fixed in 4.3.2.
# First lets rename an extraordinarily long string in column 'Seismic line' to avoid an error in IC.
#df_explo['Seismic line'] = df_explo['Seismic line'].replace('TUN15M01 3D bin datasett: Inline reference: 12688 Croslline reference: between 12383 and 12384', 'TUN15M01 3D bin: Inline 12688 Crossline 12383-12384')

# Remove decimal places introduced to the 'NPDIP' columns
df_explo['NPDID discovery'] = df_explo['NPDID discovery'].fillna(0).astype(int)
df_dev['NPDID discovery'] = df_dev['NPDID discovery'].fillna(0).astype(int)

df_explo['NPDID drilling facility'] = df_explo['NPDID drilling facility'].fillna(0).astype(int)
df_dev['NPDID drilling facility'] = df_dev['NPDID drilling facility'].fillna(0).astype(int)

df_explo['NPDID field'] = df_explo['NPDID field'].fillna(0).astype(int)
df_dev['NPDID field'] = df_dev['NPDID field'].fillna(0).astype(int)

# Copy data from one column to another, preserving the original.
df_explo['UWI number'] = df_explo['NPDID wellbore']
df_dev['UWI number'] = df_dev['NPDID wellbore']

# Check the result
df_explo[['Name', 'Country', 'NPDID drilling facility', 'NPDID wellbore']].head(n=3)

<h4>Concatenate Well Status & Well Content to match IC's Well Symbols dictionary</h4>

In [None]:
# This cell creates a new column called 'Status', combining 'Well Status' and 'Well Content'
# These values should match IC's Well Symbols graphic dictionary entries, e.g. "P & A Oil Shows"

# Change 'P&A' to 'P & A'.
df_explo['Well status'] = df_explo['Well status'].replace(to_replace='P&A', value='P & A')
# First letter of each word capitalised
df_explo['Status'] = df_explo['Well status'].str.title() + ' ' + df_explo['Well content'].str.title()

# As above but for Development wells
df_dev['Well status'] = df_dev['Well status'].replace(to_replace='P&A', value='P & A')
df_dev['Status'] = df_dev['Well status'].str.title() + ' ' + df_dev['Well content'].str.title()

# Replace a few other things to help with matching
df_explo = df_explo.replace({'Status' : { ' Not Available' : '', ' Not Applicable' : '', '/' : ' ', 'Oil Gas ' : 'Oil & Gas '}}, regex=True)
df_dev = df_dev.replace({'Status' :     { ' Not Available' : '', ' Not Applicable' : '', '/' : ' ', 'Oil Gas ' : 'Oil & Gas '}}, regex=True)

# Rename Status to status? (links to symbol_id??? e.g. 22)

# Check the results
df_explo[['Name', 'Status']].head(n=10)
#df_dev[['Name', 'Status']].tail(n=10)

In [None]:
# List all unique entries under Status for all wells.
# In IC, open Database > Graphic Dictionaries > Well Symbols, and ensure you have dictionary entries for each.

lst_explo_status = sorted(set(df_explo['Status'].astype(str)))
lst_dev_status = sorted(set(df_dev['Status'].astype(str)))

lst_all_status = lst_explo_status + lst_dev_status

print("{} unique status values to include in IC 'Well Symbols' graphic dictionary:".format(len(lst_all_status)))
print('')
lst_unique_status = sorted(set(lst_all_status))
print(', '.join(lst_unique_status))

<h4>Concatenate cells to create 'Grid system' in IC format</h4>

In [None]:
# At time of writing, there are several problems with 'Geodatum' in the NPD datasets, including:
#  - trailing spaces ('ED50 ') in all Explo wells
#  - erroneous '56ED50', '60ED50' and '61ED50' values in Dev wells
#  - missing 'ED50' values in two explo wells
# Luckily, we can just force 'ED50' on all these cells!

df_explo['Geodatum'] = 'ED50'
df_dev['Geodatum'] = 'ED50'

# Concatenate cells to create a new column 'Grid system' in IC format (e.g. "ED50 / UTM Zone 31N")

df_explo['Grid system'] = df_explo['Geodatum'] + ' / ' + 'UTM zone ' + df_explo['UTM zone'].map(str) + 'N'
df_dev['Grid system'] = df_dev['Geodatum'] + ' / ' + 'UTM zone ' + df_dev['UTM zone'].map(str) + 'N'

print('Geodatum and Grid systems for IC:')
df_explo[['Name', 'Geodatum', 'Grid system']].head()

<h4>QC - check and re-order well headers</h4>

In [None]:
# Print out attributes lists, reflecting all the changes above.
# Use these lists to check the current order of your columns in each, and consider how you might like to re-order them.
# Any columns created above (including: Country, Status, Grid system) currently appear at the end of the lists.

print('--- BEFORE RE-ORDERING ---\n')
print(len(df_explo.columns), 'Exploration attributes:\n', list(df_explo.columns), '\n')
print(len(df_dev.columns), 'Development attributes:\n', list(df_dev.columns))

<h4>Re-order all columns (OPTIONAL)</h4>

In [None]:
# # Specifies the order of columns for Exploration and Development wells in the final outputs.
# # It's not compulsory to re-order columns, as IC lists all non-default attributes alphabetically.

# explo_order = ['Name', 'Alternate 1', 'UWI number', 'Quadrant', 'Block', 'Operator', 'Licence number', 'Intent', 
#                 'Intent - planned', 'Well status', 'Well content', 'Status', 'Type', 'Subsea', 'SPUD date', 
#                 'Completion date', 'Field', 'Drill permit', 'Discovery name', 'Discovery wellbore', 
#                 'Bottom hole temperature [°C]', 'Seismic line', 'Maximum inclination [°]', 'KBE', 
#                 'Final vertical depth (TVD) [m RKB]', 'Terminal depth', 'Water depth', 'Kick off point [m RKB]', 
#                 'Oldest penetrated age', 'Oldest penetrated formation', 'Location', 'Country', 'Facility', 
#                 'Drilling facility type', 'Drilling facility category', 'Licensing activity awarded in', 
#                 'Multilateral', 'Entry year', 'Completed year', 'Reclassified from/to wellbore', 'Reentry activity', 
#                 'Plot symbol', '1st level with HC formation', '1st level with HC age', '2nd level with HC formation', 
#                 '2nd level with HC age', '3rd level with HC formation', '3rd level with HC age', 'Drilling days', 
#                 'Reentry', 'Geodatum', 'Latitude', 'Longitude', 'Surface X', 'Surface Y', 'UTM zone', 'Grid system', 
#                 'DISKOS Well Type', 'DISKOS Wellbore Parent', 
#                 'Publication date', 'Release date', 'NPDID wellbore', 'NPDID discovery', 'NPDID field', 
#                 'NPDID drilling facility', 'NPDID wellbore reclassified from', 'NPDID production licence drilled in', 
#                 'Date main level updated', 'Date all updated', 'Date sync NPD']

# dev_order = ['Name', 'Alternate 1', 'UWI number', 'Quadrant', 'Block', 'Operator', 'Licence number', 'Intent', 
#               'Intent - planned', 'Well status', 'Well content',  'Status', 'Content - planned', 'Type', 'Subsea',
#               'SPUD date', 'Completion date', 'Field', 'Predrilled entry date','Predrilled completion date', 
#               'Drill permit', 'Discovery name', 'Discovery wellbore', 'KBE', 'Final vertical depth (TVD) [m RKB]',
#               'Terminal depth', 'Water depth', 'Kick off point [m RKB]', 'Location', 'Country', 'Facility', 
#               'Drilling facility type', 'Drilling facility category', 'Licensing activity awarded in', 
#               'Production facility', 'Multilateral', 'Entry year', 'Completed year','Reclassified from/to wellbore', 
#               'Plot symbol', 'Geodatum', 'Latitude', 'Longitude', 'Surface Y', 'Surface X', 'UTM zone',  'Grid system', 
#               'DISKOS Well Type', 'DISKOS Wellbore Parent', 'NPDID wellbore', 
#               'NPDID discovery', 'NPDID field', 'Publication date', 'Release date', 'NPDID production licence drilled in', 
#               'NPDID drilling facility', 'NPDID production facility','NPDID wellbore reclassified from', 
#               'Date main level updated', 'Date all updated', 'Date sync NPD']

In [None]:
# # Check if your list of re-ordered attributes is complete.
# missing_explo = set(df_explo.columns).difference(explo_order)
# missing_dev = set(df_dev.columns).difference(dev_order)

# if len(missing_explo) > 0:
#     print('Your re-ordered list of Exploration attributes is incomplete. You must include:\n {}.\n'.format(missing_explo))
# else:
#     print('Your re-ordered list of Exploration attributes is complete.\n')
    
# if len(missing_dev) > 0:
#     print('Your re-ordered list of Development attributes is incomplete. You must include:\n {}.'.format(missing_dev))
# else:
#     print('Your re-ordered list of Development attributes is complete.')

In [None]:
# # Only when your re-ordered lists of Exploration and Development attributes are complete should you run this cell,
# # Otherwise these will not be included in the output file!
# # Applies the re-ordering to the dataframes

# df_explo = df_explo.reindex(columns=explo_order)
# df_dev = df_dev.reindex(columns=dev_order)

<h4>QC column values</h4>

In [None]:
# Print out all unique values for selected attributes (example: Operator and Field)

def lstheaderfields (*args):
    for arg in args:
        print('---' , arg, '---')
        print('')
        words = [x for x in df_explo[arg].unique()]
        print('Exploration wells:')
        print(words)
        print('')
        words = [x for x in df_dev[arg].unique()]
        print('Development wells:')
        print(words)
        print("")
        
# Enter the names of columns you would like to check
lstheaderfields('Operator', 'Field')

<h4>CREATE FILES - create well headers for exploration and development wells</h4>

In [None]:
# Outputs CSV files for Exploration and Development well headers.
# If Development dataframe contains more than 3000 wells, split in two for easier handling in IC.

# Output filenames
file_explo_all = 'output data/IC_wellbore_exploration_all.csv'
file_dev_all = 'output data/IC_wellbore_development_all.csv'

df_explo.to_csv(file_explo_all, encoding='utf-8-sig', index=False)
print('Created file: {}, which includes {} wells from {} to {}.'.format(file_explo_all, len(df_explo), 
                                                                        df_explo['Name'][df_explo.index[0]], df_explo['Name'][df_explo.index[-1]]))

df_dev.to_csv(file_dev_all, encoding='utf-8-sig', index=False)
print('Created file: {}, which includes {} wells from {} to {}.'.format(file_dev_all, len(df_dev), 
                                                                        df_dev['Name'][df_dev.index[0]], df_dev['Name'][df_dev.index[-1]]))

<h4>Well Attributes to create in IC</h4>

In [None]:
# The following attributes are not IC defaults and need to be created under Wells > Attributes.
# Alternatively, use the SQL code produced in the next cell to create these rows in SSMS. 

# Find the full list of attributes after all the editing you've done above
all_attributes = set(list(df_explo.columns) + list(df_dev.columns))

# Find and count those attributes you'll need to create in IC
non_default_attributes = list(set(all_attributes).difference(ic_default_attributes))
non_default_attributes.sort()
num_non_default_attributes = len(non_default_attributes)

print('The following {} attributes are not IC defaults and must be added to IC:\n'.format(num_non_default_attributes))
print(list(non_default_attributes))

In [None]:
# If you have database administration privileges, you can use this cell to generate the SQL Query code that will create Well Attributes in IC in the format:
    #INSERT INTO t_WellsUserFields (f_FieldId, f_FieldName, f_IsInputUsed, f_InputID, f_Description, f_Origin, f_SortOrder)
    #VALUES (1, 'Attribute', 'False', 0, 'Description of attribute', 0, 1);
# This assumes you have no yet created any Well Attributes in IC. If you have already, you'll need to tweak the 3 variables below.

pk_index = 0 #Enter one less than your highest pk_index
original_pk_index = 0 #Enter the same number as above (this one we won't change)
f_sortorder = 0 #Enter the next appropriate f_sortorder

print("INSERT INTO t_WellsUserFields")
print("  (f_FieldId, f_FieldName, f_IsInputUsed, f_InputID, f_Description, f_Origin, f_SortOrder)")
print("VALUES")

for i in non_default_attributes:
    pk_index += 1
    f_sortorder += 1
    if pk_index < (num_non_default_attributes + original_pk_index):
        print("  ({x}, '{y}', 'False', 0, 'Userfield {y}', 0, {z}),".format(x = pk_index, y = i, z = f_sortorder))
    else:
        print("  ({x}, '{y}', 'False', 0, 'Userfield {y}', 0, {z});".format(x = pk_index, y = i, z = f_sortorder))

# Follow these steps:
# 1. Open your IC database in SQL Server Management Studio. IC must be closed/computer restarted to open a LocalDB in SSMS.
# 2. Expand 'Tables', scroll down to 't_WellsUserFields' and right-click 'Edit Top 200 Rows'.
# 3. Press Ctrl+N to create a new query, copy and paste the following SQL code to the blank query and hit F5.

<h4>Ensure the correct co-ordinate systems are added to your IC project</h4>

In [None]:
# In IC, open Project > Properties > Coords > Coordinate Systems
# Ensure each of the following co-ordinate system are installed **before importing well headers**

lstfield = sorted(set(df_explo['Geodatum'].astype(str)))
print('Geodatum:', ', '.join(lstfield))

lstfield = sorted(set(df_explo['Grid system'].astype(str)))
print('Grid systems:', ', '.join(lstfield))

<h4>Import the data to IC</h4>

In [None]:
# Before importing data to IC, ensure you follow the last few steps to:
# - Create the appropriate Well Attributes in your IC Database.
# - Add the correct coordinate systems to your IC Project.

# Import reference files via Import > Well References
# Import well headers via Import > Headers.

# Note that, while the well header data imports very quickly, 
# IC is a bit slow to create the wells if they don't already exist. Patience!

In [None]:
print(df_explo.shape)
print(df_dev.shape)

<h3>Core (Cored Intervals)</h3>

In [None]:
df_core = pd.read_excel('input data/wellbore_core.xlsx', sheet_name='wellbore_core')
df_core.head()

In [None]:
df_core['Core sample depth - uom'].unique()

In [None]:
for index, row in df_core.iterrows():
    if row['Core sample depth - uom'] == '[ft  ]':
        df_core.loc[index, 'Top depth'] = (row['Core sample - top depth'] * 0.3048)
    else:
        df_core.loc[index, 'Top depth'] = row['Core sample - top depth']

In [None]:
for index, row in df_core.iterrows():
    if row['Core sample depth - uom'] == '[ft  ]':
        df_core.loc[index, 'Base depth'] = (row['Core sample -  bottom depth'] * 0.3048)
    else:
        df_core.loc[index, 'Base depth'] = row['Core sample -  bottom depth']

In [None]:
df_core.head()

In [None]:
df_core.dtypes
#Note extra space in 'Core sample -  bottom depth'

In [None]:
#df_core['Thickness'] = (df_core['Base depth'] - df_core['Top depth'])

#filt = (df_core['Wellbore'] == '1/2-1') | (df_core['Wellbore'] == '1/3-1') | (df_core['Wellbore'] == '31/2-17 A') 

# df_core[['Wellbore', 'Core sample number', 'Core sample - top depth', 'Core sample -  bottom depth', 
#           'Total core sample length [m]', 'Top depth', 'Base depth']].loc[filt].round(2)

df_core = df_core[['Wellbore', 'Top depth', 'Base depth', 'Core sample number']].round(2)
df_core

In [None]:
# Rename columns
rename_cols = {'Wellbore' : 'Well',
               'Core sample number': 'Legend'
               }
    
# Apply renaming
df_core.rename(columns=rename_cols, inplace=True)
df_core.head()

In [None]:
# df_core.drop(labels=['Core sample - top depth',
#                       'Core sample -  bottom depth',
#                       'Core sample depth - uom',
#                       'NPDID wellbore', 
#                       'Date updated', 
#                       'Date sync NPD'], axis=1, inplace=True)
# df_core.head()

In [None]:
# Output df_documents to file
file = 'output data/wellbore_core.csv'
df_core.to_csv(file, index=False)

<h3>Core Photos</h3>

In [None]:
#df_core_photo = pd.read_excel('https://factpages.npd.no/ReportServer_npdpublic?/FactPages/TableView/wellbore_core_photo&rs:Command=Render&rc:Toolbar=false&rc:Parameters=f&rs:Format=EXCEL&Top100=false&IpAddress=108.171.129.189&CultureCode=en', sheet_name='wellbore_core_photo')

df_core_photo = pd.read_excel('input data/wellbore_core_photo.xlsx', sheet_name='wellbore_core_photo')

df_core_photo.head()

In [None]:
# See https://pythex.org/

# Match patterns:
# 10208-10228ft
# 1802-1805m

pat = '\d{3,5}-\d{3,5}\D{1,2}'
    
#filt = df_core_photo['Core photo title'].str.extract(pat)
filt = df_core_photo['Core photo title'].str.contains(pat)

# Check rows that match pattern
df_core_photo[filt].head()

In [None]:
# Check rows that do not match pattern and make corrections
df_core_photo[~filt]

In [None]:
# Note 95 rows with erroneous values
# Apply obvious corrections then drop the rest.

# Values for well 2/4-X-47 are obviously in ft.
filt_correction = df_core_photo['Wellbore'] == '2/4-X-47'
df_core_photo.loc[filt_correction, 'Core photo title'] = (df_core_photo['Core photo title'] + 'm')
df_core_photo.loc[filt_correction]

# There are other obvious corrections to be made, but leave for now.
# Example below, but don't do this on .loc as index may as more wells added.

#['Core photo title'].loc[14234] = '2482-2483m'

In [None]:
# Assign rows that do not match pattern to new dataframe

df_core_photo_deletedrows = df_core_photo[~filt]

df_core_photo_deletedrows

In [None]:
# Keep only rows that do match pattern
# Dumps the rest (e.g. '2044', 'Core 2')

df_core_photo = df_core_photo[filt]
print(df_core_photo.shape)
df_core_photo

In [None]:
# Check for null values
df_core_photo.isna().sum()

In [None]:
# Check datatypes
df_core_photo.dtypes

In [None]:
# TO DO: REPLACE AFTER SPLITTING OUT?????????? 

df_core_photo['Core photo title'].replace({'mj': 'm', #one erroneous 'mj' value
                                           'n': ',m', #one erroneous 'n' value
                                           'm': ',m', #then replace all 'm'
                                           'M': ',m',
                                           'ft': ',ft',
                                           'FT': ',ft'
                                          }, regex=True, inplace=True)

df_core_photo['Core photo title'].replace({'-': ','}, regex=True, inplace=True)

df_core_photo.head()

In [None]:
df_core_photo.tail()

In [None]:
df_core_photo[['Top depth', 'Base depth', 'Unit']] = df_core_photo['Core photo title'].str.split(pat=',', n=2, expand=True)
df_core_photo

In [None]:
df_core_photo['Unit'].unique()

In [None]:
# Drop columns that contain nulls
df_core_photo.isna().sum()

#1 extra well with no unit?

In [None]:
df_core_photo

In [None]:
df_core_photo.dtypes

In [None]:
for index, row in df_core_photo.iterrows():
    if row['Unit'] == 'ft':
        df_core_photo.loc[index, 'Top depth'] = int(row['Top depth']) * 0.3048
    else:
        df_core_photo.loc[index, 'Top depth'] = int(row['Top depth'])

In [None]:
for index, row in df_core_photo.iterrows():
    if row['Unit'] == 'ft':
        df_core_photo.loc[index, 'Base depth'] = int(row['Base depth']) * 0.3048
    else:
        df_core_photo.loc[index, 'Base depth'] = int(row['Base depth'])

In [None]:
df_core_photo.round(2).head()

In [None]:
# See https://stackabuse.com/download-files-with-python/
# Using the request Module

# Would also be useful to create folders for each wellbore

def save_core_photo():
    
    for index, row in df_core_photo_deletedrows.iterrows(): # Using filterered dataframe for speed
        
        url = row['Core photo URL']
        filename = url.split('/')[-1]
        filepath = 'core_photo_jpgs\\'

        print('Beginning file download with requests: ', url)
        r = requests.get(url)
        
        with open('{}/{}'.format(filepath, filename), 'wb') as f:
            f.write(r.content)
        print('Saved to: {}/{}'.format(filepath, filename))

        # Retrieve HTTP meta-data
        print(r.status_code)
        print(r.headers['content-type'])
        print(r.encoding)
        
########################################################################################save_core_photo()

# Sometimes getting error: 
# SSLError: HTTPSConnectionPool(host='factpages.npd.no', port=443): 
# Max retries exceeded with url: /pbl/core_photo_jpgs/3279_06_2044_2049m.jpg 
# (Caused by SSLError(SSLError("bad handshake: Error([('SSL routines', 'ssl3_get_server_certificate', 'certificate verify failed')],)",),))

In [None]:
# Add a new column with the filepath, e.g. '.\3279_06_2044_2049m.jpg'
# Where . represents the current directory

# Escaped insert backslash
df_core_photo['Legend'] = '.\\' + df_core_photo['Core photo URL'].str.split('/').str[-1]
df_core_photo

In [None]:
# Rename columns
rename_cols = {'Wellbore' : 'Well'}

# Apply renaming
df_core_photo.rename(columns=rename_cols, inplace=True)
df_core_photo

In [None]:
df_core_photo = df_core_photo[['Well', 'Top depth', 'Base depth', 'Legend']]
df_core_photo.head()

In [None]:
# Output to file
file = 'output data/wellbore_core_photo.csv'
df_core_photo.to_csv(file, index=False)

# Check output
pd.read_csv(file).head(2)

<h3>Thin Sections</h3>

In [None]:
#df_thin_section = pd.read_excel('https://factpages.npd.no/ReportServer_npdpublic?/FactPages/TableView/wellbore_thin_section&rs:Command=Render&rc:Toolbar=false&rc:Parameters=f&rs:Format=EXCEL&Top100=false&IpAddress=108.171.128.189&CultureCode=en')

df_thin_section = pd.read_excel('input data/wellbore_thin_section.xlsx', sheet_name='wellbore_thin_section')

In [None]:
print(df_thin_section.shape)
df_thin_section.head()

In [None]:
df_thin_section['Unit'].unique()

In [None]:
df_thin_section.isna().sum()

In [None]:
for index, row in df_thin_section.iterrows():
    if row['Unit'] == '[ft  ]':
        df_thin_section.loc[index, 'Depth'] = row['Depth'] * 0.3048
    else:
        df_thin_section.loc[index, 'Depth'] = row['Depth']
        
df_thin_section.drop(columns='Unit', inplace=True)

In [None]:
df_thin_section.head()

In [None]:
for index, row in df_thin_section.iterrows():
    df_thin_section.loc[index, 'Legend'] = 'Thin section no. ' + str(row['Number'])
    
df_thin_section.head()

In [None]:
# Rename columns
rename_cols = {'Wellbore' : 'Well'
               }
    
# Apply renaming
df_thin_section.rename(columns=rename_cols, inplace=True)
df_thin_section

In [None]:
df_thin_section = df_thin_section[['Well', 'Depth', 'Legend']]
df_thin_section = df_thin_section.round(2)
df_thin_section

In [None]:
# Output df_documents to file
file = 'output data/wellbore_thin_section.csv'
df_thin_section.to_csv(file, index=False)

# Check output
pd.read_csv(file).head(2)

In [None]:
# Point - comment
# No new IC data types

<h3>CO2</h3>

In [None]:
#df_co2 = pd.read_excel('https://factpages.npd.no/ReportServer_npdpublic?/FactPages/TableView/wellbore_co2&rs:Command=Render&rc:Toolbar=false&rc:Parameters=f&rs:Format=EXCEL&Top100=false&IpAddress=108.171.128.189&CultureCode=en',
#                      skiprows=[0])

df_co2 = pd.read_excel('input data/wellbore_co2.xlsx', sheet_name='wellbore_co2', skiprows=[0])

In [None]:
print(df_co2.shape)
df_co2.head()

In [None]:
df_co2.drop(labels='Unnamed: 0', axis=1, inplace=True)
df_co2

In [None]:
df_co2.isna().sum()

In [None]:
df_co2.drop(labels=['Sample method', 'NPDID wellbore', 'Date sync NPD'], axis=1, inplace=True)

In [None]:
df_co2.columns

In [None]:
# Rename columns
rename_cols = {'Wellbore name' : 'Well',
               'Sample top depth [m]' : 'Top depth',
               'Sample bottom depth [m]' : 'Base depth'
               }
    
# Apply renaming
df_co2.rename(columns=rename_cols, inplace=True)
df_co2

In [None]:
# Output to file
file = 'output data/wellbore_co2.csv'
df_co2.to_csv(file, index=False)

# Check output
pd.read_csv(file).head(2)

In [None]:
# Interval-point
# Create data types: 'Sample sequence number', 'CO2 [vol %]', 'Sample type'

<h3>Oil Samples</h3>

In [None]:
#df_oil_sample = pd.read_excel('https://factpages.npd.no/ReportServer_npdpublic?/FactPages/TableView/wellbore_oil_sample&rs:Command=Render&rc:Toolbar=false&rc:Parameters=f&rs:Format=EXCEL&Top100=false&IpAddress=108.171.128.189&CultureCode=en',
#                      skiprows=[0])

df_oil_sample = pd.read_excel('input data/wellbore_oil_sample.xlsx', sheet_name='wellbore_oil_sample')

In [None]:
print(df_oil_sample.shape)
df_oil_sample.head()

In [None]:
df_oil_sample.isna().sum()

In [None]:
df_oil_sample.drop(labels=['NPDID wellbore', 'Date updated', 'Date sync NPD'], axis=1, inplace=True)

In [None]:
df_oil_sample.columns

In [None]:
# Rename columns
rename_cols = {'Wellbore' : 'Well',
               'Top depth MD [m]' : 'Top depth',
               'Bottom depth MD [m]' : 'Base depth'
               }
    
# Apply renaming
df_oil_sample.rename(columns=rename_cols, inplace=True)
df_oil_sample

In [None]:
df_oil_sample

In [None]:
# Output to file
file = 'output data/wellbore_oil_sample.csv'
df_oil_sample.to_csv(file, index=False)

# Check output
pd.read_csv(file).head(2)

In [None]:
df_oil_sample.columns

# Interval - point
# Create data types: 'Test type', 'Bottle number', 'Fluid type', 'Test time ', 'Received date'

# What to do about rows with only with 0/Nan values?

<h3>Lithostratigraphy</h3>

In [None]:
# Lithostrat available in two places. 
# Compare the length, and number of unique wells in both sources.

# (A) NPD FactPages > Wellbore > Table View > With > Lithostratigraphy
    # File: wellbore_formation_top.xlsx
    # Sheet: wellbore_formation_top
    # Link: https://factpages.npd.no/ReportServer_npdpublic?/FactPages/TableView/wellbore_formation_top&rs:Command=Render&rc:Toolbar=false&rc:Parameters=f&rs:Format=EXCEL&Top100=false&IpAddress=108.171.128.189&CultureCode=en

df_a = pd.read_excel('input data/wellbore_formation_top.xlsx', sheet_name='wellbore_formation_top')
print('Source A:', df_a.shape)
print(df_a['Wellbore name'].nunique())

# (B) NPD FactPages > Stratigraphy > Table View > Wellbores
    # File: strat_litho_wellbore.xlsx
    # Sheet: strat_litho_wellbore
    # Link: https://factpages.npd.no/ReportServer_npdpublic?/FactPages/TableView/strat_litho_wellbore&rs:Command=Render&rc:Toolbar=false&rc:Parameters=f&rs:Format=EXCEL&Top100=false&IpAddress=108.171.128.189&CultureCode=en

df_b = pd.read_excel('input data/strat_litho_wellbore.xlsx', sheet_name='strat_litho_wellbore')
print('Source B:', df_b.shape)
print(df_b['Wellbore name'].nunique())

# Both contain (almost) the same number of rows.
# Source A is preferrable as it has an exra column, 'Lithostrat. unit, parent'
# which will come in handy assigning parents to each text dictionary entry.

In [None]:
# Use Source A

# df_formation_top = pd.read_excel('https://factpages.npd.no/ReportServer_npdpublic?/FactPages/TableView/wellbore_formation_top&rs:Command=Render&rc:Toolbar=false&rc:Parameters=f&rs:Format=EXCEL&Top100=false&IpAddress=108.171.128.189&CultureCode=en', 
#                          sheet_name='wellbore_formation_top')

df_lithostrat = pd.read_excel('input data/wellbore_formation_top.xlsx', sheet_name='wellbore_formation_top')

# Print column titles
print("Lithostratigraphy wellbore well header column titles:")
print(list(df_lithostrat.columns))

In [None]:
(num_lithostrat_rows, num_lithostrat_cols) = df_lithostrat.shape
print('{} rows and {} columns in Exploration wells.'.format(num_lithostrat_rows, num_lithostrat_cols))

In [None]:
df_lithostrat.head()

In [None]:
df_lithostrat.tail()

In [None]:
#Rename columns for csv
rename_stratcols = {'Wellbore name' : 'Well',
                    'Top depth [m]' : 'Top depth',
                    'Bottom depth [m]' : 'Base depth',
                    'Lithostrat. unit' : 'Legend'
                    }

#Apply renaming to dataframe
df_lithostrat.rename(columns=rename_stratcols, inplace=True)

# Create new dataframe called "df_formation_top"
# Need to keep other columns df_lithostrat for later when writing to database

df_formation_top = df_lithostrat[['Well', 'Top depth', 'Base depth', 'Legend']]
df_formation_top.head()

In [None]:
# Output to file
file = 'output data/wellbore_formation_top.csv'
df_formation_top.to_csv(file, index=False)

# Check output
pd.read_csv(file).head(2)

<h3>Drill stem tests</h3>

In [None]:
#df_dst = pd.read_excel('https://factpages.npd.no/ReportServer_npdpublic?/FactPages/TableView/wellbore_dst&rs:Command=Render&rc:Toolbar=false&rc:Parameters=f&rs:Format=EXCEL&Top100=false&IpAddress=108.171.128.189&CultureCode=en')

df_dst = pd.read_excel('input data/wellbore_dst.xlsx', sheet_name='wellbore_dst')

In [None]:
print(df_dst.shape)
df_dst.head(20)

In [None]:
df_dst.isna().sum()

In [None]:
df_dst.drop(labels=['NPDID wellbore', 'Date updated', 'Date sync NPD'], axis=1, inplace=True)

In [None]:
df_dst.columns

In [None]:
#Rename well header columns to match dbo.WELLS (does capitalisation matter?)
rename_cols = {'Wellbore' : 'Well',
               'From depth MD [m]' : 'Top depth',
               'To depth MD [m]' : 'Base depth'
               }
    
#Apply renaming
df_dst.rename(columns=rename_cols, inplace=True)
df_dst.head(20)

In [None]:
# Output to file
file = 'output data/wellbore_dst.csv'
df_dst.to_csv(file, index=False)

# Check output
pd.read_csv(file).head(2)

In [None]:
# Note overlapping depths in DST data
# For example in well 1/9-1

filt = df_dst['Well'] == '1/9-1'
df_dst[['Well', 'Top depth', 'Base depth']][filt]

In [None]:
# Calculate overlap in Drill Stem Tests
# Zero overlap for first test in each well, no negative overlaps (representing gaps)
# Iterate over DataFrame rows as (index, Series) pairs.

for index, row in df_dst[1:].iterrows():
    
    current_row = df_dst.loc[index]
    last_row = df_dst.iloc[df_dst.index.get_loc(index) - 1]
    
    # Zero overlap for first test in each well
    if current_row['Well'] != last_row['Well']:
        df_dst.loc[index, 'Overlap'] = 0
    
    else:
        # Difference between base of last row and top of current row
        if current_row['Top depth'] < last_row['Base depth']:
            df_dst.loc[index, 'Overlap'] = last_row['Base depth'] - current_row['Top depth']
        else:
            df_dst.loc[index, 'Overlap'] = 0 

df_dst_overlap = df_dst[['Well', 'Test number', 'Top depth', 'Base depth', 'Overlap']].round(1)

# Output to file
file = "output data/calc_dst_overlap.xlsx"
df_dst_overlap.to_excel(file, index=False)
pd.read_excel(file).head(20)

In [None]:
df = pd.DataFrame({'species': ['bear', 'bear', 'marsupial'],
                  'population': [1864, 22000, 80000]},
                  index=['panda', 'polar', 'koala'])
df

In [None]:
for label, content in df.items():
    print('label:', label)
    print('content:', content, sep='\n')

In [None]:
for label, content in df.iterrows():
    print('label:', label)
    print('content:', content, sep='\n')

<h3>Casing and leak-off tests</h3>

In [None]:
#df_casinglot = pd.read_excel('https://factpages.npd.no/ReportServer_npdpublic?/FactPages/TableView/wellbore_casing_and_lot&rs:Command=Render&rc:Toolbar=false&rc:Parameters=f&rs:Format=EXCEL&Top100=false&IpAddress=108.171.128.189&CultureCode=en')

df_casinglot = pd.read_excel('input data/wellbore_casing_and_lot.xlsx', sheet_name='wellbore_casing_and_lot')

In [None]:
print(df_casinglot.shape)
df_casinglot.head()

In [None]:
df_casinglot.isna().sum()

In [None]:
df_casinglot.drop(labels=['NPDID wellbore', 'Date updated', 'Date sync NPD'], axis=1, inplace=True)

In [None]:
df_casinglot.columns

In [None]:
# Rename columns
rename_cols = {'Wellbore' : 'Well',
               'Casing depth [m]' : 'Depth'
               }
    
# Apply renaming
df_casinglot.rename(columns=rename_cols, inplace=True)
df_casinglot

In [None]:
# Output to file
file = 'output data/wellbore_casing_and_lot.csv'
df_casinglot.to_csv(file, index=False)

# Check output
pd.read_csv(file).head(2)

<h3>Drilling mud</h3>

In [None]:
#df_mud = pd.read_excel('https://factpages.npd.no/ReportServer_npdpublic?/FactPages/TableView/wellbore_mud&rs:Command=Render&rc:Toolbar=false&rc:Parameters=f&rs:Format=EXCEL&Top100=false&IpAddress=108.171.128.189&CultureCode=en')

df_mud = pd.read_excel('input data/wellbore_mud.xlsx', sheet_name='wellbore_mud')

In [None]:
print(df_mud.shape)
df_mud.head()

In [None]:
df_mud.drop(labels='Unnamed: 0', axis=1, inplace=True)
df_mud.head()

In [None]:
df_mud.isna().sum()

In [None]:
df_mud.drop(labels=['NPDID wellbore', 'Date updated', 'Date sync NPD'], axis=1, inplace=True)

In [None]:
df_mud.columns

In [None]:
# Rename columns
rename_cols = {'Wellbore' : 'Well',
               'Depth MD [m]' : 'Depth'
               }
    
# Apply renaming
df_mud.rename(columns=rename_cols, inplace=True)
df_mud

In [None]:
# Output to file
file = 'output data/wellbore_mud.csv'
df_mud.to_csv(file, index=False)

# Check output
pd.read_csv(file).head(2)

<h3>Documents</h3>

In [None]:
#df_corephotos = pd.read_excel('https://factpages.npd.no/ReportServer_npdpublic?/FactPages/TableView/wellbore_document&rs:Command=Render&rc:Toolbar=false&rc:Parameters=f&rs:Format=EXCEL&Top100=false&IpAddress=108.171.128.188&CultureCode=en')

df_document = pd.read_excel('input data/wellbore_document.xlsx', sheet_name='wellbore_document')
df_document.head()

In [None]:
df_document['Title'] = df_document['Wellbore'] + ' ' + df_document['Document type'] + ': ' + df_document['Document name'] + ' (' + df_document['Document format'] + ')'
df_document.head()

In [None]:
df_document = df_document[['Wellbore', 'Title', 'Document URL']]
df_document.head()

In [None]:
# Rename columns in df_document
rename_cols = {'Wellbore' : 'Well',
               'Document URL' : 'URL'
               }
    
#Apply renaming
df_document.rename(columns=rename_cols, inplace=True)
df_document.head()

In [None]:
df_explo_references.head()

In [None]:
# Rename columns in df_explo_references
rename_cols = {'Name' : 'Well',
               }
    
#Apply renaming
df_explo_references.rename(columns=rename_cols, inplace=True)
df_explo_references.head()

In [None]:
# Combine References and Documents dataframes
df_refs_and_docs = df_explo_references.append(df_document) 
df_refs_and_docs.sort_values(['Well', 'Title'], ascending=[True, False], ignore_index=True, inplace=True)
df_refs_and_docs.head(20)

In [None]:
# Output df_document to file
file = 'output data/wellbore_references_and_documents.csv'
df_refs_and_docs.to_csv(file, index=False)

# Check output
pd.read_csv(file).head(5)

<h3>Summary</h3>

In [None]:
# Exploration well headers
# Development well headers
# Cores
# Core photos
# Thin sections
# CO2
# Oil samples
# Lithostratigraphy
# Drill stem tests
# Casing and leak-off tests
# Drilling mud
# Documents

df_explo_references.head()

In [None]:
df_summary = pd.DataFrame({'Data type':
                        ['Exploration well headers',
                         'Development well headers',
                         'Exploration references',
                         'Development references',
                         'Cores',
                         'Core photos',
                         'Thin sections',
                         'CO2',
                         'Oil samples',
                         'Lithostratigraphy',
                         'Drill stem tests',
                         'Casing and leak-off tests',
                         'Drilling mud',
                         'Documents',
                         'Documents & References combined'
                        ], 
                        'No. unique wells':
                        [df_explo['Name'].nunique(),
                         df_dev['Name'].nunique(),
                         df_explo_references['Well'].nunique(),
                         df_dev_references['Name'].nunique(),
                         df_core['Well'].nunique(),
                         df_core_photo['Well'].nunique(),
                         df_thin_section['Well'].nunique(),
                         df_co2['Well'].nunique(),
                         df_oil_sample['Well'].nunique(),
                         df_formation_top['Well'].nunique(),
                         df_dst['Well'].nunique(),
                         df_casinglot['Well'].nunique(),
                         df_mud['Well'].nunique(),
                         df_document['Well'].nunique(),
                         df_refs_and_docs['Well'].nunique()
                        ],
                        'No. records':
                        [df_explo['Name'].shape[0],
                         df_dev['Name'].shape[0],
                         df_explo_references['Well'].shape[0],
                         df_dev_references['Name'].shape[0],
                         df_core['Well'].shape[0],
                         df_core_photo['Well'].shape[0],
                         df_thin_section['Well'].shape[0],
                         df_co2['Well'].shape[0],
                         df_oil_sample['Well'].shape[0],
                         df_formation_top['Well'].shape[0],
                         df_dst['Well'].shape[0],
                         df_casinglot['Well'].shape[0],
                         df_mud['Well'].shape[0],
                         df_document['Well'].shape[0],
                         df_refs_and_docs['Well'].shape[0]]
                       })

print('Data for', (df_explo['Name'].nunique()+df_dev['Name'].nunique()), 'wells in total.')

#Add thousands separators
df_summary['No. unique wells'] = df_summary['No. unique wells'].apply(lambda x : "{:,}".format(x))
df_summary['No. records'] = df_summary['No. records'].apply(lambda x : "{:,}".format(x))
df_summary = df_summary.style.hide_index()
df_summary

<h3>NPD Shapefiles</h3>

In [None]:
# NPD shapefiles at https://www.npd.no/en/about-us/information-services/available-data/map-services/

npd_shapefiles = {
    'AFEX': 'https://factpages.npd.no/downloads/shape/afxAreaCurrent.zip',
    'AFEX_block': 'https://factpages.npd.no/downloads/shape/afxAreaSplitByBlock.zip',
    'Licence': 'https://factpages.npd.no/downloads/shape/prlAreaCurrent.zip',
    'Licence_block': 'https://factpages.npd.no/downloads/shape/prlAreaSplitByBlock.zip',
    'Licencing APA': 'https://factpages.npd.no/downloads/shape/apaAreaGross.zip',
    'Licencing APA_block': 'https://factpages.npd.no/downloads/shape/apaAreaNet.zip',
    'Wellbore': 'https://factpages.npd.no/downloads/shape/wlbPoint.zip',
    #Ignore Wellbore - Fontfile for presentation TTF
    'BAA': 'https://factpages.npd.no/downloads/shape/baaAreaCurrent.zip',
    'BAA_block': 'https://factpages.npd.no/downloads/shape/baaAreaSplitByBlock.zip',
    'Field': 'https://factpages.npd.no/downloads/shape/fldArea.zip',
    'Discovery': 'https://factpages.npd.no/downloads/shape/dscArea.zip',
    'Facility': 'https://factpages.npd.no/downloads/shape/fclPoint.zip',
    'Survey': 'https://factpages.npd.no/downloads/shape/seaArea.zip',
    'TUF': 'https://factpages.npd.no/downloads/shape/pipLine.zip',
    'Block': 'https://factpages.npd.no/downloads/shape/blkArea.zip',
    'Quadrant': 'https://factpages.npd.no/downloads/shape/qadArea.zip',
    'Sub area': 'https://factpages.npd.no/downloads/shape/subArea.zip'
}

for key, value in npd_shapefiles.items(): 
    print(value)

#print(npd_shapefiles.values)

In [None]:
# See https://stackoverflow.com/questions/9419162/download-returned-zip-file-from-url
# https://factpages.npd.no/downloads/shape/afxAreaCurrent.zip

def save_shapefiles():
    
    for key, value in npd_shapefiles.items(): 
        
        filepath = 'shapefiles\\'
        zip_file_url = value

        print('Beginning file download with requests: ', zip_file_url)
        r = requests.get(zip_file_url)

        z = zipfile.ZipFile(io.BytesIO(r.content))
        z.extractall(filepath)

        print('Files extracted to: {}'.format(filepath))
        
save_shapefiles()

# Import shapefile to IC?
# Populate colours, e.g. Fields and Discoveries with OGW colours?

# Error:
# SSLError: HTTPSConnectionPool(host='factpages.npd.no', port=443): 
# Max retries exceeded with url: /downloads/shape/afxAreaCurrent.zip 
# (Caused by SSLError(SSLError("bad handshake: 
# Error([('SSL routines', 'ssl3_get_server_certificate', 'certificate verify failed')],)",),))

<h1>Part 2: Prepare data for import to SQL Server</h1><br>
May split the following out into separate module later.

In [None]:
from datetime import datetime
from xlrd import xldate
import pyodbc
import sqlalchemy
from sqlalchemy import create_engine, MetaData, Table, delete, insert, select, func, sql
from sqlalchemy.types import Integer
import urllib
import pprint

<h3>Well Headers</h3>

In [None]:
# All column titles in dbo.WELLS table

# IS THIS USED???
# ic_dbowells_columns = {"pk_index", "well_id", "units", "created", "creator", "modified", "modifier", "project", 
#                           "rte", "sea_bed", "rig_elevation", "datum", "terminal_depth", "spud_date", "completion_date", 
#                           "quadrant", "sub_block", "kelly", "symbol_id", "client", "utmzone", "code", "name", 
#                           "field", "location", "country", "basin", "name1", "name2", "strat_schemes", "grnd_elev", 
#                           "f_block", "grid_x", "grid_y", "latitude", "longtitude", "geodatum", "facility", 
#                           "discovery_name", "seismic_line", "intent", "f_ipid", "f_licenceNumber", "f_api", 
#                           "f_comment", "f_province", "f_county", "f_state", "f_section", "f_Township", "f_range", "f_uwi"}

# Rename well header columns to match dbo.WELLS (does capitalisation matter?)
rename_for_sql = {'Name' : 'name',
                'Alternate 1' : 'name1',
                'Operator' : 'client',
                'Licence number' : 'f_licenceNumber',
                'Intent' : 'intent',
                'Field' : 'field',
                'SPUD date' : 'spud_date',
                'Completion date' : 'completion_date',
                'Discovery name' : 'discovery_name',
                'Seismic line' : 'seismic_line',
                'Country' : 'country',
                'KBE' : 'kelly',
                'Terminal depth' : 'terminal_depth',
                'Water depth' : 'sea_bed',
                'Location' : 'location',
                'Facility' : 'facility',
                'Geodatum' : 'geodatum',
                'Latitude' : 'latitude',
                'Longitude' : 'longtitude', #spelled incorrectly to match column longtitude in dbo.WELLS!
                'Grid system' : 'utmzone',
                'Surface X' : 'grid_x',
                'Surface Y' : 'grid_y',
                'Quadrant' : 'quadrant',
                'Block' : 'f_block'}
    
# Apply renaming to each of the dataframes
df_explo.rename(columns=rename_for_sql, inplace=True)
df_dev.rename(columns=rename_for_sql, inplace=True)

# QC renamed columns
print("Renamed attributes only:")
renamed_columns = list(rename_for_sql.values())
df_explo[renamed_columns].head(n=3)
# df_dev[renamed_columns].head(n=3)

In [None]:
now = datetime.now()
now

timestampStr = now.strftime("%Y-%m-%d %H:%M:%S.%f")
print('Current Timestamp : ', timestampStr)

In [None]:
# Apply this function to created and modified columns, then push to_sql.

def datetime2ole(date):
    #convert date string to a datetime object
    date = datetime.strptime(date, "%Y-%m-%d %H:%M:%S.%f")
    #Calculate OLE manually from OLE origin date
    OLE_TIME_ZERO = datetime(1899, 12, 30)
    delta = date - OLE_TIME_ZERO
    return float(delta.days) + (float(delta.seconds) / 86400)  # 86,400 seconds in day

now = datetime2ole(timestampStr)
now

In [None]:
# Add new column(s) for dbo.WELLS but not in well header file

df_explo['datum'] = 4
df_dev['datum'] = 4
df_explo['symbol_id'] = 3146 #correct later, refer to Status?
df_dev['symbol_id'] = 3146 #correct later, refer to Status?
df_explo['units'] = 'M'
df_dev['units'] = 'M'
df_explo['created'] = now
df_dev['created'] = now
df_explo['creator'] = 1 #correct later
df_dev['creator'] = 1 #correct later
# df_explo['modified'] = null
# df_dev['modified'] = null
# df_explo['modifier'] = null
# df_dev['modifier'] = null
# df_explo['project'] = null
# df_dev['project'] = null

# Check the result
df_explo[['name', 'datum', 'symbol_id', 'units', 'created', 'creator']].head(n=3)

In [None]:
# Duplicate columns, preserving the original
df_explo['well_id'] = df_explo['NPDID wellbore']
df_dev['well_id'] = df_dev['NPDID wellbore']

df_explo['f_uwi'] = df_explo['NPDID wellbore']
df_dev['f_uwi'] = df_dev['NPDID wellbore']

df_explo['code'] = df_explo['name']
df_dev['code'] = df_dev['name']

# Limit seismic_line to match nvarchar(80) limit
df_explo["seismic_line"] = df_explo["seismic_line"].str[:80]

# Check the result
df_explo[['name', 'well_id', 'f_uwi', 'code', 'seismic_line']].head(n=10)

In [None]:
# Filter and re-order explo_dbwells to match dbo.WELLS (minus those not required, listed below)

explo_dbowells_order = ["well_id", "units", "created", "creator", "modified", "modifier", "project", 
                        "sea_bed", "datum", "terminal_depth", "spud_date", "completion_date", "quadrant", "kelly", 
                        "symbol_id", "client", "utmzone", "code", "name", "field", "location", "country", "name1", 
                        "f_block", "grid_x", "grid_y", "latitude", "longtitude", "geodatum", "facility", "discovery_name", 
                        "seismic_line", "intent", "f_licenceNumber", "f_uwi"]

df_explo_dbowells = df_explo.filter(explo_dbowells_order)

df_explo_dbowells = df_explo_dbowells.reindex(columns=explo_dbowells_order)

df_explo_dbowells.head(1)


# NOT REQUIRED
#Columns in dbo.WELLS (correct order) that are not in df_explo_dbowells:
#rte, rig_elevation, sub_block, basin, name2, strat_schemes, grnd_elev, "f_ipid, f_api, f_comment, 
#f_province, f_county, f_state, f_section, f_Township, f_range

In [None]:
# Configure and connect to SQL Server database

pp = pprint.PrettyPrinter(indent=10)

params = 'DRIVER={ODBC Driver 13 for SQL Server};' \
         'SERVER=5SQFPQ2\SQLEXPRESS;' \
         'PORT=1433;' \
         'DATABASE=Test6;' \
         'Trusted_Connection=yes;'
            
params = urllib.parse.quote_plus(params)

engine = create_engine('mssql+pyodbc:///?odbc_connect=%s' % params)

metadata = MetaData()
  
#Create Table objects
wells = Table('WELLS', metadata, autoload=True, autoload_with=engine)
wellsuserfieldsvalues = Table('t_WellsUserFieldsValues', metadata, autoload=True, autoload_with=engine)
wellsuserfields = Table('t_WellsUserFields', metadata, autoload=True, autoload_with=engine)
datalithostrat = Table('DATA_Lithostrat', metadata, autoload=True, autoload_with=engine)
projects = Table('PROJECTS', metadata, autoload=True, autoload_with=engine)
wellqueries = Table('WELLQUERIES', metadata, autoload=True, autoload_with=engine)
projectwells = Table('PROJECTWELLS', metadata, autoload=True, autoload_with=engine)

connection = engine.connect()
                       
#pp.pprint(repr(wells))

In [None]:
# Delete all data from dbo.t_wellsuserfieldsvalues and dbo.WELLS

# Select statement
stmt = select([func.count(wells.columns.name)])

# Execute the select statement and use the scalar() fetch method to save the record count
connection.execute(stmt).scalar()

# Delete all records from ? table
delete_stmt = delete(projectwells)
result_proxy = connection.execute(delete_stmt)

# Delete all data from t_wellsuserfieldsvalues table first due to dependency
# See https://www.codeproject.com/Questions/677277/I-am-getting-error-while-delete-entry
delete_stmt = delete(wellsuserfieldsvalues)
result_proxy = connection.execute(delete_stmt)

# Delete all records from WELLS table
delete_stmt = delete(wells)
result_proxy = connection.execute(delete_stmt)

# Print affected row count
result_proxy.rowcount

# Print results of the executing statement to verify there are no rows
print(connection.execute(stmt).fetchall())

# TO DO: Tried to turn the above into a function to avoid repetition but the sqlalchemy select statement doesn't accept parameters.
# See doc on https://stackoverflow.com/questions/19314342/python-sqlalchemy-pass-parameters-in-connection-execute

In [None]:
df_explo_dbowells.dtypes

In [None]:
# Create a function 
def sqlselect_rows(tablename):
    s = select(tablename)
    result = connection.execute(s)
    
    #These two lines are equivalent to:
    #result = engine.execute('SELECT * FROM PROJECTS')

    for row in result:
        print(row)
    
    result.close()
    
# Use sqlselect_rows([tablename])

In [None]:
df_explo_dbowells

In [None]:
# Write entire df_explo_dbwells to SQL Server database
# See https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html

df_explo_dbowells.to_sql('WELLS', engine, if_exists='append', index = False)

print('dbo.WELLS')
sqlselect_rows([wells])

# Still to correct datum, symbol_id, creator. I think these might have to read other tables ()
# Is well_id ok as npdid_wellbore? Will this cause any problems creating new wells in IC?

# ProgrammingError: ('The SQL contains 2014 parameter markers, but 67550 parameters were supplied', 'HY000')
# WELLS is referenced by PROJECTWELLS.well_id and t_WellsUserFieldsValues.well_id
# Original script uses Append rather than replace, refer to Azure Notebooks.

In [None]:
# Print entire dbo.WELLS table
# See https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_sql.html#pandas.read_sql

sql = '''
SELECT *
FROM dbo.WELLS
'''

pd.read_sql_query(sql, engine)

In [None]:
# Look up status ~ symbol_id from dbo.SYMBOLS
# Status symbols correspond to the Well Symbols dictionary, dic_id 32003.
df_wellsymbols = pd.read_sql('SELECT symbol_id, dic_id, description FROM dbo.SYMBOLS WHERE dic_id=32003', engine)
df_wellsymbols[['symbol_id', 'description']]

In [None]:
# Look user_id in dbo.userdef which is used in creator & modifier?
# Status symbols correspond to the Well Symbols dictionary, dic_id 32003.
df_users = pd.read_sql('SELECT user_id, usrid, name FROM dbo.userdef', engine)
df_users.head()

In [None]:
print(len(non_default_attributes))
print(non_default_attributes)

In [None]:
# Call DataFrame constructor on list of attributes
df_wellsuserfields = pd.DataFrame(non_default_attributes, columns =['f_FieldName'])

df_wellsuserfields['f_FieldID'] = range(1,len(df_wellsuserfields)+1)
df_wellsuserfields['f_IsInputUsed'] = False
df_wellsuserfields['f_InputID'] = 0
df_wellsuserfields['f_Description'] = df_wellsuserfields['f_FieldName']
df_wellsuserfields['f_Origin'] = 0
df_wellsuserfields['f_SortOrder'] = range(1,len(df_wellsuserfields)+1)

df_wellsuserfields

In [None]:
df_wellsuserfields.dtypes

In [None]:
# Delete all data from dbo.t_WellsUserFields

stmt = select([func.count(wellsuserfields.columns.f_FieldName)])

# Execute the select statement and use the scalar() fetch method to save the record count
connection.execute(stmt).scalar()

# Delete all records from ? table
delete_stmt = delete(wellsuserfields)
result_proxy = connection.execute(delete_stmt)

# Print affected row count
result_proxy.rowcount

# Print results of the executing statement to verify there are no rows
print(connection.execute(stmt).fetchall())

In [None]:
# Write entire df_wellsuserfields to SQL Server database
df_wellsuserfields.to_sql('t_WellsUserFields', engine, if_exists='replace', index = False)

print('dbo.t_WellsUserFields')
sqlselect_rows([wellsuserfields])

In [None]:
# dbo.t_WellsUserFieldsValues
# Build dataframe of wellsuserfieldsvalues

df_explo_nondefaultattributes = df_explo.filter(non_default_attributes)
df_explo_nondefaultattributes['name'] = df_explo_dbowells['name']
df_explo_nondefaultattributes['well_id'] = df_explo_dbowells['well_id']

#df_explo_nondefaultattributes = df_explo_nondefaultattributes.reindex(columns=df_explo_nondefaultattributes)

df_explo_nondefaultattributes.head(5)

In [None]:
# Dates in attributes appear as "#2019-10-03 00:00:00.0000000"
# Reformat without time, but maintain datetime64[ns] data type.

df_explo_nondefaultattributes['Date all updated'] = pd.to_datetime(df_explo_nondefaultattributes['Date all updated'].dt.strftime('%Y-%m-%d'))
df_explo_nondefaultattributes['Date main level updated'] = pd.to_datetime(df_explo_nondefaultattributes['Date main level updated'].dt.strftime('%Y-%m-%d'))
df_explo_nondefaultattributes['Publication date'] = pd.to_datetime(df_explo_nondefaultattributes['Publication date'].dt.strftime('%Y-%m-%d'))
df_explo_nondefaultattributes['Release date'] = pd.to_datetime(df_explo_nondefaultattributes['Release date'].dt.strftime('%Y-%m-%d'))

# Date sync NPD appears as "25.01.2020", which is Python dtype('O') for Object.
# df_explo_nondefaultattributes['Date sync NPD'].dtypes
# Convert to datetime

df_explo_nondefaultattributes['Date sync NPD'] = pd.to_datetime(df_explo_nondefaultattributes['Date sync NPD'].dt.strftime('%Y-%m-%d'))

#df_explo_nondefaultattributes.dtypes
df_explo_nondefaultattributes[['Date all updated',
                              'Date main level updated',
                              'Publication date',
                              'Release date',
                              'Date sync NPD']]

# Note: NaT is a pandas null value, pd.NaT

In [None]:
df_explo_nondefaultattributes = pd.melt(df_explo_nondefaultattributes, id_vars='well_id')

df_explo_nondefaultattributes.columns = ['f_WellId', 'f_FieldName', 'f_StringValue']

#df_explo_nondefaultattributes.sort_values(by='f_WellId')

df_explo_nondefaultattributes.isnull().sum()

In [None]:
df_wellsuserfieldsvalues = df_wellsuserfields.merge(df_explo_nondefaultattributes, on='f_FieldName', how='inner')
df_wellsuserfieldsvalues = df_wellsuserfieldsvalues[['f_WellId', 'f_FieldID', 'f_StringValue']].sort_values(by=['f_WellId','f_FieldID'])
df_wellsuserfieldsvalues

# Why are there wells with f_WellId = NaN with left and right joins? Check original data?
# Works fine with inner join but is this cutting out some wells?
# Show 2 copies of this table, one with well names and one without.

In [None]:
df_wellsuserfieldsvalues.isnull().sum()

In [None]:
# Delete all data from dbo.t_WellsUserFields

stmt = select([func.count(wellsuserfieldsvalues.columns.f_FieldID)])

# Execute the select statement and use the scalar() fetch method to save the record count
connection.execute(stmt).scalar()

# Delete all records from ? table
delete_stmt = delete(wellsuserfieldsvalues)
result_proxy = connection.execute(delete_stmt)

# Print affected row count
result_proxy.rowcount

# Print results of the executing statement to verify there are no rows
print(connection.execute(stmt).fetchall())

In [None]:
# Write entire df_wellsuserfieldsvalues to SQL Server database

df_wellsuserfieldsvalues.to_sql('t_WellsUserFieldsValues', engine, if_exists='replace', index = False)

print('dbo.t_WellsUserFieldsValues')
sqlselect_rows([wellsuserfieldsvalues])

# WHERE IS WELL ID 1? CHECK THAT 8991 IS THE LAST????

In [None]:
# INSERT STATEMENT

# # Build an insert statement to insert a record into the data table: insert_stmt
# insert_stmt = insert(wells).values(well_id=55, name='9/9-15', created=now)

# # Execute the insert statement via the connection: results
# results = connection.execute(insert_stmt)

# # Print result rowcount
# print(results.rowcount)

# # Build a select statement to validate the insert: select_stmt
# select_stmt = select([wells]).where(wells.columns.name == '9/9-15')

# # Print the result of executing the query.
# print(connection.execute(select_stmt).first())

# #repr(wells)

<h3>Lithostratigraphy</h3>

In [None]:
# Lithostrat available in two places. 
# Compare the length, and number of unique wells in both sources.

# (A) NPD FactPages > Wellbore > Table View > With > Lithostratigraphy
    # File: wellbore_formation_top.xlsx
    # Sheet: wellbore_formation_top
    # Link: https://factpages.npd.no/ReportServer_npdpublic?/FactPages/TableView/wellbore_formation_top&rs:Command=Render&rc:Toolbar=false&rc:Parameters=f&rs:Format=EXCEL&Top100=false&IpAddress=108.171.128.189&CultureCode=en

df_a = pd.read_excel('input data/wellbore_formation_top.xlsx', sheet_name='wellbore_formation_top')
print('Source A:', df_a.shape)
print(df_a['Wellbore name'].nunique())

# (B) NPD FactPages > Stratigraphy > Table View > Wellbores
    # File: strat_litho_wellbore.xlsx
    # Sheet: strat_litho_wellbore
    # Link: https://factpages.npd.no/ReportServer_npdpublic?/FactPages/TableView/strat_litho_wellbore&rs:Command=Render&rc:Toolbar=false&rc:Parameters=f&rs:Format=EXCEL&Top100=false&IpAddress=108.171.128.189&CultureCode=en

df_b = pd.read_excel('input data/strat_litho_wellbore.xlsx', sheet_name='strat_litho_wellbore')
print('Source B:', df_b.shape)
print(df_b['Wellbore name'].nunique())

# Both contain (almost) the same number of rows.
# Source A is preferrable as it has an exra column, 'Lithostrat. unit, parent'
# which will come in handy assigning parents to each text dictionary entry.

In [None]:
# Use Source A

# df_formation_top = pd.read_excel('https://factpages.npd.no/ReportServer_npdpublic?/FactPages/TableView/wellbore_formation_top&rs:Command=Render&rc:Toolbar=false&rc:Parameters=f&rs:Format=EXCEL&Top100=false&IpAddress=108.171.128.189&CultureCode=en', 
#                          sheet_name='wellbore_formation_top')

df_lithostrat = pd.read_excel('input data/wellbore_formation_top.xlsx', sheet_name='wellbore_formation_top')

# Print column titles
print("Lithostratigraphy wellbore well header column titles:")
print(list(df_lithostrat.columns))

In [None]:
(num_lithostrat_rows, num_lithostrat_cols) = df_lithostrat.shape
print('{} rows and {} columns in Exploration wells.'.format(num_lithostrat_rows, num_lithostrat_cols))

In [None]:
df_lithostrat.head()

In [None]:
df_lithostrat.tail()

In [None]:
# Rename columns for csv
rename_stratcols = {'Wellbore name' : 'Well',
                    'Top depth [m]' : 'Top depth',
                    'Bottom depth [m]' : 'Base depth',
                    'Lithostrat. unit' : 'Legend'
                    }

# Apply renaming to dataframe
df_lithostrat.rename(columns=rename_stratcols, inplace=True)

# Create new dataframe called "df_formation_top"
# Need to keep other columns df_lithostrat for later when writing to database

df_formation_top = df_lithostrat[['Well', 'Top depth', 'Base depth', 'Legend']]
df_formation_top.head()

In [None]:
# Output to file
file = 'output data/wellbore_formation_top.csv'
df_formation_top.to_csv(file, index=False)

# Check output
pd.read_csv(file).head(2)

<h3>SQL Lithostratigraphy</h3>

In [None]:
# RENAME FOR DATABASE!!!!!!!!!!!!!!!!!!!!!!

# Rename only the 4 columns that will be imported to the database
rename_stratcols = {'Top depth [m]' : 'top_depth',
                'Bottom depth [m]' : 'base_depth',
                'Lithostrat. unit' : 'legend',
                'NPDID wellbore' : 'well_id'}

# Apply renaming to dataframe
df_lithostrat.rename(columns=rename_stratcols, inplace=True)
df_lithostrat.head()

In [None]:
df_lithostrat.drop(labels=['well_id', 'NPDID lithostrat. unit', 'NPDID parent lithostrat. unit', 'Date updated', 'Date sync NPD'],
                   axis=1, inplace=True)
df_lithostrat.head()

In [None]:
# dbo.DATA_Lithostrat stores Gp, Fm and Mbrs.
# Use Level column to create data_type column, numering Group (110), Formation (111) and Members ().

def level_datatypes(row):
    if row == 'GROUP':
        return 110
    elif row == 'FORMATION':
        return 111
    elif row == 'MEMBER':
        return 112
    else:
        0

df_lithostrat['data_type'] = df_lithostrat['Level'].apply(level_datatypes)
df_lithostrat.head()

In [None]:
df_lithostrat = df_lithostrat[['well_id', 'data_type', 'top_depth', 'base_depth', 'legend']]
df_lithostrat.head()

In [None]:
# Insert new columns

df_lithostrat['symbol_id'] = 0
df_lithostrat['f_interpid'] = 0
df_lithostrat['creator'] = 1
df_lithostrat['modifier'] = 1
df_lithostrat['source'] = 'Python script'
df_lithostrat['attr'] = 'test' 
#'{"ZoneColour":-1,"ZoneColourIsIpAuto":true,"EventSymbolId":0,"IsLocked":false,"OriginalZoneIndex":0}'
df_lithostrat['top_boundary'] = 0
df_lithostrat['base_boundary'] = 0
df_lithostrat['created '] = now
df_lithostrat['modified'] = now
df_lithostrat['obsno'] = 0
df_lithostrat['mindepth'] = 0
df_lithostrat['maxdepth'] = 0
df_lithostrat['dipangle'] = 0
df_lithostrat['dipazimuth'] = 0
df_lithostrat['age'] = 0

# Additional columns in dbo.Lithostrat that don't need autopopulated
# top_age, base_age, owconf, owqual, owkind, owbaseconf, owbasequal, owbasekind, abr, interpreter, remark, geofeature

In [None]:
# df_lithostrat = df_lithostrat.merge(df_explo_nondefaultattributes, on='f_FieldName', how='inner')
# df_wellsuserfieldsvalues = df_wellsuserfieldsvalues[['f_WellId', 'f_FieldID', 'f_StringValue']].sort_values(by=['f_WellId','f_FieldID'])
# df_wellsuserfieldsvalues

In [None]:
# Why is 'created' column entirely NaN (count 35963 rows) after reindexing?

strat_order = ['well_id', 'data_type', 'top_depth', 'symbol_id', 'f_interpid', 'creator', 
              'modifier', 'source', 'attr', 'top_boundary', 'base_depth', 'base_boundary', 'legend', 
              'created', 'modified', 'obsno', 'mindepth', 'maxdepth', 'dipangle', 'dipazimuth', 'age']

df_lithostrat = df_lithostrat.reindex(columns=strat_order)
df_lithostrat.head()

# Will 'pk_index' be auto generated again? On this list it came after symbol_id.

In [None]:
# Where is "extended geological info" stored (Geologic Feature, Remark)
# Note that source column in DATA_Lithostrat is nvarchar(255)  - change seismic_line from 64 to 255?

#df_lithostrat['created'] = df_lithostrat['created'].apply(sqlalchemy.DateTime)
#df_lithostrat['modified'] = pd.to_datetime

#df_lithostrat[['created', 'modified']] = df_lithostrat[['created', 'modified']].apply(pd.to_datetime)
df_lithostrat.dtypes

In [None]:
#well_id and data_type columns should be int, not float
df_lithostrat.well_id = df_lithostrat.well_id.astype('int64')
df_lithostrat.data_type = df_lithostrat.data_type.astype('int64')

# When trying to insert 'created' and 'modified' columns to database, get error:
# TypeError: cannot astype a datetimelike from [datetime64[ns]] to [float64]
# Why are these floats in dbo.DATA_Lithostrat but in other tables like dbo.WELLS they're nvarchar/srt?
# Try importing vs manually adding created and modified fields in IC.
# See what it does with floats!

# Useful doc: Data type mappings between Python and SQL Server
# https://docs.microsoft.com/en-us/sql/advanced-analytics/python/python-libraries-and-data-types?view=sql-server-ver15

# Need to Cconvert datetime64[ns] column for 'created' and 'modified' to float64
# But you can't do that! TypeError: cannot astype a datetimelike from [datetime64[ns]] to [float64]

####df_lithostrat['created'] = df_lithostrat['created'].astype('float64')
####df_lithostrat['modified'] = df_lithostrat['modified'].astype('float64')

#df_lithostrat.dtypes

In [None]:
# Temporarily drop 'created' and 'modified' columns
df_lithostrat.drop(['created', 'modified'], axis=1, inplace=True)
df_lithostrat.head()

In [None]:
# Delete all data from dbo.DATA_Lithostrat
stmt = select([func.count(datalithostrat.columns.well_id)])

# Execute the select statement and use the scalar() fetch method to save the record count
connection.execute(stmt).scalar()

# Delete all records from table
delete_stmt = delete(datalithostrat)
result_proxy = connection.execute(delete_stmt)

# Print affected row count
result_proxy.rowcount

#Print results of the executing statement to verify there are no rows
print(connection.execute(stmt).fetchall())

In [None]:
# Write df_lithostrat to SQL Server database

df_lithostrat.to_sql('DATA_Lithostrat', engine, if_exists='replace', index = False)

print('dbo.DATA_Lithostrat')
sqlselect_rows([datalithostrat])

# df_lithostrat.to_sql(name='DATA_Lithostrat', con=engine, if_exists='append', index=False,
#             dtype={'created': sqlalchemy.DateTime(), 
#                    'modified': sqlalchemy.DateTime()})

<h3>Create IC Dynamic Projects</h3>

In [None]:
# Create new df using arrays
# Populate dbo.PROJECTS with relevant fields to create 4 new dynamic projects:
# ALL WELLS, NORWAY NORTH SEA, NORWAY NORWEGIAN SEA & NORWAY BARENTS SEA

# Note: six not null columns:
    # pk_index int e.g. 1, 
    # project_id int e.g. 113,
    # WellGroupFieldIsUserDefined bit, e.g. 0
    # WellOrderFieldsIsUserDefined bit, e.g. 0
    # f_dynamic bit e.g. 1 (for 
    # f_WellQueryId int (0 for non-dynamic or e.g. 129 for "Exploration Offshore" project with project_id = 21)
    
# IC dialogue requires you to enter a project "Title".

# IC auto-selects "Type: Static", "display units: metres", "Datum: MSL", "Geodatum: ED50", 
    #"Grid system: ED50 / UTM zone 30N". Display tab "Group wells by: [No Grouping]", "Order wells by: Name".

# Database auto-populates:  "TVD_datum: MSL", 
    #defchronodatatype: 0 (74 for several - probably links to Chronostratigraphy - Age), 
    #deftwtdata: NULL, 
    #defchronointerpid: 0, 
    #WellGroupField: 4,0,5,1, etc, 
    #WellGroupFieldIsUserDefined: 0, 
    #WellOrderField: 0,2, 
    #WellOrderFieldIsUserDefined: 0, 
    #WellPatternTable:0, 
    #WellPatternTableLayerField:0, 
    #WellPatternTablePolygonField:0, 
    #DefaultPRMTemplates:{}, 
    #DefaultSummaryCharts{}, 
    #DefaultWellstickTemplates:{}
    
# Is it OK just to overwrite the existing default IC project with these?

In [None]:
# Create dataframe for dynamic projects

dyprojects_data = {'project_id' : ['1', '2', '3', '4'],
                 'title' : 
                   ['ALL WELLS', 
                    'NORWAY NORTH SEA', 
                    'NORWAY NORWEGIAN SEA', 
                    'NORWAY BARENTS SEA'],
                   #'client' : ['', '', '', ''],
                   #'jobno' : ['', '', '', ''],
                   #'code' : ['', '', '', ''],
                #'notes' : ['', '', '', ''],
                'Units' : ['M', 'M', 'M', 'M'],
                'Map' : ['NULL', 'NULL', 'NULL', 'NULL'],
                'datum' : ['4230', '4230', '4230', '4230'],
                'utmzone' : 
                   ['ED50 / UTM zone 31N', 
                    'ED50 / UTM zone 31N', 
                    'ED50 / UTM zone 32N', 
                    'ED50 / UTM zone 34N'],
                'TVD_datum' : ['MSL', 'MSL', 'MSL', 'MSL'],
                'OWTranslation' : ['2', '2', '2', '2'],
                #'f_fieldname' : ['', '', '', ''],
                'defchronodatatype' : ['0', '0', '0', '0'],
                #'deftstprops' : ['', '', '', ''],
                'deftwtdata' : ['NULL', 'NULL', 'NULL', 'NULL'],
                'deffaultsdatatype' : ['0', '0', '0', '0'],
                'defchronointerpid' : ['0', '0', '0', '0'],
                'WellGroupField' : ['0', '0', '0', '0'],
                'WellGroupFieldIsUserDefined' : ['0', '0', '0', '0'],
                'WellOrderField' : ['0', '0', '0', '0'],
                'WellOrderFieldIsUserDefined' : ['0', '0', '0', '0'],
                'WellPatternTable' : ['NULL', 'NULL', 'NULL', 'NULL'],
                'WellPatternTableLayerField' : ['NULL', 'NULL', 'NULL', 'NULL'],
                'WellPatternTablePolygonField' : ['NULL', 'NULL', 'NULL', 'NULL'],
                #'RPMWellTypeField' : ['', '', '', ''],
                #'DefaultRPMTemplates' : ['', '', '', ''],
                'DefaultSummaryCharts' : ['{}', '{}', '{}', '{}'],
                'DefaultWellstickTemplates' : ['{}', '{}', '{}', '{}'],
                'f_dynamic' : ['1', '1', '1', '1'],
                'f_WellQueryId' : ['1', '2', '3', '4']}

# Create temp index, not sent to db
df_dyprojects = pd.DataFrame(dyprojects_data, index = ['project_1', 'project_2', 'project_3', 'project_4'])
df_dyprojects.head()

In [None]:
# Use just populated columns
#df_dyprojects_short = df_dyprojects.filter(['project_id', 'title', 'WellGroupFieldIsUserDefined', 'WellOrderFieldsIsUserDefined', 'f_dynamic', 'f_WellQueryId'], axis=1)
#df_dyprojects_short.head()

# To do: had trouble filtering this database (thinks its a list)

In [None]:
# Delete all data from dbo.PROJECTS
stmt = select([func.count(projects.columns.project_id)])

# Execute the select statement and use the scalar() fetch method to save the record count
connection.execute(stmt).scalar()

# Delete all records from ? table
delete_stmt = delete(projects)
result_proxy = connection.execute(delete_stmt)

# Print affected row count
result_proxy.rowcount

# Print results of the executing statement to verify there are no rows
print(connection.execute(stmt).fetchall())

In [None]:
# Write df_dyprojects to SQL Server database

df_dyprojects.to_sql('PROJECTS', engine, if_exists='replace', index = False)

print('dbo.PROJECTS')
sqlselect_rows([projects])

In [None]:
# Now create well queries to use with dynamic projects

wellqueries_data = {'invertresults' : ['0', '0', '0', '0'], 
                    'category' : ['ProjectFolder(1)', 'ProjectFolder(2)', 'ProjectFolder(3)', 'ProjectFolder(4)'], 
                    'query_id' : ['1', '2', '3', '4'], 
                    'project_id' : ['-1', '-1', '-1', '-1'], 
                    'title' : ['Country = NORWAY', 
                               'Location = North Sea', 
                               'Location = Norwegian Sea', 
                               'Location = Barents Sea'], 
                    'nentries' : ['1', '1', '1', '1'], 
                    'pencolour' : ['0', '0', '0', '0'], 
                    'enttype' : ['4', '4', '4', '4'], 
                    'entdatatype' : ['0', '0', '0', '0'], 
                    'entfunction' : ['=', '=', '=', '='], 
                    'entvalue' : ['NORWAY', 'NORTH SEA', 'NORWEGIAN SEA', 'BARENTS SEA'], 
                    'entinfokey' : ['Country', 'Location', 'Location', 'Location'], 
                    'highlightstyle' : ['1', '1', '1', '1'], 
                    'highlightsymbol' : ['4198', '4198', '4198', '4198']}

# Only pk_index is not null

# Create temp index, not sent to db
df_wellqueries = pd.DataFrame(wellqueries_data, index = ['wellquery_1', 'wellquery_2', 'wellquery_3', 'wellquery_4'])
df_wellqueries.head()

In [None]:
# Delete all data from dbo.WELLQUERIES
stmt = select([func.count(wellqueries.columns.query_id)])

# Execute the select statement and use the scalar() fetch method to save the record count
connection.execute(stmt).scalar()

# Delete all records from ? table
delete_stmt = delete(wellqueries)
result_proxy = connection.execute(delete_stmt)

# Print affected row count
result_proxy.rowcount

# Print results of the executing statement to verify there are no rows
print(connection.execute(stmt).fetchall())

# CAN I TURN THE ABOVE INTO A FUNCTION, AS I KEEP RE-USING THE SAME CODE?

In [None]:
# Write df_dyprojects to SQL Server database

df_wellqueries.to_sql('WELLQUERIES', engine, if_exists='replace', index = False)

print('dbo.WELLQUERIES')
sqlselect_rows([wellqueries])

<h3> PROJECTWELLS (which wells in which Projects)</h3>

In [None]:
# First lets show all the dataframes used in this notebook
%whos DataFrame

In [None]:
# Which of these are written to database via pandas.DataFrame.to_sql:
    #df_explo_dbowells (still to do the same for dev wells?)
    #df_wellsuserfields
    #df_wellsuserfieldsvalues
    #df_lithostrat
    #df_dyprojects
    #df_wellqueries

In [None]:
# Look at first row of wells and project dataframes
# Merge on project = project_id?

df_explo_dbowells.head(1)

In [None]:
df_dyprojects.head(1)

In [None]:
# Need to populate the project column in dbo.WELLS
# This appears empty just now because none of my wells have projects.
# Create a function that populates a project number based on well name

#df_explo_dbowells.location.unique()
#df_explo_dbowells.location.isnull().sum()

def location_projectid(row):
    if row == 'NORTH SEA':
        return 2
    elif row == 'NORWEGIAN SEA':
        return 3
    elif row == 'BARENTS SEA':
        return 4
    else:
        0

df_explo_dbowells['project'] = df_explo_dbowells['location'].apply(location_projectid)
df_explo_dbowells.head()

In [None]:
# The columns you're joining on (left_on and right_on) must be the same data type! 

# In the SQL database, project in WELLS is nvarchar(25) while project_id in PROJECTS is int.
# In this pandas dataframes I'm writing to the database, these 2 columns are float64 and object respectively.
# Sort this by doing a complete review of datatypes that I'm writing to_sql, and ensure each is correct wrt. the database.

df_explo_dbowells['project'] = df_explo_dbowells['project'].astype(str)
df_dyprojects['project_id'] = df_dyprojects['project_id'].astype(str)

print("df_explo_dbowells['project']: ", df_explo_dbowells['project'].dtypes)
print("df_dyprojects['project_id']: ", df_dyprojects['project_id'].dtypes)

# Because I created all wells with no project, they're all null/NaN so can't convert the column to int! Produced error:
# ValueError: invalid literal for int() with base 10: 'nan'
# So I converted them both to str (i.e. object?), the Python equivalent of nvarchar which seemed to work.


In [None]:
#dbo.PROJECTWELLS requires 3 columns (all not null):
    #pk_index 11 PK int not null
    #well_id FK 2 int not null
    #project_id 2 FK int not null
    
# Needs to join dbo.WELLS and dbo.PROJECTS?

df_projectwells = df_explo_dbowells.merge(df_dyprojects, left_on='project', right_on='project_id', how='inner')
#Error - ValueError: You are trying to merge on float64 and object columns. If you wish to proceed you should use pd.concat

df_projectwells[['name', 'well_id', 'project', 'project_id']]

In [None]:
df_projectwells = df_projectwells[['well_id', 'project_id']]

In [None]:
df_projectwells[df_projectwells.project_id == '3']

In [None]:
# Delete all data from dbo.WELLQUERIES
stmt = select([func.count(projectwells.columns.project_id)])

# Execute the select statement and use the scalar() fetch method to save the record count
connection.execute(stmt).scalar()

# Delete all records from ? table
delete_stmt = delete(projectwells)
result_proxy = connection.execute(delete_stmt)

# Print affected row count
result_proxy.rowcount

# Print results of the executing statement to verify there are no rows
print(connection.execute(stmt).fetchall())

In [None]:
df_projectwells.to_sql('PROJECTWELLS', engine, if_exists='replace', index = False)

print('dbo.PROJECTWELLS')
sqlselect_rows([projectwells])