# Finding All Unique Wells


This file is going to parse through all of the wellbore data to find all the unique wells that are being used. The reason why we do this is to make sure that no well is counted more than once. Otherwise we may messing up the overall data.

The way we are going to do this is by going through each of the different files and extract all the unique well names and store them in new files. Then off of these files, the program will determine all the unique wells for all the given data.

### Helper Functions

Before we start parsing the data, we need to initialize some functions that will help deal with the way that wellbore names are saved in each of the different wellbore data files.

In [1]:
# A function that takes in a string containing the characters: "/" or "-"
# and replaces it with blank spaces

def formatWellName1(name):
    temp = name
    temp = temp.replace("/", "_")
    temp = temp.replace("-", "_")
    
    space = 0
    for i in range(0,len(temp)):
        if(temp[i] == ' '):
            break
        space += 1
    if(space != len(temp)):
        temp = temp[0:space]
    
    return temp

In [2]:
# A function that will in a string and will cut out everything up until it finds
# "__"

def formatWellName2(name):
    temp = name
    lastNum = temp.find("__")
            
    temp = temp[0:lastNum]
    
    return temp

In [3]:
import numpy as np
import pandas as pd

from pandas import read_excel

# Change this to whatever extension you need to work on your machine
extension = "Wellbore_Data\\"

Now we start combing through the different wellbore datasheets to get the unique wells from the wellbore data. 

The way we do this is by going through the given file, looking at the column containing the wellbore name, and storing it in a separate dataframe containing the unique wellbore names for that given file.

Before inserting the current wellbore name into the list of unique wellbore names, the program will check to make sure the name is not already present. If it is, then it won't be inserted, if it's not, then it will be ignored.

In [4]:
# Dealing with the core images

file_name = extension + 'public_core_images_excel_recording.xlsx'
df = read_excel(file_name)

# This will store the unique names of the current file that will then be
# turned into an dataframe.
unique = set([])

# Because of the way that wellbore names are stored in these files, we need
# to use the function to format the name into a more easily manageble state.
for i in range (0, len(df)):
    wellName = df["6406_3_2"][i]
    unique.add(formatWellName1(str(wellName)))
    
# By this point, unique, will contain a set of all the unique names of the file
# so it will just put all that data into a dataframe that will then be exported
# to an excel sheet.

exportDF = pd.DataFrame(columns = ['WellName'])
temp = list(unique)
for i in range(0, len(temp)):
    exportDF = exportDF.append({'WellName': temp[i]}, ignore_index = True)

exportDF.to_csv("unique_public_core_images_excel_recording.csv")

After this cell, everything is the same.

In [5]:
# Dealing with the core photos

file_name = extension + 'core_photos.xlsx'
df = read_excel(file_name)

unique = set([])

for i in range (0, len(df)):
    wellName = df["Well Name"][i]
    unique.add(formatWellName1(str(wellName)))

exportDF = pd.DataFrame(columns = ['WellName'])
temp = list(unique)
for i in range(0, len(temp)):
    exportDF = exportDF.append({'WellName': temp[i]}, ignore_index = True)

exportDF.to_csv("unique_core_photos.csv")

In [6]:
# Dealing with the core descriptions

file_name = extension + 'Digitized_core_descriptions.xlsx'
df = read_excel(file_name)

unique = set([])

for i in range (0, len(df)):
    wellName = df["WellName"][i]
    unique.add(formatWellName1(str(wellName)))

exportDF = pd.DataFrame(columns = ['WellName'])
temp = list(unique)
for i in range(0, len(temp)):
    exportDF = exportDF.append({'WellName': temp[i]}, ignore_index = True)

exportDF.to_csv("unique_core_descriptions.csv")

In [7]:
# Dealing with the lithology reports

file_name = extension + 'GEOLINK_Lithology.xlsx'
df = read_excel(file_name)

unique = set([])

for i in range (0, len(df)):
    wellName = df["Well Name"][i]
    unique.add(formatWellName1(str(wellName)))

exportDF = pd.DataFrame(columns = ['WellName'])
temp = list(unique)
for i in range(0, len(temp)):
    exportDF = exportDF.append({'WellName': temp[i]}, ignore_index = True)

exportDF.to_csv("unique_GEOLINK.csv")

In [8]:
# Dealing with the NPD reports

file_name = extension + 'NPD_stratigraphic_picks_north_sea.xlsx' 
df = read_excel(file_name)

unique = set([])

for i in range (0, len(df)):
    wellName = df["Well identifier"][i]
    unique.add(formatWellName1(str(wellName)))

exportDF = pd.DataFrame(columns = ['WellName'])
temp = list(unique)
for i in range(0, len(temp)):
    exportDF = exportDF.append({'WellName': temp[i]}, ignore_index = True)

exportDF.to_csv("unique_GEOLINK.csv")

In [9]:
# Dealing with a different set of completion logs

file_name = extension + 'RealLogs_completion_logs.xlsx' 
df = read_excel(file_name)

unique = set([])

for i in range (0, len(df)):
    wellName = df["Well Name"][i]
    unique.add(formatWellName1(str(wellName)))

exportDF = pd.DataFrame(columns = ['WellName'])
temp = list(unique)
for i in range(0, len(temp)):
    exportDF = exportDF.append({'WellName': temp[i]}, ignore_index = True)

exportDF.to_csv("unique_RealLogs_completion.csv")

In [10]:
# Dealing with more lithology reports

file_name = extension + 'RealPore_Por_Perm_Lithology_data_1240_Wells_Norway_public.xlsx'
df = read_excel(file_name)

unique = set([])

for i in range (0, len(df)):
    wellName = df["Well Name"][i]
    unique.add(formatWellName1(str(wellName)))

exportDF = pd.DataFrame(columns = ['WellName'])
temp = list(unique)
for i in range(0, len(temp)):
    exportDF = exportDF.append({'WellName': temp[i]}, ignore_index = True)

exportDF.to_csv("unique_RealLogs_Por_Perm.csv")

In [11]:
# Dealing with unsorted Wellbore data

file_name = extension + 'Discovery_v1.xlsx'
df = read_excel(file_name)

unique = set([])

for i in range (0, len(df)):
    wellName = df["dscName"][i]
    unique.add(formatWellName1(str(wellName)))

exportDF = pd.DataFrame(columns = ['WellName'])
temp = list(unique)
for i in range(0, len(temp)):
    exportDF = exportDF.append({'WellName': temp[i]}, ignore_index = True)

exportDF.to_csv("unique_discovery.csv")


In [12]:
file_name = extension + 'Norwegian_shows__All_data__CC-BY_v1.xlsx'
df = read_excel(file_name)

unique = set([])

for i in range (0, len(df)):
    wellName = df["Filename"][i]
    unique.add(formatWellName2(str(wellName)))

exportDF = pd.DataFrame(columns = ['WellName'])
temp = list(unique)
for i in range(0, len(temp)):
    exportDF = exportDF.append({'WellName': temp[i]}, ignore_index = True)

exportDF.to_csv("unique_Norwegian_shows.csv")


Now the last thing this notebook will do is set up the dataframe for the next notebook, Finding the Most Common Wells, to do it's work.

The program will compile a list of all the unique wells throughout all the data and set up a blank dataframe that will contain all the unique wells and columns representing all the different files the data came from.

In [13]:
files_to_read = ['unique_core_descriptions.csv',
                 'unique_core_photos.csv',
                 'unique_discovery.csv',
                 'unique_GEOLINK.csv',
                 'unique_Norwegian_shows_v2.csv',
                 'unique_public_core_images_excel_recording.csv',
                 'unique_RealLogs_Por_Perm.csv',
                 'unique_RealLogs_completion.csv',
                 'unique_stratigraphic_picks.csv']
temp_extension = 'Unique_excels\\'

unique = set([])

# Here we are going through all the different unique files that were generated
# in the previous cells, and find all the unique wells from those lists.
for i in range (0, len(files_to_read)):
    file = temp_extension + files_to_read[i]
    df = pd.read_csv(file)

    for i in range (0, len(df)):
        wellName = df["WellName"][i]
        unique.add(formatWellName1(str(wellName)))

# At this point, unique will contain all the unique well names. So now to construct
# the blank dataframe
exportDF = pd.DataFrame(columns = ['WellName'])
temp = list(unique)

# Lastly, insert all the unique well names into a dataframe and then export.
for i in range(0, len(temp)):
    exportDF = exportDF.append({'WellName': temp[i], 'Occurances': 0}, ignore_index = True)

exportDF.to_csv("All_Unique_Wells_blank.csv")