# A notebook for Checking Available Data
## Identify wells, tops, & curves that can be used in model

------------------------

## Goals for this notebook
- 1.A. Code to identify what tops are common enough to be included
- 1.B. Code to identify what curves are common enough to be included
- 1.C. Create list of wells that include necessary tops and curves based on two steps listed above
- 1.D Document what wells were not included in training and why!
- 1.E. Write to a file a initial dataset of wells to include and tops to include.
- 2.A. Split dataset in step above into 80/20 train/test subsets and label as such with a new column based on wells not rows
- 2.B There are many more "not near a pick" rows that at or near a pick rows. We'll need to create a column for what rows in the "away from pick" category to exclude in order to have balanced classes.

## THINGS YOU MIGHT NEED TO CHANGE IN THIS NOTEBOOK!
1. Links to the various files!
2. Decide what the minimum number of tops that need to be present before you can work with that top!
3. Decide what the minimum number of wells that have a curve name need to be before that curve is included as a required curve to include a well in the dataset you'll work with.

# Import necessary libraries

In [1]:
import pandas as pd
import numpy as np
import itertools
import matplotlib.pyplot as plt
%matplotlib inline
import welly
from welly import Well
import lasio
import glob
from sklearn import neighbors
import pickle
import math
import dask
import dask.dataframe as dd
from dask.distributed import Client
from dask import delayed
from dask import compute
# import pdvega
# import vega
import dask.dataframe as dd
from dask.distributed import Client


In [2]:
print(welly.__version__)
print(dask.__version__)
print(pd.__version__)

0.3.5
0.17.5
0.23.0


In [3]:
%%timeit
import os
env = %env

87.8 µs ± 5.75 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [4]:
from IPython.display import display
#### Had to change display options to get this to print in full!
# pd.set_option('display.height', 1000)
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)
pd.options.display.max_colwidth = 100000

## Import Initial Data

#### YOU'LL WANT TO CHANGE THESE LINKS IF YOU USE DIFFERENT DATA OR CHANGE LOCATION OF NOTEBOOK OR DATA

In [5]:
picks_dic = pd.read_csv('../../../SPE_006_originalData/OilSandsDB/PICKS_DIC.TXT',delimiter='\t')
picks = pd.read_csv('../../../SPE_006_originalData/OilSandsDB/PICKS.TXT',delimiter='\t')
wells = pd.read_csv('../../../SPE_006_originalData/OilSandsDB/WELLS.TXT',delimiter='\t')
gis = pd.read_csv('../../../well_lat_lng.csv')

In [6]:
picks.head()

Unnamed: 0,SitID,HorID,Pick,Quality
0,102496,1000,321.0,1
1,102496,2000,,-1
2,102496,3000,,-1
3,102496,4000,,-1
4,102496,5000,438.0,2


In [7]:
picks_dic

Unnamed: 0,HorID,Descriptopn
0,1000,mannville
1,2000,t61
2,3000,t51
3,4000,t41
4,5000,t31
5,6000,clw_wab
6,7000,t21
7,8000,e20
8,9000,t15
9,9500,e14


In [8]:
wells.head()

Unnamed: 0,SitID,UWI (AGS),UWI
0,102496,674010812000,00/12-08-067-01W4/0
1,102497,674020807000,00/07-08-067-02W4/0
2,102498,674021109000,00/09-11-067-02W4/0
3,102500,674022910000,00/10-29-067-02W4/0
4,102501,674023406000,00/06-34-067-02W4/0


In [9]:
gis.head()

Unnamed: 0,SitID,UWI (AGS),UWI,HorID,Pick,Quality,lat,lng
0,102496,674010812000,00/12-08-067-01W4/0,13000,475,3,54.785907,-110.12932
1,102497,674020807000,00/07-08-067-02W4/0,13000,515,3,54.782284,-110.269446
2,102498,674021109000,00/09-11-067-02W4/0,13000,480,3,54.785892,-110.186851
3,102500,674022910000,00/10-29-067-02W4/0,13000,549,3,54.829624,-110.269422
4,102501,674023406000,00/06-34-067-02W4/0,13000,529,2,54.840471,-110.224832


## Question 1: How many wells are included for each top? 

In [10]:
listOfTops = picks_dic.HorID.unique()

In [11]:
listOfTops

array([ 1000,  2000,  3000,  4000,  5000,  6000,  7000,  8000,  9000,
        9500, 10000, 11000, 12000, 13000, 14000])

### We'll use the fact that absent picks are categorized as -1 in terms of quality to exclude those and then count the rest that remain. You input files might require a different methodology!

In [63]:
#### produces dataframe with no picks that have a value of zero
noZeroPicks = picks[picks.Pick != 0]
#### produces dataframe that doesn't have any picks with a quality of negative one, meaning not to be trusted or present
noNullPicks = noZeroPicks[noZeroPicks.Quality != -1]
#### produces dataframe of horID and counts of non-zero,non-null picks
pick_counts = noNullPicks.groupby('HorID').SitID.count()

In [55]:
pick_counts

HorID
1000     1903
2000      517
3000      531
4000      597
5000     2188
6000      461
7000     2191
9000     2184
9500     2184
10000    2187
11000    2184
12000    2182
13000    2184
14000    2169
Name: SitID, dtype: int64

In [56]:
#### The total number of wells with any sort of pick is:
wells_with_picks_array = picks.SitID.unique()
print("number of wells with picks of some sort is: ",len(wells_with_picks_array))

number of wells with picks of some sort is:  2193


### A human decision is required to determine the minimum umber of tops needed to do anything with that top. For our purposes, we'll limit to those with at least 1900

### We're most interested in wells that have both the Top and Base McMurry picks, so let's see how many wells have both and get that list of wells.

In [57]:
topsMustHave = [13000,14000]

#### Idea for this task:
- Make a list of wells for each top in the topsMustHave list
- Find the wells that exist in all of the lists

In [58]:
def findWellsThatHaveCertainTop(top):
    #### Takes in top
    #### Returns a list of wells with the given top
    #print(top)
    rows_with_picks = picks[picks.Quality != -1]
    #print(rows_with_picks[0:4])
    rows_with_that_top = list(rows_with_picks.loc[rows_with_picks['HorID'] == top].SitID.unique())
    #print("before return",rows_with_that_top)
    return rows_with_that_top

In [59]:
def findWellsWithAllTopsGive(tops):
    #### Takes in a list of tops
    #### Returns a list of wells that include all of those tops. If only one top occurs, well is not included
    list_of_wells_with_tops = []
    for top in tops:
        list_of_wells_with_tops.append(findWellsThatHaveCertainTop(top))
    result = set(list_of_wells_with_tops[0])
    for s in list_of_wells_with_tops[1:]:
        result.intersection_update(s)
    return list(result)

In [60]:
wells_with_all_given_tops = findWellsWithAllTopsGive(topsMustHave)

In [61]:
len(wells_with_all_given_tops)

2164

In [62]:
wells_with_all_given_tops

[106501,
 106503,
 106507,
 106508,
 163855,
 106512,
 163860,
 106518,
 106519,
 122903,
 106524,
 106526,
 106527,
 106529,
 106532,
 106533,
 106534,
 106537,
 106538,
 106540,
 106544,
 106547,
 106548,
 106552,
 106553,
 106554,
 106555,
 106556,
 106558,
 114753,
 106562,
 114754,
 114755,
 106565,
 114756,
 114758,
 106569,
 114761,
 106573,
 163919,
 114768,
 106577,
 163921,
 163923,
 106580,
 114772,
 114773,
 163925,
 106584,
 114779,
 114783,
 114784,
 106594,
 114789,
 122982,
 106600,
 106602,
 114795,
 114799,
 106608,
 114801,
 106613,
 114805,
 106617,
 114810,
 114811,
 106621,
 106626,
 114819,
 114822,
 106633,
 106635,
 114829,
 123024,
 114833,
 114834,
 163985,
 106644,
 106645,
 114836,
 114838,
 163986,
 106649,
 106650,
 106651,
 106652,
 114842,
 114843,
 114847,
 106656,
 114849,
 114850,
 106659,
 114844,
 106661,
 106662,
 106663,
 114855,
 106665,
 114857,
 106669,
 114861,
 114862,
 114864,
 106673,
 106674,
 114865,
 106676,
 114869,
 106679,
 106680,
 

# Import the logs and see how common each curve name is

In [20]:
def findAllCurvesInGivenWells(path):
    objectOfCurves = {}
    for fn in glob.glob(path):
        las = lasio.read(fn, ignore_data=True)
        mnemonics = [c.mnemonic for c in las.curves]
        fnShort = fn.replace("../../../SPE_006_originalData/OilSandsDB/Logs/","")
        objectOfCurves[fnShort] = mnemonics
    #print(fn + '\n\t' + '\n\t'.join(mnemonics))
    return objectOfCurves
    


In [21]:
las_path = '../../../SPE_006_originalData/OilSandsDB/Logs/*.LAS'

In [22]:
objectOfCurves = findAllCurvesInGivenWells(las_path)

Header section Parameter regexp=~P was not found.


In [23]:
print(objectOfCurves)

{'00-07-27-073-18W4-0.LAS': ['DEPT', 'GR', 'DPHI', 'NPHI', 'ILD'], '00-10-21-078-26W4-0.LAS': ['DEPT', 'DELT', 'GR', 'ILD'], '00-10-16-092-19W4-0.LAS': ['DEPT', 'CALI', 'GR', 'NPHI', 'DPHI', 'ILD'], 'AA-10-13-091-12W4-0.LAS': ['DEPT', 'GR', 'DPHI', 'ILD'], '00-07-35-078-10W4-0.LAS': ['DEPT', 'RHOB', 'GR', 'DPHI', 'NPHI', 'ILD'], 'AB-06-17-094-09W4-0.LAS': ['DEPT', 'GR', 'ILD', 'NPHI'], 'AA-15-36-096-11W4-0.LAS': ['DEPT', 'GR', 'ILD', 'NPHI', 'RHOB', 'DPHI'], '00-07-11-078-02W5-0.LAS': ['DEPT', 'ILD', 'NPHI', 'DPHI', 'CALI', 'GR'], '00-10-26-076-09W4-0.LAS': ['DEPT', 'GR', 'DPHI', 'NPHI', 'ILD'], '00-10-08-067-17W4-0.LAS': ['DEPT', 'ILD', 'DPHI', 'NPHI', 'GR', 'CALI'], '00-15-20-072-21W4-0.LAS': ['DEPT', 'DPHI', 'NPHI', 'GR', 'CALI', 'ILD'], '00-11-04-092-18W4-0.LAS': ['DEPT', 'CALI', 'GR', 'NPHI', 'DPHI', 'ILD'], '00-07-36-075-23W4-0.LAS': ['DEPT', 'ILD', 'DPHI', 'NPHI', 'GR', 'CALI'], '00-06-32-069-04W4-0.LAS': ['DEPT', 'ILD', 'DPHI', 'NPHI', 'GR'], '00-11-09-079-15W4-0.LAS': ['DEPT',

In [24]:
def countsOfCurves(objectOfCurves):
    listOfListOfCurves = objectOfCurves.values()
    startList = []
    for listI in listOfListOfCurves:
        startList = startList+listI
    uniq_CurvesList = set(startList)
    countsOfCurves = {}
    for eachCurve in uniq_CurvesList:
        countsOfCurves[eachCurve] = startList.count(eachCurve)
    return countsOfCurves

In [25]:
countsOfCurves = countsOfCurves(objectOfCurves)

In [26]:
countsOfCurves

{'DPHI:1': 1,
 'NPHI': 2008,
 'CALI': 783,
 'DT': 14,
 'GR:2': 1,
 'LLD': 2,
 'DELT': 98,
 'PHIN': 4,
 'DEPTH': 7,
 'DPHI': 1917,
 'IL': 2,
 'LLS': 1,
 'SN': 1,
 'RESD': 6,
 'SFL': 3,
 'RHOB': 132,
 'SNP': 2,
 'GR': 2169,
 'ILD': 2154,
 'COND': 3,
 'DPHI:2': 1,
 'SP': 14,
 'LITH': 1,
 'ILD:2': 1,
 'ILD:1': 1,
 'DEPT': 2164,
 'RT': 1,
 'ILM': 6,
 'PHID': 8,
 'GR:1': 1,
 'SFLU': 6,
 'DENS': 4}

### One thing to note is that there are some curves with slightly different names that might be the exact same thing.
For example, GR:1 might be identical to GR or it might different enough we wouldn't want to treat them the same way. I don't have information on that, so I'll just skip the GR:1 gamma-ray wells.
    
### However, there are seven wells with DEPTH instead of DEPT. I'll include those for now as that's just a spelling difference but I will need to remember to change the name of that column later when I import !

### Let's set the minimum number of wells we want to have the common curves to be 1900. If your dataset is different, you'll likely want to change this number

In [27]:
minNumberCurves = 1900

In [28]:
def getCurvesInMinNumberOfWells(minNumberCurves,countsOfCurves):
    #### Takes in a minmum number of wells that need to have specific curves and an object where keys are curve names and values is the count of that curves across all wells.
    #### Returns an array of curve names that are found in at least the given number of wells.
    curvePlusCountArray = countsOfCurves.items()
    onlyPlentifulCurvesArray = []
    for curve in curvePlusCountArray:
        if curve[1] > minNumberCurves:
            onlyPlentifulCurvesArray.append(curve[0])
    return onlyPlentifulCurvesArray

In [29]:
plentifulCurves = getCurvesInMinNumberOfWells(minNumberCurves,countsOfCurves)

In [30]:
plentifulCurves

['NPHI', 'DPHI', 'GR', 'ILD', 'DEPT']

### Now lets find all the wells that have all of those curves!

In [31]:
def findWellsWithCertainCurves(objectOfCurves,plentifulCurves):
    #### Function takes in an object with keys that are well names and values that are all curves in that well and as the second argument an array of plentiful curves expected to be in every well
    #### Function returns an array of wells that have the specified curves in the second argument.
    wellsWithWantedCurves = []
    for eachWell in objectOfCurves.keys():
        if set(plentifulCurves).issubset(objectOfCurves[eachWell]):
            wellsWithWantedCurves.append(eachWell)
    return wellsWithWantedCurves

In [32]:
wellsWithNeededCurvesList = findWellsWithCertainCurves(objectOfCurves,plentifulCurves)

In [33]:
print("number of wells with all the required curves is",len(wellsWithNeededCurvesList))

number of wells with all the required curves is 1848


### NOTE! when we import the wells for real, we should add in the wells that have DEPTH instead of DEPT and rename the curve to DEPT!
Those wells are....

In [34]:
print(plentifulCurves)

['NPHI', 'DPHI', 'GR', 'ILD', 'DEPT']


In [35]:
def getCurvesListWithDifferentCurveName(originalCurveList,origCurve,newCurve):
    #### Takes in list of curves, curve name to be replaced, and curve name to replace with.
    #### Returns a list with the orginal and new curve names switched in the given curve list
    plentifulCurves_wDEPTH = originalCurveList.copy()
    plentifulCurves_wDEPTH.remove(origCurve)
    plentifulCurves_wDEPTH.append(newCurve)
    return plentifulCurves_wDEPTH

In [36]:
newCurveList = getCurvesListWithDifferentCurveName(plentifulCurves,'DEPT','DEPTH')
newCurveList

['NPHI', 'DPHI', 'GR', 'ILD', 'DEPTH']

In [37]:
wellsWithNeededCurvesListButDEPTHinsteadDEPT = findWellsWithCertainCurves(objectOfCurves,newCurveList)
print("number of wells with all the required curves but DEPTH instead of DEPT is",len(wellsWithNeededCurvesListButDEPTHinsteadDEPT))

number of wells with all the required curves but DEPTH instead of DEPT is 0


Hmmm, zero? Let's see if we can get those 7 wells that we know have DEPTH instead of DEPT to appear if we reduce the other curve names?

In [38]:
wellsWithNeededCurvesListButDEPTHinsteadDEPT = findWellsWithCertainCurves(objectOfCurves,['GR','DEPTH'])
print("number of wells with all the required curves but DEPTH instead of DEPT is",len(wellsWithNeededCurvesListButDEPTHinsteadDEPT))

number of wells with all the required curves but DEPTH instead of DEPT is 7


In [39]:
wellsWithNeededCurvesListButDEPTHinsteadDEPT = findWellsWithCertainCurves(objectOfCurves,['GR','DEPT'])
print("number of wells with all the required curves but DEPTH instead of DEPT is",len(wellsWithNeededCurvesListButDEPTHinsteadDEPT))

number of wells with all the required curves but DEPTH instead of DEPT is 2162


In [40]:
wellsWithNeededCurvesListButDEPTHinsteadDEPT = findWellsWithCertainCurves(objectOfCurves,['ILD', 'NPHI', 'GR','DEPT'])
print("number of wells with all the required curves but DEPTH instead of DEPT is",len(wellsWithNeededCurvesListButDEPTHinsteadDEPT))

number of wells with all the required curves but DEPTH instead of DEPT is 2000


In [41]:
wellsWithNeededCurvesListButDEPTHinsteadDEPT = findWellsWithCertainCurves(objectOfCurves,['ILD', 'GR', 'DPHI','DEPT'])
print("number of wells with all the required curves but DEPTH instead of DEPT is",len(wellsWithNeededCurvesListButDEPTHinsteadDEPT))

number of wells with all the required curves but DEPTH instead of DEPT is 1911


In [42]:
wellsWithNeededCurvesListButDEPTHinsteadDEPT = findWellsWithCertainCurves(objectOfCurves,['ILD', 'GR', 'DEPT'])
print("number of wells with all the required curves but DEPTH instead of DEPT is",len(wellsWithNeededCurvesListButDEPTHinsteadDEPT))

number of wells with all the required curves but DEPTH instead of DEPT is 2153


In [43]:
wellsWithNeededCurvesListButDEPTHinsteadDEPT = findWellsWithCertainCurves(objectOfCurves,['ILD', 'GR', 'DEPTH'])
print("number of wells with all the required curves but DEPTH instead of DEPT is",len(wellsWithNeededCurvesListButDEPTHinsteadDEPT))

number of wells with all the required curves but DEPTH instead of DEPT is 0


In [44]:
wellsWithNeededCurvesListButDEPTHinsteadDEPT = findWellsWithCertainCurves(objectOfCurves,['ILD', 'NPHI', 'GR', 'DPHI', 'DEPT'])
print("number of wells with all the required curves but DEPTH instead of DEPT is",len(wellsWithNeededCurvesListButDEPTHinsteadDEPT))

number of wells with all the required curves but DEPTH instead of DEPT is 1848


#### Analysis

It appears like the number of wells available if we only use wells that have 'ILD', 'NPHI', 'GR', 'DPHI' plus a depth curve is 1848 vs. 2153 if we use only GR and ILD and depth.

In prevous runs, only ILD and GR were treated as mandatory. IT is probably worth it to try both ways, population one would have ~2150 wells and population two would have only ~1850 wells but two density logs that related to porisity, NPHI and DPHI.

Some notes on DPHI and NPHI logs <a href="http://www.pe.tamu.edu/blasingame/data/z_zCourse_Archive/P663_10B/P663_Schechter_Notes/PETE_663_DEN_NEUTR.pdf">here</a>.

#### Given that in earlier runs I think I just did ILD & GR, I'll do 'ILD', 'NPHI', 'GR', 'DPHI' this time!

In [45]:
wellsWithNeededCurvesList_real = findWellsWithCertainCurves(objectOfCurves,['ILD', 'NPHI', 'GR', 'DPHI', 'DEPT'])
print("number of wells with all the required curves but DEPTH instead of DEPT is",len(wellsWithNeededCurvesList_real))

number of wells with all the required curves but DEPTH instead of DEPT is 1848


# Make list of wells that includes both the minimum required curves & minimum required tops

In [46]:
wells_with_all_given_tops

[106501,
 106503,
 106507,
 106508,
 163855,
 106512,
 163860,
 106518,
 106519,
 122903,
 106524,
 106526,
 106527,
 106529,
 106532,
 106533,
 106534,
 106537,
 106538,
 106540,
 106544,
 106547,
 106548,
 106552,
 106553,
 106554,
 106555,
 106556,
 106558,
 114753,
 106562,
 114754,
 114755,
 106565,
 114756,
 114758,
 106569,
 114761,
 106573,
 163919,
 114768,
 106577,
 163921,
 163923,
 106580,
 114772,
 114773,
 163925,
 106584,
 114779,
 114783,
 114784,
 106594,
 114789,
 122982,
 106600,
 106602,
 114795,
 114799,
 106608,
 114801,
 106613,
 114805,
 106617,
 114810,
 114811,
 106621,
 106626,
 114819,
 114822,
 106633,
 106635,
 114829,
 123024,
 114833,
 114834,
 163985,
 106644,
 106645,
 114836,
 114838,
 163986,
 106649,
 106650,
 106651,
 106652,
 114842,
 114843,
 114847,
 106656,
 114849,
 114850,
 106659,
 114844,
 106661,
 106662,
 106663,
 114855,
 106665,
 114857,
 106669,
 114861,
 114862,
 114864,
 106673,
 106674,
 114865,
 106676,
 114869,
 106679,
 106680,
 

In [47]:
wellsWithNeededCurvesList_real

['00-07-27-073-18W4-0.LAS',
 '00-10-16-092-19W4-0.LAS',
 '00-07-35-078-10W4-0.LAS',
 'AA-15-36-096-11W4-0.LAS',
 '00-07-11-078-02W5-0.LAS',
 '00-10-26-076-09W4-0.LAS',
 '00-10-08-067-17W4-0.LAS',
 '00-15-20-072-21W4-0.LAS',
 '00-11-04-092-18W4-0.LAS',
 '00-07-36-075-23W4-0.LAS',
 '00-06-32-069-04W4-0.LAS',
 '00-11-09-079-15W4-0.LAS',
 '00-08-29-077-10W4-0.LAS',
 'AA-15-14-101-13W4-0.LAS',
 '00-10-20-075-18W4-0.LAS',
 '00-10-27-078-07W4-0.LAS',
 '00-10-08-076-06W4-0.LAS',
 '00-10-09-071-26W4-0.LAS',
 '00-10-03-094-21W4-0.LAS',
 '00-13-26-080-22W4-0.LAS',
 '00-14-03-093-19W4-0.LAS',
 '00-06-07-073-06W4-0.LAS',
 '00-11-16-095-18W4-0.LAS',
 'AA-10-23-082-18W4-0.LAS',
 '00-06-32-073-09W4-0.LAS',
 'AA-12-12-099-15W4-0.LAS',
 '00-05-25-081-03W5-0.LAS',
 '00-10-23-081-20W4-0.LAS',
 '00-11-26-078-01W5-0.LAS',
 'AB-10-18-096-10W4-0.LAS',
 '00-15-31-084-14W4-0.LAS',
 '00-11-29-074-01W4-0.LAS',
 'AA-10-18-098-10W4-0.LAS',
 '00-10-30-071-12W4-0.LAS',
 '00-11-33-077-17W4-0.LAS',
 '00-11-18-074-08W4-

### These two lists are different. One is SITEID the other is LAS file name. We'll convert them in the function below and find the ones in common and returnt that as a new list of wells.

In [48]:
def findWellsWithGivenTopsCurves(wells_with_all_given_tops,wellsWithNeededCurvesList_real):
    new_wells = wells.set_index('SitID').T.to_dict('list')
    for key in new_wells:
        new_wells[key].append(new_wells[key][1].replace("/","-")+".LAS") 
    new_wells_with_all_given_tops = []
    for well in wells_with_all_given_tops:
        new_wells_with_all_given_tops.append(new_wells[well][2])
    return list(set(new_wells_with_all_given_tops).intersection(wellsWithNeededCurvesList_real))

In [49]:
WellsWithGivenTopsCurves = findWellsWithGivenTopsCurves(wells_with_all_given_tops,wellsWithNeededCurvesList_real)

## List of wells in LAS filename to be used

In [50]:
WellsWithGivenTopsCurves

['00-07-35-070-11W4-0.LAS',
 '00-06-10-074-25W4-0.LAS',
 '00-14-09-096-15W4-0.LAS',
 '00-11-34-077-03W4-0.LAS',
 '00-07-32-076-15W4-0.LAS',
 'AA-06-11-089-20W4-0.LAS',
 'AA-07-35-093-12W4-0.LAS',
 'AA-10-11-074-07W4-0.LAS',
 'AA-01-01-096-07W4-0.LAS',
 '00-07-29-078-07W4-0.LAS',
 '00-13-09-068-11W4-0.LAS',
 '00-07-24-075-13W4-0.LAS',
 '00-08-11-069-08W4-0.LAS',
 '00-14-22-087-19W4-0.LAS',
 'AA-07-26-080-08W4-0.LAS',
 '00-10-21-069-09W4-0.LAS',
 '00-07-26-079-26W4-0.LAS',
 '00-07-26-090-22W4-0.LAS',
 '00-02-29-080-13W4-0.LAS',
 '00-15-36-080-19W4-0.LAS',
 '00-07-22-077-05W4-0.LAS',
 'AA-11-13-100-13W4-0.LAS',
 '00-06-36-085-20W4-0.LAS',
 '00-07-01-070-27W4-0.LAS',
 '00-16-33-093-21W4-0.LAS',
 '00-08-09-079-05W4-0.LAS',
 '00-06-28-076-19W4-0.LAS',
 '00-10-29-067-08W4-0.LAS',
 'AA-10-29-086-09W4-0.LAS',
 '00-04-13-077-05W4-0.LAS',
 '00-15-35-093-21W4-0.LAS',
 '00-11-04-081-04W4-0.LAS',
 'AA-15-14-101-13W4-0.LAS',
 '00-06-29-071-22W4-0.LAS',
 '00-15-23-075-04W4-0.LAS',
 'AA-11-06-085-11W4-

In [51]:
print(len(WellsWithGivenTopsCurves))

1822


## Write to file

In [52]:
with open('WellsWithGivenTopsCurves_201809_vA.pkl', 'wb') as f:
    pickle.dump(WellsWithGivenTopsCurves, f)

### This hasn't yet checked for other circumstances that may prevent wells from being used, for example:
1. Wells have malformed LAS files
2. Wells don't have nearby neighbors to use for certain feature calculations.