## DUC DATA COMPLETION INFORMATION

### WORKFLOW:
- Import Cleaned Well Header File
- Import Completion File
- Convert todatetime
- Iterate through EPAssetID's and find:
    - Earliest Completion Date
    - Last Completion Date
    - Completion Flag: 0 for not found, 1 for any completion
    - Highest completion top
    - Deepest Completion bottom



In [None]:
# Import libraries
import warnings
warnings.filterwarnings('ignore')
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime
from sklearn.preprocessing import LabelEncoder
%matplotlib inline

In [None]:
# Load files
Wells = pd.read_csv('WellHeader_Clean.csv')
Comps = pd.read_csv('PerfTreatments.csv')

In [None]:
Comps.info()

In [None]:
# Convert to datetime
Comps['ActivityDate'] = pd.to_datetime(Comps['ActivityDate'], infer_datetime_format=True)

In [None]:
IDs = set(Comps['EPAssetsId'])
len(IDs)

#### PerfTreatments has 20 up hole remediation and or error entries with IntervalTop <= 687.  We are loking for the production interval so remove these and other remedial, cement sqeeze etc. entries 

In [None]:
# PerfTreatments has a few entries at 0 mMD.  Remove these eroneous data points
len(Comps[Comps['IntervalTop']<=687])

In [None]:
# Remove rows and check
Comps = Comps[Comps.IntervalTop > 687]
len(Comps[Comps['IntervalTop']<=687])

In [None]:
C_Activities = list(set(Comps['ActivityType']))
C_Activities.remove(C_Activities[0])  #remove nan from list

In [None]:
C_Activities

In [None]:
sum(Comps['ActivityType'].isnull())

In [None]:
Keep_Activities = ['Acidize','Multi-Stage Fracture','Hydraulic Fracture','Packing Device Capped w/Cement','Sand Fracture','Acid Wash','Open Hole',
 'Multi-Stage Fracture - Port Closed','Chemical Fracture','Hydra Jet Perforation','Other','Perforation',
 'Slotted Liner','Fracture','Chemical Squeeze','Open Hole/Barefoot Completion','Acid Treatment','Acid Squeeze']

In [None]:
Drop_Activities = list(set(C_Activities) - set(Keep_Activities))
Drop_Activities

In [None]:
# Replace Nulls with Other - These are mostly in some BC Wells
Comps['ActivityType'].fillna('Other', inplace = True)

#### Filter events to drop into a new dataframe.  Use the EPAPTId EPASSetsID and IntervalTop to select corresponding redundant perf or other treatment events that are related to remedial avtivities.  Create a Remedial dataframe

In [None]:
Remedial = Comps[Comps['ActivityType'] == 'Bridge Plug - No Cement']
for item in Drop_Activities:
    if item != 'Bridge Plug - No Cement':

        df = Comps[Comps['ActivityType'] == item]
        Remedial = pd.concat([Remedial,df])
Remedial.index = range(Remedial.shape[0])    
Remedial.info()

In [None]:
Remedial.head()

In [None]:
#Merge in corresponding activities like perforation that align with the remedial event

for sample in range(len(Remedial)):
    ID = Remedial['EPAssetsId'][sample]
    depth = Remedial['IntervalTop'][sample]
    df = Comps[Comps['EPAssetsId']==ID]
    df1 = df[df['IntervalTop']==depth]
    Remedial = pd.concat([Remedial,df1])

Remedial.info()

In [None]:
# Drop Duplicates and reindex
Remedial = Remedial.drop_duplicates()
Remedial.index = range(Remedial.shape[0]) 
Remedial.info()

#### Remove all remedial activities from the Comps dataframe

In [None]:
for sample in range(len(Remedial)):
    PTID = Remedial['EPAPTId'][sample]
    Comps = Comps[Comps['EPAPTId'] != PTID]
    
Comps.info()

In [None]:
# Save the Remedial Activities file
Remedial.to_csv('Remedial_Activities.csv')

#### Removed another 148 entries.  Now transfer Completion Info to the Well Header.

In [None]:
Comps.head()

In [None]:
Comps.drop(Comps[['WellHeader.Match','EPAPTId','ObservationNumber','ActivityType','PerfShots']], axis =1, inplace = True)

In [None]:
Comps.head()

In [None]:
Comps.info()

#### Review the Wells info

In [None]:
Wells.head()

In [None]:
Wells.drop(Wells[['Unnamed: 0']], axis = 1, inplace = True)

In [None]:
# Convert to datetime
date_cols = ['SpudDate', 'FinalDrillDate','StatusDate']
for col in date_cols:
    Wells[col] = pd.to_datetime(Wells[col], infer_datetime_format=True)

In [None]:
Wells.info()

### Group Comps (PerfTreatments) by EPAssetId for min interval top & date, and maximum interval base and date to get renge of completion depth & time

In [None]:
Tops_Early = Comps.groupby('EPAssetsId').min()[['IntervalTop', 'ActivityDate']]
Tops_Early.columns=['Comp_Top', 'Early_Comp']
Tops_Early.head()

In [None]:
# Add a completion flag denoting the well was in the PerfTreatment file
# After merge the remaining wells without a completion will have a null value under the completion flag, and
#   completion dates and depths

Bases_Late = Comps.groupby('EPAssetsId').max()[['IntervalTop', 'ActivityDate']]
Bases_Late.columns=['Comp_Base', 'Late_Comp']
Bases_Late['Comp_Flag'] = 1
Bases_Late.head()

### Merge Wells with the Tops and Bases dataframes on EPAssetsId

In [None]:
Wells = pd.merge(Wells, Tops_Early, left_on = 'EPAssetsId', right_on = Tops_Early.index, how = 'outer', sort = False)

Wells = pd.merge(Wells, Bases_Late, left_on = 'EPAssetsId', right_on = Bases_Late.index, how = 'outer', sort = False)

In [None]:
Wells.info()

#### There are 329 wells without any completion information

In [None]:
Wells.head()

In [None]:
#Check range of results
min(Wells['Comp_Top']), min(Wells['Comp_Base']), min(Wells['Early_Comp']), min(Wells['Late_Comp'])

In [None]:
#Check range of results
max(Wells['Comp_Top']), max(Wells['Comp_Base']), max(Wells['Early_Comp']), max(Wells['Late_Comp'])

### The latest completion activity on these 10,000+ wells is Feb 18, 2020.

#### Save results to file

In [None]:
Wells.to_csv('WellHeader_with_Completions.csv')

### Look at TVD by Direction Well vs All Wells and TVD vs Comp_Top depth

In [None]:
fig, (ax1) = plt.subplots(1, 1, figsize = (8, 8))

plt.scatter(Wells['TotalDepth'], Wells['TVD'], label = 'All Wells with TVD Entry')
plt.scatter(Wells['TotalDepth'][Wells['WellProfile']=='Vertical'], Wells['TVD'][Wells['WellProfile']=='Vertical'], label = 'Vertical Wells')
plt.scatter(Wells['TotalDepth'][Wells['WellProfile']=='Directional'], Wells['TVD'][Wells['WellProfile']=='Directional'], label = 'Directional Wells')

plt.title('Wells TVD vs Total Depth')
plt.xlabel('Total Depth')
plt.ylabel('TVD')
plt.legend()
plt.grid()
plt.show

In [None]:
fig, (ax1) = plt.subplots(1, 1, figsize = (8, 8))

plt.scatter(Wells['Comp_Top'][Wells['WellProfile']=='Horizontal'], Wells['TVD'][Wells['WellProfile']=='Horizontal'], label = 'Horizontalal Wells')
plt.scatter(Wells['Comp_Top'][Wells['WellProfile']=='Vertical'], Wells['TVD'][Wells['WellProfile']=='Vertical'], label = 'Vertical Wells')
plt.scatter(Wells['Comp_Top'][Wells['WellProfile']=='Directional'], Wells['TVD'][Wells['WellProfile']=='Directional'], label = 'Directional Wells')

plt.title('Wells TVD vs Completion Top')
plt.xlabel('Completion Top')
plt.ylabel('TVD')
plt.legend()
plt.grid()
plt.show

#### Points to the right of the trend indicate horizontal well with completions far from the landing point or the well has a long build section.

#### Point to the left possibly indicate completion events still in the dataset related to remedial activities possibly labelled as 'Other'  or less likely uphole comingling.