# FIT/LOT Data Type

Two methods of testing wellbores are used within the industry: FIT and leak-off test (LOT). The industry often confuses LOTs and FITs. But understanding the difference is important in appreciating the benefits of frequent dynamic FITs when drilling in trouble zones.
#### FIT -> Formation Integrity Test

A FIT is comparable to testing a pressure vessel to its rated operating pressure, which includes a safety factor and in which no damage to future pressure containment capability is expected. 

#### LOT -> Leak Off Test

a LOT is comparable to testing a pressure vessel to leak, rupture, or become permanently deformed. The pressure is raised until the last casing shoe or formation is fractured as indicated by leak-off.



As a Product Owner
I want to be able to extract the FIT and LOT data from various End of Well Reports (EoWR) pertaining to all wells for a given asset
So that I can create an organised table of curated data and visualise it in a dashboard / UI as appropriate.

In [1]:
from azure.core.credentials import AzureKeyCredential
from azure.ai.formrecognizer import DocumentAnalysisClient
import json
import openai
import os

with open('../settings.json') as f:
    data = json.load(f)

# Set form recogniser client
credential = AzureKeyCredential(data["FORM_KEY"])
document_analysis_client = DocumentAnalysisClient(data["FORM_ENDPOINT"], credential)



# This example also requires an OpenAI API key
os.environ['OPENAI_API_KEY'] = data['OPENAI_API_KEY']
openai.api_key = os.environ['OPENAI_API_KEY']

In [2]:
from azure.ai.formrecognizer import FormRecognizerClient
form_recognizer_client = FormRecognizerClient(data["FORM_ENDPOINT"], credential)

## Data Source

EoWR 

In [7]:
path = "../data/EoWR/206_12a-3 (SW Clair F1) Geological EOWR_Signed.pdf"

In [8]:
# Analyze the document
with open(path, "rb") as f:
    poller = document_analysis_client.begin_analyze_document("prebuilt-document", f)
    result = poller.result()

In [9]:
OCR_text = result.content

In [10]:
len(OCR_text)

154909

In [11]:
import tiktoken

enc = tiktoken.encoding_for_model("gpt-3.5-turbo")
len(enc.encode(OCR_text))

53594

# Text

In [12]:
import pdfplumber

with pdfplumber.open(path) as pdf:
    page = pdf.pages[52].extract_text()
    page += pdf.pages[53].extract_text()

    # poller = document_analysis_client.begin_analyze_document("prebuilt-document", page)
    # page_result = poller.result()
    

In [13]:
print(page)

206/12a-3
Geological End of Well Report
10.3 LOT & FIT Plots
Figure 15: 20” Shoe LOT at 536m TVD to 1.26sg EMW
Formation Integrity Test
Well Name : 206/12a-3 Rig : BYFORD DOLPHIN Test Conducted By : Date : 04th July 2011
Mike Thorogood
CASING MUD TEST TEST 400
VOLUME PRESS. 390
Size(in) WT.(sg) 0.6 62 380
13 3/8 1.55 0.8 66 370 FIT at cement unit = 360psi
WT.(ppf) YP(lb/100ft2) 1.0 74 360 Mud hydrostatic to drill floor = 32psi on the pump
72 1.1 92 33 45 00 FIT pressure = 328psi
Grade API WL.(cc) 1.2 123 330 FIT mud weight = 1.55sg
L80(30min) 1.3 157 320
Max.Allo. 1.4 200 310
Press.(psi) Gel 0/10 1.5 241 300 FIT Achieved = 1.75sg
Burst BBL 1 1. .6 7 2 37 29 0. .0 0 222 789 000 MWD Check )isp(
Press.(psi) Pumped 1.80 342.0 260 MWD pressure Minimum = 2388psi 1.85 1.85 360.0 250 MWD pressure Maximum = 2732psi erusserp
Test MD 1( ,m 14) RBB etL Time Pressure 222 234 000 S Pe ren ss so ur ed e ap pt ph i= 1 =12 31 4.5 4m iM oD 1 0 s9 ta6 t. i0 cm TVD
9 urned r l ed ps ve r
Test TVD(m) 1.85 

In [14]:
input = page

json_template = json.dumps({ 
    "FIT": "<Formation Integrity Test Value in sg>",
    "LOT": "<Leak Off Test Value in sg>"
})

system = f"""
You are an API that given a text extracted using OCR from an End of Well Report will reply with the JSON {json_template}
If there is a field that you can not find, set it a null.
If the document has any kind of errors or is corrupted, add a field {{"errors": "<error description>"}}
If there is any additional information of feedback from the infromation extraction, add a {{"notes": "<additional-information>"}}

The format of your input will be the text of the relevant page.
"""

config = {
    "temperature": 0.2,
    "max_tokens": 512,
    "top_p": 1,
}


In [15]:
response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=[
      {"role": "system", "content": system},
      {"role": "user", "content": input},
    ],
    temperature=config["temperature"],
    max_tokens=config["max_tokens"],
    top_p=config["top_p"],
  )

In [16]:
gpt_dict = json.loads(response.get("choices")[0]["message"]["content"])
gpt_dict

{'FIT': '1.26', 'LOT': '1.694'}

## Open list approach

In [17]:
input = page

json_template = json.dumps({ 
    "Test Type": "<FIT or LOT>",
    "Casing Shoe": "<Casing shoe size>",
    "TVD (m)": "TVD in meters",
    "Surface pressure (psi)": "<MW value in sg>",
    "MW (sg)": "<MW value in sg>",
    "EMW (sg)": "<EMW value in sg>"
})

system = f"""
You are an API that given a text extracted using OCR from an End of Well Report will extract Formation Integrity Test (FIT) and Leak Off Test (LOT) results.
Your response will be a JSON with as many entries as needed in the format {json_template}
If there is a field that you can not find, set it a null.
If the document has any kind of errors or is corrupted, add a field {{"errors": "<error description>"}}
If there is any additional information of feedback from the infromation extraction, add a {{"notes": "<additional-information>"}}

The format of your input will be the text of the relevant page.
"""

config = {
    "temperature": 0.2,
    "max_tokens": 512,
    "top_p": 1,
}

response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=[
      {"role": "system", "content": system},
      {"role": "user", "content": input},
    ],
    temperature=config["temperature"],
    max_tokens=config["max_tokens"],
    top_p=config["top_p"],
  )

In [18]:
print(response.get("choices")[0]["message"]["content"])

{"Test Type": "FIT", "Casing Shoe": "20\"", "TVD (m)": "536", "Surface pressure (psi)": "360", "MW (sg)": "1.55", "EMW (sg)": "1.26"}
{"Test Type": "LOT", "Casing Shoe": "9-5/8\"", "TVD (m)": "1816", "Surface pressure (psi)": "965", "MW (sg)": "1.694", "EMW (sg)": "1.694"}
{"Test Type": "FIT", "Casing Shoe": "9-5/8\"", "TVD (m)": "1119", "Surface pressure (psi)": "965", "MW (sg)": "1.32", "EMW (sg)": "1.75"}


# Tables

In [19]:
import pdfplumber

with pdfplumber.open(path) as pdf:
    table = pdf.pages[52].extract_tables()

In [20]:
table

[[['Formation Integrity Test',
   None,
   None,
   None,
   None,
   None,
   None,
   None,
   None,
   None],
  ['Well Name : 206/12a-3',
   None,
   None,
   'Rig : BYFORD DOLPHIN',
   None,
   'Test Conducted By :\nMike Thorogood',
   'Date : 04th July 2011',
   None,
   None,
   None],
  ['CASING MUD TEST TEST 400\nVOLUME PRESS. 390\nSize(in) WT.(sg) 0.6 62 380\n13 3/8 1.55 0.8 66 370 FIT at cement unit = 360psi\nWT.(ppf) YP(lb/100ft2) 1.0 74 360 Mud hydrostatic to drill floor = 32psi on the pump\n72 1.1 92 33 45 00 FIT pressure = 328psi\nGrade API WL.(cc) 1.2 123 330 FIT mud weight = 1.55sg\nL80(30min) 1.3 157 320\nMax.Allo. 1.4 200 310\nPress.(psi) Gel 0/10 1.5 241 300 FIT Achieved = 1.75sg\nBurst BBL 1 1. .6 7 2 37 29 0. .0 0 222 789 000 MWD Check )isp(\nPress.(psi) Pumped 1.80 342.0 260 MWD pressure Minimum = 2388psi 1.85 1.85 360.0 250 MWD pressure Maximum = 2732psi erusserp\nTest MD 1( ,m 14) RBB etL Time Pressure 222 234 000 S Pe ren ss so ur ed e ap pt ph i= 1 =12 31 4.5 

In [21]:
input = str(table)

json_template = json.dumps({ 
    "Test Type": "<FIT or LOT>",
    "Casing Shoe": "<Casing shoe size>",
    "TVD (m)": "TVD in meters",
    "Surface pressure (psi)": "<MW value in sg>",
    "MW (sg)": "<MW value in sg>",
    "EMW (sg)": "<EMW value in sg>"
})

system = f"""
You are an API that given a text extracted using OCR from an End of Well Report will extract Formation Integrity Test (FIT) and Leak Off Test (LOT) results.
Your response will be a JSON with as many entries as needed in the format {json_template}
If there is a field that you can not find, set it a null.
If the document has any kind of errors or is corrupted, add a field {{"errors": "<error description>"}}
If there is any additional information of feedback from the infromation extraction, add a {{"notes": "<additional-information>"}}

The format of your input will be the text of the relevant page.
"""

config = {
    "temperature": 0.2,
    "max_tokens": 512,
    "top_p": 1,
}

response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=[
      {"role": "system", "content": system},
      {"role": "user", "content": input},
    ],
    temperature=config["temperature"],
    max_tokens=config["max_tokens"],
    top_p=config["top_p"],
  )

In [22]:
print(response.get("choices")[0]["message"]["content"])

{"Test Type": "FIT", "Casing Shoe": "13 3/8", "TVD (m)": "1,119", "Surface pressure (psi)": "328", "MW (sg)": "1.55", "EMW (sg)": "1.756"}
{"Test Type": "LOT", "Casing Shoe": null, "TVD (m)": "1,111", "Surface pressure (psi)": null, "MW (sg)": null, "EMW (sg)": null}


# Form recogniser

In [23]:
import io

# OCR from base form recogniser
def base_form_recogniser(pdf_bytes: io.BytesIO) -> dict:
    document = pdf_bytes.getvalue()

    # Start the document analysis
    poller = document_analysis_client.begin_analyze_document("prebuilt-document", document, polling_interval=5)

    # Get the result
    result = poller.result()
    data = result.to_dict()
    return data

In [24]:
from PyPDF4 import PdfFileWriter, PdfFileReader

inputpdf = PdfFileReader(open(path, "rb"))

output = PdfFileWriter()
output.addPage(inputpdf.pages[52])

output_bytesio = io.BytesIO()

output.write(output_bytesio)

In [25]:
extracted_text = base_form_recogniser(output_bytesio)

In [27]:
extracted_text["content"]

'bp\n206/12a-3 Geological End of Well Report\nCLAR\n10.3 LOT & FIT Plots\nLeak Off Test\nWell Name : 206/12a-3\nRig : BYFORD DOLPHIN\nTest Conducted By : Mike Thorogood\nDate : 25th June 2011\nCASING\nTEST VOLUME\nTEST\n120\nPRESS.\n110 100 90 80 70 Applied pressure (psi) 60 50 40 30 20 10\n0\nSize(in)\nWT.(sg)\n15\n20\n1.15\n20\nWT.(ppf) 133\nYP(Ib/100ft2)\n0.2\n26\n0.3\n35\nGrade X56\nAPI VL.(cc)\n0.4\n49\n[30min)\n0.5\n65\nMax.Allo.\n0.6\n80\nPress.(psi)\nGel 0/10\n0.7\n90\n0.8\n101.0\nBurst\nBBL\n0.9\n106.0\nPress. (psi)\nPumped\n0.90\nTest MD(m)\n536\nBBI\nTime\nPressure\nReturned\nTest TVD(m)\n0.70\n536\nElevation(F()\nSHOE\nFrom RKB\nMSL(Ft)\nShoe MD(m)\n527\nWater\nShoe TVD(m) Depth[m]\n527\nPump\nLOT Result\nLOT Result\nRate(bpm) 0.3 Pressure\nPressururized\n84.0\nDp/Ann/Both? FIT Press.\nBoth\nEMW[sg)\nLiner Size(in)\n1.26\nLOT Result Calculation Formula EMW = MW + (P-IT ? (TVD-shoe * 1.421)) 1.15 0.11 = 1.26 (so)\n0.0 0.2 0.4 0.6 0.8 1.0 Volume Pumped (bbls)\nFigure 15: 20" 

In [28]:
input = extracted_text["content"]

json_template = json.dumps({ 
    "Test Type": "<FIT or LOT>",
    "Casing Shoe": "<Casing shoe size>",
    "TVD (m)": "TVD in meters",
    "Surface pressure (psi)": "<MW value in sg>",
    "MW (sg)": "<MW value in sg>",
    "EMW (sg)": "<EMW value in sg>"
})

system = f"""
You are an API that given a text extracted using OCR from an End of Well Report will extract Formation Integrity Test (FIT) and Leak Off Test (LOT) results.
Your response will be a JSON with as many entries as needed in the format {json_template}
If there is a field that you can not find, set it a null.
If the document has any kind of errors or is corrupted, add a field {{"errors": "<error description>"}}
If there is any additional information of feedback from the infromation extraction, add a {{"notes": "<additional-information>"}}

The format of your input will be the text of the relevant page.
"""

config = {
    "temperature": 0.2,
    "max_tokens": 512,
    "top_p": 1,
}

response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=[
      {"role": "system", "content": system},
      {"role": "user", "content": input},
    ],
    temperature=config["temperature"],
    max_tokens=config["max_tokens"],
    top_p=config["top_p"],
  )

In [29]:
print(response.get("choices")[0]["message"]["content"])

{"Test Type": "LOT", "Casing Shoe": "20", "TVD (m)": "536", "Surface pressure (psi)": "84.0", "MW (sg)": "1.15", "EMW (sg)": "1.26"}
{"Test Type": "FIT", "Casing Shoe": "13 3/8", "TVD (m)": "1119", "Surface pressure (psi)": "328.0", "MW (sg)": "1.55", "EMW (sg)": "1.756"}


# Let's fine tune the Propmt

In [30]:
input = extracted_text["content"]

json_template = json.dumps({
    "Test Type": "<FIT or LOT>",
    "Casing Shoe": "<Casing shoe size>",
    "TVD (m)": "TVD in meters",
    "Surface pressure (psi)": "<Surface pressure value>",
    "MW (sg)": "<MW value in sg>",
    "EMW (sg)": "<EMW value in sg>"
})

system = f"""
You are an assistant that given a text extracted using OCR from an End of Well Report will extract 'Formation Integrity Test' (FIT) and 'Leak Off Test' (LOT) results.
There can be multiple tests, report all of them.
Write your output as a JSON with an entry with the format {json_template} per each test you find.
If there is a field that you can not find, set it a null.
If the document has any kind of errors or is corrupted, add a field {{"errors": "<error description>"}}
If there is any additional information of feedback from the infromation extraction, add a {{"notes": "<additional-information>"}}
"""
config = {
    "temperature": 0.2,
    "max_tokens": 512,
    "top_p": 1,
}

response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=[
      {"role": "system", "content": system},
      {"role": "user", "content": input},
    ],
    temperature=config["temperature"],
    max_tokens=config["max_tokens"],
    top_p=config["top_p"],
  )

In [31]:
print(response.get("choices")[0]["message"]["content"])

{"Test Type": "Leak Off Test", "Casing Shoe": "20", "TVD (m)": "536", "Surface pressure (psi)": "101.0", "MW (sg)": "1.15", "EMW (sg)": "1.26"}
{"Test Type": "Formation Integrity Test", "Casing Shoe": "13 3/8", "TVD (m)": "1119", "Surface pressure (psi)": "328.0", "MW (sg)": "1.55", "EMW (sg)": "1.756"}


# Another Document Example

In [34]:
path = "../data/EoWR/Clair A21 EOWR.pdf"

In [35]:
from PyPDF4 import PdfFileWriter, PdfFileReader

inputpdf = PdfFileReader(open(path, "rb"))

output = PdfFileWriter()
output.addPage(inputpdf.pages[9])

output_bytesio = io.BytesIO()

output.write(output_bytesio)
extracted_text = base_form_recogniser(output_bytesio)

input = extracted_text["content"]

json_template = json.dumps({
    "Test Type": "<FIT or LOT>",
    "Casing Shoe": "<Casing shoe size>",
    "TVD (m)": "TVD in meters",
    "Surface pressure (psi)": "<Surface pressure value>",
    "MW (sg)": "<MW value in sg>",
    "EMW (sg)": "<EMW value in sg>"
})

system = f"""
You are an assistant that given a text extracted using OCR from an End of Well Report will extract 'Formation Integrity Test' (FIT) and 'Leak Off Test' (LOT) results.
There can be multiple tests, report all of them.
Write your output as a JSON with an entry with the format {json_template} per each test you find.
If there is a field that you can not find, set it a null.
If the document has any kind of errors or is corrupted, add a field {{"errors": "<error description>"}}
If there is any additional information of feedback from the infromation extraction, add a {{"notes": "<additional-information>"}}
"""
config = {
    "temperature": 0.2,
    "max_tokens": 512,
    "top_p": 1,
}

response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=[
      {"role": "system", "content": system},
      {"role": "user", "content": input},
    ],
    temperature=config["temperature"],
    max_tokens=config["max_tokens"],
    top_p=config["top_p"],
  )

In [36]:
print(response.get("choices")[0]["message"]["content"])

{"Test Type": "FIT", "Casing Shoe": "17.1/2\"", "TVD (m)": "1180", "Surface pressure (psi)": null, "MW (sg)": null, "EMW (sg)": "1.72"}
{"Test Type": "LOT", "Casing Shoe": "8.1/2\"", "TVD (m)": "-1881.5", "Surface pressure (psi)": null, "MW (sg)": null, "EMW (sg)": null}


In [62]:
input

'bp\n4. Geology and geophysics\n4.1 Geological summary\n32" Section\n206/08-A21 was drilled from Slot 20 on the Clair Phase 1 Platform. The 32" section drilled through the Otter Bank Sequence, Ferder Formation, Morrison Sequence and Sinclair Sequence before setting the shoe in the sands of the Westray Group at 390m MD, 390m TVDBRT.\n17.1/2" Section\nThe 17.1/2" section drilled the remaining Westray Group, the Balder Formation, and into the Cretaceous. The shoe was set at 1249m MD, 1180m TVDBRT in the upper part of the Cretaceous at the top of the Maastrichtian K90 sequence. This section comprised predominantly sands and thin mudstones, with the Cretaceous boundary marking a transition to mudstones. The formation tops came within 5mTVD of prognosis, with top Cretaceous 4mTVD shallow.\n12.1/4" Section\nLarge quantities of cavings were recorded when cleaning out the rat-hole which cleaned up relatively quickly. A FIT was performed to 1.72sg EMW after drilling 3m of new formation. Drilling

# Another Document Example

In [37]:
path = "../data/EoWR/BHGE Integrated EOWR_204_20_L12.pdf"

In [40]:
from PyPDF4 import PdfFileWriter, PdfFileReader

inputpdf = PdfFileReader(open(path, "rb"))

output = PdfFileWriter()
output.addPage(inputpdf.pages[28])

output_bytesio = io.BytesIO()

output.write(output_bytesio)
extracted_text = base_form_recogniser(output_bytesio)

input = extracted_text["content"]

json_template = json.dumps({
    "Test Type": "<FIT or LOT>",
    "Casing Shoe": "<Casing shoe size>",
    "TVD (m)": "TVD in meters",
    "Surface pressure (psi)": "<Surface pressure value>",
    "MW (sg)": "<MW value in sg>",
    "EMW (sg)": "<EMW value in sg>"
})

system = f"""
You are an assistant that given a text extracted using OCR from an End of Well Report will extract 'Formation Integrity Test' (FIT) and 'Leak Off Test' (LOT) results.
There can be multiple tests, report all of them.
Write your output as a JSON with an entry with the format {json_template} per each test you find.
If there is a field that you can not find, set it a null.
If the document has any kind of errors or is corrupted, add a field {{"errors": "<error description>"}}
If there is any additional information of feedback from the infromation extraction, add a {{"notes": "<additional-information>"}}
"""
config = {
    "temperature": 0.2,
    "max_tokens": 512,
    "top_p": 1,
}

response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=[
      {"role": "system", "content": system},
      {"role": "user", "content": input},
    ],
    temperature=config["temperature"],
    max_tokens=config["max_tokens"],
    top_p=config["top_p"],
  )

In [41]:
print(response.get("choices")[0]["message"]["content"])

{"Test Type": "FIT", "Casing Shoe": null, "TVD (m)": null, "Surface pressure (psi)": null, "MW (sg)": null, "EMW (sg)": "1.45sg"}
