# 0.0 Notes

### Complete:
- Cross checking the wellbore names in our study area to the wellbore names on SoDir factpages.
- Parsing the website for each well
- Scraping **Wellbore history** from each of the wells.
- Saving the wellbore history text from each well into a single .txt file - This file is called "*wellbore_history.txt*".
- Fetching the table in **General information** and saves it to a .csv file and a .txt file. Check **sect. 1.4** for this.
  - This also uses pandas to print the table directly (**NOTE:** This should be excluded when working with larger data sets)

### Goals:
- Table with depth (time or meters) gas hydrates
- Map if there are gas hydrates in the shallow areas --> Only one of the wellbore histories contain the word "*hydrate(s)*", it is found in *34/4-14 S*.
- top and bottom of reservoir
- Depth of the main geological feature
- 
- plot

# 1.0 Scraping data from a website

In [81]:
# Importing libraries
import csv
import os
import requests
import time
import pandas as pd
from bs4 import BeautifulSoup, NavigableString, Tag


## 1.1 Testing response from website

In [89]:
# Defining website. Factpages SoDir --> Exploration --> All
url = "https://factpages.sodir.no/en/wellbore/PageView/Exploration/All"

# Checking response from website
response = requests.get(url)
print(response,'if [200], fine')

response = response.content
soup = BeautifulSoup(response, 'html.parser')


<Response [200]> if [200], fine


## 1.2 Cross checking wellbore names in our list with wellbore names on SoDir Factpages

In [83]:
# URL of the webpage to scrape
url = "https://factpages.sodir.no/en/wellbore/PageView/Exploration/All"

# Read wells of interest from a CSV file
wells_of_interest = []
with open('well list.csv', newline='') as csvfile:
    reader = csv.reader(csvfile)
    for row in reader:
        wells_of_interest.extend(row)

print("Wellbores in CSV:", len(wells_of_interest))
print("Wellbores found on Factpages:") #, len(well_list))
        
# Fetch the webpage content
response = requests.get(url)
html_content = response.text

# Parse the HTML
soup = BeautifulSoup(html_content, "html.parser")

# Find all <li> tags
list_items = soup.find_all('li')

# Initialize list to hold the found items
well_list = []

# Iterate through each item
for item in list_items:
    # Extract the text within the <div> tag
    div_text = item.find("div").text if item.find("div") else None
    
    # Check if this text is in your wells_of_interest
    if div_text in wells_of_interest:
        href = item.find("a")["href"] if item.find("a") else None

        well_list.append((div_text, href))

# Determine which wells were not found
not_found_wells = set(wells_of_interest) - set(well[0] for well in well_list)

# Print found items
for name, url in well_list:
    print(f"Well: {name}, URL: {url}")

# Print wells not found
print("\nWells from CSV not found in Factpages:")
for name in not_found_wells:
    print(name)


Wellbores in CSV: 223
Wellbores found on Factpages:
Well: 30/2-1, URL: https://factpages.sodir.no/en/wellbore/PageView/Exploration/All/72
Well: 30/2-2, URL: https://factpages.sodir.no/en/wellbore/PageView/Exploration/All/457
Well: 30/2-3, URL: https://factpages.sodir.no/en/wellbore/PageView/Exploration/All/1970
Well: 30/2-4 S, URL: https://factpages.sodir.no/en/wellbore/PageView/Exploration/All/5784
Well: 30/2-5 S, URL: https://factpages.sodir.no/en/wellbore/PageView/Exploration/All/9051
Well: 30/3-1, URL: https://factpages.sodir.no/en/wellbore/PageView/Exploration/All/376
Well: 30/3-2, URL: https://factpages.sodir.no/en/wellbore/PageView/Exploration/All/203
Well: 30/3-3, URL: https://factpages.sodir.no/en/wellbore/PageView/Exploration/All/10
Well: 30/3-4, URL: https://factpages.sodir.no/en/wellbore/PageView/Exploration/All/460
Well: 30/3-9, URL: https://factpages.sodir.no/en/wellbore/PageView/Exploration/All/4053
Well: 30/3-10 S, URL: https://factpages.sodir.no/en/wellbore/PageView/Ex

## 1.3 Scraping wellbore history from the wells in "well list.csv" (all the wells in our area) and saves as "wellbore_history.txt".

In [87]:
def scrape_wellbore_history(url):
    response = requests.get(url)
    html_content = response.text
    soup = BeautifulSoup(html_content, "html.parser")
    elements = soup.select('html > body > main > div > div:nth-of-type(2) > ul > li:nth-of-type(2) > *')
    processed_text = ""
    for element in elements:
        if element.name == 'span' and element.get('style', '') == 'font-weight:700;':
            processed_text += f"\n\nHeader: {element.text}\n"
        else:
            processed_text += element.text + ' '
    return processed_text.strip()

# Read existing content from the file
existing_content = ""
try:
    with open('wellbore_history.txt', 'r', encoding='utf-8') as file:
        existing_content = file.read()
except FileNotFoundError:
    # If the file does not exist, proceed as it will be created later
    pass

# Initialize a list to hold data for printing and writing, if not already fetched
well_data = []

# Fetch data once and store it, if not already in the file
for well_name, well_url in well_list:
    if f"Well: {well_name}" not in existing_content:
        additional_info = scrape_wellbore_history(well_url)
        well_data.append((well_name, additional_info))
        # Pause the execution for a specified number of seconds to avoid overloading the server
        time.sleep(4) # Being polite to the server :)

# Write new data to the file, if any
with open('wellbore_history.txt', 'a', encoding='utf-8') as file:  # Note the 'a' mode for appending
    for well_name, additional_info in well_data:
        text_to_write = f"Well: {well_name}\n{additional_info}\n" + "-" * 50 + "\n"
        if text_to_write.strip() not in existing_content:
            file.write(text_to_write)

# Check if the file exists after executing the script and print a message
if os.path.exists(file_path):
    print("wellbore_history.txt already exists")

# Note: The content check is done before scraping to avoid unnecessary server requests.
# This implementation assumes that "Well: {well_name}" is a unique identifier for each entry.


wellbore_history.txt already exists


## 1.4 Scraping General Information table from the Factpages

In [90]:
def scrape_table_data(url):
    response = requests.get(url)
    soup = BeautifulSoup(response.content, 'html.parser')
    row_data = []

    table = soup.find('table', class_='general-info-table')
    if table:
        rows = table.find_all('tr')[1:]  # Skip header row
        
        # List of row indices to be excluded
        excluded_rows_indices = [4]  # Example: Exclude row 4 (zero-indexed)
        
        for i, row in enumerate(rows):
            if i not in excluded_rows_indices:  # Check if the current row index is not in the list of excluded indices
                cells = row.find_all('td')
                if len(cells) == 2:
                    # Manually navigate and clean the cells[0] content
                    for content in cells[0].contents:
                        if isinstance(content, Tag) and (content.name == 'div' and 'uk-drop' in content.get('class', [])):
                            content.decompose()  # Remove the matching div and its contents
                        if isinstance(content, Tag) and content.name == 'button':
                            content.decompose()  # Remove the button and its contents
                    
                    attribute = ''.join(cells[0].stripped_strings)  # Reconstruct attribute text without unwanted parts

                    # Extract value
                    value_div = cells[1].find('div')
                    if value_div:
                        value = value_div.get_text(strip=True)
                    else:
                        value = cells[1].text.strip()

                    row_data.append([attribute, value])

    return row_data

def write_to_csv(well_name, data, file_path):
    file_exists = os.path.isfile(file_path)
    with open(file_path, 'a', newline='', encoding='utf-8') as csvfile:
        writer = csv.writer(csvfile)
        if not file_exists:
            writer.writerow(['Attribute', 'Value'])  # Adjust column names as necessary
        for row in data:
            writer.writerow(row)

def write_to_txt(well_name, data, file_path):
    with open(file_path, 'a', newline='', encoding='utf-8') as txtfile:
        txtfile.write(f"{well_name}\n")
        for row in data:
            txtfile.write(f"{row[0]}: {row[1]}\n")
        txtfile.write('\n')

def check_well_exists(well_name, file_path):
    if not os.path.isfile(file_path):
        return False
    with open(file_path, 'r', newline='', encoding='utf-8') as csvfile:
        reader = csv.reader(csvfile)
        for row in reader:
            if well_name in row:
                return True
    return False

file_path_csv = 'general_information.csv'
file_path_txt = 'general_information.txt'

for well_name, well_url in well_list[:2]:
    if check_well_exists(well_name, file_path_csv):
        print(f"Data for {well_name} already exists in CSV - skipped")
    else:
        data = scrape_table_data(well_url)
        write_to_csv(well_name, data, file_path_csv)
        print(f"Data for {well_name} written to CSV.")

    if check_well_exists(well_name, file_path_txt):
        print(f"Data for {well_name} already exists in TXT - skipped")
    else:
        data = scrape_table_data(well_url)
        write_to_txt(well_name, data, file_path_txt)
        print(f"Data for {well_name} written to TXT.")

    time.sleep(2) # Being polite to the server again :)

# Read the CSV file using Pandas and print the table
df = pd.read_csv(file_path_csv)
print(df)


Data for 30/2-1 already exists in CSV - skipped
Data for 30/2-1 already exists in TXT - skipped
Data for 30/2-2 already exists in CSV - skipped
Data for 30/2-2 already exists in TXT - skipped
         Attribute             Value
0    Wellbore name            30/2-1
1             Type       EXPLORATION
2          Purpose           WILDCAT
3           Status               P&A
4        Main area         NORTH SEA
..             ...               ...
75      EW degrees  2° 39' 51.65'' E
76      NS UTM [m]        6744240.64
77      EW UTM [m]         481749.07
78        UTM zone                31
79  NPDID wellbore               457

[80 rows x 2 columns]


## Test