# **Importing Libraries**

In [1]:
import pandas as pd
import requests
from bs4 import BeautifulSoup

# **Web Scraping**

**Objective**

The goal of this notebook is to demonstrate the process of scraping head circumference (HC) measurements from a web resource and utilizing these measurements to estimate gestational age. The data is sourced from a reputable educational platform associated with the International Federation of Gynecologists and Obstetricians (FIGO).

**Source**

URL: https://www.glowm.com/section-view/heading/Assessment%20of%20Gestational%20Age%20by%20Ultrasound/item/206#



In [2]:
# URL of the webpage to scrape
url = "https://www.glowm.com/section-view/heading/Assessment%20of%20Gestational%20Age%20by%20Ultrasound/item/206#"

# Send a GET request to fetch the webpage content
response = requests.get(url)
response.raise_for_status()  # Check if the request was successful

# Parse the webpage content using BeautifulSoup
soup = BeautifulSoup(response.content, 'html.parser')

# Function to find the <p> tag with the table heading text
def find_table_heading(soup):
    for p in soup.find_all('p', class_='tablehead'):
        if 'TABLE 5.Head Circumference Measurements Relative to Gestational Age' in p.get_text():
            return p
    return None

# Find the table heading
table_heading = find_table_heading(soup)

if table_heading:
    # Find the next <div> with class 'table-wrap' after the <p> tag
    table_div = table_heading.find_next('div', class_='table-wrap')

    if table_div:
        # Find the <table> within the <div>
        table = table_div.find('table')

        if table:
            # Extract headers from <thead> rows
            headers = []
            for header in table.find('thead').find_all('tr'):
                headers.extend([th.get_text(strip=True) for th in header.find_all('td')])
            
            # Extract table rows from <tbody>
            rows = []
            for row in table.find('tbody').find_all('tr'):
                cols = [td.get_text(strip=True) for td in row.find_all('td')]
                rows.append(cols)

            # Create a DataFrame
            df = pd.DataFrame(rows, columns=headers[:len(rows[0])])  # Ensure the number of headers matches the number of columns in rows

            # Save the DataFrame to a CSV file
            output_csv = 'head_circumference_table_5.csv'
            df.to_csv(output_csv, index=False)

            print(f'Table "TABLE 5. Head Circumference Measurements Relative to Gestational Age" has been saved to {output_csv}')
        else:
            print('Table not found within the <div> element.')
    else:
        print('Table <div> not found after the heading.')
else:
    print('Table heading not found.')


Table "TABLE 5. Head Circumference Measurements Relative to Gestational Age" has been saved to head_circumference_table_5.csv


# **Data Description**

**The head circumference (HC)** measurement is a critical parameter used in estimating gestational age. 

The dataset includes:

1. Head Circumference (cm): The measurement of the fetal head circumference in centimeters.
2. Gestational Age (weeks): The corresponding gestational age in weeks.

In [3]:
# Read the existing CSV into a DataFrame
df_original = pd.read_csv('/kaggle/working/head_circumference_table_5.csv')
# Display the first few rows to understand the structure
print("Original DataFrame:")
print(df_original.head())


Original DataFrame:
   Head  Menstrual  Head.1  Menstrual.1
0   8.0       13.4    22.5         24.4
1   8.5       13.7    23.0         24.9
2   9.0       14.0    23.5         25.4
3   9.5       14.3    24.0         25.9
4  10.0       14.6    24.5         26.4


# **Data Processing**

**Usage**

The head circumference measurement can be used to estimate the gestational age of the fetus by referring to the provided table. This process involves:

**Conversion:**

The head circumference values are converted from centimeters to millimeters for precision.

In [5]:
import pandas as pd

# Load the existing CSV file
input_csv = '/kaggle/working/head_circumference_table_5.csv'
output_csv = '/kaggle/working/head_circumference.csv'

# Read the existing CSV into a DataFrame
df = pd.read_csv(input_csv)

# Define new headers
new_headers = [
    "Head Circumference (cm)", "Gestational Age (weeks)",
    "Head Circumference (cm)", "Gestational Age (weeks)"
]

# Adjust the DataFrame columns
if df.shape[1] == 4:  # Ensure that the DataFrame has 4 columns
    df.columns = new_headers
    df = pd.DataFrame({
        "Head Circumference (cm)": pd.concat([df.iloc[:, 0], df.iloc[:, 2]]).reset_index(drop=True),
        "Gestational Age (weeks)": pd.concat([df.iloc[:, 1], df.iloc[:, 3]]).reset_index(drop=True)
    })
    
    # Save the reformatted DataFrame to a new CSV file
    df.to_csv(output_csv, index=False)
    print(f'CSV file has been reformatted and saved to {output_csv}')
else:
    print('Unexpected number of columns in the CSV file.')

# Read the new CSV into a DataFrame
df = pd.read_csv(output_csv)

# Display the first few rows to understand the structure
print("Original DataFrame:")
print(df.head())

# Assuming the column for head circumference in cm is named 'Head Circumference (cm)'
if 'Head Circumference (cm)' in df.columns:
    # Create a new column for head circumference in mm
    df['Head Circumference (mm)'] = df['Head Circumference (cm)'] * 10
    
    # Save the updated DataFrame to the same CSV file
    df.to_csv(output_csv, index=False)
    print(f'CSV file has been updated and saved to {output_csv}')
else:
    print('Column "Head Circumference (cm)" not found in the CSV file.')

# Read the updated CSV into a DataFrame to verify the changes
df = pd.read_csv(output_csv)

# Display the first few rows to verify the changes
print("Updated DataFrame:")
print(df.head())


CSV file has been reformatted and saved to /kaggle/working/head_circumference.csv
Original DataFrame:
   Head Circumference (cm)  Gestational Age (weeks)
0                      8.0                     13.4
1                      8.5                     13.7
2                      9.0                     14.0
3                      9.5                     14.3
4                     10.0                     14.6
CSV file has been updated and saved to /kaggle/working/head_circumference.csv
Updated DataFrame:
   Head Circumference (cm)  Gestational Age (weeks)  Head Circumference (mm)
0                      8.0                     13.4                     80.0
1                      8.5                     13.7                     85.0
2                      9.0                     14.0                     90.0
3                      9.5                     14.3                     95.0
4                     10.0                     14.6                    100.0


**Comparison:** 

The given head circumference is compared against the recorded measurements to estimate the corresponding gestational age.

In [14]:
def find_gestational_age(csv_file, input_hc_mm):
    # Load the CSV file into a DataFrame
    df = pd.read_csv(csv_file)
    
    # Sort the DataFrame by head circumference (mm) for proper comparison
    df = df.sort_values(by='Head Circumference (mm)')
    
    # Determine the gestational age based on input head circumference
    if input_hc_mm < df['Head Circumference (mm)'].iloc[0]:
        return f"< {df['Gestational Age (weeks)'].iloc[0]}"
    elif input_hc_mm > df['Head Circumference (mm)'].iloc[-1]:
        return f"> {df['Gestational Age (weeks)'].iloc[-1]}"
    elif input_hc_mm == df['Head Circumference (mm)'].iloc[len(df)-2]:
        return df['Gestational Age (weeks)'].iloc[len(df)-2]
    else:
        # Find the appropriate gestational age
        for i in range(len(df)-2):
            if df['Head Circumference (mm)'].iloc[i] <= input_hc_mm < df['Head Circumference (mm)'].iloc[i + 1]:
                return df['Gestational Age (weeks)'].iloc[i]
        
        # If not found in the range (this is a fallback, should not be reached if data is correct)
        return "Gestational age not found"


In [15]:
# Example usage
input_hc_mm = 359  # Example input head circumference in mm
csv_file = '/kaggle/working/head_circumference.csv'
gestational_age = find_gestational_age(csv_file,input_hc_mm)
print(f'The gestational age for head circumference {input_hc_mm} mm is: {gestational_age} weeks')

The gestational age for head circumference 359 mm is: 40.8 weeks


# **Conclusion**

Gestational age is a crucial parameter that aids in the identification of brain anomalies, specifically in detecting conditions like CSP and LV. Understanding whether these structures are absent due to developmental anomalies or simply because they have not yet developed is vital for accurate diagnosis and intervention in the next steps.