## **Data Preparation**

Scraping all relevant data needed on maternal health care and reproductive system from the appropriate websites. 

World Health Organization

In [1]:
#libraries
import requests
from bs4 import BeautifulSoup
import re
import json
import os
from urllib.parse import urlparse
#from scripts.loggingsetup import error_logger, success_logger
print("Libraries imported")

Libraries imported


In [29]:
#function to fetch and save structured data from a web page

def get_data(url):
    try:
        # Get the data from the website
        headers = {"User-Agent": "Mozilla/5.0"}
        response = requests.get(url, headers=headers)
        response.raise_for_status()  # Raise error if bad response

        # Parse the data using BeautifulSoup
        soup = BeautifulSoup(response.content, 'html.parser')

        # Find all sections with article content
        sections = soup.find_all("div", {"data-testid": "tabbed-article-section"})

        if not sections:
            print("❌ No article content found.")
            return None

        # Combine all the article text
        article_text = ""

        for section in sections:
            for tag in section.find_all(['h2', 'h3', 'p']):
                text = tag.get_text(strip=True)
                if not text:
                    continue
            #ensuring that there is space between headings and paragraphs
                if tag.name == 'h2':
                    article_text += f"\n\n## {text}\n\n"
                elif tag.name == 'h3':
                    article_text += f"\n\n### {text}\n\n"
                else:  # paragraph
                    article_text += f"{text}\n\n"

        # Generate a file name based on URL path
        path = urlparse(url).path.strip('/')
        file_name = path.replace('/', '_') or 'index'
        file_path = f"{file_name}.txt"

        # Save to file in the data folder
        data_folder = r"C:\Projects_ML\AI-for-Maternal-HealthCare\data"
        if not os.path.exists(data_folder):
            os.makedirs(data_folder)

        full_path = os.path.join(data_folder, file_path)
        with open(full_path, 'w', encoding='utf-8') as f:
            f.write(article_text)

        print(f"✅ Data saved to {full_path}")
        return article_text

    except Exception as e:
        print(f"❌ Error processing {url}: {str(e)}")
        return None


In [30]:
#data extraction
get_data("https://psychcentral.com/depression/depression-busters-for-new-moms")

✅ Data saved to C:\Projects_ML\AI-for-Maternal-HealthCare\data\depression_depression-busters-for-new-moms.txt


"Postpartum depression can affect anyone. Here are some tips for coping with it.\n\nThe birth of a baby can bring a lot of change to someone’s life. Some of these changes are expected, like lots of dirty diapers, new feeding schedules, and sleep deprivation.\n\nBut some changes are less expected — such as those that affect your mental and emotional health, likepostpartum depression (PPD).\n\nPPD is a form ofmajor depressive disorder (MDD)that develops in a parent in the year following the birth of their child. Many factors maycause postpartum depression.\n\n“It’s similar to major depression in that the main symptoms are feeling depressed and disinterested,” explains Kristin Calverley, a licensed psychologist in Texas certified in perinatal mental health and owner ofInner Balance Psychological Services.\n\nOther symptoms you might experience include, but aren’t limited, to:\n\nIf you’re experiencing any of these symptoms for several days or weeks on end, you might be experiencing PPD. W

In [2]:
#function 2 - websites have different structures, 
# so we need to create a new function to handle the different structure of the website
def get_data2(url):
    try:
        import requests
        from bs4 import BeautifulSoup
        from urllib.parse import urlparse
        import os

        headers = {"User-Agent": "Mozilla/5.0"}
        response = requests.get(url, headers=headers)
        response.raise_for_status()

        soup = BeautifulSoup(response.content, 'html.parser')

        # Look for the main content div
        article_container = soup.find("div", class_="entry-content")
        if not article_container:
            print("❌ Could not find article container with class 'entry-content'.")
            return None

        # Extract text content from all paragraphs and headings
        article_text = ""
        for tag in article_container.find_all(['p', 'h2', 'h3']):
            text = tag.get_text(strip=True)
            if text:
                article_text += text + "\n\n"

        # Create a clean filename from the URL
        path = urlparse(url).path.strip('/')
        file_name = path.replace('/', '_') or 'index'
        file_path = f"{file_name}.txt"

        # Save to file
        data_folder = r"C:\Projects_ML\AI-for-Maternal-HealthCare\data" # Change this to your desired folder
        os.makedirs(data_folder, exist_ok=True)
        full_path = os.path.join(data_folder, file_path)

        with open(full_path, 'w', encoding='utf-8') as f:
            f.write(article_text)

        print(f"✅ Article saved to {full_path}")
        return article_text

    except Exception as e:
        print(f"❌ Error processing {url}: {str(e)}")
        return None


In [3]:
get_data2("https://smartparenting.ng/how-to-stay-healthy-and-active-during-pregnancy/#:~:text=This%20blog%20post%20aims%20to%20provide%20practical%20tips,experience.%20Nigerian%20moms%20face%20unique%20challenges%20during%20pregnancy.")

✅ Article saved to C:\Projects_ML\AI-for-Maternal-HealthCare\data\how-to-stay-healthy-and-active-during-pregnancy.txt


"Introduction\n\nMaintaining health and fitness during pregnancy is crucial for both mothers and babies: how to stay healthy and active during pregnancy.A healthy lifestyle promotes fetal development and helps moms cope with labor and delivery.In Nigeria, some common health concerns include gestational diabetes, hypertension, and inadequate nutrition.Each of these can lead to complications, making awareness essential for pregnant women.\n\nThis blog post aims to provide practical tips for Nigerian mothers.We will cover nutrition, exercise, and mental well-being strategies.These simple yet effective methods can significantly enhance your pregnancy experience.\n\nUnderstanding Health Challenges\n\nNigerian moms face unique challenges during pregnancy.Poor nutrition is common due to economic constraints or limited access to healthy foods.This can lead to weight gain or deficiencies that can affect the baby’s health.\n\nAdditionally, the hot climate often leaves pregnant women dehydrated.D

In [4]:
get_data2("https://smartparenting.ng/women-pregnancy-tips/")

✅ Article saved to C:\Projects_ML\AI-for-Maternal-HealthCare\data\women-pregnancy-tips.txt


"Introduction\n\nWomen pregnancy tips highlight that pregnancy marks a significant chapter in a woman’s life.It brings excitement, hope, and transformation.However, it also introduces a range of physical and emotional challenges.For Nigerian women, cultural expectations often amplify these challenges.Many feel pressured to meet traditional roles while managing pregnancy.Family expectations and societal norms can create additional stress.\n\nNigerian culture places high value on motherhood.Women may feel the weight of expectations to embody strength and resilience.These societal pressures can lead to anxiety during pregnancy.Furthermore, the availability of healthcare varies across regions in Nigeria.Women in rural areas may face more obstacles accessing prenatal care and support.This context emphasizes the need for practical guidance tailored to their experiences.\n\nThe purpose of this blog is to offer valuable and culturally relevant tips.Our aim is to help Nigerian women navigate th

In [5]:
#function 3 - websites have different structures, 
# so we need to create a new function to handle the different structure of the website
def get_data3(url):
    try:
        import requests
        from bs4 import BeautifulSoup
        from urllib.parse import urlparse
        import os

        headers = {"User-Agent": "Mozilla/5.0"}
        response = requests.get(url, headers=headers)
        response.raise_for_status()

        soup = BeautifulSoup(response.content, 'html.parser')

        # Look for the main content div
        sections = soup.find_all("div", {"data-testid": "tabbed-article-section"})
        if not sections:
            print("❌ No article content found.")
            return None

        # Extract text content from all paragraphs and headings
        article_text = ""
        for section in sections:
            # Look for all paragraphs, headings and bullet points in this section
            for tag in section.find_all(['h2', 'h3', 'p', 'li']):
                text = tag.get_text(strip=True)
                if not text:
                    continue

                if tag.name == 'h2':
                    article_text += f"\n\n## {text}\n\n"
                elif tag.name == 'h3':
                    article_text += f"\n\n### {text}\n\n"
                elif tag.name == 'li':
                    article_text += f"- {text}\n"
                else:  # paragraph
                    article_text += f"{text}\n\n"

        # Generate file name
        path = urlparse(url).path.strip('/')
        file_name = path.replace('/', '_') or 'index'
        file_path = f"{file_name}.txt"

        data_folder = r"C:\Projects_ML\AI-for-Maternal-HealthCare\data"
        os.makedirs(data_folder, exist_ok=True)
        full_path = os.path.join(data_folder, file_path)

        with open(full_path, 'w', encoding='utf-8') as f:
            f.write(article_text)

        print(f"✅ Data saved to {full_path}")
        return article_text

    except Exception as e:
        print(f"❌ Error processing {url}: {str(e)}")
        return None

In [6]:
get_data3("https://www.healthline.com/health/depression/how-to-deal-with-postpartum-depression")

✅ Data saved to C:\Projects_ML\AI-for-Maternal-HealthCare\data\health_depression_how-to-deal-with-postpartum-depression.txt


'The period after you have your baby can be filled with countless emotions, including sadness. If your feelings of sadness become severe and start to interfere with your everyday life, you may be experiencing postpartum depression (PPD).\n\nSymptoms of postpartum depression (PPD) usually start within a few weeks of delivery, though they may develop up to 6 months afterward. They may include changes in mood, trouble bonding with your baby, and difficulty thinking or making decisions.\n\nIf you feel like you may be depressed, you aren’t alone. Many new parents experience depression after giving birth.\n\nThe most effective way to diagnose and treat PPD is by visiting your doctor. They can evaluate your symptoms and devise the best treatment plan for you. You may benefit from psychotherapy, antidepressants, or some combination of both.\n\nThere are also things you can do at home to help cope with everyday life. Keep reading for more on how to manage PPD.\n\n\n\n## 1. Exercise when you can