# HTML to Markdown Converter

This script converts an HTML page into Markdown format using Python. It utilizes the `beautifulsoup4` library to parse the HTML content and the `markdownify` library to convert the HTML content to Markdown.

## Requirements

Ensure you have the following libraries installed:

```bash
pip install beautifulsoup4 markdownify requests

In [None]:
# Install required packages
%pip install requests beautifulsoup4 markdownify

In [None]:
# Import Necessary Libraries
import requests
from bs4 import BeautifulSoup
from markdownify import markdownify as md

In [None]:
# Define Helper Functions
# HTML to Markdown Conversion Function
def html_to_markdown(html_content):
    """
    Converts HTML content to Markdown format.

    Parameters:
        html_content (str): The HTML content as a string.

    Returns:
        str: The converted Markdown content.
    """
    soup = BeautifulSoup(html_content, 'html.parser')
    markdown_content = md(str(soup))
    return markdown_content

In [None]:
# Fetch HTML Content Function
def fetch_html(url):
    """
    Fetches HTML content from a given URL.

    Parameters:
        url (str): The URL of the HTML page.

    Returns:
        str: The HTML content as a string.
    """
    response = requests.get(url)
    response.raise_for_status()
    return response.text

In [None]:
# Read HTML Content from Local File Function
def read_html_file(file_path):
    """
    Reads HTML content from a local file.

    Parameters:
        file_path (str): The path to the local HTML file.

    Returns:
        str: The HTML content as a string.
    """
    with open(file_path, 'r', encoding='utf-8') as file:
        return file.read()

In [None]:
# Save Content to File Function
def save_to_file(content, filename):
    """
    Saves content to a file.

    Parameters:
        content (str): The content to save.
        filename (str): The name of the file to save the content.
    """
    with open(filename, 'w', encoding='utf-8') as file:
        file.write(content)

In [None]:
def main(source, source_type):
    """
    Main function to convert an HTML page to Markdown.

    Parameters:
        source (str): The source URL or file path of the HTML content.
        source_type (str): The type of the source - 'url' or 'file'.
    """
    # Fetch or read HTML content based on the source type
    if source_type == 'url':
        html_content = fetch_html(source)
    elif source_type == 'file':
        html_content = read_html_file(source)
    else:
        print("Invalid source type. Please choose 'url' or 'file'.")
        return

    # Convert HTML content to Markdown
    markdown_content = html_to_markdown(html_content)

    # Print the Markdown content
    print("Markdown Content:\n=================================================\n")
    print(markdown_content)

    # Save Markdown content to a file
    output_filename = 'output.md'
    save_to_file(markdown_content, output_filename)

    print(f'\n\n=================================================\nMarkdown content saved to {output_filename}')


In [None]:
# Example usage
source = 'sample.html'  # or 'path/to/local/example.html'
source_type = 'file'  # 'url' or 'file'

# Run the main function
main(source, source_type)
