# JINA_READER

## Overview
JINA_READER fetches and extracts the main content of a web page and returns it in Markdown format using the Jina Reader API (https://jina.ai/reader/). The Jina Reader API is a developer-friendly service that leverages advanced content extraction algorithms to identify and return the most relevant, readable sections of any web page. It is designed to work with a wide range of web content, including articles, company profiles, and news, and outputs clean Markdown for easy integration into downstream workflows.

This function is useful for extracting specific information, summarizing content, or other text processing tasks directly in Excel. It enables business users to quickly analyze, summarize, or reference web-based information without leaving their spreadsheet environment. The Jina Reader API is robust against clutter, advertisements, and navigation elements, focusing on delivering the core readable content. It is ideal for automating research, reporting, and integrating web-based data into business processes.

## Usage
To use the `JINA_READER` function in Excel, enter it as a formula in a cell, specifying the URL of the web page you want to fetch. Optionally, you can provide an API key if you have one:

```excel
=JINA_READER(url, [api_key])
```

## Arguments
| Argument | Type   | Required | Description                                  | Example |
|:---|:---|:---|:---|:---|
| url       | string | Yes      | The full URL of the web page to fetch        | "https://www.ycombinator.com/companies/airbnb" |
| api_key   | string | No       | API key for authentication (if required)     | "your_api_key" |

## Returns
| Returns | Type   | Description                                                                    | Example |
|:---|:---|:---|:---|
| Content      | string | The main content of the web page in Markdown format, extracted by Jina Reader.  | "# Airbnb..." |
| Error        | string | Error message if the URL is invalid or unreachable.                             | "Error: Invalid URL" |

## Examples

### Company Analysis for Market Research

**Sample Input:**
| URL                                      | API Key      |
|-------------------------------------------|--------------|
| https://www.ycombinator.com/companies/airbnb | (optional)   |

**Sample Call:**
```excel
=JINA_READER("https://www.ycombinator.com/companies/airbnb")
=JINA_READER("https://www.ycombinator.com/companies/airbnb", "your_api_key")
```

**Sample Output:**
Returns the extracted content about Airbnb, including their business model and company history (in Markdown format).

In [None]:
def jina_reader(url, api_key=None):
    """
    Returns web page content in markdown format using Jina. Useful as a starting point for extraction, summarization, etc.

    Args:
        url (str): The full URL to fetch.
        api_key (str, optional): API key for authentication. Default is None.

    Returns:
        str: The content of the response from the URL, or an error message string if the request fails or input is invalid.
    """
    import requests
    if not isinstance(url, str) or not url.strip():
        return "Error: Invalid URL"
    headers = {
        "X-Retain-Images": "none"
    }
    if api_key:
        headers["Authorization"] = f"Bearer {api_key}"
    base_url = "https://r.jina.ai/"
    full_url = base_url + url
    try:
        response = requests.get(full_url, headers=headers, timeout=15)
        if response.status_code != 200:
            return f"Error: HTTP {response.status_code} - {response.reason}"
        # Extract content after 'Markdown Content:' marker
        try:
            content = response.text.split("Markdown Content:")[1]
        except IndexError:
            content = response.text
        return content.strip() if content.strip() else "Error: No content returned"
    except requests.exceptions.RequestException as e:
        return f"Error: {str(e)}"

In [None]:
import ipytest
ipytest.autoconfig()

import pytest

def test_company_page_content():
    url = "https://www.ycombinator.com/companies/airbnb"
    result = jina_reader(url)
    assert isinstance(result, str)
    assert len(result) > 0
    assert not result.startswith("Error:")
    assert any(x in result for x in ["Airbnb", "accommodation", "travel"])

def test_educational_page_content():
    url = "https://en.wikipedia.org/wiki/Microsoft_Excel"
    result = jina_reader(url)
    assert isinstance(result, str)
    assert len(result) > 0
    assert not result.startswith("Error:")
    assert any(x in result for x in ["Excel", "spreadsheet", "Microsoft"])

def test_error_handling():
    result = jina_reader("")
    assert isinstance(result, str)
    assert result.startswith("Error:")
    result2 = jina_reader("non-existent-website-12345.com")
    assert isinstance(result2, str)
    assert result2.startswith("Error:")

def test_api_key_optional():
    url = "https://www.ycombinator.com/companies/airbnb"
    result = jina_reader(url, api_key="jina_4bb8aec7a7f342d7a0dbc5f13610e7576VOaR0qA8OWk2eF42YDq3TlLMsX0")
    assert isinstance(result, str)
    assert len(result) > 0
    assert not result.startswith("Error:")
    assert any(x in result for x in ["Airbnb", "accommodation", "travel"])

ipytest.run()

In [None]:
# Interactive Demo
import gradio as gr

def gradio_jina_reader(url, api_key=None):
    return jina_reader(url, api_key=api_key)

demo_cases = [
    ["https://www.ycombinator.com/companies/airbnb", None],
    ["https://en.wikipedia.org/wiki/Microsoft_Excel", None]
]

demo = gr.Interface(
    fn=gradio_jina_reader,
    inputs=[
        gr.Textbox(label="URL", value="https://www.ycombinator.com/companies/airbnb"),
        gr.Textbox(label="API Key (optional)", value="")
    ],
    outputs=gr.Markdown(label="Extracted Content (Markdown)"),
    examples=demo_cases,
    description="Fetch the main content of a web page as Markdown using the Jina Reader API. Optionally provide an API key if required.",
    flagging_mode="never",
)
demo.launch()