<a target="_blank" href="https://colab.research.google.com/github/cohere-ai/notebooks/blob/main/notebooks/guides/Cohere_RAG_Inline_Citation_Markdown_Generator.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

In [None]:
pip install --quiet cohere

# Cohere RAG Inline Citation Markdown Generator

This helper function generates a formatted response with citations in both markdown and HTML formats. The function can be customized to change the look and feel of the output.

### Parameters:

non_cited_response (str): The response text without citations.
citations (list): List of citation objects. Each citation object should have the attributes start, end, text, and document_ids.
doc_url_mapping (dict): A dictionary mapping document IDs to their URLs.
text_color (str, optional): The color for the cited text. Default is "SkyBlue".
link_color (str, optional): The color for the links. Default is "green".
font_family (str, optional): The font family for the response text. Default is "Arial".
line_height (str, optional): The line height for the response text. Default is "1.6".
Returns:

A tuple containing:
markdown_response (str): The response formatted as markdown.
html_response (str): The response formatted as HTML.
Markdown Formatting:

Cited Text: Cited text within the response is styled using <span> tags with the specified text_color.
Links: Links to the cited documents are created using <a> tags with the specified link_color.
Bullet Points: Bullet points are preserved by converting them to <li> tags.
Line Breaks: Line breaks are preserved by converting \n characters to <br> tags.

In [None]:
import re
def generate_response(non_cited_response, citations, doc_url_mapping, text_color="SkyBlue", link_color="green", font_family="Arial", line_height="1.6"):
    """
    Generate a formatted response with citations in markdown and HTML.

    Parameters:
    non_cited_response (str): The response text without citations.
    citations (list): List of citation objects with start, end, text, and document_ids attributes.
    doc_url_mapping (dict): Mapping of document IDs to URLs.
    text_color (str, optional): Color for the cited text. Default is "SkyBlue".
    link_color (str, optional): Color for the links. Default is "green".
    font_family (str, optional): Font family for the response text. Default is "Arial".
    line_height (str, optional): Line height for the response text. Default is "1.6".

    Returns:
    tuple: A tuple containing the markdown response and the HTML response.
    """
    markdown_response = non_cited_response
    offset = 0
    for citation in citations:
        start = citation.start + offset
        end = citation.end + offset
        text = citation.text
        doc_links = ", ".join([f'<a href="{doc_url_mapping[doc_id.split("_")[1]]}" target="_blank" style="color: {link_color}; text-decoration: none;">{int(doc_id.split("_")[1])}</a>' for doc_id in citation.document_ids])
        replacement = f'<span style="color:{text_color}; font-weight: normal;">{text}</span> [{doc_links}]'
        markdown_response = markdown_response[:start] + replacement + markdown_response[end:]
        offset += len(replacement) - len(text)

    # Ensure markdown elements like bullet points and new lines are preserved
    markdown_response = markdown_response.replace("\n", "<br>")
    bullet_points = re.findall(r'(\s*[-*]\s.*)', markdown_response)
    for point in bullet_points:
        markdown_response = markdown_response.replace(point, f"<li>{point.strip('-* ')}</li>")

    html_response = f"""
    <style>
        p {{
            font-family: {font_family};
            line-height: {line_height};
        }}
        a {{
            text-decoration: none;
            color: {link_color};
        }}
        span {{
            color: {text_color};
            font-weight: bold;
        }}
    </style>
    <p>{markdown_response}</p>
    """

    return markdown_response, html_response

# Initiate Cohere Stream Chat with Web Connector

Try it out by running the cell below.

In [None]:
import cohere
from IPython.display import HTML, display
from google.colab import userdata
COHERE_API_KEY = userdata.get('COHERE_API_KEY')


# Initialize Cohere client
co = cohere.Client(api_key=COHERE_API_KEY)

query = "Explain the process of photosynthesis"

# Call Cohere Chat with Streamed Response
stream = co.chat_stream(
    model='command-r-plus',
    message=query,
    temperature=0,
    chat_history=[],
    prompt_truncation='AUTO',
    connectors=[{"id":"web-search"}]
)

non_cited_response = ""
citations = []
documents = []

print("Streamed chat response:")
for event in stream:
    # Handle event types
    if event.event_type == "text-generation":
        non_cited_response += event.text
        print(event.text, end='')
    if event.event_type == "citation-generation":
        citations.extend(event.citations)
    if event.event_type == "stream-end":
        documents = event.response.documents

doc_url_mapping = {doc['id'].split("_")[1]: doc['url'] for doc in documents}

# Call helper function
markdown_response, html_response = generate_response(non_cited_response, citations, doc_url_mapping)

# Save markdown file
with open("response.md", "w") as file:
    file.write(markdown_response)

print("Markdown Response:")

# Display the generated HTML response
display(HTML(html_response))

Streamed chat response:
Photosynthesis is the process by which plants, algae, and some types of bacteria use sunlight, water, and carbon dioxide to create oxygen and energy in the form of sugar. 

During photosynthesis, plants take in carbon dioxide and water from the air and soil. Within the plant cell, the water is oxidised, meaning it loses electrons, while the carbon dioxide is reduced, meaning it gains electrons. This transforms the water into oxygen and the carbon dioxide into glucose. The plant then releases the oxygen back into the air and stores energy within the glucose molecules.

The process can be broken down into two stages: light-dependent reactions and light-independent reactions. Light-dependent reactions take place within the thylakoid membrane and require a steady stream of sunlight. The light-independent stage, also known as the Calvin cycle, takes place in the stroma and does not require light.Markdown Response:
