<a href="https://colab.research.google.com/github/MoritzLaurer/rag-demo/blob/master/rag_llamaindex_ai_law.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Overview of RAG with LlamaIndex and Hugging Face

In [1]:
!pip install --upgrade pip -q
!pip install llama-index~=0.9.32
!pip install langchain~=0.1.0
!pip install transformers~=4.36.0
!pip install huggingface_hub~=0.20.2
!pip install sentence_transformers~=2.2.2
!pip install chromadb~=0.4.22  # vector database
!pip install pypdf  # simple pdf reader
!pip install PyMuPDF~=1.23.7  # faster pdf reader


[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.1/2.1 MB[0m [31m9.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting llama-index~=0.9.32
  Downloading llama_index-0.9.34-py3-none-any.whl.metadata (8.4 kB)
Collecting beautifulsoup4<5.0.0,>=4.12.2 (from llama-index~=0.9.32)
  Downloading beautifulsoup4-4.12.3-py3-none-any.whl.metadata (3.8 kB)
Collecting dataclasses-json (from llama-index~=0.9.32)
  Downloading dataclasses_json-0.6.3-py3-none-any.whl.metadata (25 kB)
Collecting deprecated>=1.2.9.3 (from llama-index~=0.9.32)
  Downloading Deprecated-1.2.14-py2.py3-none-any.whl.metadata (5.4 kB)
Collecting httpx (from llama-index~=0.9.32)
  Downloading httpx-0.26.0-py3-none-any.whl.metadata (7.6 kB)
Collecting openai>=1.1.0 (from llama-index~=0.9.32)
  Downloading openai-1.9.0-py3-none-any.whl.metadata (18 kB)
Collecting tiktoken>=0.3.3 (from llama-index~=0.9.32)
  Downloading tiktoken-0.5.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.6 kB)
Coll

### Load and read PDFs

In [2]:
## download PDF data
import os
import zipfile
import requests
from io import BytesIO

# URL of the zip file in your GitHub repo (make sure it's the raw file URL)
zip_url = 'https://github.com/MoritzLaurer/rag-demo/blob/master/data/position-papers-pdfs.zip?raw=true'

# Download the zip file
print("Downloading zip file...")
response = requests.get(zip_url)
zip_content = BytesIO(response.content)

# Define the extraction path
extract_path = '/content/data'

# Create directory if it doesn't exist
if not os.path.exists(extract_path):
    os.makedirs(extract_path)

# Extract the zip file
print("Extracting zip file...")
with zipfile.ZipFile(zip_content, 'r') as zip_ref:
    zip_ref.extractall(extract_path)

print("Extraction completed.")

file_paths = [f for f in os.listdir(extract_path) if os.path.isfile(os.path.join(extract_path, f))]
print(f"{len(file_paths)} PDF files downloaded.")


Downloading zip file...
Extracting zip file...
Extraction completed.
440 PDF files downloaded.


In [3]:
# parse the raw PDFs into machine-readable docs

# LlamaIndex PDF document reader (is quite slow)
#from llama_index import SimpleDirectoryReader
#docs = SimpleDirectoryReader("./data").load_data()

In [4]:
# langchain PDF reading with PyMuPDF is faster
from langchain.document_loaders import PyMuPDFLoader
from tqdm.notebook import tqdm

directory = "./data"

docs = []
for pdf_path in tqdm(os.listdir(directory)):
  try:
    docs.append(PyMuPDFLoader(os.path.join(directory, pdf_path)).load())
  except Exception as e:
    print("Exception: ", e)

print("Number of PDFs: ", len(docs))

# PDFs are split by pages. We unnest the list here to have one element per page.
docs = [item for sublist in docs for item in sublist]
print("Number of PDF pages: ", len(docs))

docs[3]

  0%|          | 0/440 [00:00<?, ?it/s]

Exception:  cannot open broken document
Exception:  cannot open broken document
Number of PDFs:  438
Number of PDF pages:  4523


Document(page_content=' \n \n \n \n \n \n \n \n \n \n \n \n \n+32 2893 0235 \nhttps://digitalsme.eu \n123 Rue du Commerce, 1000 Brussels, Belgium \nVAT: BE0899786252 \noffice@digitalsme.eu \nEU Transparency Reg.: 082698126468-52 \ndisrupted. This may be happening naturally, but large industrial structures that have developed \nover decades and may be rather hard to disrupt at this stage. Certain industries may be \nunwilling to innovate as long as business is still going fine, so there may be a need to consciously \nand willingly disrupt our own industries to make them tougher and globally more competitive. \nSome companies decide to do so on their own: They incubate and accelerate new business \nmodels based on AI and technologies within their own structures. The companies that are \nwilling to innovate should be supported in their efforts. The current Covid-19 crisis is \nquestioning the future of certain industries, but could also accelerate AI-driven innovation, \nwhich could be an

In [5]:
# convert langchain docs to llamaindex docs
from llama_index.schema import Document

docs = [Document.from_langchain_format(doc) for doc in docs]
print(len(docs))
docs[3]

4523


Document(id_='dbbd8269-8651-486d-bd28-a083a16e6d11', embedding=None, metadata={'source': './data/F529888-DIGITAL_SME_Position_Paper_AI_White_Paper_FINAL_DRAFT.pdf', 'file_path': './data/F529888-DIGITAL_SME_Position_Paper_AI_White_Paper_FINAL_DRAFT.pdf', 'page': 3, 'total_pages': 19, 'format': 'PDF 1.7', 'title': '', 'author': '', 'subject': '', 'keywords': '', 'creator': '', 'producer': '', 'creationDate': '', 'modDate': '', 'trapped': ''}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash='2d3243d5cbc4e0f2b5ceeb4565dfce59a05e57a0543bf9aa1a7d8223de38bfbf', text=' \n \n \n \n \n \n \n \n \n \n \n \n \n+32 2893 0235 \nhttps://digitalsme.eu \n123 Rue du Commerce, 1000 Brussels, Belgium \nVAT: BE0899786252 \noffice@digitalsme.eu \nEU Transparency Reg.: 082698126468-52 \ndisrupted. This may be happening naturally, but large industrial structures that have developed \nover decades and may be rather hard to disrupt at this stage. Certain industries may be 

### Preprocess documents

In [7]:
from langchain.text_splitter import SentenceTransformersTokenTextSplitter, RecursiveCharacterTextSplitter
from llama_index.node_parser import LangchainNodeParser
from transformers import AutoTokenizer

# many textsplitters in llamaindex: https://docs.llamaindex.ai/en/stable/module_guides/loading/node_parsers/modules.html
# or via langchain: https://python.langchain.com/docs/modules/data_connection/document_transformers/
chunk_size = 256
text_splitter = RecursiveCharacterTextSplitter.from_huggingface_tokenizer(
        AutoTokenizer.from_pretrained("BAAI/bge-small-en-v1.5"),
        chunk_size=chunk_size,
        chunk_overlap=int(chunk_size / 10),
        add_start_index=True,
        strip_whitespace=True,
        separators=["\n\n", "\n", ".", " ", ""],
)

text_splitter = LangchainNodeParser(text_splitter)

nodes = text_splitter.get_nodes_from_documents(docs)

print(len(nodes))
nodes[3].text
# 14876

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

18266


'actors, with a particular focus on high-risk applications. See: European Commission, “White Paper On Artificial \nIntelligence - A European approach to excellence and trust”, 19 February 2020. \nRef. Ares(2020)3356825 - 26/06/2020'

### Adding meta data

In [None]:
## prepare meta data
import pandas as pd
import numpy as np

# load meta data
df_metadata = pd.read_csv(
    "https://raw.githubusercontent.com/MoritzLaurer/rag-demo/master/data/position-papers-metadata.csv",
    sep=";", on_bad_lines='error', encoding="cp1252"
)

df_metadata = df_metadata[[
    'Organisation name', #'Transparency register number',
    'User type', 'Organisation size', 'Country', 'Scope',
    'Feedback date', 'Language', 'Reference',
    #'Publication privacy settings', 'First name', 'Surname',
    #'You can upload a document here:\n\n'
]]

df_metadata = df_metadata.rename(columns={
    'Reference': "document_reference", 'Feedback date': "document_date", 'Language': "language",
    'User type': "stakeholder_type", 'Scope': "stakeholder_scope",
    'Organisation name': "stakeholder_name",
    #'Transparency register number': "transparency_register_number",
    #'First name': "first_name", 'Surname': "surname",
    'Organisation size': "stakeholder_size", 'Country': "stakeholder_country",
    #'Publication privacy settings', 'You can upload a document here:\n\n'
})

# add column with exact pdf names corresponding to pdf reference
# not all respondents provided PDFs
def find_string_with_substring(substring, string_list):
    for string in string_list:
        if substring in string:
            return string
    return np.nan

doc_dir = "./data"
file_names = os.listdir(doc_dir)
pdf_name_col = [find_string_with_substring(ref, file_names) for ref in df_metadata["document_reference"]]

# note that not all respondents provided PDFs
# document_name is NaN if no PDF is available
df_metadata.loc[:, "document_name"] = pdf_name_col

df_metadata

Unnamed: 0,stakeholder_name,stakeholder_type,stakeholder_size,stakeholder_country,stakeholder_scope,document_date,language,document_reference,document_name
0,Governance of AI Research Group,Academic/Research Institution,Micro (< 10 employees),United States,,19-06-2020 23:58,English,F529892,F529892-Governance_of_AI_Research_Group_EU_Com...
1,European Technology Policy Committee (EUTPC) o...,Academic/Research Institution,Large (250 or more),United States,,19-06-2020 22:38,English,F529891,
2,EIT Health e.V.,Other,Medium (< 250 employees),Germany,,19-06-2020 21:54,English,F529890,F529890-EIT_Health_Consultative_Group_on_EC_Da...
3,on behalf of: Chairman of the National Broadca...,Public authority,Medium (< 250 employees),Poland,National,19-06-2020 17:58,Polish,F529889,F529889-feedback_Consultation_on_the_White_Pap...
4,,,,,,19-06-2020 17:17,English,F529888,F529888-DIGITAL_SME_Position_Paper_AI_White_Pa...
...,...,...,...,...,...,...,...,...,...
1211,,,,,,19-02-2020 16:09,English,F518570,
1212,CUBE ROBOT X by haleez.com,Business Association,Micro (< 10 employees),Germany,,19-02-2020 15:58,German,F518569,
1213,,,,,,19-02-2020 14:33,English,F518568,
1214,,,,,,19-02-2020 13:17,English,F518567,


In [None]:
# add meta data to docs based on unique reference
# docs: https://docs.llamaindex.ai/en/stable/module_guides/loading/documents_and_nodes/usage_documents.html#customizing-documents

for node in tqdm(nodes):
    # the unique reference of each respondent are also specific characters of the PDF name
    # this reference can be used to merge the PDFs with meta data from the .csv
    if node.metadata["source"]:
        # overwrite automatically created metadata
        node.metadata = {"source": node.metadata["source"]}
        # add our custom metadata
        node_reference = node.metadata["source"][7:14]
        for col in df_metadata.columns:
            metadata_col_value = df_metadata[df_metadata["document_reference"] == node_reference][col].iloc[0]
            node.metadata[col] = metadata_col_value

print("Example for meta data added to document")
nodes[3].metadata

TextNode(id_='9cfb54a9-59c4-4d1b-b3d6-25d35f39f41b', embedding=None, metadata={'source': './data/F530251-EPHA_Moving_beyond_the-Hype_2019.pdf', 'stakeholder_name': 'European Public Health Alliance (EPHA)', 'stakeholder_type': 'NGO (Non-governmental organisation)', 'stakeholder_size': 'Small (< 50 employees)', 'stakeholder_country': 'Belgium', 'stakeholder_scope': nan, 'document_date': '13-06-2020 20:05', 'language': 'English', 'document_reference': 'F530251', 'document_name': 'F530251-EPHA_Moving_beyond_the-Hype_2019.pdf'}, excluded_embed_metadata_keys=['source', 'document_date', 'document_reference', 'document_name'], excluded_llm_metadata_keys=['source', 'document_date', 'document_reference', 'document_name'], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='7f8d5cf5-91d6-43ee-89f5-17a1b1b760d4', node_type=<ObjectType.DOCUMENT: '4'>, metadata={'source': './data/F530251-EPHA_Moving_beyond_the-Hype_2019.pdf', 'file_path': './data/F530251-EPHA_Moving_beyond_the-Hy

In [None]:
# determine how generative llm and embedder will handle metadata
for node in tqdm(nodes):
    # decide which metadata to ignore
    node.excluded_llm_metadata_keys = ["source", "document_date", "document_reference", "document_name"]
    node.excluded_embed_metadata_keys = ["source", "document_date", "document_reference", "document_name"]
    # decide how metadata is formatted, when added to text
    node.metadata_template = "{key}: {value}"
    node.metadata_seperator = ", "
    node.text_template = "Metadata: {metadata_str}. Content: {content}"

  0%|          | 0/14876 [00:00<?, ?it/s]

In [None]:
# this is what models will see when it receives a node
# note that this changes the input length and max token limits might need to be handled differently
from llama_index.schema import MetadataMode

print(
    "The generative LLM sees this: \n",
    nodes[0].get_content(metadata_mode=MetadataMode.LLM),
)
print(
    "\nThe Embedding model sees this: \n",
    nodes[0].get_content(metadata_mode=MetadataMode.EMBED),
)

The generative LLM sees this: 
 Metadata: stakeholder_name: European Public Health Alliance (EPHA), stakeholder_type: NGO (Non-governmental organisation), stakeholder_size: Small (< 50 employees), stakeholder_country: Belgium, stakeholder_scope: nan, language: English. Content: moving beyond the hype november 2019 epha reﬂection paper on big data and artiﬁcial intelligence ref. ares ( 2020 ) 3359993 - 26 / 06 / 2020

The Embedding model sees this: 
 Metadata: stakeholder_name: European Public Health Alliance (EPHA), stakeholder_type: NGO (Non-governmental organisation), stakeholder_size: Small (< 50 employees), stakeholder_country: Belgium, stakeholder_scope: nan, language: English. Content: moving beyond the hype november 2019 epha reﬂection paper on big data and artiﬁcial intelligence ref. ares ( 2020 ) 3359993 - 26 / 06 / 2020


### Embed and create vector store

In [None]:
# docs LlamaIndex & Hugging Face: https://docs.llamaindex.ai/en/stable/examples/llm/huggingface.html
# many different embedding libraries supported by LlamaIndex: https://docs.llamaindex.ai/en/stable/module_guides/models/embeddings.html
from llama_index.llms import HuggingFaceInferenceAPI, HuggingFaceLLM
from llama_index.embeddings import HuggingFaceInferenceAPIEmbedding, OptimumEmbedding, HuggingFaceEmbedding
from google.colab import userdata
import pandas as pd

# choose vector database
# demo with ChromaDB: https://docs.llamaindex.ai/en/stable/examples/vector_stores/ChromaIndexDemo.html
# available vector stores: https://docs.llamaindex.ai/en/stable/module_guides/storing/vector_stores.html
import chromadb
from llama_index.vector_stores import ChromaVectorStore
from llama_index import VectorStoreIndex, ServiceContext
from llama_index.storage.storage_context import StorageContext


In [None]:
# choose model to create embeddings with
# leaderboard: https://huggingface.co/spaces/mteb/leaderboard
# tradeoff: larger models perform better vs. larger models are slower, require more memory and create larger embeddings to store
embed_model_local = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")

# compute optimized model version for CPU
#OptimumEmbedding.create_and_save_optimum_model(
#    "BAAI/bge-small-en-v1.5", "./bge_onnx"
#)
#embed_model_local_onnx = OptimumEmbedding(folder_name="./bge_onnx")

# or run embedding model via API
# https://github.com/run-llama/llama_index/blob/9728746f898a22f5ecafaf59a8e319d29d39a91c/llama_index/embeddings/huggingface.py#L205
#embed_model_hf_api = HuggingFaceInferenceAPIEmbedding(model_name="BAAI/bge-base-en-v1.5")

In [None]:
# choose generator model
# need to call this here already to remove OAI dependency of ServiceContext below
model_generator = HuggingFaceInferenceAPI(
    model_name="mistralai/Mixtral-8x7B-Instruct-v0.1", #"HuggingFaceH4/zephyr-7b-beta",
    token=userdata.get('HF_TOKEN'),
    #task="text-generation"
)

In [None]:
# create vector database client and a new collection
chroma_client = chromadb.EphemeralClient()
chroma_collection = chroma_client.create_collection("quickstart")

In [None]:
# set up ChromaVectorStore and load in data
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)

storage_context = StorageContext.from_defaults(vector_store=vector_store)
service_context = ServiceContext.from_defaults(embed_model=embed_model_local, llm=model_generator)

# https://docs.llamaindex.ai/en/stable/api_reference/indices/vector_store.html
index = VectorStoreIndex(
    nodes,
    storage_context=storage_context, service_context=service_context,
    use_async=False, store_nodes_override=False,
    insert_batch_size=2048, show_progress=True,
    #**kwargs
)


Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/540 [00:00<?, ?it/s]

## RAG Pipeline

### 1. Retrieve

In [None]:
# https://docs.llamaindex.ai/en/stable/understanding/querying/querying.html
from llama_index.retrievers import VectorIndexRetriever
from llama_index.schema import QueryBundle

query = "What are the pros and cons of AI regulation from a business perspective?"
#query = "What does the stakeholder DigitalEurope think about AI regulation?"
query_bundle = QueryBundle(query)

# configure retriever: https://docs.llamaindex.ai/en/latest/api_reference/query/retrievers/vector_store.html#
retriever = VectorIndexRetriever(
    index=index,
    similarity_top_k=10,
)

nodes_retrieved = retriever.retrieve(query_bundle)


In [None]:
# inspect results
df_nodes_retrieved = pd.DataFrame({
    "score": [node.score for node in nodes_retrieved],
    "text": [node.text for node in nodes_retrieved],
    "metadata": [node.metadata for node in nodes_retrieved],
})

print("Documents retrieved based on this query:\n", query)
df_nodes_retrieved

Documents retrieved based on this query:
 What are the pros and cons of AI regulation from a business perspective?


Unnamed: 0,score,text,metadata
0,0.656722,and devise mitigation strategies from the desi...,{'source': './data/F530212-Microsoft_Response_...
1,0.6552,22 all these points highlight that the concept...,{'source': './data/F530162-DE_comments_on_AI_W...
2,0.6545,risk ” ai as being in need of regulation. succ...,{'source': './data/F530005-AI_White_Paper_Subm...
3,0.650637,the use of ai technology to assist or execute ...,{'source': './data/F530202-EBF_041600_-_EBF_An...
4,0.649519,to loans that they are obliged to do by extens...,{'source': './data/F530466-EC_Consultation_AI_...
5,0.643217,"the white paper, which proposes to take the be...",{'source': './data/F529923-BSP_contribution_AI...
6,0.642087,3 2. a risk - based approach to rules affectin...,{'source': './data/F530156-ITI_response_to_EC_...
7,0.641826,##ovate responsibly using ai versus second gue...,{'source': './data/F528968-IRSG_DATA_WORKSTREA...


### 2. Rerank

In [None]:
# reranking
# https://docs.llamaindex.ai/en/stable/examples/node_postprocessor/LLMReranker-Gatsby.html
from llama_index.postprocessor import SentenceTransformerRerank

# https://docs.llamaindex.ai/en/stable/api_reference/node_postprocessor.html#llama_index.indices.postprocessor.SentenceTransformerRerank
reranker = SentenceTransformerRerank(
    model="BAAI/bge-reranker-base",
    top_n=3,
)

retrieved_nodes_reranked = reranker.postprocess_nodes(
    nodes_retrieved, query_bundle
)

print([node.text for node in retrieved_nodes_reranked])

['3 2. a risk - based approach to rules affecting ai we appreciate the white paper ’ s approach suggesting that regulation should concentrate on how to minimise the various risks of potential harms that may emerge from high - risk ai applications ( p. 10 ). we agree that risks need to be identified and mitigated and encourage policymakers to take a risk - based rather than overly precautionary approach to rules affecting ai. given that the potential benefits of ai development are enormous, and that ai is a rapidly evolving technology, a legislative approach should be flexible enough to account for the rapidly changing and fast - paced technological advancement in this sector. technological innovations bring innumerable benefits to the european economy and society. should the future european ai approach be too restrictive, there is also a risk of limiting the enablement of such technologies and miss opportunities for europe and its citizens. we are already experiencing the benefits of a

In [None]:
# inspect results
df_nodes_reranked = pd.DataFrame({
    "score": [node.score for node in retrieved_nodes_reranked],
    "text": [node.text for node in retrieved_nodes_reranked],
    "metadata": [node.metadata for node in retrieved_nodes_reranked],
})

print("Documents retrieved based on this query:\n", query)
df_nodes_reranked

Documents retrieved based on this query:
 What are the pros and cons of AI regulation from a business perspective?


Unnamed: 0,score,text,metadata
0,0.105848,3 2. a risk - based approach to rules affectin...,{'source': './data/F530156-ITI_response_to_EC_...
1,0.063031,risk ” ai as being in need of regulation. succ...,{'source': './data/F530005-AI_White_Paper_Subm...
2,0.051848,to loans that they are obliged to do by extens...,{'source': './data/F530466-EC_Consultation_AI_...


### 3. Generate

In [None]:
# generation
# https://docs.llamaindex.ai/en/stable/api_reference/prompts.html#llama_index.prompts.base.BasePromptTemplate.format
# using HF LLMs locally: https://docs.llamaindex.ai/en/stable/api_reference/llms/huggingface.html  example https://docs.llamaindex.ai/en/stable/examples/customization/llms/SimpleIndexDemo-Huggingface_stablelm.html
from llama_index.prompts import PromptTemplate


prompt_template = PromptTemplate("""\
Your task is to answer a question based on context.
Your answer should be concise and you should only return an answer grounded in the contexts.

contexts:
{context}

question: {question}

answer: """
)


context = ""
for i, node in enumerate(retrieved_nodes_reranked):

  # add metadata to context for generator
  metadata_string = ""
  for key, value in node.metadata.items():
    if key not in ["source", "document_name"]:
      metadata_string += f"{key}: {value}, "

  context += f"context_{i+1}: {metadata_string} stakeholder position: {node.text.strip()} \n "


prompt = prompt_template.format(**{"context": context, "question": query})


# https://github.com/run-llama/llama_index/blob/fc0afeabfebd34f3eec4771e639cc1d8742e14f5/llama_index/llms/huggingface.py#L398
model_generator = HuggingFaceInferenceAPI(
    model_name="mistralai/Mixtral-8x7B-Instruct-v0.1", #"HuggingFaceH4/zephyr-7b-beta",
    token=userdata.get('HF_TOKEN'),
    #task="text-generation"
)

response = model_generator.complete(prompt)

print(f"Query:\n", query)
print("\nRAG response:\n", response)
print("\nThe response is based on this context:\n", context)


Query:
 What are the pros and cons of AI regulation from a business perspective?

RAG response:
 

Pros of AI regulation from a business perspective:
- Encourages policymakers to take a risk-based approach to rules affecting AI, which can provide business certainty and give consumers confidence that the AI is trustworthy.
- A balanced approach to AI regulation can take into account the risks of AI and its benefits, and be informed by experts and science.
- Regulation can avoid creating costs that stifle competition, particularly when those costs fall most heavily on SMEs.

Cons of AI regulation from a business perspective:
- Overly restrictive regulation can limit the enablement of AI technologies and miss opportunities for businesses and their customers.
- Regulation can raise barriers to the development and application of AI.
- Perceived over-regulation can pose the danger of bureaucratization in a field that needs the fostering of innovation.
- Existing regulation may already cover 