In [1]:
import os
# import dotenv
from dotenv import load_dotenv
load_dotenv()

os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")

In [8]:
# Data Ingestion 

from langchain_community.document_loaders import UnstructuredWordDocumentLoader, Docx2txtLoader
from langchain_openai import ChatOpenAI
# loader = UnstructuredWordDocumentLoader('../Data/EU AI Act Doc (1) (3).docx',
loader = Docx2txtLoader('../Data/EU AI Act Doc (1) (3).docx', 
                #   mode="elements"
                  )
text_docs = loader.load()

In [15]:
print(text_docs)

[Document(metadata={'source': '../Data/EU AI Act Doc (1) (3).docx'}, page_content='High-level summary of the AI Act\n\n27 Feb, 2024\n\nUpdated on 30 May in accordance with the Corrigendum version of the AI Act.\n\nIn this article we provide you with a high-level summary of the AI Act, selecting the parts which are most likely to be relevant to you regardless of who you are. We provide links to the original document where relevant so that you can always reference the Act text.\n\nTo explore the full text of the AI Act yourself, use our\xa0AI Act Explorer. Alternatively, if you want to know which parts of the text are most relevant to you, use our\xa0Compliance Checker. \n\nFour-point summary\n\nThe AI Act classifies AI according to its risk:\n\nUnacceptable risk is prohibited (e.g. social scoring systems and manipulative AI).\n\nMost of the text addresses high-risk AI systems, which are regulated.\n\nA smaller section handles limited risk AI systems, subject to lighter transparency obli

In [10]:
# Checking for extracted images

for doc in text_docs:
    if 'images' in doc.metadata:
        images = doc.metadata['images']
        print(f"Extracted {len(images)} images on this page.")

In [18]:
# Text Splittings 

from langchain_text_splitters import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size = 1000, chunk_overlap = 200)
docs = text_splitter.split_documents(text_docs)
docs[:5]

[Document(metadata={'source': '../Data/EU AI Act Doc (1) (3).docx'}, page_content='High-level summary of the AI Act\n\n27 Feb, 2024\n\nUpdated on 30 May in accordance with the Corrigendum version of the AI Act.\n\nIn this article we provide you with a high-level summary of the AI Act, selecting the parts which are most likely to be relevant to you regardless of who you are. We provide links to the original document where relevant so that you can always reference the Act text.\n\nTo explore the full text of the AI Act yourself, use our\xa0AI Act Explorer. Alternatively, if you want to know which parts of the text are most relevant to you, use our\xa0Compliance Checker. \n\nFour-point summary\n\nThe AI Act classifies AI according to its risk:\n\nUnacceptable risk is prohibited (e.g. social scoring systems and manipulative AI).\n\nMost of the text addresses high-risk AI systems, which are regulated.'),
 Document(metadata={'source': '../Data/EU AI Act Doc (1) (3).docx'}, page_content='Unac

In [19]:
# Vector Embeddings and Vectorstore

from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma

db = Chroma.from_documents(docs, OpenAIEmbeddings())

In [21]:
query = "What is the four point summary?"
retireved_results=db.similarity_search(query)
print(retireved_results[0].page_content)

High-level summary of the AI Act

27 Feb, 2024

Updated on 30 May in accordance with the Corrigendum version of the AI Act.

In this article we provide you with a high-level summary of the AI Act, selecting the parts which are most likely to be relevant to you regardless of who you are. We provide links to the original document where relevant so that you can always reference the Act text.

To explore the full text of the AI Act yourself, use our AI Act Explorer. Alternatively, if you want to know which parts of the text are most relevant to you, use our Compliance Checker. 

Four-point summary

The AI Act classifies AI according to its risk:

Unacceptable risk is prohibited (e.g. social scoring systems and manipulative AI).

Most of the text addresses high-risk AI systems, which are regulated.


In [22]:
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model = "gpt-4o")
print(llm)

client=<openai.resources.chat.completions.completions.Completions object at 0x00000298575A5C10> async_client=<openai.resources.chat.completions.completions.AsyncCompletions object at 0x00000298575A6B10> root_client=<openai.OpenAI object at 0x00000298575A58D0> root_async_client=<openai.AsyncOpenAI object at 0x00000298575A6650> model_name='gpt-4o' model_kwargs={} openai_api_key=SecretStr('**********') stream_usage=True


In [23]:
# Design ChatPrompt Template
from langchain_core.prompts import ChatPromptTemplate
prompt = ChatPromptTemplate.from_template("""
            Answer the following question based only on the context. 
            Think step by step defore providing a detailed answer. 
            I will tip you $1000 if the used finds the answer helpful.
            Context : {context}
            Question: {input}""")

In [24]:
retriever = db.as_retriever()
retriever

VectorStoreRetriever(tags=['Chroma', 'OpenAIEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x0000029856EB73D0>, search_kwargs={})

In [25]:
from langchain_classic.chains.combine_documents import create_stuff_documents_chain

document_chain = create_stuff_documents_chain(llm, prompt)

In [26]:
from langchain_classic.chains import create_retrieval_chain

retrieval_chain = create_retrieval_chain(retriever, document_chain)

In [28]:
# response = retrieval_chain.invoke({"input":"An attention function can be described as mapping a query"})
response = retrieval_chain.invoke({"input":"What is the four point summary?"})

In [29]:
response

{'input': 'What is the four point summary?',
 'context': [Document(metadata={'source': '../Data/EU AI Act Doc (1) (3).docx'}, page_content='High-level summary of the AI Act\n\n27 Feb, 2024\n\nUpdated on 30 May in accordance with the Corrigendum version of the AI Act.\n\nIn this article we provide you with a high-level summary of the AI Act, selecting the parts which are most likely to be relevant to you regardless of who you are. We provide links to the original document where relevant so that you can always reference the Act text.\n\nTo explore the full text of the AI Act yourself, use our\xa0AI Act Explorer. Alternatively, if you want to know which parts of the text are most relevant to you, use our\xa0Compliance Checker. \n\nFour-point summary\n\nThe AI Act classifies AI according to its risk:\n\nUnacceptable risk is prohibited (e.g. social scoring systems and manipulative AI).\n\nMost of the text addresses high-risk AI systems, which are regulated.'),
  Document(metadata={'source':

In [30]:
print(response['answer'])

To answer your question step by step, let's focus on the four-point summary of the AI Act as provided in the context. Here's a breakdown:

1. **AI Classification by Risk**: 
   - **Unacceptable Risk**: AI systems that carry an unacceptable risk are prohibited. Examples include social scoring systems and manipulative AI.
   - **High-Risk Systems**: The majority of the Act focuses on high-risk AI systems, which are subject to regulation.

2. **GPAI Models and Systemic Risks**:
   - GPAI (General Purpose AI) models are considered systemic risks if they use an extensive amount of computational power (more than \(10^{25}\) floating point operations). Providers must notify the Commission within two weeks if this criterion is met.
   - Providers can argue that despite meeting the criteria, their model doesn't pose a systemic risk. The Commission, with advice from a scientific panel, may determine if a model has significant impact capabilities that qualify it as systemic.

3. **Obligations for

In [34]:
# query = "What is multi head attention?"

query = "Requirements for providers of high-risk AI systems"
retireved_results=db.similarity_search(query)
print(retireved_results[1].page_content)

Unacceptable risk is prohibited (e.g. social scoring systems and manipulative AI).

Most of the text addresses high-risk AI systems, which are regulated.

A smaller section handles limited risk AI systems, subject to lighter transparency obligations: developers and deployers must ensure that end-users are aware that they are interacting with AI (chatbots and deepfakes).

Minimal risk is unregulated (including the majority of AI applications currently available on the EU single market, such as AI enabled video games and spam filters – at least in 2021; this is changing with generative AI).

The majority of obligations fall on providers (developers) of high-risk AI systems.

Those that intend to place on the market or put into service high-risk AI systems in the EU, regardless of whether they are based in the EU or a third country.

And also third country providers where the high risk AI system’s output is used in the EU.


In [35]:
retireved_results

[Document(metadata={'source': '../Data/EU AI Act Doc (1) (3).docx'}, page_content='Providers whose AI system falls under the use cases in\xa0Annex III\xa0but believes it is\xa0not\xa0high-risk must document such an\nassessment before placing it on the market or putting it into service.\n\nRequirements for providers of high-risk AI systems (Art.\xa08–17)\n\nHigh risk AI providers must:\n\nEstablish a\xa0risk management system\xa0throughout the high risk AI system’s lifecycle;\n\nConduct\xa0data governance, ensuring that training, validation and testing datasets are relevant, sufficiently representative and, to the best extent possible, free of errors and complete according to the intended purpose.\n\nDraw up\xa0technical documentation\xa0to demonstrate compliance and provide authorities with the information to assess that compliance.\n\nDesign their high risk AI system for\xa0record-keeping\xa0to enable it to automatically record events relevant for identifying national level risks and 