# Summarization of transcripts with Langchain

In this example, we intend to create a summarizer for long transcripts. The main goal is to break the original transcript into different chunks based on context - i.e. using an unsupervised approach to identify the different topics throughout the transcript (somehow similarly to Topic Modelling) - and summarize each of these chunks. in the end, the different summaries are returned to the user.

## Step 0: Configuring the environment

Most of the libraries that are necessary for the development of this example are built-in on the GenAI workspace, available in AI Studio. More specific libraries to handle the type of input will be added here. In this case, we are giving support to transcripts in the webvtt format, used to store transcripts, which require the webvtt-py library.

In [None]:
!pip install webvtt-py
!pip install pandas
!pip install promptquality==0.64.2
!pip install httpx==0.27.2
!pip install galileo-protect==0.15.1
!pip install galileo-observe==1.13.2

### Configuration of Hugging face caches

In the next cell, we configure HuggingFace cache, so that all the models downloaded from them are persisted locally, even after the workspace is closed. This is a future desired feature for AI Studio and the GenAI addon.

In [None]:
import os
os.environ["HF_HOME"] = "/home/jovyan/local/hugging_face"
os.environ["HF_HUB_CACHE"] = "/home/jovyan/local/hugging_face/hub"

## Step 1: Loading the data from the transcript

At first, we need to read the data from the transcript. As our transcript is in the .vtt format, we use a library called webvtt-py to read the content. As the text is a trancript of audio/video, it is organized in small chunks of conversation, each containing a sequential id, the time of the start and end of the chunk, and the text content (often in the form speaker:content).

From this data, we expect to extract the actual content,  while keeping reference to the other metadata - for this reason, we are loading all the data into a Pandas dataset. 

In [None]:
import webvtt
import pandas as pd

data = {
    "id": [],
    "speaker": [],
    "content": [],
    "start": [],
    "end": []
}

for caption in webvtt.read('data/I_have_a_dream.vtt'):
    line = caption.text.split(":")
    while len(line) < 2:
        line = [''] + line
    data["id"].append(caption.identifier)
    data["speaker"].append(line[0].strip())
    data["content"].append(line[1].strip())
    data["start"].append(caption.start)
    data["end"].append(caption.end)
    
df = pd.DataFrame(data)

df.head()
    
    

As a second option, we provide here a code to load the same structure from a plain text document, which only contains the actual content of the speech/conversation, without extra metadata. For the sake of simplicity and reuse of code, we keep the same Data Frame structure as the previous version, by filling the remaining fields with empty strings.

In [None]:
import pandas as pd

with open("data/I_have_a_dream.txt") as file:
    lines = file.read()

data = {
    "id": [],
    "speaker": [],
    "content": [],
    "start": [],
    "end": []
}

for line in lines.split("\n"):
    if line.strip() != "":
        data["id"].append("")
        data["speaker"].append("")
        data["content"].append(line.strip())
        data["start"].append("")
        data["end"].append("")        
        
df = pd.DataFrame(data)

df.head()

## Step 2: Semantic chunking of the transcript
Having the information content loaded according to the transcription format - with the text split into audio blocks, or into paragraphs, we now want to group these small blocks into relevant topics - so we can summarize each topic individually. Here, we are using a very simple approach for that, by using a semantic embedding of each sentence (using an embedding model from Hugging Face Sentence Transformers), and identifying the "breaks" among chunks as the ones with higher semantic distance. Notice that this method can be parameterized, to inform the number of topics or the best method to identify the breaks.

In [None]:
import numpy as np
from sentence_transformers import SentenceTransformer
from scipy.spatial.distance import cosine

embedding_model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
embeddings = embedding_model.encode(df.content)


In [None]:
class SemanticSplitter():
    def __init__ (self, content, embedding_model, method="number", partition_count = 10, quantile = 0.9):
        self.content = content
        self.embedding_model = embedding_model
        self.partition_count = partition_count
        self.quantile = quantile
        self.embeddings = embedding_model.encode(content)
        self.distances = [cosine(embeddings[i - 1], embeddings[i]) for i in range(1, len(embeddings))]
        self.breaks = []
        self.centroids = []
        self.load_breaks(method=method)

    def centroid_distance(self, embedding_id, centroid_id):
        return cosine(self.embeddings[embedding], self.centroid[centroid])

    def adjust_neighbors(self):
        self.breaks = []

    def load_breaks(self, method = 'number'):
        if method == 'number':
            if self.partition_count > len(self.distances):
                self.partition_count = len(self.distances)
            self.breaks = np.sort(np.argpartition(self.distances, self.partition_count - 1)[0:self.partition_count - 1])
        elif method == 'quantiles':
            threshold = np.quantile(self.distances, self.quantile)
            self.breaks = [i for i, v in enumerate(self.distances) if v >= threshold]
        else:
            self.breaks = []

    def get_centroid(self, beginning, end):
        return embedding_model.encode('\n'.join(self.content[beginning : end]))
    
    def load_centroids(self):
        if len(self.breaks) == 0:
            self.centroids = [self.get_centroid(0, len(self.content))]
        else:
            self.centroids = []
            beginning = 0
            for break_position in self.breaks:
                self.centroids += [self.get_centroid(beginning, break_position + 1)]
                beginning = break_position + 1
            self.centroids += [self.get_centroid(beginning, len(self.content))]

    def get_chunk(self, beginning, end):
        return '\n'.join(self.content[beginning : end])
    
    def get_chunks(self):
        if len(self.breaks) == 0:
            return [self.get_chunk(0, len(self.content))]
        else:
            chunks = []
            beginning = 0
            for break_position in self.breaks:
                chunks += [self.get_chunk(beginning, break_position + 1)]
                beginning = break_position + 1
            chunks += [self.get_chunk(beginning, len(self.content))]
        return chunks
        
    

In [None]:
chunk_separator = "\n *-* \n"

splitter = SemanticSplitter(df.content, embedding_model, method="number", partition_count=6)
chunks = chunk_separator.join(splitter.get_chunks())

## Step 3: Using a LLM model to Summarize each chunk
In our example, we are going to summarize each individual chunk separately. This solution might be advantageous in different situations:
 * When the original text is too big , or the loaded model works with a context that is too small. In this scenario, breaking information into chunks are necessary to allow the model to be applied
 * When the user wants to make sure that all the separate topics of a conversation are covered into the summarized version. An extra step could be added to allow some verification or manual configuration of the chunks to allow the user to customize the output

To achieve this goal, we load a LLM model and use a summarization prompt. For the model, we illustrate four different options here:
* Calling an cloud API (e.g. openAI API): This would require an API key from the desired service. In our example, we reccomend saving your API keys into a secrets.yaml file, and not shared together with the code, for security issues. An example with empty keys is provided with our code
* Connecting to a Hugging Face rest API: This also requires an API key
* Loading the model locally using Hugging Face repo: This would require to download the model in the first time you run your code. This might take several minutes (depending on your internet connection), and the model will be saved in local HF cache (set to be persisted in the beginning of this notebook)
* Loading the model from a file available as a project asset.

In [None]:
### Alternate code to connect to Hugging Face models
#from langchain_huggingface import HuggingFaceEndpoint

#import yaml
#with open('secrets.yaml') as file:
#    secrets = yaml.safe_load(file)
#huggingfacehub_api_token = secrets["HuggingFace"]

#repo_id = "mistralai/Mistral-7B-Instruct-v0.2"
#llm = HuggingFaceEndpoint(
#   huggingfacehub_api_token=huggingfacehub_api_token,
#   repo_id=repo_id,
#)


In [None]:
#from langchain_huggingface import HuggingFacePipeline
#from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

#model_id = "mistralai/Mistral-7B-v0.1"
#tokenizer = AutoTokenizer.from_pretrained(model_id)
#model = AutoModelForCausalLM.from_pretrained(model_id)
#pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, max_new_tokens=100, device=0)
#llm = HuggingFacePipeline(pipeline=pipe)

In [None]:
### Alternate code to load local models. 
###This specific example requires the project to have an asset call Llama7b, associated with the cloud S3 URI s3://dsp-demo-bucket/LLMs (public bucket)

from langchain_core.callbacks import CallbackManager, StreamingStdOutCallbackHandler
from langchain_community.llms import LlamaCpp

callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])

llm = LlamaCpp(
            model_path="/home/jovyan/datafabric/Llama7b/ggml-model-f16-Q5_K_M.gguf",
            n_gpu_layers=64,
            n_batch=512,
            n_ctx=4096,
            max_tokens=1024,
            f16_kv=True,  
            callback_manager=callback_manager,
            verbose=False,
            stop=[],
            streaming=False,
            temperature=0.4,
        )

In [None]:
prompt_template = '''
The following text is an excerpt of a transcription:

### 
{context} 
###

Please, produce a single paragraph summarizing the given excerpt.
'''

## Step 4: Create parallel chain to summarize the transcript

In the following cell, we create a chain that will receive a single string with multiple chunks (separated by the declared separator), than:
  * Break the input into separated chains - using the break_chunks function embedded in a RunnableLambda to be used in LangChain
  * Run a Parallel Chain with the following elements for each chunk:
    * Get an individual element
    * Personalize the prompt template to create an individual prompt for each chunk
    * Use the LLM inference to summarize the chunk
  * Merge the individual summaries into a single one




In [None]:
from operator import itemgetter
from langchain.prompts import ChatPromptTemplate
from langchain.schema import StrOutputParser
from langchain_core.runnables import RunnableLambda, RunnablePassthrough


def join_summaries(summaries):
    return "\n".join(list(summaries.values()))

def break_chunks(chunks):
    return chunks.split(chunk_separator)

lambda_join = RunnableLambda(join_summaries)
lambda_break = RunnableLambda(break_chunks)

prompt = ChatPromptTemplate.from_template(prompt_template)

chain = lambda_break | {f"summary_{i}" : itemgetter(i) | prompt | llm  for (i, _) in enumerate(RunnablePassthrough())} | lambda_join | StrOutputParser()



## Step 5: Connect to Galileo
Through the Galileo library called Prompt Quality, we connect our API generated in the Galileo console to log in. To get your ApiKey, use this link: https://console.hp.galileocloud.io/api-keys

In [None]:
import promptquality as pq

import yaml
with open('secrets.yaml') as file:
    secrets = yaml.safe_load(file)
os.environ['GALILEO_API_KEY'] = secrets["Galileo"]
galileo_url = "https://console.hp.galileocloud.io/"
pq.login(galileo_url)

## Step 6: Run the chain and connect the metrics to Galileo

In this session, we call the created chain and create the mechanisms to ingest the quality metrics into Galileo. For this example, we create a personalized metric (scorer), that will be running locally to measure the quality of the summarization. For this reason, we use HuggingFace implementation of ROUGE (using evaluate library), and implement into a CustomScorer from Galileo (next cell).

Below, we illustrate two alternative ways to connect to Galileo:
  * Using a customized run, which calculates the scores and logs into Galileo
  * Using the langchain callback (currently unavailable due to compatibility issues)

In [None]:
import evaluate
import time
import json
import promptquality as pq

def rouge_executor(row) -> float:
    try:
        print(f"node_input: {row.node_input}")
        print(f"node_output: {row.node_output}")

        # Try to decode node_input as JSON
        try:
            node_input = json.loads(row.node_input)
            reference = node_input.get("content", "").strip()
        except json.JSONDecodeError:
            print(f"Error decoding JSON in node_input: {row.node_input}")
            return 0.0

        # Try to decode node_output as JSON
        try:
            node_output = json.loads(row.node_output)
            prediction = node_output.get("content", "").strip()
        except json.JSONDecodeError:
            print(f"Error decoding JSON in node_output: {row.node_output}")
            return 0.0

        if not reference or not prediction:
            print("'content' fields are empty in node_input or node_output")
            return 0.0

        # Calculates ROUGE-L
        rouge = evaluate.load("rouge")
        rouge_values = rouge.compute(predictions=[prediction], references=[reference])

        return rouge_values.get("rougeL", 0.0)
    except Exception as e:
        print(f"Unexpected error in rouge_executor: {e}")
        return 0.0

def rouge_aggregator(scores, indices) -> dict:
    if len(scores) == 0:
        return {'Average RougeL': 0.0}
    else:
        return {'Average RougeL': sum(scores) / len(scores)}

# Define CustomScorer with corrected functions
rouge_scorer = pq.CustomScorer(name='RougeL', executor=rouge_executor, aggregator=rouge_aggregator)

# Invoke the chain to get the summaries
print("Invoking the chain...")
response = chain.invoke(chunks)

# Debugging to check the content of response
print(f"Response returned by the chain: {response}")

# Configures the assessment execution
partitioned_run = pq.EvaluateRun(
    project_name="AIStudio_template_code_summarization",
    run_name="Test4 partitioned script",
    scorers=[pq.Scorers.toxicity, pq.Scorers.sexist, rouge_scorer]
)

# Measures execution time
start_time = time.time()
response = chain.invoke(chunks)
total_time = int((time.time() - start_time) * 1000000)

# Adiciona os dados ao workflow
partitioned_run.add_workflow(input=chunks, output=response, duration_ns=total_time) 
partitioned_run.add_llm_step(input=chunks, output=response, duration_ns=total_time, model='local')

# Finalizes the execution of the assessment
partitioned_run.finish()


In [None]:
### THIS CODE IS NOT WORKING YET, AS GALILEO DOES NOT SUPPORT LISTS AS THE OUTPUT OF CHAIN NODES 

#summarization_callback =  pq.GalileoPromptCallback(
#    project_name = "AIStudio_summarization_template",
#    run_name = "Partitioned transcript",
#    scorers=[pq.Scorers.toxicity, pq.Scorers.sexist, rouge_scorer]
#)

#summaries = chain.invoke(chunks, config={"callbacks": [summarization_callback]})



## Galileo Protect

Galileo Protect serves as a powerful tool for safeguarding AI model outputs by detecting and preventing the release of sensitive information like personal addresses or other PII. By integrating Galileo Protect into your AI pipelines, you can ensure that model responses comply with privacy and security guidelines in real-time.

Galileo functions as an API that provides support for protection verification of your chain/LLM. To log into the Galileo console, it is necessary to integrate it with another service, such as Galileo Evaluate or Galileo Observe.

**Attention**: an integrated API within the Galileo console is required to perform this verification.

In [None]:
import galileo_protect as gp
from galileo_protect import ProtectTool, ProtectParser, Ruleset

with open("secrets.yaml", "r") as file:
    secrets = yaml.safe_load(file)

# 2. Configure environment variables with credentials
os.environ["GALILEO_API_KEY"] = secrets["Galileo"]
os.environ["GALILEO_CONSOLE_URL"] = secrets.get("galileo_url", "https://console.hp.galileocloud.io/")


Galileo Protect works by creating rules that identify conditions such as Personally Identifiable Information (PII) and toxicity. It ensures that the prompt will not receive or respond to sensitive questions. In this example, we create a set of rules (ruleset) and a set of actions that return a pre-programmed response if a rule is triggered. Galileo Protect also offers a variety of other metrics to suit different protection needs. You can learn more about the available metrics here: [Supported Metrics and Operators](https://docs.rungalileo.io/galileo/gen-ai-studio-products/galileo-protect/how-to/supported-metrics-and-operators).

Additionally, it is possible to import rulesets directly from Galileo through stages. Learn more about this feature here: [Invoking Rulesets](https://docs.rungalileo.io/galileo/gen-ai-studio-products/galileo-protect/how-to/invoking-rulesets).


In [None]:
project = gp.create_project("validate_protect")
print(f"Project created. Project ID: {project.id}")

stage = gp.create_stage(name="validate_chain_stage", project_id=project.id)
print(f"Stage created. ID do stage: {stage.id}")

# Define the Ruleset for PII Protection
ruleset = Ruleset(
    rules=[
        {
            "metric": "pii",
            "operator": "contains",
            "target_value": "ssn",
        },
    ],
    action={
        "type": "OVERRIDE",
        "choices": ["Personal Identifiable Information detected. Sorry, I cannot provide the response."]
    }
)

# Initialize ProtectTool with the configured stage_id and ruleset
protect_tool = ProtectTool(stage_id=stage.id, prioritized_rulesets=[ruleset], timeout=10)

# Use existing chain and combine with ProtectTool
protect_parser = ProtectParser(chain=chain)  # 'chain' has already been defined previously
protected_chain = protect_tool | protect_parser.parser

# Example of using the protected chain with input and output
input_data = {
    "input": "John Doe's social security number is 123-45-6789.",
    "output": "John Doe's social security number is 123-45-6789."
}

# Invoke the protected chain
print("Invoking the chain with PII protection...")
response = protected_chain.invoke(input_data)
print("Protected chain response:")
print(response)

## Galileo Observe

Galileo Observe helps you monitor your generative AI applications in production. With Observe you will understand how your users are using your application and identify where things are going wrong. Keep tabs on your production system, instantly receive alerts when bad things happen, and perform deep root cause analysis though the Observe dashboard.

You can connect Galileo Observe to your Langchain chain to monitor metrics such as cost and guardrail indicators.

In [None]:
from galileo_observe import GalileoObserveCallback
from operator import itemgetter


example_query = """Tell me a story about technology and innovation. 
Explain how artificial intelligence is shaping the future. 
Summarize the impact of renewable energy on society."""

result_break = lambda_break.invoke(example_query)


chain = lambda_break | {
    f"summary_{i}": itemgetter(i) | prompt | llm
    for i in range(len(result_break))
} | lambda_join | StrOutputParser()

monitor_handler = GalileoObserveCallback(project_name="validate_galileo_observe")

print("Invoking the chain with Galileo Observe...")
try:
    output = chain.invoke(
        example_query,
        config={"callbacks": [monitor_handler]}
    )
    print("Observed chain output:")
    print(output)
except Exception as e:
    print(f"Error during chain execution: {e}")


## Model service Galileo Protect + Observe

In [None]:
import os
import yaml
import mlflow
import pandas as pd
from mlflow.pyfunc import PythonModel, ModelSignature
from mlflow.types.schema import Schema, ColSpec
from langchain_openai import OpenAI
from langchain.prompts import ChatPromptTemplate
from langchain.schema import StrOutputParser
from galileo_observe import GalileoObserveCallback
from galileo_protect import ProtectTool, ProtectParser, Ruleset
import galileo_protect as gp
from sentence_transformers import SentenceTransformer
from scipy.spatial.distance import cosine
from langchain_community.llms import LlamaCpp


class SemanticSplitter:
    def __init__(self, content, embedding_model, method="number", partition_count=10, quantile=0.9):
        print("Starting SemanticSplitter...")
        self.content = [line.strip() for line in content if line.strip()]  
        self.embedding_model = embedding_model
        self.partition_count = partition_count
        self.quantile = quantile

        print("Calculating embeddings for content...")
        self.embeddings = embedding_model.encode(self.content)
        self.distances = [cosine(self.embeddings[i - 1], self.embeddings[i]) for i in range(1, len(self.embeddings))]
        self.breaks = []
        self.load_breaks(method=method)
        print(f"Loaded breaks: {self.breaks}")

    def load_breaks(self, method='number'):
        print(f"Loading breaks using the method: {method}")
        if len(self.distances) == 0:
            print("No distances calculated. No breaks loaded.")
            return

        if method == 'number':
            if self.partition_count > len(self.distances):
                self.partition_count = len(self.distances)
            self.breaks = np.sort(np.argpartition(self.distances, self.partition_count - 1)[:self.partition_count - 1])
        elif method == 'quantiles':
            threshold = np.quantile(self.distances, self.quantile)
            self.breaks = [i for i, v in enumerate(self.distances) if v >= threshold]

    def get_chunk(self, beginning, end):
        return '\n'.join(self.content[beginning:end]).strip()

    def get_chunks(self):
        print("Dividing text into chunks...")
        if len(self.breaks) == 0:
            return [self.get_chunk(0, len(self.content))]

        chunks = []
        beginning = 0
        for break_position in self.breaks:
            chunk = self.get_chunk(beginning, break_position + 1)
            if chunk:
                chunks.append(chunk)
            beginning = break_position + 1

        chunk = self.get_chunk(beginning, len(self.content))
        if chunk:
            chunks.append(chunk)

        print(f"Chunks gerados: {chunks}")
        return chunks

class TextSummarizationService(PythonModel):
    def load_context(self, context):
        print("Loading model context.")

        
        secrets_path = context.artifacts["secrets"]
        with open(secrets_path, "r") as file:
            secrets = yaml.safe_load(file)
        
        os.environ["GALILEO_API_KEY"] = secrets.get("Galileo", "")
        os.environ["GALILEO_CONSOLE_URL"] = secrets.get("galileo_url", "https://console.hp.galileocloud.io/")
        
        model_source = secrets.get("source", "local_molder") #Use 'HuggingFace', or 'local_model'.

        if model_source == "HuggingFace":
            huggingface_key = secrets.get("HuggingFace", "")
            hf_model_repo = secrets.get("hf_model_repo", "mistralai/Mistral-7B-Instruct-v0.2")
            self.llm = HuggingFaceEndpoint(huggingfacehub_api_token=huggingface_key, repo_id=hf_model_repo)
            print("Using the HuggingFace model.")
        elif model_config["source"] == "local_model":
            print("[INFO] Initializing local LlamaCpp model.")
            callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])
            self.llm = LlamaCpp(
                model_path = context.artifacts["model_folder"],
                n_gpu_layers=30,
                n_batch=512,
                n_ctx=4096,
                max_tokens=1024,
                f16_kv=True,
                callback_manager=callback_manager,
                verbose=False,
                stop=[],
                streaming=False,
                temperature=0.2,
            )
            print("Using the local LlamaCpp model.")
        else:
            raise ValueError("Invalid model source. Use  'HuggingFace', or 'local_model'.")

        
        self.observe_callback = GalileoObserveCallback(project_name="summarization_service_observe")

        
        self.prompt = ChatPromptTemplate.from_template("""
        The following text is an excerpt of a transcription:

        ###
        {context}
        ###

        Please, produce a single paragraph summarizing the given excerpt.
        """)

        project = gp.create_project("summarization_service_protect")
        print(f"Project created. Project ID: {project.id}")

        stage = gp.create_stage(name="summarization_stage", project_id=project.id)
        print(f"Stage created. Stage ID: {stage.id}")

        ruleset = Ruleset(
            rules=[
                {
                    "metric": "pii",
                    "operator": "contains",
                    "target_value": "ssn",
                },
            ],
            action={
                "type": "OVERRIDE",
                "choices": ["Personal Identifiable Information detected. Sorry, I cannot provide the summary."]
            }
        )

        protect_tool = ProtectTool(stage_id=stage.id, prioritized_rulesets=[ruleset], timeout=10)
        self.protect_parser = ProtectParser(chain=self.prompt | self.llm | StrOutputParser())
        self.protected_chain = protect_tool | self.protect_parser.parser

        self.embedding_model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')

    def predict(self, context, model_input):
        text = model_input["text"].iloc[0]
        print(f"Input text:\n{text[:200]}...")  

        splitter = SemanticSplitter(text.split("\n"), self.embedding_model, method="number", partition_count=6)
        chunks = splitter.get_chunks()

        print(f"Number of chunks generated:{len(chunks)}")

        combined_input = "\n\n".join(chunks)
        print(f"Combined text for batch:\n{combined_input[:500]}...")  # Show first 500 characters

        print("Sending all chunks in batch to the chain...")
        try:
            result = self.protected_chain.invoke(
                {"input": combined_input, "output": ""},
                config={"callbacks": [self.observe_callback]}
            )
            print("Batch result processed successfully.")
        except Exception as e:
            result = f"Error processing batch:{e}"
            print(result)

        print(f"Processing completed. Total chunks processed: {len(chunks)}")

        return pd.DataFrame([{"summary": result}])


if __name__ == "__main__":
    print("Initializing experiment in MLflow.")

    mlflow.set_experiment("Text_Summarization_Service")

    secrets_path = "secrets.yaml"
    model_path = "/home/jovyan/datafabric/Llama_7b/ggml-model-f16-Q5_K_M.gguf"

    if not os.path.exists(secrets_path):
        raise FileNotFoundError(f"secrets.yaml file not found in path: {os.path.abspath(secrets_path)}")

    input_schema = Schema([ColSpec("string", "text")])
    output_schema = Schema([ColSpec("string", "summary")])
    signature = ModelSignature(inputs=input_schema, outputs=output_schema)

    with mlflow.start_run(run_name="Text_Summarization_Service_Run") as run:
        mlflow.pyfunc.log_model(
            artifact_path="text_summarization_service",
            python_model=TextSummarizationService(),
            artifacts={"secrets": secrets_path, "model_folder": model_path},
            signature=signature,
            pip_requirements=[
                "galileo-protect==0.15.1",
                "galileo-observe==1.13.2",
                "pyyaml",
                "pandas",
                "sentence-transformers"
            ]
        )

        model_uri = f"runs:/{run.info.run_id}/text_summarization_service"
        mlflow.register_model(model_uri=model_uri, name="Text_Summarization_Service_Model")

        print(f"Registered model with execution ID: {run.info.run_id}")
        print(f"Model registered successfully. Run ID: {run.info.run_id}")
