# Evaluation Of the Multi-Agent

## Documentació startups

### Create a Synthetic Evaluation Dataset

Synthetically generate a high-quality evaluation set for measuring the quality of your agent.  The method``generate_evals_df`` part of the ``databricks-agents`` Python package. (Retrival evaluation!)

The input dataframe :

- ``content``: The parsed document content as a string.
- ``doc_uri`` : The document URI.

Additional parameters:

- ``num_evals`` The total number of evaluations to generate across all of the documents. 
- ``agent_description`` A task description of the agent
- ``question_guidelines``: A set of guidelines that help guide the synthetic question generation. This is a free-form string that will be used to prompt the generation. 

The output: 
- ``request_id``: A unique request id.
- ``request``: The synthesized request.
- ``expected_facts``: A list of expected facts in the response. This column has dtype list[string].
- ``expected_retrieved_context``: The context this evaluation has been synthesized from, including the document content and the doc_uri.

In [0]:
%pip install mlflow mlflow[databricks] databricks-agents
dbutils.library.restartPython()

In [0]:
import mlflow
from databricks.agents.evals import generate_evals_df
import pandas as pd
import math

####### generate evaluation dataset
docs = spark.read.table("dts_proves_pre.startups_documentacio.documentacio_docs_chunked")
docs = docs.withColumnRenamed("content_chunked", "content")

agent_description = "Agent about promoting entrepreneurship that has information about startups."

question_guidelines = """
# User personas
- A excited entrepreneur that has a specific knowledge about one field but nothing about building a startup.
- A curious young entrepreneur. That wants to know general information about startups. More general to compare than specifics.

# Example questions
- Encara no tinc una idea de negoci. Dona’m consells per trobar una idea.
- Vull saber si tinc actituds d'emprenedor i explorar el meu perfil emprenedor. 
- D’on puc treure el finançament per crear la meva startup?

# Additional Guidelines
- Questions should be succinct, and human-like, in catalan.
"""

num_evals = 50

evals = generate_evals_df(
    docs,
    # The total number of evals to generate. The method attempts to generate evals that have full coverage over the documents
    # provided. If this number is less than the number of documents, is less than the number of documents,
    # some documents will not have any evaluations generated. See "How num_evals is used" below for more details.
    num_evals=num_evals,
    # A set of guidelines that help guide the synthetic generation. These are free-form strings that will be used to prompt the generation.
    agent_description=agent_description,
    question_guidelines=question_guidelines
)

display(evals)


Generating evaluations:   0%|          | 0/50 evals generated [Elapsed: 00:00, Remaining: ?]

request_id,request,expected_retrieved_context,expected_facts,source_type,source_id
b4378da7d11e09b83e2ded7eaa9c98c9a85d9f85c2f9be5ee9da3479632d15c0,"List(List(List(Què és el lísing o arrendament financer i per a què s’utilitza principalment?, user)))","List(List(1. Lísing o arrendament financer **Definició** Cessió dels drets d’utilització d’un bé moble o immoble per part d’una societat de lísing (empresa que es coneix com a propietària) a una altra empresa (empresa contractant), durant un període de temps pactat a canvi del pagament d’una quota d’arrendament, amb la particularitat que l’empresa contractant podrà optar per la compra del bé al final del termini pactat. **Característiques:** **■** Instrument financer adequat per obtenir finançament a mig i llarg termini per a l’adquisició de béns materials de les empreses., /Volumes/dts_proves_pre/startups_documentacio/documentacio/Glossari-dinstruments-financers_Annex-1_tcm124-103473.pdf))","List(Financial leasing involves the transfer of usage rights of a movable or immovable asset from a leasing company to a business., The lease term is for an agreed period of time in exchange for rental payments., There is an option to purchase the asset at the end of the lease term., Primarily used as a financial instrument for medium- and long-term financing to acquire physical assets for businesses.)",SYNTHETIC_FROM_DOC,/Volumes/dts_proves_pre/startups_documentacio/documentacio/Glossari-dinstruments-financers_Annex-1_tcm124-103473.pdf
fd82a81f55a9d8a2722c3ce24b850a7538ac91046b135d47c0ff3f77fae0a9d2,"List(List(List(Quins són els avantatges i inconvenients del lísing o arrendament financer per a una empresa?, user)))","List(List(**Avantatges:** **■** S'aconsegueix una amortització accelerada del bé a gust de l'empresa. **■** Permet el finançament del 100% del bé, tot i que també es pot donar a percentatges de finançament inferiors. **■** És útil per a empreses molt solvents i/o usuàries de béns tecnològics. **■** Fiscalment, permet deduir pràcticament la totalitat de les quotes d’arrendament financer meritades, amb certs límits establerts. **■** No és necessari fer un desembors inicial de recursos propis i, per tant, permet a l’empresa gaudir d’una major liquiditat. (tot i que en certes operacions pot no ser així). **■** Al final, mitjançant el pagament d'un valor residual prefixat en el contracte, es pot adquirir la propietat del bé. **Inconvenients:** **■** La durada del contracte d’arrendament financer és irrevocable. **■** En general és un producte menys flexible que un préstec., /Volumes/dts_proves_pre/startups_documentacio/documentacio/Glossari-dinstruments-financers_Annex-1_tcm124-103473.pdf))","List(leasing allows accelerated depreciation of the asset, leasing enables 100% financing of the asset, leasing is suitable for very solvent companies or technology users, significant tax deductions on leasing fees are possible within certain limits, no initial cash outlay required, providing greater liquidity, possibility to acquire ownership of the asset by paying a pre-agreed residual value at the end of the contract, the leasing contract duration is irrevocable, leasing is less flexible than a loan)",SYNTHETIC_FROM_DOC,/Volumes/dts_proves_pre/startups_documentacio/documentacio/Glossari-dinstruments-financers_Annex-1_tcm124-103473.pdf
a34deecc53dde40b2ac619926d0c5a976c0b940bb390b30e3942c1e9efaabe0c,"List(List(List(Què és el rènting i quines característiques té principalment?, user)))","List(List(2. Rènting **Definició** Contracte mercantil bilateral pel qual una de les parts, la societat de rènting, s'obliga a cedir a una altra empresa l'ús d'un bé durant un temps determinat, a canvi del pagament d'una quota periòdica d’arrendament. **Característiques:** **■** S’utilitza per llogar tot tipus de maquinària, equips informàtics, equipament d’oficines i fins i tot immobles. **■** El pagament de la renda inclou el dret a l'ús de l'equip, el manteniment i les reparacions del bé durant el termini de l’operació, així com d’altres despeses relacionades amb la seva utilització. **■** No permet, al seu venciment, l’opció d’adquisició del bé per part de l’empresa., /Volumes/dts_proves_pre/startups_documentacio/documentacio/Glossari-dinstruments-financers_Annex-1_tcm124-103473.pdf))","List(Renting is a bilateral commercial contract., One company allows another company to use an asset for a specified period., The user company pays periodic rental payments., It is used to rent machinery, computer equipment, office equipment, and real estate., The rental payment includes use of the equipment, maintenance, repairs, and other related expenses., There is no option for the user company to purchase the asset at the end of the term.)",SYNTHETIC_FROM_DOC,/Volumes/dts_proves_pre/startups_documentacio/documentacio/Glossari-dinstruments-financers_Annex-1_tcm124-103473.pdf
ae9f323a7dda683bfa94d1857f859b434affccb884f8324e1fd2d20a498885ff,"List(List(List(Quins tipus d’instruments financers es detallen en el glossari?, user)))","List(List(Sumari  1. Lísing o arrendament financer  2. Rènting  3. Facturatge  4. Confirmació  5. Forfetatge  6. Capital de risc  7. Présctec participatiu  8. Préstec  9. Crèdit, /Volumes/dts_proves_pre/startups_documentacio/documentacio/Glossari-dinstruments-financers_Annex-1_tcm124-103473.pdf))","List(The answer should include a list of financial instruments detailed in the glossary., The financial instruments listed should include: Leasing or financial lease, renting, factoring, confirmation, forfaiting, venture capital, participative loan, loan, credit.)",SYNTHETIC_FROM_DOC,/Volumes/dts_proves_pre/startups_documentacio/documentacio/Glossari-dinstruments-financers_Annex-1_tcm124-103473.pdf
5ab59b37f94bade137d34c791b56d7ce2f04e642ca66b48290a7d57ee429ded6,"List(List(List(Com va utilitzar Airbnb el Design Thinking per transformar el seu negoci?, user)))","List(List(Un dels casos d’èxit més coneguts és el de l’empresa d’allotjaments turístics Airbnb. El 2009, poc després de crear la start-up, els seus fundadors es van adonar que el model de negoci no estava funcionant. Analitzant la plataforma, van detectar un patró: la similitud entre tots els anuncis d’allotjaments oferts es trobava en les fotografies de baixa qualitat que publicaven els propietaris. Això els va posar a la pell dels clients i els va fer entendre per què no estaven llogant les habitacions. Un dels seus fundadors, Joe Gebbia, coneixia el Design Thinking i el va voler posar en pràctica. Després de detectar el problema, van viatjar a Nova York per visitar els allotjaments amb mentalitat de client, i fer-ne millors fotografies. La facturació de l’empresa va començar a augmentar poc després. Això va canviar la filosofia de la companyia, que va entendre que el codi no ho era tot i que sortir a conèixer clients reals seria, gairebé sempre, la millor manera de resoldre els seus problemes i trobar solucions intel·ligents., /Volumes/dts_proves_pre/startups_documentacio/documentacio/Design-thinking_accessible.pdf))","List(Airbnb used empathy to identify issues faced by customers., They identified low-quality photos as a significant issue., To address this, Airbnb took action by improving photos, specifically traveling to New York., The result of these actions was an improved customer experience and increased revenue., This initiative led to a shift in Airbnb's company philosophy to engage directly with customers for problem-solving.)",SYNTHETIC_FROM_DOC,/Volumes/dts_proves_pre/startups_documentacio/documentacio/Design-thinking_accessible.pdf
c7fce77d764ae207f4be0b1dfe843221dce0e9a061be1d8a71d3cebabbbb5517,"List(List(List(Quins són els beneficis del Design Thinking per a una startup?, user)))","List(List(Aquesta metodologia té diversos beneficis: - Centra l’atenció en el públic i genera empatia, ja que es basa a identificar problemes, necessitats i desitjos reals per crear solucions. La interacció amb el públic, i la seva satisfacció, són la clau de l’èxit. - Fomenta el treball en equip i permet que cada individu aporti les seves capacitats en la creació d’una idea única. - Permet crear prototips, ja que és imprescindible validar qualsevol idea abans de consolidar-la al mercat. Això contribueix a identificar i solucionar errors i t’ajudarà a garantir que el negoci ofereix una solució real. - Promou un procés lúdic. L’objectiu de la metodologia és gaudir del camí i explotar el teu potencial amb el màxim de llibertat creativa., /Volumes/dts_proves_pre/startups_documentacio/documentacio/Design-thinking_accessible.pdf))","List(Generates empathy by focusing on real audience problems and needs, Fosters teamwork to leverage individual capabilities, Creates prototypes to validate ideas and identify errors, Promotes a playful and creative process that encourages innovation)",SYNTHETIC_FROM_DOC,/Volumes/dts_proves_pre/startups_documentacio/documentacio/Design-thinking_accessible.pdf
5c0368d4accba711443c595851ad47f706b99239713779de814df31bf4e7225e,"List(List(List(Què és el Design Thinking i quin és el seu origen?, user)))","List(List(El Design Thinking s’inspira en la forma de treballar dels dissenyadors i dissenyadores de producte per entendre i donar solució a les necessitats reals del públic a través de la innovació. Tot i que el concepte el va utilitzar per primera vegada Herbert Simon, Premi Nobel d’Economia, l’any 1969, va ser durant els anys 70 que es va desenvolupar de forma teòrica a la Universitat de Standford. Un dels seus professors, David Kelley, va fundar la firma de disseny IDEO, pionera a aplicar la metodologia a l’entorn empresarial. El president executiu d’IDEO, Tim Brown, defineix el Design Thinking com “un enfocament de la innovació centrat en l’ésser humà, que es basa en el conjunt d’eines del dissenyador per integrar les necessitats de les persones, les possibilitats de la tecnologia i els requisits per a l’èxit empresarial”. És en la confluència d’aquests tres factors que s’origina la innovació i, com a conseqüència, l’emprenedoria. Per IDEO, “el Design Thinking és una manera de resoldre problemes mitjançant la creativitat (...). Adoptem una ment de principiant, amb la intenció de romandre oberts i curiosos, de no assumir res i de veure l’ambigüitat com una oportunitat”., /Volumes/dts_proves_pre/startups_documentacio/documentacio/Design-thinking_accessible.pdf))","List(Design Thinking is a human-centered approach to innovation., Design Thinking integrates the needs of people, possibilities of technology, and requirements for business success., Design Thinking originated with Herbert Simon in 1969., Design Thinking was further developed at Stanford University in the 1970s., David Kelley, founder of IDEO, made significant contributions to the development of Design Thinking.)",SYNTHETIC_FROM_DOC,/Volumes/dts_proves_pre/startups_documentacio/documentacio/Design-thinking_accessible.pdf
cd134ddeb6aa8c148670826e26ceb31b56f09ec52d0452c7700d402e9395c8e2,"List(List(List(Quins són els avantatges del finançament via fintech per a les startups?, user)))","List(List(Les _fintech_ ofereixen a les persones emprenedores i pimes diverses modalitats de finançament a través de plataformes digitals, que funcionen de manera àgil i flexible gràcies als beneficis tecnològics. Aquestes plataformes faciliten el registre i la sol·licitud del finançament necessari a les empreses i ofereixen temps d'aprovació i d’entrega de capital més ràpids i eficients. A més a més, han permès democratitzar l’accés als productes financers, fent-los arribar i entendre, de manera senzilla, a un conjunt més ampli i variat de persones usuàries., /Volumes/dts_proves_pre/startups_documentacio/documentacio/Cat-Empren_Fintech_accessible22.pdf))","List(Fintech provides agile and flexible operations., Fintech offers quicker registration and application processes., Fintech ensures faster approval and capital delivery., Fintech democratizes access to financial products, making them more accessible to startups.)",SYNTHETIC_FROM_DOC,/Volumes/dts_proves_pre/startups_documentacio/documentacio/Cat-Empren_Fintech_accessible22.pdf
9fb762d721df7e83a8a8831b755d2c7637153e00b683d688bff84da2e16b4022,"List(List(List(Què és el fintech i quin és el seu objectiu principal?, user)))","List(List(El terme _fintech_ prové de la unió de les paraules en anglès _finance_ i _technology_ i fa referència a aquelles activitats que impliquen l’ús de la innovació i la tecnologia en el desenvolupament de productes i serveis financers. Les empreses _fintech_ desgranen el negoci bancari en diverses àrees, conegudes com a “verticals”, i s’especialitzen en una d’elles per maximitzar la seva competitivitat. Ofereixen productes i serveis molt accessibles, atractius i intuïtius, dirigits a un públic nadiu digital que ha crescut en un entorn tecnològic i s’adapta amb rapidesa a aquests nous serveis. El seu objectiu és redefinir la forma d’entendre els serveis financers., /Volumes/dts_proves_pre/startups_documentacio/documentacio/Cat-Empren_Fintech_accessible22.pdf))","List(Fintech involves the use of innovation and technology., Fintech focuses on developing financial products and services., The primary objective of fintech is to redefine the understanding and delivery of financial services.)",SYNTHETIC_FROM_DOC,/Volumes/dts_proves_pre/startups_documentacio/documentacio/Cat-Empren_Fintech_accessible22.pdf
1b778cd5a3f333cfbcfd94d9dec632b951676cdf2d7260379f9f81eb35d885ce,"List(List(List(Quines són les principals tipologies de finançament a través de les tecnologies financeres i com funcionen?, user)))","List(List(Existeixen dues tipologies principals de finançament a través de les tecnologies financeres: - **Préstecs** **ràpids en línia** . Algunes _fintech_ ofereixen plataformes online des d’on particulars i empreses poden sol·licitar i rebre préstecs de petits imports, que es concedeixen de manera àgil amb un tipus d’interès estipulat. Aquestes empreses actuen amb una fitxa bancària europea, com les entitats financeres tradicionals, de manera que es regeixen per la mateixa regulació. Poden demanar uns mínims de solvència, entre d’altres indicadors. - **Plataformes de** **_crowdfunding_** **o finançament col·lectiu** . Es tracta d’un punt de trobada entre persones emprenedores amb l’objectiu de desenvolupar un projecte i persones inversores que vulguin finançar propostes del seu interès. Mitjançant les noves tecnologies, es facilita la seva interacció i l’intercanvi de recursos. Les plataformes de _crowdfunding_ tenen un període de temps establert per a la recaptació dels fons necessaris i només es fan arribar a la persona emprenedora si s’assoleix la fita marcada., /Volumes/dts_proves_pre/startups_documentacio/documentacio/Cat-Empren_Fintech_accessible22.pdf))","List(The main types of financing through financial technologies are fast online loans and crowdfunding platforms., Fast online loans allow individuals and businesses to quickly obtain small loans., These loans have regulated interest rates and possible solvency requirements., Crowdfunding platforms connect entrepreneurs with investors to finance projects., Funds on crowdfunding platforms are transferred only if the financial goal is met within a specified period.)",SYNTHETIC_FROM_DOC,/Volumes/dts_proves_pre/startups_documentacio/documentacio/Cat-Empren_Fintech_accessible22.pdf


[0;31m---------------------------------------------------------------------------[0m
[0;31mAttributeError[0m                            Traceback (most recent call last)
[0;32m~/.ipykernel/32051/command-8489890432097872-3283812887[0m in [0;36m?[0;34m()[0m
[1;32m     44[0m [0mtable_name[0m [0;34m=[0m [0;34m"evals_generated_questions_documentacio"[0m[0;34m[0m[0;34m[0m[0m
[1;32m     45[0m [0mfull_table_name[0m [0;34m=[0m [0;34mf"{catalog_name}.{schema_name}.{table_name}"[0m[0;34m[0m[0;34m[0m[0m
[1;32m     46[0m [0;34m[0m[0m
[1;32m     47[0m [0;31m# Save the DataFrame to Unity Catalog as a managed table[0m[0;34m[0m[0;34m[0m[0m
[0;32m---> 48[0;31m [0mevals[0m[0;34m.[0m[0mwrite[0m[0;34m.[0m[0mmode[0m[0;34m([0m[0;34m"overwrite"[0m[0;34m)[0m[0;34m.[0m[0msaveAsTable[0m[0;34m([0m[0mfull_table_name[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0m
[0;32m/databricks/python/lib/python3.11/site-packages/pandas/core/generic.py[0

In [0]:
eval_spark = spark.createDataFrame(evals)

In [0]:
# write generated dataset to a table
# Define your Unity Catalog target path
catalog_name = "dts_proves_pre"
schema_name = "startups_documentacio"
table_name = "evals_generated_questions_documentacio"
full_table_name = f"{catalog_name}.{schema_name}.{table_name}"

# Save the DataFrame to Unity Catalog as a managed table
eval_spark.write.mode("overwrite").saveAsTable(full_table_name)

In [0]:
evals= spark.read.table("dts_proves_pre.startups_documentacio.evals_generated_questions_documentacio")


In [0]:
evals_df = evals.toPandas()


### Llama

In [0]:
####### llm: llama

mlflow.set_experiment("/Users/aliciachimeno_ext@gencat.cat/FINAL/Agent VS+ GENIE")
# Evaluate the model using the newly generated evaluation set. After the function call completes, click the UI link to see the results. You can use this as a baseline for your agent.
results = mlflow.evaluate(
  model="endpoints:/agents_dts_proves_pre-startups_list-multi-agent-chatbot-3",
  data=evals_df[:25],
  model_type="databricks-agent"
)



Evaluating:   0%|          | 0/25 [Elapsed: 00:00, Remaining: ?]



In [0]:
####### llm: llama

mlflow.set_experiment("/Users/aliciachimeno_ext@gencat.cat/FINAL/Agent VS+ GENIE")
# Evaluate the model using the newly generated evaluation set. After the function call completes, click the UI link to see the results. You can use this as a baseline for your agent.
results = mlflow.evaluate(
  model="endpoints:/agents_dts_proves_pre-startups_list-multi-agent-chatbot-3",
  data=evals_df[26:],
  model_type="databricks-agent"
)



Evaluating:   0%|          | 0/24 [Elapsed: 00:00, Remaining: ?]



### Claude

In [0]:
mlflow.set_experiment("/Users/aliciachimeno_ext@gencat.cat/FINAL/Agent VS+ GENIE")
# Evaluate the model using the newly generated evaluation set. After the function call completes, click the UI link to see the results. You can use this as a baseline for your agent.
results = mlflow.evaluate(
  model="endpoints:/agents_dts_proves_pre-startups_list-multi-agent-chatbot-claude",
  data=evals_df[:5],
  model_type="databricks-agent"
)



Evaluating:   0%|          | 0/5 [Elapsed: 00:00, Remaining: ?]



In [0]:
mlflow.set_experiment("/Users/aliciachimeno_ext@gencat.cat/FINAL/Agent VS+ GENIE")
# Evaluate the model using the newly generated evaluation set. After the function call completes, click the UI link to see the results. You can use this as a baseline for your agent.
results = mlflow.evaluate(
  model="endpoints:/agents_dts_proves_pre-startups_list-multi-agent-chatbot-claude",
  data=evals_df[5:10],
  model_type="databricks-agent"
)


Evaluating:   0%|          | 0/5 [Elapsed: 00:00, Remaining: ?]



In [0]:
mlflow.set_experiment("/Users/aliciachimeno_ext@gencat.cat/FINAL/Agent VS+ GENIE")
# Evaluate the model using the newly generated evaluation set. After the function call completes, click the UI link to see the results. You can use this as a baseline for your agent.
results = mlflow.evaluate(
  model="endpoints:/agents_dts_proves_pre-startups_list-multi-agent-chatbot-claude",
  data=evals_df[10:15],
  model_type="databricks-agent"
)



Evaluating:   0%|          | 0/5 [Elapsed: 00:00, Remaining: ?]



In [0]:
mlflow.set_experiment("/Users/aliciachimeno_ext@gencat.cat/FINAL/Agent VS+ GENIE")
# Evaluate the model using the newly generated evaluation set. After the function call completes, click the UI link to see the results. You can use this as a baseline for your agent.
results = mlflow.evaluate(
  model="endpoints:/agents_dts_proves_pre-startups_list-multi-agent-chatbot-claude",
  data=evals_df[15:20],
  model_type="databricks-agent"
)



In [0]:
mlflow.set_experiment("/Users/aliciachimeno_ext@gencat.cat/FINAL/Agent VS+ GENIE")
# Evaluate the model using the newly generated evaluation set. After the function call completes, click the UI link to see the results. You can use this as a baseline for your agent.
results = mlflow.evaluate(
  model="endpoints:/agents_dts_proves_pre-startups_list-multi-agent-chatbot-claude",
  data=evals_df[20:25],
  model_type="databricks-agent"
)



### Create a manual evaluation dataset

In [0]:
## startups catalog
import pandas as pd
import mlflow

eval_set_documnetation = [
    {"request": {"messages": [{"role": "user", "content": "Encara no tinc una idea de negoci. Dona’m consells per trobar una idea. "}]}},
    {"request": {"messages": [{"role": "user", "content": "Vull saber si tinc actituds d'emprenedor i explorar el meu perfil emprenedor. "}]}},
    {"request": {"messages": [{"role": "user", "content": "La meva idea és viable? com avaluar una idea de negoci"}]}},
    {"request": {"messages": [{"role": "user", "content": " Quines opcions jurídiques tinc per crear una Startup?"}]}},
    {"request": {"messages": [{"role": "user", "content": "D’on puc treure el finançament per crear la meva startup? "}]}},
    {"request": {"messages": [{"role": "user", "content": "Sóc estudiant d’enginyeria informàtica i tinc una idea d’aplicació. Com la puc convertir en startup?"}]}},
    {"request":{"messages":[{"role":"user","content":"Soc enginyer/a i tinc una idea per un producte tecnològic innovador. Quins passos concrets hauria de seguir per validar si la meva idea tindrà èxit al mercat abans de fer una gran inversió?"}]}},
    {"request":{"messages":[{"role":"user","content":"Vull obrir un negoci local, com una botiga o un servei de proximitat. Quina forma jurídica em recomaneu si soc només jo per començar i vull que sigui senzill?"}]}},
    {"request":{"messages":[{"role":"user","content":"Tinc poca experiència prèvia en la gestió d'empreses. Quines habilitats personals són crucials per a un emprenedor d'avui dia i com puc desenvolupar-les?"}]}},
    {"request":{"messages":[{"role":"user","content":"La meva idea requereix un capital inicial considerable. Quines són les fonts de finançament alternatives als bancs que podria explorar?"}]}},
    {"request":{"messages":[{"role":"user","content":"Estic preparant el pla economicofinancer per al meu projecte, però no estic segur/a de si ho faig bé. Quins són els errors més habituals que he d'evitar en l'elaboració d'aquest pla?"}]}},
    {"request":{"messages":[{"role":"user","content":"He sentit parlar del Design Thinking i el Lean Startup. Com puc aplicar aquestes metodologies per dissenyar i validar la meva idea de negoci de manera més eficient i amb menys risc?"}]}},
    {"request":{"messages":[{"role":"user","content":"Em fa molta por que el meu negoci fracassi. Com puc estar preparat/da psicològicament i estratègicament per a aquesta possibilitat i aprendre dels errors?"}]}},
    {"request":{"messages":[{"role":"user","content":"La meva idea no és totalment nova, ja existeix competència. Com puc assegurar-me que la meva proposta de valor es diferencia i capta l'atenció dels clients?"}]}},
]

eval_dataset = pd.DataFrame(eval_set_documnetation)
display(eval_dataset)


request
"List(List(List(Encara no tinc una idea de negoci. Dona’m consells per trobar una idea. , user)))"
"List(List(List(Vull saber si tinc actituds d'emprenedor i explorar el meu perfil emprenedor. , user)))"
"List(List(List(La meva idea és viable? com avaluar una idea de negoci, user)))"
"List(List(List( Quines opcions jurídiques tinc per crear una Startup?, user)))"
"List(List(List(D’on puc treure el finançament per crear la meva startup? , user)))"
"List(List(List(Sóc estudiant d’enginyeria informàtica i tinc una idea d’aplicació. Com la puc convertir en startup?, user)))"
"List(List(List(Soc enginyer/a i tinc una idea per un producte tecnològic innovador. Quins passos concrets hauria de seguir per validar si la meva idea tindrà èxit al mercat abans de fer una gran inversió?, user)))"
"List(List(List(Vull obrir un negoci local, com una botiga o un servei de proximitat. Quina forma jurídica em recomaneu si soc només jo per començar i vull que sigui senzill?, user)))"
"List(List(List(Tinc poca experiència prèvia en la gestió d'empreses. Quines habilitats personals són crucials per a un emprenedor d'avui dia i com puc desenvolupar-les?, user)))"
"List(List(List(La meva idea requereix un capital inicial considerable. Quines són les fonts de finançament alternatives als bancs que podria explorar?, user)))"


### Llama

In [0]:
## llm = llama
with mlflow.start_run(run_name="agent-documentacio-llama-preguntes-personalitzades-1"):
    eval_results = mlflow.evaluate(
        "endpoints:/agents_dts_proves_pre-startups_list-multi-agent-chatbot-3",  # Aquí pones tu endpoint
        data=eval_set_documnetation[:5],  # Tu dataset de evaluación
        model_type="databricks-agent",  # Para evaluación de Mosaic AI Agents
    )

display(eval_results.tables['eval_results'])

Evaluating:   0%|          | 0/5 [Elapsed: 00:00, Remaining: ?]

request_id,request,response,retrieved_context,trace,tool_calls,response/overall_assessment/rating,response/llm_judged/safety/rating,response/llm_judged/safety/rationale,retrieval/llm_judged/chunk_relevance/ratings,retrieval/llm_judged/chunk_relevance/rationales,response/llm_judged/relevance_to_query/rating,response/llm_judged/relevance_to_query/rationale,response/llm_judged/groundedness/rating,response/llm_judged/groundedness/rationale,agent/latency_seconds,retrieval/llm_judged/chunk_relevance/precision,response/overall_assessment/rationale


In [0]:
with mlflow.start_run(run_name="agent-documentacio-llama-preguntes-personalitzades-2"):
    eval_results = mlflow.evaluate(
        "endpoints:/agents_dts_proves_pre-startups_list-multi-agent-chatbot-3",  # Aquí pones tu endpoint
        data=eval_set_documnetation[5:10],  # Tu dataset de evaluación
        model_type="databricks-agent",  # Para evaluación de Mosaic AI Agents
    )

display(eval_results.tables['eval_results'])

Evaluating:   0%|          | 0/5 [Elapsed: 00:00, Remaining: ?]

request_id,request,response,retrieved_context,trace,tool_calls,response/overall_assessment/rating,response/llm_judged/safety/rating,response/llm_judged/safety/rationale,response/llm_judged/relevance_to_query/rating,response/llm_judged/relevance_to_query/rationale,retrieval/llm_judged/chunk_relevance/ratings,retrieval/llm_judged/chunk_relevance/rationales,response/llm_judged/groundedness/rating,response/llm_judged/groundedness/rationale,agent/latency_seconds,retrieval/llm_judged/chunk_relevance/precision,response/overall_assessment/rationale


In [0]:
with mlflow.start_run(run_name="agent-documentacio-llama-preguntes-personalitzades-3"):
    eval_results = mlflow.evaluate(
        "endpoints:/agents_dts_proves_pre-startups_list-multi-agent-chatbot-3",  # Aquí pones tu endpoint
        data=eval_set_documnetation[10:],  # Tu dataset de evaluación
        model_type="databricks-agent",  # Para evaluación de Mosaic AI Agents
    )

display(eval_results.tables['eval_results'])

Evaluating:   0%|          | 0/4 [Elapsed: 00:00, Remaining: ?]

request_id,request,response,retrieved_context,trace,tool_calls,response/overall_assessment/rating,response/overall_assessment/rationale,response/llm_judged/safety/rating,response/llm_judged/safety/rationale,response/llm_judged/relevance_to_query/rating,response/llm_judged/relevance_to_query/rationale,retrieval/llm_judged/chunk_relevance/ratings,retrieval/llm_judged/chunk_relevance/rationales,response/llm_judged/groundedness/rating,response/llm_judged/groundedness/rationale,agent/latency_seconds,retrieval/llm_judged/chunk_relevance/precision


### Claude

In [0]:
## llm: claude

with mlflow.start_run(run_name="agent-documentacio-claude-preguntes-personalitzades-1"):
    eval_results = mlflow.evaluate(
        "endpoints:/agents_dts_proves_pre-startups_list-multi-agent-chatbot-claude",  # Aquí pones tu endpoint
        data=eval_set_documnetation[:5],  # Tu dataset de evaluación
        model_type="databricks-agent",  # Para evaluación de Mosaic AI Agents
    )

display(eval_results.tables['eval_results'])

Evaluating:   0%|          | 0/5 [Elapsed: 00:00, Remaining: ?]



request_id,request,trace,tool_calls,model_error_message
7b7fb5c9b27fe3066b9a6111a3eda8da16ccc6f52cfc9bf0dfb005c5b2e75f11,"List(List(List(Encara no tinc una idea de negoci. Dona’m consells per trobar una idea. , user)))","{""info"": {""request_id"": ""tr-40a6305c41ea4dc0bd11363872f01c25"", ""experiment_id"": ""3845882404188464"", ""timestamp_ms"": 1747391116621, ""execution_time_ms"": 0, ""status"": ""ERROR"", ""request_metadata"": {""mlflow.sourceRun"": ""e08fca067d984d29879a949d85599795"", ""mlflow.traceInputs"": ""{\""messages\"": [{\""role\"": \""user\"", \""content\"": \""Encara no tinc una idea de negoci. Dona\\u2019m consells per trobar una idea. \""}], \""databricks_options\"": {\""return_trace\"": true}}"", ""mlflow.traceOutputs"": ""Fail to invoke the model with {'messages': [{'role': 'user', 'content': 'Encara no tinc una idea de negoci. Dona\u2019m consells per trobar una idea. '}], 'databricks_options': {'return_trace': True}}. MlflowException('Failed to call the deployment endpoi""}, ""tags"": {""mlflow.user"": ""aliciachimeno_ext@gencat.cat"", ""mlflow.artifactLocation"": ""dbfs:/databricks/mlflow-tracking/3845882404188464/tr-40a6305c41ea4dc0bd11363872f01c25/artifacts""}, ""assessments"": []}, ""data"": {""spans"": [{""name"": ""root_span"", ""context"": {""span_id"": ""decdcc7428084cbd"", ""trace_id"": ""8605ebdced864111a5bb548d730e7c27""}, ""parent_id"": null, ""start_time"": null, ""end_time"": null, ""status_code"": ""OK"", ""status_message"": """", ""attributes"": {""mlflow.traceRequestId"": ""\""8605ebdced864111a5bb548d730e7c27\"""", ""mlflow.spanInputs"": ""{\""messages\"": [{\""role\"": \""user\"", \""content\"": \""Encara no tinc una idea de negoci. Dona\\u2019m consells per trobar una idea. \""}], \""databricks_options\"": {\""return_trace\"": true}}"", ""mlflow.spanOutputs"": ""\""Fail to invoke the model with {'messages': [{'role': 'user', 'content': 'Encara no tinc una idea de negoci. Dona\\u2019m consells per trobar una idea. '}], 'databricks_options': {'return_trace': True}}. MlflowException('Failed to call the deployment endpoint. Please check the deployment URL is set correctly and the input payload is valid.\\\\n\\\\n- Error: 400 Client Error: Encountered an unexpected error while parsing the input data. Error \\\\'Restarting model process. This is likely caused by a request timeout or OOM.\\\\' for url: https://adb-2869758279805397.17.azuredatabricks.net/serving-endpoints/agents_dts_proves_pre-startups_list-multi-agent-chatbot-claude/invocations. Response text: {\\\""error_code\\\"": \\\""BAD_REQUEST\\\"", \\\""message\\\"": \\\""Encountered an unexpected error while parsing the input data. Error \\\\'Restarting model process. This is likely caused by a request timeout or OOM.\\\\'\\\""}\\\\n\\\\n- Deployment URI: agents_dts_proves_pre-startups_list-multi-agent-chatbot-claude\\\\n\\\\n- Input payload: {\\\\'messages\\\\': [{\\\\'role\\\\': \\\\'user\\\\', \\\\'content\\\\': \\\\'Encara no tinc una idea de negoci. Dona\\u2019m consells per trobar una idea. \\\\'}], \\\\'databricks_options\\\\': {\\\\'return_trace\\\\': True}}')\""""}, ""events"": []}], ""request"": ""{\""messages\"": [{\""role\"": \""user\"", \""content\"": \""Encara no tinc una idea de negoci. Dona\\u2019m consells per trobar una idea. \""}], \""databricks_options\"": {\""return_trace\"": true}}"", ""response"": ""Fail to invoke the model with {'messages': [{'role': 'user', 'content': 'Encara no tinc una idea de negoci. Dona\u2019m consells per trobar una idea. '}], 'databricks_options': {'return_trace': True}}. MlflowException('Failed to call the deployment endpoint. Please check the deployment URL is set correctly and the input payload is valid.\\n\\n- Error: 400 Client Error: Encountered an unexpected error while parsing the input data. Error \\'Restarting model process. This is likely caused by a request timeout or OOM.\\' for url: https://adb-2869758279805397.17.azuredatabricks.net/serving-endpoints/agents_dts_proves_pre-startups_list-multi-agent-chatbot-claude/invocations. Response text: {\""error_code\"": \""BAD_REQUEST\"", \""message\"": \""Encountered an unexpected error while parsing the input data. Error \\'Restarting model process. This is likely caused by a request timeout or OOM.\\'\""}\\n\\n- Deployment URI: agents_dts_proves_pre-startups_list-multi-agent-chatbot-claude\\n\\n- Input payload: {\\'messages\\': [{\\'role\\': \\'user\\', \\'content\\': \\'Encara no tinc una idea de negoci. Dona\u2019m consells per trobar una idea. \\'}], \\'databricks_options\\': {\\'return_trace\\': True}}')""}}",List(),"Fail to invoke the model with {'messages': [{'role': 'user', 'content': 'Encara no tinc una idea de negoci. Dona’m consells per trobar una idea. '}], 'databricks_options': {'return_trace': True}}. MlflowException('Failed to call the deployment endpoint. Please check the deployment URL is set correctly and the input payload is valid.\n\n- Error: 400 Client Error: Encountered an unexpected error while parsing the input data. Error \'Restarting model process. This is likely caused by a request timeout or OOM.\' for url: https://adb-2869758279805397.17.azuredatabricks.net/serving-endpoints/agents_dts_proves_pre-startups_list-multi-agent-chatbot-claude/invocations. Response text: {""error_code"": ""BAD_REQUEST"", ""message"": ""Encountered an unexpected error while parsing the input data. Error \'Restarting model process. This is likely caused by a request timeout or OOM.\'""}\n\n- Deployment URI: agents_dts_proves_pre-startups_list-multi-agent-chatbot-claude\n\n- Input payload: {\'messages\': [{\'role\': \'user\', \'content\': \'Encara no tinc una idea de negoci. Dona’m consells per trobar una idea. \'}], \'databricks_options\': {\'return_trace\': True}}')"
d0fb9c1d0ff82683c8c433abd3e91771981accce4401778b74b4de32815e54e2,"List(List(List(La meva idea és viable? com avaluar una idea de negoci, user)))","{""info"": {""request_id"": ""tr-aa9a31f1ddd742c1a46a8749dcd2be72"", ""experiment_id"": ""3845882404188464"", ""timestamp_ms"": 1747391210376, ""execution_time_ms"": 0, ""status"": ""ERROR"", ""request_metadata"": {""mlflow.sourceRun"": ""e08fca067d984d29879a949d85599795"", ""mlflow.traceInputs"": ""{\""messages\"": [{\""role\"": \""user\"", \""content\"": \""La meva idea \\u00e9s viable? com avaluar una idea de negoci\""}], \""databricks_options\"": {\""return_trace\"": true}}"", ""mlflow.traceOutputs"": ""Fail to invoke the model with {'messages': [{'role': 'user', 'content': 'La meva idea \u00e9s viable? com avaluar una idea de negoci'}], 'databricks_options': {'return_trace': True}}. MlflowException('Failed to call the deployment endpoint. Please check t""}, ""tags"": {""mlflow.user"": ""aliciachimeno_ext@gencat.cat"", ""mlflow.artifactLocation"": ""dbfs:/databricks/mlflow-tracking/3845882404188464/tr-aa9a31f1ddd742c1a46a8749dcd2be72/artifacts""}, ""assessments"": []}, ""data"": {""spans"": [{""name"": ""root_span"", ""context"": {""span_id"": ""a5b06c0c133a4ff7"", ""trace_id"": ""6fb226896d7947f9800d4fff3c2be404""}, ""parent_id"": null, ""start_time"": null, ""end_time"": null, ""status_code"": ""OK"", ""status_message"": """", ""attributes"": {""mlflow.traceRequestId"": ""\""6fb226896d7947f9800d4fff3c2be404\"""", ""mlflow.spanInputs"": ""{\""messages\"": [{\""role\"": \""user\"", \""content\"": \""La meva idea \\u00e9s viable? com avaluar una idea de negoci\""}], \""databricks_options\"": {\""return_trace\"": true}}"", ""mlflow.spanOutputs"": ""\""Fail to invoke the model with {'messages': [{'role': 'user', 'content': 'La meva idea \\u00e9s viable? com avaluar una idea de negoci'}], 'databricks_options': {'return_trace': True}}. MlflowException('Failed to call the deployment endpoint. Please check the deployment URL is set correctly and the input payload is valid.\\\\n\\\\n- Error: 400 Client Error: Encountered an unexpected error while parsing the input data. Error \\\\'Restarting model process. This is likely caused by a request timeout or OOM.\\\\' for url: https://adb-2869758279805397.17.azuredatabricks.net/serving-endpoints/agents_dts_proves_pre-startups_list-multi-agent-chatbot-claude/invocations. Response text: {\\\""error_code\\\"": \\\""BAD_REQUEST\\\"", \\\""message\\\"": \\\""Encountered an unexpected error while parsing the input data. Error \\\\'Restarting model process. This is likely caused by a request timeout or OOM.\\\\'\\\""}\\\\n\\\\n- Deployment URI: agents_dts_proves_pre-startups_list-multi-agent-chatbot-claude\\\\n\\\\n- Input payload: {\\\\'messages\\\\': [{\\\\'role\\\\': \\\\'user\\\\', \\\\'content\\\\': \\\\'La meva idea \\u00e9s viable? com avaluar una idea de negoci\\\\'}], \\\\'databricks_options\\\\': {\\\\'return_trace\\\\': True}}')\""""}, ""events"": []}], ""request"": ""{\""messages\"": [{\""role\"": \""user\"", \""content\"": \""La meva idea \\u00e9s viable? com avaluar una idea de negoci\""}], \""databricks_options\"": {\""return_trace\"": true}}"", ""response"": ""Fail to invoke the model with {'messages': [{'role': 'user', 'content': 'La meva idea \u00e9s viable? com avaluar una idea de negoci'}], 'databricks_options': {'return_trace': True}}. MlflowException('Failed to call the deployment endpoint. Please check the deployment URL is set correctly and the input payload is valid.\\n\\n- Error: 400 Client Error: Encountered an unexpected error while parsing the input data. Error \\'Restarting model process. This is likely caused by a request timeout or OOM.\\' for url: https://adb-2869758279805397.17.azuredatabricks.net/serving-endpoints/agents_dts_proves_pre-startups_list-multi-agent-chatbot-claude/invocations. Response text: {\""error_code\"": \""BAD_REQUEST\"", \""message\"": \""Encountered an unexpected error while parsing the input data. Error \\'Restarting model process. This is likely caused by a request timeout or OOM.\\'\""}\\n\\n- Deployment URI: agents_dts_proves_pre-startups_list-multi-agent-chatbot-claude\\n\\n- Input payload: {\\'messages\\': [{\\'role\\': \\'user\\', \\'content\\': \\'La meva idea \u00e9s viable? com avaluar una idea de negoci\\'}], \\'databricks_options\\': {\\'return_trace\\': True}}')""}}",List(),"Fail to invoke the model with {'messages': [{'role': 'user', 'content': 'La meva idea és viable? com avaluar una idea de negoci'}], 'databricks_options': {'return_trace': True}}. MlflowException('Failed to call the deployment endpoint. Please check the deployment URL is set correctly and the input payload is valid.\n\n- Error: 400 Client Error: Encountered an unexpected error while parsing the input data. Error \'Restarting model process. This is likely caused by a request timeout or OOM.\' for url: https://adb-2869758279805397.17.azuredatabricks.net/serving-endpoints/agents_dts_proves_pre-startups_list-multi-agent-chatbot-claude/invocations. Response text: {""error_code"": ""BAD_REQUEST"", ""message"": ""Encountered an unexpected error while parsing the input data. Error \'Restarting model process. This is likely caused by a request timeout or OOM.\'""}\n\n- Deployment URI: agents_dts_proves_pre-startups_list-multi-agent-chatbot-claude\n\n- Input payload: {\'messages\': [{\'role\': \'user\', \'content\': \'La meva idea és viable? com avaluar una idea de negoci\'}], \'databricks_options\': {\'return_trace\': True}}')"
facefe488ec57b335d2bb10a106e2096cdb15d2e82685d6cc06b882672a1ed6a,"List(List(List(D’on puc treure el finançament per crear la meva startup? , user)))","{""info"": {""request_id"": ""tr-4d90db0b4552487586d90c7c1b8417f5"", ""experiment_id"": ""3845882404188464"", ""timestamp_ms"": 1747391210369, ""execution_time_ms"": 0, ""status"": ""ERROR"", ""request_metadata"": {""mlflow.sourceRun"": ""e08fca067d984d29879a949d85599795"", ""mlflow.traceInputs"": ""{\""messages\"": [{\""role\"": \""user\"", \""content\"": \""D\\u2019on puc treure el finan\\u00e7ament per crear la meva startup? \""}], \""databricks_options\"": {\""return_trace\"": true}}"", ""mlflow.traceOutputs"": ""Fail to invoke the model with {'messages': [{'role': 'user', 'content': 'D\u2019on puc treure el finan\u00e7ament per crear la meva startup? '}], 'databricks_options': {'return_trace': True}}. MlflowException('Failed to call the deployment endpoint. Please che""}, ""tags"": {""mlflow.user"": ""aliciachimeno_ext@gencat.cat"", ""mlflow.artifactLocation"": ""dbfs:/databricks/mlflow-tracking/3845882404188464/tr-4d90db0b4552487586d90c7c1b8417f5/artifacts""}, ""assessments"": []}, ""data"": {""spans"": [{""name"": ""root_span"", ""context"": {""span_id"": ""9d9d73d3fa9a4c78"", ""trace_id"": ""2437f6ce47124fe6bc44f7463e056a3d""}, ""parent_id"": null, ""start_time"": null, ""end_time"": null, ""status_code"": ""OK"", ""status_message"": """", ""attributes"": {""mlflow.traceRequestId"": ""\""2437f6ce47124fe6bc44f7463e056a3d\"""", ""mlflow.spanInputs"": ""{\""messages\"": [{\""role\"": \""user\"", \""content\"": \""D\\u2019on puc treure el finan\\u00e7ament per crear la meva startup? \""}], \""databricks_options\"": {\""return_trace\"": true}}"", ""mlflow.spanOutputs"": ""\""Fail to invoke the model with {'messages': [{'role': 'user', 'content': 'D\\u2019on puc treure el finan\\u00e7ament per crear la meva startup? '}], 'databricks_options': {'return_trace': True}}. MlflowException('Failed to call the deployment endpoint. Please check the deployment URL is set correctly and the input payload is valid.\\\\n\\\\n- Error: 400 Client Error: Encountered an unexpected error while parsing the input data. Error \\\\'Restarting model process. This is likely caused by a request timeout or OOM.\\\\' for url: https://adb-2869758279805397.17.azuredatabricks.net/serving-endpoints/agents_dts_proves_pre-startups_list-multi-agent-chatbot-claude/invocations. Response text: {\\\""error_code\\\"": \\\""BAD_REQUEST\\\"", \\\""message\\\"": \\\""Encountered an unexpected error while parsing the input data. Error \\\\'Restarting model process. This is likely caused by a request timeout or OOM.\\\\'\\\""}\\\\n\\\\n- Deployment URI: agents_dts_proves_pre-startups_list-multi-agent-chatbot-claude\\\\n\\\\n- Input payload: {\\\\'messages\\\\': [{\\\\'role\\\\': \\\\'user\\\\', \\\\'content\\\\': \\\\'D\\u2019on puc treure el finan\\u00e7ament per crear la meva startup? \\\\'}], \\\\'databricks_options\\\\': {\\\\'return_trace\\\\': True}}')\""""}, ""events"": []}], ""request"": ""{\""messages\"": [{\""role\"": \""user\"", \""content\"": \""D\\u2019on puc treure el finan\\u00e7ament per crear la meva startup? \""}], \""databricks_options\"": {\""return_trace\"": true}}"", ""response"": ""Fail to invoke the model with {'messages': [{'role': 'user', 'content': 'D\u2019on puc treure el finan\u00e7ament per crear la meva startup? '}], 'databricks_options': {'return_trace': True}}. MlflowException('Failed to call the deployment endpoint. Please check the deployment URL is set correctly and the input payload is valid.\\n\\n- Error: 400 Client Error: Encountered an unexpected error while parsing the input data. Error \\'Restarting model process. This is likely caused by a request timeout or OOM.\\' for url: https://adb-2869758279805397.17.azuredatabricks.net/serving-endpoints/agents_dts_proves_pre-startups_list-multi-agent-chatbot-claude/invocations. Response text: {\""error_code\"": \""BAD_REQUEST\"", \""message\"": \""Encountered an unexpected error while parsing the input data. Error \\'Restarting model process. This is likely caused by a request timeout or OOM.\\'\""}\\n\\n- Deployment URI: agents_dts_proves_pre-startups_list-multi-agent-chatbot-claude\\n\\n- Input payload: {\\'messages\\': [{\\'role\\': \\'user\\', \\'content\\': \\'D\u2019on puc treure el finan\u00e7ament per crear la meva startup? \\'}], \\'databricks_options\\': {\\'return_trace\\': True}}')""}}",List(),"Fail to invoke the model with {'messages': [{'role': 'user', 'content': 'D’on puc treure el finançament per crear la meva startup? '}], 'databricks_options': {'return_trace': True}}. MlflowException('Failed to call the deployment endpoint. Please check the deployment URL is set correctly and the input payload is valid.\n\n- Error: 400 Client Error: Encountered an unexpected error while parsing the input data. Error \'Restarting model process. This is likely caused by a request timeout or OOM.\' for url: https://adb-2869758279805397.17.azuredatabricks.net/serving-endpoints/agents_dts_proves_pre-startups_list-multi-agent-chatbot-claude/invocations. Response text: {""error_code"": ""BAD_REQUEST"", ""message"": ""Encountered an unexpected error while parsing the input data. Error \'Restarting model process. This is likely caused by a request timeout or OOM.\'""}\n\n- Deployment URI: agents_dts_proves_pre-startups_list-multi-agent-chatbot-claude\n\n- Input payload: {\'messages\': [{\'role\': \'user\', \'content\': \'D’on puc treure el finançament per crear la meva startup? \'}], \'databricks_options\': {\'return_trace\': True}}')"
d43359eb13d400382b59e7dde8b6e9b1de820265267a86d11d92a67cdcd5be54,"List(List(List( Quines opcions jurídiques tinc per crear una Startup?, user)))","{""info"": {""request_id"": ""tr-108fe71a347e415bbcc3f0a990b22315"", ""experiment_id"": ""3845882404188464"", ""timestamp_ms"": 1747391332813, ""execution_time_ms"": 0, ""status"": ""ERROR"", ""request_metadata"": {""mlflow.sourceRun"": ""e08fca067d984d29879a949d85599795"", ""mlflow.traceInputs"": ""{\""messages\"": [{\""role\"": \""user\"", \""content\"": \"" Quines opcions jur\\u00eddiques tinc per crear una Startup?\""}], \""databricks_options\"": {\""return_trace\"": true}}"", ""mlflow.traceOutputs"": ""Fail to invoke the model with {'messages': [{'role': 'user', 'content': ' Quines opcions jur\u00eddiques tinc per crear una Startup?'}], 'databricks_options': {'return_trace': True}}. MlflowException('Failed to call the deployment endpoint. Please check t""}, ""tags"": {""mlflow.user"": ""aliciachimeno_ext@gencat.cat"", ""mlflow.artifactLocation"": ""dbfs:/databricks/mlflow-tracking/3845882404188464/tr-108fe71a347e415bbcc3f0a990b22315/artifacts""}, ""assessments"": []}, ""data"": {""spans"": [{""name"": ""root_span"", ""context"": {""span_id"": ""510765925bba4431"", ""trace_id"": ""7e5859fd3bb243aa95e7e5ea94fe5a93""}, ""parent_id"": null, ""start_time"": null, ""end_time"": null, ""status_code"": ""OK"", ""status_message"": """", ""attributes"": {""mlflow.traceRequestId"": ""\""7e5859fd3bb243aa95e7e5ea94fe5a93\"""", ""mlflow.spanInputs"": ""{\""messages\"": [{\""role\"": \""user\"", \""content\"": \"" Quines opcions jur\\u00eddiques tinc per crear una Startup?\""}], \""databricks_options\"": {\""return_trace\"": true}}"", ""mlflow.spanOutputs"": ""\""Fail to invoke the model with {'messages': [{'role': 'user', 'content': ' Quines opcions jur\\u00eddiques tinc per crear una Startup?'}], 'databricks_options': {'return_trace': True}}. MlflowException('Failed to call the deployment endpoint. Please check the deployment URL is set correctly and the input payload is valid.\\\\n\\\\n- Error: 400 Client Error: Encountered an unexpected error while parsing the input data. Error \\\\'Restarting model process. This is likely caused by a request timeout or OOM.\\\\' for url: https://adb-2869758279805397.17.azuredatabricks.net/serving-endpoints/agents_dts_proves_pre-startups_list-multi-agent-chatbot-claude/invocations. Response text: {\\\""error_code\\\"": \\\""BAD_REQUEST\\\"", \\\""message\\\"": \\\""Encountered an unexpected error while parsing the input data. Error \\\\'Restarting model process. This is likely caused by a request timeout or OOM.\\\\'\\\""}\\\\n\\\\n- Deployment URI: agents_dts_proves_pre-startups_list-multi-agent-chatbot-claude\\\\n\\\\n- Input payload: {\\\\'messages\\\\': [{\\\\'role\\\\': \\\\'user\\\\', \\\\'content\\\\': \\\\' Quines opcions jur\\u00eddiques tinc per crear una Startup?\\\\'}], \\\\'databricks_options\\\\': {\\\\'return_trace\\\\': True}}')\""""}, ""events"": []}], ""request"": ""{\""messages\"": [{\""role\"": \""user\"", \""content\"": \"" Quines opcions jur\\u00eddiques tinc per crear una Startup?\""}], \""databricks_options\"": {\""return_trace\"": true}}"", ""response"": ""Fail to invoke the model with {'messages': [{'role': 'user', 'content': ' Quines opcions jur\u00eddiques tinc per crear una Startup?'}], 'databricks_options': {'return_trace': True}}. MlflowException('Failed to call the deployment endpoint. Please check the deployment URL is set correctly and the input payload is valid.\\n\\n- Error: 400 Client Error: Encountered an unexpected error while parsing the input data. Error \\'Restarting model process. This is likely caused by a request timeout or OOM.\\' for url: https://adb-2869758279805397.17.azuredatabricks.net/serving-endpoints/agents_dts_proves_pre-startups_list-multi-agent-chatbot-claude/invocations. Response text: {\""error_code\"": \""BAD_REQUEST\"", \""message\"": \""Encountered an unexpected error while parsing the input data. Error \\'Restarting model process. This is likely caused by a request timeout or OOM.\\'\""}\\n\\n- Deployment URI: agents_dts_proves_pre-startups_list-multi-agent-chatbot-claude\\n\\n- Input payload: {\\'messages\\': [{\\'role\\': \\'user\\', \\'content\\': \\' Quines opcions jur\u00eddiques tinc per crear una Startup?\\'}], \\'databricks_options\\': {\\'return_trace\\': True}}')""}}",List(),"Fail to invoke the model with {'messages': [{'role': 'user', 'content': ' Quines opcions jurídiques tinc per crear una Startup?'}], 'databricks_options': {'return_trace': True}}. MlflowException('Failed to call the deployment endpoint. Please check the deployment URL is set correctly and the input payload is valid.\n\n- Error: 400 Client Error: Encountered an unexpected error while parsing the input data. Error \'Restarting model process. This is likely caused by a request timeout or OOM.\' for url: https://adb-2869758279805397.17.azuredatabricks.net/serving-endpoints/agents_dts_proves_pre-startups_list-multi-agent-chatbot-claude/invocations. Response text: {""error_code"": ""BAD_REQUEST"", ""message"": ""Encountered an unexpected error while parsing the input data. Error \'Restarting model process. This is likely caused by a request timeout or OOM.\'""}\n\n- Deployment URI: agents_dts_proves_pre-startups_list-multi-agent-chatbot-claude\n\n- Input payload: {\'messages\': [{\'role\': \'user\', \'content\': \' Quines opcions jurídiques tinc per crear una Startup?\'}], \'databricks_options\': {\'return_trace\': True}}')"
c16560630777750babec33ef58e50457c3e1541eea7eb9bf801ad52d5d3f1225,"List(List(List(Vull saber si tinc actituds d'emprenedor i explorar el meu perfil emprenedor. , user)))","{""info"": {""request_id"": ""tr-1bb3971fad054821b8df4184a982c92e"", ""experiment_id"": ""3845882404188464"", ""timestamp_ms"": 1747391335338, ""execution_time_ms"": 0, ""status"": ""ERROR"", ""request_metadata"": {""mlflow.sourceRun"": ""e08fca067d984d29879a949d85599795"", ""mlflow.traceInputs"": ""{\""messages\"": [{\""role\"": \""user\"", \""content\"": \""Vull saber si tinc actituds d'emprenedor i explorar el meu perfil emprenedor. \""}], \""databricks_options\"": {\""return_trace\"": true}}"", ""mlflow.traceOutputs"": ""Fail to invoke the model with {'messages': [{'role': 'user', 'content': \""Vull saber si tinc actituds d'emprenedor i explorar el meu perfil emprenedor. \""}], 'databricks_options': {'return_trace': True}}. MlflowException('Failed to call the deployment ""}, ""tags"": {""mlflow.user"": ""aliciachimeno_ext@gencat.cat"", ""mlflow.artifactLocation"": ""dbfs:/databricks/mlflow-tracking/3845882404188464/tr-1bb3971fad054821b8df4184a982c92e/artifacts""}, ""assessments"": []}, ""data"": {""spans"": [{""name"": ""root_span"", ""context"": {""span_id"": ""19b9ac1ec26144a1"", ""trace_id"": ""5e34c34067d24f3aa1048a4d377064ee""}, ""parent_id"": null, ""start_time"": null, ""end_time"": null, ""status_code"": ""OK"", ""status_message"": """", ""attributes"": {""mlflow.traceRequestId"": ""\""5e34c34067d24f3aa1048a4d377064ee\"""", ""mlflow.spanInputs"": ""{\""messages\"": [{\""role\"": \""user\"", \""content\"": \""Vull saber si tinc actituds d'emprenedor i explorar el meu perfil emprenedor. \""}], \""databricks_options\"": {\""return_trace\"": true}}"", ""mlflow.spanOutputs"": ""\""Fail to invoke the model with {'messages': [{'role': 'user', 'content': \\\""Vull saber si tinc actituds d'emprenedor i explorar el meu perfil emprenedor. \\\""}], 'databricks_options': {'return_trace': True}}. MlflowException('Failed to call the deployment endpoint. Please check the deployment URL is set correctly and the input payload is valid.\\\\n\\\\n- Error: 400 Client Error: Encountered an unexpected error while parsing the input data. Error \\\\'Restarting model process. This is likely caused by a request timeout or OOM.\\\\' for url: https://adb-2869758279805397.17.azuredatabricks.net/serving-endpoints/agents_dts_proves_pre-startups_list-multi-agent-chatbot-claude/invocations. Response text: {\\\""error_code\\\"": \\\""BAD_REQUEST\\\"", \\\""message\\\"": \\\""Encountered an unexpected error while parsing the input data. Error \\\\'Restarting model process. This is likely caused by a request timeout or OOM.\\\\'\\\""}\\\\n\\\\n- Deployment URI: agents_dts_proves_pre-startups_list-multi-agent-chatbot-claude\\\\n\\\\n- Input payload: {\\\\'messages\\\\': [{\\\\'role\\\\': \\\\'user\\\\', \\\\'content\\\\': \\\""Vull saber si tinc actituds d\\\\'emprenedor i explorar el meu perfil emprenedor. \\\""}], \\\\'databricks_options\\\\': {\\\\'return_trace\\\\': True}}')\""""}, ""events"": []}], ""request"": ""{\""messages\"": [{\""role\"": \""user\"", \""content\"": \""Vull saber si tinc actituds d'emprenedor i explorar el meu perfil emprenedor. \""}], \""databricks_options\"": {\""return_trace\"": true}}"", ""response"": ""Fail to invoke the model with {'messages': [{'role': 'user', 'content': \""Vull saber si tinc actituds d'emprenedor i explorar el meu perfil emprenedor. \""}], 'databricks_options': {'return_trace': True}}. MlflowException('Failed to call the deployment endpoint. Please check the deployment URL is set correctly and the input payload is valid.\\n\\n- Error: 400 Client Error: Encountered an unexpected error while parsing the input data. Error \\'Restarting model process. This is likely caused by a request timeout or OOM.\\' for url: https://adb-2869758279805397.17.azuredatabricks.net/serving-endpoints/agents_dts_proves_pre-startups_list-multi-agent-chatbot-claude/invocations. Response text: {\""error_code\"": \""BAD_REQUEST\"", \""message\"": \""Encountered an unexpected error while parsing the input data. Error \\'Restarting model process. This is likely caused by a request timeout or OOM.\\'\""}\\n\\n- Deployment URI: agents_dts_proves_pre-startups_list-multi-agent-chatbot-claude\\n\\n- Input payload: {\\'messages\\': [{\\'role\\': \\'user\\', \\'content\\': \""Vull saber si tinc actituds d\\'emprenedor i explorar el meu perfil emprenedor. \""}], \\'databricks_options\\': {\\'return_trace\\': True}}')""}}",List(),"Fail to invoke the model with {'messages': [{'role': 'user', 'content': ""Vull saber si tinc actituds d'emprenedor i explorar el meu perfil emprenedor. ""}], 'databricks_options': {'return_trace': True}}. MlflowException('Failed to call the deployment endpoint. Please check the deployment URL is set correctly and the input payload is valid.\n\n- Error: 400 Client Error: Encountered an unexpected error while parsing the input data. Error \'Restarting model process. This is likely caused by a request timeout or OOM.\' for url: https://adb-2869758279805397.17.azuredatabricks.net/serving-endpoints/agents_dts_proves_pre-startups_list-multi-agent-chatbot-claude/invocations. Response text: {""error_code"": ""BAD_REQUEST"", ""message"": ""Encountered an unexpected error while parsing the input data. Error \'Restarting model process. This is likely caused by a request timeout or OOM.\'""}\n\n- Deployment URI: agents_dts_proves_pre-startups_list-multi-agent-chatbot-claude\n\n- Input payload: {\'messages\': [{\'role\': \'user\', \'content\': ""Vull saber si tinc actituds d\'emprenedor i explorar el meu perfil emprenedor. ""}], \'databricks_options\': {\'return_trace\': True}}')"


In [0]:
## llm: claude

with mlflow.start_run(run_name="agent-documentacio-claude-preguntes-personalitzades-2"):
    eval_results = mlflow.evaluate(
        "endpoints:/agents_dts_proves_pre-startups_list-multi-agent-chatbot-claude",  # Aquí pones tu endpoint
        data=eval_set_documnetation[5:10],  # Tu dataset de evaluación
        model_type="databricks-agent",  # Para evaluación de Mosaic AI Agents
    )

display(eval_results.tables['eval_results'])

Evaluating:   0%|          | 0/5 [Elapsed: 00:00, Remaining: ?]

request_id,request,trace,tool_calls,model_error_message,response,retrieved_context,response/overall_assessment/rating,response/llm_judged/safety/rating,response/llm_judged/safety/rationale,response/llm_judged/relevance_to_query/rating,response/llm_judged/relevance_to_query/rationale,retrieval/llm_judged/chunk_relevance/ratings,retrieval/llm_judged/chunk_relevance/rationales,response/llm_judged/groundedness/rating,response/llm_judged/groundedness/rationale,agent/latency_seconds,retrieval/llm_judged/chunk_relevance/precision,response/overall_assessment/rationale
46b0277fdf1c04400986863bd0fafdcfb287b541ffc7dbb8525e0243ffb9374f,"List(List(List(Vull obrir un negoci local, com una botiga o un servei de proximitat. Quina forma jurídica em recomaneu si soc només jo per començar i vull que sigui senzill?, user)))","{""info"": {""request_id"": ""tr-220f0a84302248b2a607b5fc0d1361ad"", ""experiment_id"": ""3845882404188464"", ""timestamp_ms"": 1747394046664, ""execution_time_ms"": 0, ""status"": ""ERROR"", ""request_metadata"": {""mlflow.sourceRun"": ""7fe8f058079d44dbb6cdb8611e46336b"", ""mlflow.traceInputs"": ""{\""messages\"": [{\""role\"": \""user\"", \""content\"": \""Vull obrir un negoci local, com una botiga o un servei de proximitat. Quina forma jur\\u00eddica em recomaneu si soc nom\\u00e9s jo per comen\\u00e7ar i vull que sigui senzill?\""}], \""databricks_options\"": {\""retur"", ""mlflow.traceOutputs"": ""Fail to invoke the model with {'messages': [{'role': 'user', 'content': 'Vull obrir un negoci local, com una botiga o un servei de proximitat. Quina forma jur\u00eddica em recomaneu si soc nom\u00e9s jo per comen\u00e7ar i vull que sigui senzill?'}], 'databricks_op""}, ""tags"": {""mlflow.user"": ""aliciachimeno_ext@gencat.cat"", ""mlflow.artifactLocation"": ""dbfs:/databricks/mlflow-tracking/3845882404188464/tr-220f0a84302248b2a607b5fc0d1361ad/artifacts""}, ""assessments"": []}, ""data"": {""spans"": [{""name"": ""root_span"", ""context"": {""span_id"": ""9ce44a2ef35d4d05"", ""trace_id"": ""e41e3876701d4c1e8cc1c1d3065200e6""}, ""parent_id"": null, ""start_time"": null, ""end_time"": null, ""status_code"": ""OK"", ""status_message"": """", ""attributes"": {""mlflow.traceRequestId"": ""\""e41e3876701d4c1e8cc1c1d3065200e6\"""", ""mlflow.spanInputs"": ""{\""messages\"": [{\""role\"": \""user\"", \""content\"": \""Vull obrir un negoci local, com una botiga o un servei de proximitat. Quina forma jur\\u00eddica em recomaneu si soc nom\\u00e9s jo per comen\\u00e7ar i vull que sigui senzill?\""}], \""databricks_options\"": {\""return_trace\"": true}}"", ""mlflow.spanOutputs"": ""\""Fail to invoke the model with {'messages': [{'role': 'user', 'content': 'Vull obrir un negoci local, com una botiga o un servei de proximitat. Quina forma jur\\u00eddica em recomaneu si soc nom\\u00e9s jo per comen\\u00e7ar i vull que sigui senzill?'}], 'databricks_options': {'return_trace': True}}. MlflowException('Failed to call the deployment endpoint. Please check the deployment URL is set correctly and the input payload is valid.\\\\n\\\\n- Error: 400 Client Error: Encountered an unexpected error while evaluating the model. Verify that the input is compatible with the model for inference. Error \\\\'1 validation error for ChatAgentMessage\\\\n Value error, Either \\\\'content\\\\' or \\\\'tool_calls\\\\' must be provided. [type=value_error, input_value={\\\\'role\\\\': \\\\'assistant\\\\', \\\\'co...449e-8b32-36268b37d4ce\\\\'}, input_type=dict]\\\\n For further information visit https://errors.pydantic.dev/2.11/v/value_error\\\\' for url: https://adb-2869758279805397.17.azuredatabricks.net/serving-endpoints/agents_dts_proves_pre-startups_list-multi-agent-chatbot-claude/invocations. Response text: {\\\""error_code\\\"": \\\""BAD_REQUEST\\\"", \\\""message\\\"": \\\""Encountered an unexpected error while evaluating the model. Verify that the input is compatible with the model for inference. Error \\\\'1 validation error for ChatAgentMessage\\\\\\\\n Value error, Either \\\\'content\\\\' or \\\\'tool_calls\\\\' must be provided. [type=value_error, input_value={\\\\'role\\\\': \\\\'assistant\\\\', \\\\'co...449e-8b32-36268b37d4ce\\\\'}, input_type=dict]\\\\\\\\n For further information visit https://errors.pydantic.dev/2.11/v/value_error\\\\'\\\""}\\\\n\\\\n- Deployment URI: agents_dts_proves_pre-startups_list-multi-agent-chatbot-claude\\\\n\\\\n- Input payload: {\\\\'messages\\\\': [{\\\\'role\\\\': \\\\'user\\\\', \\\\'content\\\\': \\\\'Vull obrir un negoci local, com una botiga o un servei de proximitat. Quina forma jur\\u00eddica em recomaneu si soc nom\\u00e9s jo per comen\\u00e7ar i vull que sigui senzill?\\\\'}], \\\\'databricks_options\\\\': {\\\\'return_trace\\\\': True}}')\""""}, ""events"": []}], ""request"": ""{\""messages\"": [{\""role\"": \""user\"", \""content\"": \""Vull obrir un negoci local, com una botiga o un servei de proximitat. Quina forma jur\\u00eddica em recomaneu si soc nom\\u00e9s jo per comen\\u00e7ar i vull que sigui senzill?\""}], \""databricks_options\"": {\""return_trace\"": true}}"", ""response"": ""Fail to invoke the model with {'messages': [{'role': 'user', 'content': 'Vull obrir un negoci local, com una botiga o un servei de proximitat. Quina forma jur\u00eddica em recomaneu si soc nom\u00e9s jo per comen\u00e7ar i vull que sigui senzill?'}], 'databricks_options': {'return_trace': True}}. MlflowException('Failed to call the deployment endpoint. Please check the deployment URL is set correctly and the input payload is valid.\\n\\n- Error: 400 Client Error: Encountered an unexpected error while evaluating the model. Verify that the input is compatible with the model for inference. Error \\'1 validation error for ChatAgentMessage\\n Value error, Either \\'content\\' or \\'tool_calls\\' must be provided. [type=value_error, input_value={\\'role\\': \\'assistant\\', \\'co...449e-8b32-36268b37d4ce\\'}, input_type=dict]\\n For further information visit https://errors.pydantic.dev/2.11/v/value_error\\' for url: https://adb-2869758279805397.17.azuredatabricks.net/serving-endpoints/agents_dts_proves_pre-startups_list-multi-agent-chatbot-claude/invocations. Response text: {\""error_code\"": \""BAD_REQUEST\"", \""message\"": \""Encountered an unexpected error while evaluating the model. Verify that the input is compatible with the model for inference. Error \\'1 validation error for ChatAgentMessage\\\\n Value error, Either \\'content\\' or \\'tool_calls\\' must be provided. [type=value_error, input_value={\\'role\\': \\'assistant\\', \\'co...449e-8b32-36268b37d4ce\\'}, input_type=dict]\\\\n For further information visit https://errors.pydantic.dev/2.11/v/value_error\\'\""}\\n\\n- Deployment URI: agents_dts_proves_pre-startups_list-multi-agent-chatbot-claude\\n\\n- Input payload: {\\'messages\\': [{\\'role\\': \\'user\\', \\'content\\': \\'Vull obrir un negoci local, com una botiga o un servei de proximitat. Quina forma jur\u00eddica em recomaneu si soc nom\u00e9s jo per comen\u00e7ar i vull que sigui senzill?\\'}], \\'databricks_options\\': {\\'return_trace\\': True}}')""}}",List(),"Fail to invoke the model with {'messages': [{'role': 'user', 'content': 'Vull obrir un negoci local, com una botiga o un servei de proximitat. Quina forma jurídica em recomaneu si soc només jo per començar i vull que sigui senzill?'}], 'databricks_options': {'return_trace': True}}. MlflowException('Failed to call the deployment endpoint. Please check the deployment URL is set correctly and the input payload is valid.\n\n- Error: 400 Client Error: Encountered an unexpected error while evaluating the model. Verify that the input is compatible with the model for inference. Error \'1 validation error for ChatAgentMessage\n Value error, Either \'content\' or \'tool_calls\' must be provided. [type=value_error, input_value={\'role\': \'assistant\', \'co...449e-8b32-36268b37d4ce\'}, input_type=dict]\n For further information visit https://errors.pydantic.dev/2.11/v/value_error\' for url: https://adb-2869758279805397.17.azuredatabricks.net/serving-endpoints/agents_dts_proves_pre-startups_list-multi-agent-chatbot-claude/invocations. Response text: {""error_code"": ""BAD_REQUEST"", ""message"": ""Encountered an unexpected error while evaluating the model. Verify that the input is compatible with the model for inference. Error \'1 validation error for ChatAgentMessage\\n Value error, Either \'content\' or \'tool_calls\' must be provided. [type=value_error, input_value={\'role\': \'assistant\', \'co...449e-8b32-36268b37d4ce\'}, input_type=dict]\\n For further information visit https://errors.pydantic.dev/2.11/v/value_error\'""}\n\n- Deployment URI: agents_dts_proves_pre-startups_list-multi-agent-chatbot-claude\n\n- Input payload: {\'messages\': [{\'role\': \'user\', \'content\': \'Vull obrir un negoci local, com una botiga o un servei de proximitat. Quina forma jurídica em recomaneu si soc només jo per començar i vull que sigui senzill?\'}], \'databricks_options\': {\'return_trace\': True}}')",,,,,,,,,,,,,,


In [0]:
## llm: claude

with mlflow.start_run(run_name="agent-documentacio-claude-preguntes-personalitzades-3"):
    eval_results = mlflow.evaluate(
        "endpoints:/agents_dts_proves_pre-startups_list-multi-agent-chatbot-claude",  # Aquí pones tu endpoint
        data=eval_set_documnetation[10:],  # Tu dataset de evaluación
        model_type="databricks-agent",  # Para evaluación de Mosaic AI Agents
    )

display(eval_results.tables['eval_results'])

Evaluating:   0%|          | 0/4 [Elapsed: 00:00, Remaining: ?]

request_id,request,response,retrieved_context,trace,tool_calls,response/overall_assessment/rating,response/llm_judged/safety/rating,response/llm_judged/safety/rationale,retrieval/llm_judged/chunk_relevance/ratings,retrieval/llm_judged/chunk_relevance/rationales,response/llm_judged/relevance_to_query/rating,response/llm_judged/relevance_to_query/rationale,response/llm_judged/groundedness/rating,response/llm_judged/groundedness/rationale,agent/latency_seconds,retrieval/llm_judged/chunk_relevance/precision
