In [17]:
import pandas as pd
from tqdm.auto import tqdm
from langchain.prompts.chat import (
    ChatPromptTemplate,
    HumanMessagePromptTemplate,
)
from langchain.schema import SystemMessage
from langchain.chat_models import AzureChatOpenAI
import os 

pd.set_option("display.max_colwidth", None)

In [36]:
EVALUATION_PROMPT = """###Task Description:
An instruction (might include an Input inside it), a response to evaluate, a reference answer that gets a score of 5, and a score rubric representing an evaluation criteria are given.
1. Write a detailed feedback that assesses the quality of the response strictly based on the given score rubric, not evaluating in general.
2. After writing feedback, write a score that is an integer between 1 and 10. You should refer to the score rubric.
3. The output format should look as follows: "Feedback: {{write a feedback for criteria}} [RESULT] {{an integer number between 1 and 10}}"
4. Please do not generate any other opening, closing, and explanations. Be sure to include [RESULT] in your output.

###The instruction to evaluate:
{instruction}

###Response to evaluate:
{response}

###Retrieved Information:
{retrieved_information}

###Reference Answer (Score 10):
{reference_answer}

###Score Rubrics:
[Is the response correct, accurate, and factual based on the reference answer?]
Score 1: The response is completely incorrect, inaccurate, and/or not factual.
Score 2-3: The response has many inaccuracies and lacks factual correctness.
Score 4-5: The response is somewhat correct, accurate, and/or factual.
Score 6-7: The response is mostly correct, accurate, and factual.
Score 8-9: The response is very accurate, correct, and factual with minor errors.
Score 10: The response is completely correct, accurate, and factual with no errors.

###Feedback:"""


evaluation_prompt_template = ChatPromptTemplate.from_messages(
    [
        SystemMessage(content="You are a fair evaluator language model."),
        HumanMessagePromptTemplate.from_template(EVALUATION_PROMPT),
    ]
)

In [37]:
os.environ['OPENAI_API_KEY'] = 'OPENAI_API_KEY'
os.environ['OPENAI_API_TYPE'] = 'azure'
os.environ['OPENAI_API_VERSION'] = '2023-03-15-preview'
os.environ['OPENAI_API_BASE'] = 'OPENAI_API_BASE'

eval_chat_model = AzureChatOpenAI(
      deployment_name="deployment_name",
      model_name="gpt-model_name-32k")

evaluator_name = "GPT4"



In [49]:
results = pd.read_excel("llm-QA-results.xlsx")

In [52]:
results.head(5)

Unnamed: 0,llm-model,question,contexts,answer,ground_truth
0,Llama-2-13b-chat-hf,"What are the fundamental differences between deductive and inductive reasoning in the context of Data Science, as outlined in the guide?","['(hypothesis-based) and inductive (pattern-based) reasoning. /T_his is \na fundamental change from traditional analytic approaches. Inductive \nreasoning and exploratory data analysis provide a means to form or \nre/f_ine hypotheses and discover new analytic paths. In fact, to do the \ndiscovery of signi/f_icant insights that are the hallmark of Data Science, \nyou must have the tradecraft and the interplay between inductive \nand deductive reasoning. By actively combining the ability to reason \ndeductively and inductively, Data Science creates an environment \nwhere models of reality no longer need to be static and empirically \nbased. Instead, they are constantly tested, updated and improved until \nbetter models are found. /T_hese concepts are summarized in the /f_igure, \n/T_he Types of Reason and /T_heir Role in Data Science Tradecraft .\nTHE TYPES OF REASON…\nDEDUCTIVE REASONING:\n › Commonly associated \nwith “formal logic.”\n › Involves reasoning from known \npremises, or premises presumed', '(hypothesis-based) and inductive (pattern-based) reasoning. /T_his is \na fundamental change from traditional analytic approaches. Inductive \nreasoning and exploratory data analysis provide a means to form or \nre/f_ine hypotheses and discover new analytic paths. In fact, to do the \ndiscovery of signi/f_icant insights that are the hallmark of Data Science, \nyou must have the tradecraft and the interplay between inductive \nand deductive reasoning. By actively combining the ability to reason \ndeductively and inductively, Data Science creates an environment \nwhere models of reality no longer need to be static and empirically \nbased. Instead, they are constantly tested, updated and improved until \nbetter models are found. /T_hese concepts are summarized in the /f_igure, \n/T_he Types of Reason and /T_heir Role in Data Science Tradecraft .\nTHE TYPES OF REASON…\nDEDUCTIVE REASONING:\n › Commonly associated \nwith “formal logic.”\n › Involves reasoning from known \npremises, or premises presumed', '(hypothesis-based) and inductive (pattern-based) reasoning. /T_his is \na fundamental change from traditional analytic approaches. Inductive \nreasoning and exploratory data analysis provide a means to form or \nre/f_ine hypotheses and discover new analytic paths. In fact, to do the \ndiscovery of signi/f_icant insights that are the hallmark of Data Science, \nyou must have the tradecraft and the interplay between inductive \nand deductive reasoning. By actively combining the ability to reason \ndeductively and inductively, Data Science creates an environment \nwhere models of reality no longer need to be static and empirically \nbased. Instead, they are constantly tested, updated and improved until \nbetter models are found. /T_hese concepts are summarized in the /f_igure, \n/T_he Types of Reason and /T_heir Role in Data Science Tradecraft .\nTHE TYPES OF REASON…\nDEDUCTIVE REASONING:\n › Commonly associated \nwith “formal logic.”\n › Involves reasoning from known \npremises, or premises presumed', 'premises, or premises presumed \nto be true, to a certain conclusion.\n › The conclusions reached are \ncertain, inevitable, inescapable.INDUCTIVE REASONING\n › Commonly known as “informal \nlogic,” or “everyday argument.”\n › Involves drawing uncertain \ninferences, based on \nprobabilistic reasoning.\n › The conclusions reached \nare probable, reasonable, \nplausible, believable.\n…AND THEIR ROLE IN DATA SCIENCE TRADECRAFT.\nDEDUCTIVE REASONING:\n › Formulate hypotheses about \nrelationships and underlying models.\n › Carry out experiments with the data \nto test hypotheses and models.INDUCTIVE REASONING\n › Exploratory data analysis to \ndiscover or reﬁne hypotheses.\n › Discover new relationships, insights \nand analytic paths from the data.\nThe Types of Reason and Their Role in Data Science Tradecraft\nTHE FIELD GUIDE to DATA SCIENCE\n Source: Booz Allen Hamilton \n19 Start Here for the Basics 19 Start Here for the Basics\n/T_he diﬀerences between Data Science and traditional analytic', 'premises, or premises presumed \nto be true, to a certain conclusion.\n › The conclusions reached are \ncertain, inevitable, inescapable.INDUCTIVE REASONING\n › Commonly known as “informal \nlogic,” or “everyday argument.”\n › Involves drawing uncertain \ninferences, based on \nprobabilistic reasoning.\n › The conclusions reached \nare probable, reasonable, \nplausible, believable.\n…AND THEIR ROLE IN DATA SCIENCE TRADECRAFT.\nDEDUCTIVE REASONING:\n › Formulate hypotheses about \nrelationships and underlying models.\n › Carry out experiments with the data \nto test hypotheses and models.INDUCTIVE REASONING\n › Exploratory data analysis to \ndiscover or reﬁne hypotheses.\n › Discover new relationships, insights \nand analytic paths from the data.\nThe Types of Reason and Their Role in Data Science Tradecraft\nTHE FIELD GUIDE to DATA SCIENCE\n Source: Booz Allen Hamilton \n19 Start Here for the Basics 19 Start Here for the Basics\n/T_he diﬀerences between Data Science and traditional analytic']","The fundamental differences between deductive and inductive reasoning in the context of Data Science are:\n * Deductive reasoning involves reasoning from known premises to a certain conclusion, while inductive reasoning involves drawing uncertain inferences based on probabilistic reasoning.\n * Data Science combines both deductive and inductive reasoning to create an environment where models of reality are constantly tested, updated, and improved until better models are found.\n * Data Science emphasizes the use of inductive reasoning to discover new relationships and insights from the data, while deductive reasoning is used to formulate hypotheses and carry out experiments to test those hypotheses.","deductive reasoning begins with hypotheses and tests them against data to reach a certain conclusion, reflecting a structured, logical process. Inductive reasoning, conversely, starts with observations and data to generate hypotheses, favoring discovery and insights. This interplay enhances model adaptability and reflects a significant shift from traditional analytic methods."
1,Llama-2-13b-chat-hf,How does the guide describe the transformation of data into actionable insights through the creation of data products?,"['TheSHORT \n VERSION\n\x03›Data Science is the art of turning data into actions. \nIt’s all about the tradecraft. Tradecraft is the process, tools and \ntechnologies for humans and computers to work together to \ntransform data into insights.\n\x03›Data Science tradecraft creates data products . \nData products provide actionable information without exposing \ndecision makers to the underlying data or analytics (e.g., buy/sell \nstrategies for /f_inancial instruments, a set of actions to improve \nproduct yield, or steps to improve product marketing).\n\x03›Data Science supports and encourages shifting between \ndeductive (hypothesis-based) and inductive (pattern-\nbased) reasoning. \n/T_his is a fundamental change from traditional analysis approaches. \nInductive reasoning and exploratory data analysis provide a means \nto form or re/f_ine hypotheses and discover new analytic paths. \nModels of reality no longer need to be static. /T_hey are constantly', 'TheSHORT \n VERSION\n\x03›Data Science is the art of turning data into actions. \nIt’s all about the tradecraft. Tradecraft is the process, tools and \ntechnologies for humans and computers to work together to \ntransform data into insights.\n\x03›Data Science tradecraft creates data products . \nData products provide actionable information without exposing \ndecision makers to the underlying data or analytics (e.g., buy/sell \nstrategies for /f_inancial instruments, a set of actions to improve \nproduct yield, or steps to improve product marketing).\n\x03›Data Science supports and encourages shifting between \ndeductive (hypothesis-based) and inductive (pattern-\nbased) reasoning. \n/T_his is a fundamental change from traditional analysis approaches. \nInductive reasoning and exploratory data analysis provide a means \nto form or re/f_ine hypotheses and discover new analytic paths. \nModels of reality no longer need to be static. /T_hey are constantly', 'TheSHORT \n VERSION\n\x03›Data Science is the art of turning data into actions. \nIt’s all about the tradecraft. Tradecraft is the process, tools and \ntechnologies for humans and computers to work together to \ntransform data into insights.\n\x03›Data Science tradecraft creates data products . \nData products provide actionable information without exposing \ndecision makers to the underlying data or analytics (e.g., buy/sell \nstrategies for /f_inancial instruments, a set of actions to improve \nproduct yield, or steps to improve product marketing).\n\x03›Data Science supports and encourages shifting between \ndeductive (hypothesis-based) and inductive (pattern-\nbased) reasoning. \n/T_his is a fundamental change from traditional analysis approaches. \nInductive reasoning and exploratory data analysis provide a means \nto form or re/f_ine hypotheses and discover new analytic paths. \nModels of reality no longer need to be static. /T_hey are constantly', 'Organizations often make decisions based on inexact data. Data \nstovepipes mean that organizations may have blind spots. /T_hey are \nnot able to see the whole picture and fail to look at their data and \nchallenges holistically. /T_he end result is that valuable information is \nwithheld from decision makers. Research has shown almost 33% of \ndecisions are made without good data or information. [10] \nWhen Data Scientists are able to explore and analyze all the data, new \nopportunities arise for analysis and data-driven decision making. /T_he \ninsights gained from these new opportunities will signi/f_icantly change \nthe course of action and decisions within an organization. Gaining \naccess to an organization’s complete repository of data, however, \nrequires preparation.\nOur experience shows time and time again that the best tool for \nData Scientists to prepare for analysis is a lake – speci/f_ically, the Data \nLake.[11] /T_his is a new approach to collecting, storing and integrating', 'Organizations often make decisions based on inexact data. Data \nstovepipes mean that organizations may have blind spots. /T_hey are \nnot able to see the whole picture and fail to look at their data and \nchallenges holistically. /T_he end result is that valuable information is \nwithheld from decision makers. Research has shown almost 33% of \ndecisions are made without good data or information. [10] \nWhen Data Scientists are able to explore and analyze all the data, new \nopportunities arise for analysis and data-driven decision making. /T_he \ninsights gained from these new opportunities will signi/f_icantly change \nthe course of action and decisions within an organization. Gaining \naccess to an organization’s complete repository of data, however, \nrequires preparation.\nOur experience shows time and time again that the best tool for \nData Scientists to prepare for analysis is a lake – speci/f_ically, the Data \nLake.[11] /T_his is a new approach to collecting, storing and integrating']","Based on the provided context, the guide describes the transformation of data into actionable insights through the creation of data products, which provide actionable information without exposing decision makers to the underlying data or analytics. The guide highlights the importance of extracting timely and actionable information from diverse data sources to drive data products, and emphasizes the shift from traditional analysis approaches to inductive and deductive reasoning. Additionally, the guide notes that organizations can directly connect business decision makers to the data, allowing them to value the data rather than managing it as something separate.","The guide outlines the transformation of data into actionable insights through the creation of data products as a process that turns data into actions. This is achieved by developing data products that provide actionable information to decision-makers without requiring them to interact with the underlying data or analytics. Examples of such data products include buy/sell strategies, actions to improve product yield, or steps to enhance product marketing. This transformation requires extracting timely actionable information from various data sources to drive the data products, thus enabling decisions based on insights like which products to advertise more or how to improve compliance programs cost-effectively."
2,Llama-2-13b-chat-hf,"According to the guide, what role does the Data Lake play in the preparation and analysis of data for Data Science endeavors?","['Organizations often make decisions based on inexact data. Data \nstovepipes mean that organizations may have blind spots. /T_hey are \nnot able to see the whole picture and fail to look at their data and \nchallenges holistically. /T_he end result is that valuable information is \nwithheld from decision makers. Research has shown almost 33% of \ndecisions are made without good data or information. [10] \nWhen Data Scientists are able to explore and analyze all the data, new \nopportunities arise for analysis and data-driven decision making. /T_he \ninsights gained from these new opportunities will signi/f_icantly change \nthe course of action and decisions within an organization. Gaining \naccess to an organization’s complete repository of data, however, \nrequires preparation.\nOur experience shows time and time again that the best tool for \nData Scientists to prepare for analysis is a lake – speci/f_ically, the Data \nLake.[11] /T_his is a new approach to collecting, storing and integrating', 'Organizations often make decisions based on inexact data. Data \nstovepipes mean that organizations may have blind spots. /T_hey are \nnot able to see the whole picture and fail to look at their data and \nchallenges holistically. /T_he end result is that valuable information is \nwithheld from decision makers. Research has shown almost 33% of \ndecisions are made without good data or information. [10] \nWhen Data Scientists are able to explore and analyze all the data, new \nopportunities arise for analysis and data-driven decision making. /T_he \ninsights gained from these new opportunities will signi/f_icantly change \nthe course of action and decisions within an organization. Gaining \naccess to an organization’s complete repository of data, however, \nrequires preparation.\nOur experience shows time and time again that the best tool for \nData Scientists to prepare for analysis is a lake – speci/f_ically, the Data \nLake.[11] /T_his is a new approach to collecting, storing and integrating', 'Organizations often make decisions based on inexact data. Data \nstovepipes mean that organizations may have blind spots. /T_hey are \nnot able to see the whole picture and fail to look at their data and \nchallenges holistically. /T_he end result is that valuable information is \nwithheld from decision makers. Research has shown almost 33% of \ndecisions are made without good data or information. [10] \nWhen Data Scientists are able to explore and analyze all the data, new \nopportunities arise for analysis and data-driven decision making. /T_he \ninsights gained from these new opportunities will signi/f_icantly change \nthe course of action and decisions within an organization. Gaining \naccess to an organization’s complete repository of data, however, \nrequires preparation.\nOur experience shows time and time again that the best tool for \nData Scientists to prepare for analysis is a lake – speci/f_ically, the Data \nLake.[11] /T_his is a new approach to collecting, storing and integrating', 'of risks and opportunities by evaluating situational, operational and \nbehavioral data.\nWith the totality of data fully accessible in the Data Lake, \norganizations can use analytics to /f_ind the kinds of connections and \npatterns that point to promising opportunities. /T_his high-speed \nanalytic connection is done within the Data Lake, as opposed to \nolder style sampling methods that could only make use of a narrow \nslice of the data. In order to understand what was in the lake, you had \nto bring the data out and study it. Now you can dive into the lake, \nbringing your analytics to the data. /T_he /f_igure, Analytic Connection in \nthe Data Lake, highlights the concept of diving into the Data Lake to \ndiscover new connections and patterns . \nAnalytic Connection in the Data Lake\nTHE FIELD GUIDE to DATA SCIENCE\n29 Start Here for the Basics 29 Start Here for the BasicsData Scientists work across the spectrum of analytic goals – Describe,', 'of risks and opportunities by evaluating situational, operational and \nbehavioral data.\nWith the totality of data fully accessible in the Data Lake, \norganizations can use analytics to /f_ind the kinds of connections and \npatterns that point to promising opportunities. /T_his high-speed \nanalytic connection is done within the Data Lake, as opposed to \nolder style sampling methods that could only make use of a narrow \nslice of the data. In order to understand what was in the lake, you had \nto bring the data out and study it. Now you can dive into the lake, \nbringing your analytics to the data. /T_he /f_igure, Analytic Connection in \nthe Data Lake, highlights the concept of diving into the Data Lake to \ndiscover new connections and patterns . \nAnalytic Connection in the Data Lake\nTHE FIELD GUIDE to DATA SCIENCE\n29 Start Here for the Basics 29 Start Here for the BasicsData Scientists work across the spectrum of analytic goals – Describe,']","The Data Lake plays a crucial role in the preparation and analysis of data for Data Science endeavors by providing a centralized repository of all the organization's data, eliminating the need for ETL and enabling real-time analysis of all the data.","The guide emphasizes the Data Lake's critical role in Data Science for both the preparation and analysis of data. It is highlighted as the optimal tool for Data Scientists to prepare data for analysis, acting as a centralized repository that consolidates an organization's entire data collection into a unified view. This consolidation eliminates the need for traditional, cumbersome data-preparation processes like Extract/Transform/Load (ETL), making all information readily available for every inquiry. The Data Lake enables direct analysis of the complete data set, facilitating the discovery of connections and patterns that drive actionable insights, thereby significantly enhancing decision-making and opportunity identification within organizations."
3,Llama-2-13b-chat-hf,"Can you explain the concept of 'Data Science Maturity' within an organization as presented in the guide, and how does it impact the organization's analytical capabilities?","['unless it is intended to produce an output – that is, you have the \nintent to Advise . /T_his means simply that each step forward in \nmaturity drives you to the right in the model diagram. Moving \nto the right requires the correct processes, people, culture and \noperating model – a robust Data Science capability. What Does it \nTake to Create a Data Science Capability? addresses this topic. \nWe have observed very few organizations actually operating at \nthe highest levels of maturity, the Predict and Advise stages. /T_he \ntradecraft of Discover is only now maturing to the point that \norganizations can focus on advanced Predict and Advise activities. \n/T_his is the new frontier of Data Science. /T_his is the space in which \nwe will begin to understand how to close the cognitive gap between \nhumans and computers. Organizations that reach Advise will be \nmet with true insights and real competitive advantage. » Where does your organization \nfall in analytic maturity? \n Take the quiz!', 'unless it is intended to produce an output – that is, you have the \nintent to Advise . /T_his means simply that each step forward in \nmaturity drives you to the right in the model diagram. Moving \nto the right requires the correct processes, people, culture and \noperating model – a robust Data Science capability. What Does it \nTake to Create a Data Science Capability? addresses this topic. \nWe have observed very few organizations actually operating at \nthe highest levels of maturity, the Predict and Advise stages. /T_he \ntradecraft of Discover is only now maturing to the point that \norganizations can focus on advanced Predict and Advise activities. \n/T_his is the new frontier of Data Science. /T_his is the space in which \nwe will begin to understand how to close the cognitive gap between \nhumans and computers. Organizations that reach Advise will be \nmet with true insights and real competitive advantage. » Where does your organization \nfall in analytic maturity? \n Take the quiz!', 'unless it is intended to produce an output – that is, you have the \nintent to Advise . /T_his means simply that each step forward in \nmaturity drives you to the right in the model diagram. Moving \nto the right requires the correct processes, people, culture and \noperating model – a robust Data Science capability. What Does it \nTake to Create a Data Science Capability? addresses this topic. \nWe have observed very few organizations actually operating at \nthe highest levels of maturity, the Predict and Advise stages. /T_he \ntradecraft of Discover is only now maturing to the point that \norganizations can focus on advanced Predict and Advise activities. \n/T_his is the new frontier of Data Science. /T_his is the space in which \nwe will begin to understand how to close the cognitive gap between \nhumans and computers. Organizations that reach Advise will be \nmet with true insights and real competitive advantage. » Where does your organization \nfall in analytic maturity? \n Take the quiz!', 'tested, updated and improved until better models are found. \n\x03›Data Science is necessary for companies to stay with the \npack and compete in the future. \nOrganizations are constantly making decisions based on gut \ninstinct, loudest voice and best argument – sometimes they are \neven informed by real information. /T_he winners and the losers in \nthe emerging data economy are going to be determined by their \nData Science teams. \n\x03›Data Science capabilities can be built over time. \nOrganizations mature through a series of stages – Collect, \nDescribe, Discover, Predict, Advise – as they move from data \ndeluge to full Data Science maturity. At each stage, they can \ntackle increasingly complex analytic goals with a wider breadth \nof analytic capabilities. However, organizations need not reach \nmaximum Data Science maturity to achieve success. Signi/f_icant \ngains can be found in every stage.\n\x03›Data Science is a diﬀerent kind of team sport. \nData Science teams need a broad view of the organization. Leaders', 'tested, updated and improved until better models are found. \n\x03›Data Science is necessary for companies to stay with the \npack and compete in the future. \nOrganizations are constantly making decisions based on gut \ninstinct, loudest voice and best argument – sometimes they are \neven informed by real information. /T_he winners and the losers in \nthe emerging data economy are going to be determined by their \nData Science teams. \n\x03›Data Science capabilities can be built over time. \nOrganizations mature through a series of stages – Collect, \nDescribe, Discover, Predict, Advise – as they move from data \ndeluge to full Data Science maturity. At each stage, they can \ntackle increasingly complex analytic goals with a wider breadth \nof analytic capabilities. However, organizations need not reach \nmaximum Data Science maturity to achieve success. Signi/f_icant \ngains can be found in every stage.\n\x03›Data Science is a diﬀerent kind of team sport. \nData Science teams need a broad view of the organization. Leaders']","The concept of ""Data Science Maturity"" refers to an organization's ability to utilize data to drive decision-making and improve its analytical capabilities. It is a model that assesses an organization's progress in developing its data science capabilities, from basic data collection and description to more advanced predictive and advisory activities. The model has five stages of maturity, ranging from collecting and describing data to predicting and advising. Organizations that reach the highest levels of maturity, known as the ""Advise"" stage, are able to use data to gain true insights and achieve real competitive advantage. To create a data science capability, organizations must build a robust team with a broad view of the organization, and leaders must foster trust and transparent communication across all levels.","The concept of 'Data Science Maturity' in an organization is described as a progression through stages from data collection to advanced analytics and decision-making advice. It outlines a journey from handling simple data tasks to achieving deep insights that guide strategic actions. Each stage—Collect, Describe, Discover, Predict, and Advise—represents an increase in analytical capability, moving from basic data aggregation to providing sophisticated, actionable advice. Advancing through these stages enhances the organization's ability to make informed decisions, ultimately offering a competitive advantage."
4,Llama-2-13b-chat-hf,"The guide mentions a 'Data Science Venn Diagram' that includes domain expertise, computer science, and mathematics. How does this diagram illustrate the interdisciplinary nature of Data Science, and why are these areas critical?","['of the types of personalities that make Data Science possible, as well \nas a willingness to establish a culture of innovation and curiosity in \nyour organization. You must also consider how to deploy the team and \ngain widespread buy-in from across your organization. The Data Science Venn Diagram (inspired by [12])\nTHE FIELD GUIDE to DATA SCIENCE\n35 Start Here for the Basics 35 Start Here for the BasicsUnder standing What Makes \na Data Scientist \nData Science often requires a signi/f_icant investment of time across \na variety of tasks. Hypotheses must be generated and data must be \nacquired, prepared, analyzed, and acted upon. Multiple techniques \nare often applied before one yields interesting results. If that seems \ndaunting, it is because it is. Data Science is diﬃcult, intellectually \ntaxing work, which requires lots of talent: both tangible technical \nskills as well as the intangible ‘x-factors.’\n/T_he most important qualities of Data Scientists tend to be the', 'of the types of personalities that make Data Science possible, as well \nas a willingness to establish a culture of innovation and curiosity in \nyour organization. You must also consider how to deploy the team and \ngain widespread buy-in from across your organization. The Data Science Venn Diagram (inspired by [12])\nTHE FIELD GUIDE to DATA SCIENCE\n35 Start Here for the Basics 35 Start Here for the BasicsUnder standing What Makes \na Data Scientist \nData Science often requires a signi/f_icant investment of time across \na variety of tasks. Hypotheses must be generated and data must be \nacquired, prepared, analyzed, and acted upon. Multiple techniques \nare often applied before one yields interesting results. If that seems \ndaunting, it is because it is. Data Science is diﬃcult, intellectually \ntaxing work, which requires lots of talent: both tangible technical \nskills as well as the intangible ‘x-factors.’\n/T_he most important qualities of Data Scientists tend to be the', 'of the types of personalities that make Data Science possible, as well \nas a willingness to establish a culture of innovation and curiosity in \nyour organization. You must also consider how to deploy the team and \ngain widespread buy-in from across your organization. The Data Science Venn Diagram (inspired by [12])\nTHE FIELD GUIDE to DATA SCIENCE\n35 Start Here for the Basics 35 Start Here for the BasicsUnder standing What Makes \na Data Scientist \nData Science often requires a signi/f_icant investment of time across \na variety of tasks. Hypotheses must be generated and data must be \nacquired, prepared, analyzed, and acted upon. Multiple techniques \nare often applied before one yields interesting results. If that seems \ndaunting, it is because it is. Data Science is diﬃcult, intellectually \ntaxing work, which requires lots of talent: both tangible technical \nskills as well as the intangible ‘x-factors.’\n/T_he most important qualities of Data Scientists tend to be the', 'Science. Companies with strong Data Science teams often focus \non a single class of problems – graph algorithms for social \nnetwork analysis and recommender models for online shopping \nare two notable examples. Booz Allen is diﬀerent. In our role as \nconsultants, we support a diverse set of clients across a variety of \ndomains. /T_his allows us to uniquely understand the DNA of \nData Science. Our goal in creating /T_he Field Guide to Data Science \nis to capture what we have learned and to share it broadly. \nWe want this eﬀort to help drive forward the science and art \nof Data Science. \n/T_his /f_ield guide came from the passion our team feels for \nits work. It is not a textbook nor is it a super/f_icial treatment. \nSenior leaders will walk away with a deeper understanding \nof the concepts at the heart of Data Science. Practitioners \nwill add to their toolbox. We hope everyone will enjoy \nthe journey. »Why Data Science DNA? \nWe view Data Science as having \nDNA-like characteristics. Much like', 'Science. Companies with strong Data Science teams often focus \non a single class of problems – graph algorithms for social \nnetwork analysis and recommender models for online shopping \nare two notable examples. Booz Allen is diﬀerent. In our role as \nconsultants, we support a diverse set of clients across a variety of \ndomains. /T_his allows us to uniquely understand the DNA of \nData Science. Our goal in creating /T_he Field Guide to Data Science \nis to capture what we have learned and to share it broadly. \nWe want this eﬀort to help drive forward the science and art \nof Data Science. \n/T_his /f_ield guide came from the passion our team feels for \nits work. It is not a textbook nor is it a super/f_icial treatment. \nSenior leaders will walk away with a deeper understanding \nof the concepts at the heart of Data Science. Practitioners \nwill add to their toolbox. We hope everyone will enjoy \nthe journey. »Why Data Science DNA? \nWe view Data Science as having \nDNA-like characteristics. Much like']","The Data Science Venn Diagram illustrates the interdisciplinary nature of Data Science by showing the overlap between computer science, mathematics, and domain expertise. These areas are critical because they provide the foundation for creating data products, understanding the problem space, and examining data science problems theoretically. Building a successful Data Science capability requires a diverse set of skills, including computer science, mathematics, and domain expertise, to create a winning team that can achieve improved insights.","The 'Data Science Venn Diagram' showcases Data Science as an intersection of domain expertise, computer science, and mathematics. Domain expertise ensures relevance, computer science handles data manipulation, and mathematics provides the basis for analysis and prediction. This integration is crucial for deriving meaningful insights and solutions from data, highlighting Data Science's interdisciplinary nature and its reliance on these core areas for effective decision-making."


In [50]:
results["llm-model"].unique()

array(['Llama-2-13b-chat-hf', 'Mistral-7B-Instruct-v0.2',
       'LargeWorldModel-LWM-Text-Chat-256K',
       'AzureOpenai-GPT-35-turbo-16k'], dtype=object)

In [39]:
llm_model = []
eval_feedback = []
eval_score = []

for i in results["llm-model"].unique():
    data = results[results["llm-model"] ==i][['question', 'answer', 'ground_truth','contexts']].to_dict(orient='records')
    llm_model.append(i)
    for experiment in tqdm(data):
        eval_prompt = evaluation_prompt_template.format_messages(
            instruction=experiment["question"],
            response=experiment["answer"],
            retrieved_information=experiment["contexts"],
            reference_answer=experiment["ground_truth"])

        eval_result = eval_chat_model.invoke(eval_prompt)
        
        feedback, score = [
        item.strip() for item in eval_result.content.split("[RESULT]")]
        eval_feedback.append(feedback)
        eval_score.append(score)
    

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

In [47]:
# llm_model listesini genişletmek
expanded_llm_model = [model for model in llm_model for _ in range(5)]

# DataFrame oluşturma
data = {
    'llm_model': expanded_llm_model,
    'evaluation_feedback': eval_feedback,
    'evaluation_score': eval_score
}

# DataFrame'i oluşturma
llm_results = pd.DataFrame(data)
llm_results["evaluation_score"] = llm_results["evaluation_score"].astype("int")
llm_results


Unnamed: 0,llm_model,evaluation_feedback,evaluation_score
0,Llama-2-13b-chat-hf,"The response accurately describes the fundamental differences between deductive and inductive reasoning in the context of Data Science. It correctly defines deductive reasoning as beginning with known premises to reach a certain conclusion, and inductive reasoning as drawing uncertain inferences based on probabilistic reasoning. The response also accurately describes how Data Science combines both reasoning types to constantly test, update, and improve models of reality. The response is slightly detailed than the reference answer but it maintains the factual correctness.",9
1,Llama-2-13b-chat-hf,"The response is mostly correct, accurate, and factual. It successfully outlines the guide's description of the transformation of data into actionable insights through the creation of data products. The response correctly mentions that data products provide actionable information without exposing decision makers to the underlying data or analytics. It also accurately highlights the shift from traditional analysis approaches to inductive and deductive reasoning. However, the response lacks the specific examples of data products mentioned in the reference answer, such as buy/sell strategies, actions to improve product yield, or steps to enhance product marketing. It also does not mention the potential decisions that can be made based on these insights, such as advertising products more or improving compliance programs cost-effectively. Nevertheless, the response is largely accurate and factual.",7
2,Llama-2-13b-chat-hf,"The response correctly identifies the Data Lake as a crucial tool in the preparation and analysis of data for Data Science endeavors. It correctly states that the Data Lake provides a centralized repository for all of an organization's data, which is aligned with the information in the guide. The response also correctly highlights the elimination of the need for ETL processes and the enablement of real-time data analysis, which is supported by the reference answer. However, the response could have mentioned the discovery of connections and patterns that drive actionable insights, which is an important point in the guide.",9
3,Llama-2-13b-chat-hf,"The response accurately describes the concept of 'Data Science Maturity' within an organization as the progression of an organization's ability to use data to drive decision-making and improve analytical capabilities. It correctly outlines the five stages of maturity from data collection to advising, and highlights the significance of reaching the highest levels of maturity. The response also accurately mentions the importance of a robust team and transparent communication. However, it could be improved by mentioning that organizations do not need to reach maximum Data Science maturity to achieve success, as indicated in the retrieved information.",9
4,Llama-2-13b-chat-hf,"The response correctly illustrates the interdisciplinary nature of Data Science using the Data Science Venn Diagram, highlighting the overlap between computer science, mathematics, and domain expertise. Moreover, it accurately explains why these areas are critical in the field of Data Science. It presents the information in a clear and factual manner, aligning with the details provided in the reference answer. The response could have further emphasized on how each of these areas contributes to Data Science: domain expertise ensures relevance, computer science handles data manipulation, and mathematics provides the basis for analysis and prediction.",9
5,Mistral-7B-Instruct-v0.2,"The response correctly identifies the fundamental differences between deductive and inductive reasoning in the context of Data Science as outlined in the guide. It accurately describes deductive reasoning as proceeding from known premises to a certain conclusion and inductive reasoning as drawing uncertain inferences based on probabilistic reasoning. It also correctly highlights their roles in Data Science, with deductive reasoning used for hypothesis formulation and inductive reasoning used for exploratory data analysis. The response, however, includes additional information that is not in the retrieved information or the reference answer, such as the comparison between Data Science and Business Intelligence. This additional information, while relevant to the broader context of Data Science, does not directly address the specific question about the differences between deductive and inductive reasoning.",9
6,Mistral-7B-Instruct-v0.2,"The response is mostly correct, accurate, and factual. The responder correctly mentions how data science is the art of turning data into actions, and this transformation occurs through the creation of data products. They also accurately state that these data products provide actionable information without exposing decision-makers to the underlying data or analytics. The responder also correctly states that these data products allow organizations to make informed decisions. However, the response lacks specific examples of data products and how they enable decisions based on insights, as mentioned in the reference answer. Also, the term 'consolidation of data' is confusing and not explicitly mentioned in the retrieved information.",7
7,Mistral-7B-Instruct-v0.2,"The response provided a good explanation of the role of the Data Lake in data science as per the guide. It accurately identified the Data Lake as a tool used for data preparation and analysis, and correctly explained that it consolidates an organization's data into a single view, thereby eliminating the need for ETL processes. It also correctly pointed out that the Data Lake makes all data available for inquiries and helps in discovering new connections and patterns. However, the response failed to mention the significant enhancement of decision-making and opportunity identification within organizations, which is a key point in the reference answer.",9
8,Mistral-7B-Instruct-v0.2,"The response accurately explains the concept of 'Data Science Maturity' within an organization. It correctly outlines the stages of maturity, from data collection to advanced analytics, which matches the reference answer. The response also accurately states that organizations don't need to reach the maximum maturity to make significant gains, which is also implied in the guide. The response includes the importance of the culture that fosters curiosity, transparency, and trust that was not explicitly mentioned in the reference but is a part of the retrieved information. Overall, the response is factual and correct, and it covers all the necessary aspects of the topic.",10
9,Mistral-7B-Instruct-v0.2,"The response accurately describes the interdisciplinary nature of Data Science as illustrated by the Venn Diagram, highlighting the importance of computer science, mathematics, and domain expertise. It correctly identifies the roles of these areas in Data Science - computer science for testing data-driven hypotheses, mathematics for identifying patterns in data, and domain expertise for applying Data Science to real-world problems. The response also emphasizes the collaborative nature of Data Science and its forward-thinking approach, which are also critical aspects. However, it could have been more precise in explaining why these areas are crucial for effective decision-making, as mentioned in the reference answer.",9


In [45]:
llm_results.groupby("llm_model")[["evaluation_score"]].mean()

Unnamed: 0_level_0,evaluation_score
llm_model,Unnamed: 1_level_1
AzureOpenai-GPT-35-turbo-16k,9.6
LargeWorldModel-LWM-Text-Chat-256K,8.4
Llama-2-13b-chat-hf,8.6
Mistral-7B-Instruct-v0.2,8.8
