### Objective

In this notebook, we integrate the newly tested scene generator with the dual-chatbot system.

#### 1. Load libraries

In [1]:
from langchain.prompts import PromptTemplate
from langchain.chat_models import ChatOpenAI, AzureChatOpenAI
from langchain.llms import OpenAI, AzureOpenAI
from langchain.memory import ConversationBufferMemory
from langchain.prompts import (
    ChatPromptTemplate, 
    MessagesPlaceholder, 
    SystemMessagePromptTemplate, 
    HumanMessagePromptTemplate
)
from langchain.chains import ConversationChain
import utilities
import os

#### 2. Problem definition

In [2]:
problem_type_list = ["classification", "regression", "clustering",
                    "anomaly detection", "recommendation", "time series analysis",
                    "natural language processing", "computer vision"]
business_size_list = ["small (less than 100 employees)",
                    "Medium (100-500 employees)",
                    "large (more than 500 employees)"]
industry_list = ["healthcare", "finance", "retail", "technology",
                "manufacturing", "transportation", "energy",
                "real estate", "education", "government", "non-profit"]

#### 3. Generate scenarios

To generate scenarios, we adopt a two-stage appraoch, where in the first stage, we prompt the LLM to generate a broad description of the scenario, and in the second stage, we prompt the LLM to fill the details for the previously generated scenario.

In [3]:
scen_generator_llm = AzureChatOpenAI(openai_api_base="https://abb-chcrc.openai.azure.com/",
                        openai_api_version="2023-03-15-preview",
                        openai_api_key=os.environ["OPENAI_API_KEY_AZURE"],
                        openai_api_type="azure",
                        deployment_name="gpt-35-turbo-0301",
                        temperature=1.0)
# scen_generator = ChatOpenAI(model_name='gpt-3.5-turbo', temperature=1.0)

In [4]:
memory = ConversationBufferMemory(return_messages=True)
prompt = ChatPromptTemplate.from_messages([
    MessagesPlaceholder(variable_name="history"),
    HumanMessagePromptTemplate.from_template("""{input}""")
])

scen_generator = ConversationChain(memory=memory, prompt=prompt, 
                                  llm=scen_generator_llm, verbose=False)

In [5]:
# Generate description of the scenario
industry = "manufacturing"
business_size = "medium"
problem_type = "anomaly detection"
details = "Types of products manufactured, machines used in the production process, \
common issues faced by the company, tools and technologies used for quality control."

# Prompt
prompt = f"""For a {industry} company of {business_size} size focusing on {problem_type} problems, 
generate a concrete data science project scenario that a data scientist might encounter in real life. 
Please provide concrete and specific details relevant to the selected industry and problem type.

For the generated scenario, please provide:
1. A specific and realistic description of a problem faced by the company.
2. The desired outcome that the company is hoping to achieve by solving the problem.
3. A list of the top 3 most relevant data sources that might be available for solving the problem.

Output format:
Problem description: [content of problem description]
Desired outcome: [content of desired outcome]
Available data: [content of available data]
"""

interm_scenario = scen_generator.predict(input=prompt)
print(interm_scenario)

Problem description: A medium-sized manufacturing company specializes in producing electronic components for various industries. Recently, the quality control team has detected an increasing number of anomalies in the production process, resulting in defective products that need costly recalls and unsatisfied customers. The company suspects that these anomalies are caused by the malfunctioning of some of the machines involved in the production process.

Desired outcome: The company wants to identify the root cause(s) of the anomaly occurrences and take corrective actions to minimize costly recalls and disruptions in production. The data science project should aim to develop a predictive model that can detect anomalies early and precisely, as well as a dashboard that can provide real-time monitoring of the production process.

Available data: 
1. Production logs: The data recorded by the machines in the production process, including sensor measurement data, machine status, timestamps, a

In [6]:
# Prompt
additional_details = f"""Based on the previously generated scenario, please enrich the problem description 
by providing more specific details (such as {details}) about the problem.

Output format:
Enriched problem description: [content of enriched problem description]
Desired outcome: [content of desired outcome]
Available data: [content of available data]
"""

scenario = scen_generator.predict(input=additional_details)
print(scenario)

Enriched problem description: A medium-sized manufacturing company produces electronic components for automotive, health and beauty, and home appliance industries. The production process involves multiple machines, including CNC machines, robotics arms, and a pick and place machine to mention a few. The company has been facing issues with defects in the final products, including faulty components, irregular sizes, and inadequate finishing. Quality control is done manually by the team, which can result in human errors and missed defects. The company has invested in traditional quality control tools such as Statistical Process Control (SPC) and Failure Mode and Effect Analysis (FEMA) but is still struggling to curb the defects.

Desired outcome: The company is looking to reduce the waste produced, minimize recalls, and improve customer satisfaction. The data science project aims to develop a predictive model that can detect anomalies in real-time. Furthermore, a dashboard can provide ins

#### 4. Define two chatbots

Client bot

In [7]:
class ClientBot():
    """Class definition for the client bot, created with LangChain."""

    
    def __init__(self, engine):
        """Setup journalist bot.
        """   
        if engine == 'OpenAI':
            self.llm = ChatOpenAI(
                model_name='gpt-3.5-turbo',
                temperature=0.8
            )

        elif engine == 'Azure':
            self.llm = AzureChatOpenAI(openai_api_base="https://abb-chcrc.openai.azure.com/",
                    openai_api_version="2023-03-15-preview",
                    openai_api_key=os.environ["OPENAI_API_KEY_AZURE"],
                    openai_api_type="azure",
                    deployment_name="gpt-35-turbo-0301",
                    temperature=0.8)

        else:
            raise KeyError("Currently unsupported chat model type!")
            
        # Instantiate memory
        self.memory = ConversationBufferMemory(return_messages=True)

        
    def instruct(self, industry, business_size, scenario):
        """Determine the context of client chatbot. 
        """
        
        self.industry = industry
        self.business_size = business_size
        self.scenario = scenario
        
        # Define prompt template
        prompt = ChatPromptTemplate.from_messages([
            SystemMessagePromptTemplate.from_template(self._specify_system_message()),
            MessagesPlaceholder(variable_name="history"),
            HumanMessagePromptTemplate.from_template("""{input}""")
        ])
        
        # Create conversation chain
        self.conversation = ConversationChain(memory=self.memory, prompt=prompt, 
                                              llm=self.llm, verbose=False)
        

    def step(self, prompt):
        """Client chatbot speaks. 
        """
        response = self.conversation.predict(input=prompt)
        
        return response
        

    def _specify_system_message(self):
        """Specify the behavior of the client chatbot.
        """      
        
        # Prompt
        prompt = f"""You are role-playing a representative from a {self.industry} company of {self.business_size} size and 
        you are meeting with a data scientist (which is played by another bot), to discuss how to leverage machine learning 
        to address a problem your company is facing. 
        
        The problem description, desired outcome, and available data are:
        {self.scenario}.
        
        Your ultimate goal is to work with the data scientist to define a clear problem and agree on a suitable data science solution or approach.

        Guidelines to keep in mind:
        - **Get Straight to the Point**: Start the conversation by directly addressing the problem at hand. There is no need for pleasantries or introductions.
        - **Engage in Conversation**: Respond to the data scientist's questions and prompts. Do not provide all the information at once or provide the entire conversation yourself.
        - **Clarify and Confirm**: Always make sure to clarify and confirm the problem, desired outcome, and any proposed solutions with the data scientist. 
        - **Stay in Role**: Your role as a client is to represent your company's needs and work with the data scientist to define a clear problem and agree on a suitable data science solution or approach. Do not try to propose solutions.
        - **Provide Information as Needed**: Provide information about the problem, available data, constraints, and requirements as it becomes relevant in the conversation. If the data scientist asks a question and the information was not provided in the problem description, it is okay to improvise and create details that seem reasonable.
        - **Collaborate**: Collaborate with the data scientist to clearly define the problem and to consider any proposed solutions or approaches.
        """

        return prompt

Data scientist bot

In [10]:
class DataScientistBot():
    """Class definition for the data scientist bot, created with LangChain."""

    
    def __init__(self, engine):
        """Setup journalist bot.
        """        
        if engine == 'OpenAI':
            self.llm = ChatOpenAI(
                model_name='gpt-3.5-turbo',
                temperature=0.8
            )

        elif engine == 'Azure':
            self.llm = AzureChatOpenAI(openai_api_base="https://abb-chcrc.openai.azure.com/",
                    openai_api_version="2023-03-15-preview",
                    openai_api_key=os.environ["OPENAI_API_KEY_AZURE"],
                    openai_api_type="azure",
                    deployment_name="gpt-35-turbo-0301",
                    temperature=0.8)

        else:
            raise KeyError("Currently unsupported chat model type!")
            
        # Instantiate memory
        self.memory = ConversationBufferMemory(return_messages=True)

        
    def instruct(self, industry, business_size, scenario):
        """Determine the context of data scientist chatbot. 
        """
        
        self.industry = industry
        self.business_size = business_size
        self.problem_type = problem_type
        
        # Define prompt template
        prompt = ChatPromptTemplate.from_messages([
            SystemMessagePromptTemplate.from_template(self._specify_system_message()),
            MessagesPlaceholder(variable_name="history"),
            HumanMessagePromptTemplate.from_template("""{input}""")
        ])
        
        # Create conversation chain
        self.conversation = ConversationChain(memory=self.memory, prompt=prompt, 
                                              llm=self.llm, verbose=False)
        

    def step(self, prompt):
        """Data scientist chatbot speaks. 
        """
        response = self.conversation.predict(input=prompt)
        
        return response
        

    def _specify_system_message(self):
        """Specify the behavior of the data scientist chatbot.
        """      
        
        # Prompt
        prompt = f"""You are role-playing a data scientist meeting with a representative (which is played by another chatbot) 
        from a {self.industry} company of {self.business_size} size. They are currently concerned with 
        a {self.problem_type} problem.

        Your ultimate goal is to understand the problem in depth and agree on a suitable data science solution or approach 
        by engaging in a conversation with the client representative. 

        Guidelines to keep in mind:
        - **Engage in Conversation**: You are only the data scientist. Do not provide the entire conversation yourself.
        - **Understand the Problem**: Make sure to ask questions to get a clear and detailed understanding of the problem, the desired outcome, available data, constraints, and requirements.
        - **Propose Solutions**: Based on the information provided by the client, suggest possible data science approaches or solutions to address the problem.
        - **Consider Constraints**: Be mindful of any constraints that the client may have, such as budget, timeline, or data limitations, and tailor your proposed solutions accordingly.
        - **Collaborate**: Collaborate with the client to refine the problem definition, proposed solutions, and ultimately agree on a suitable data science approach.
        """

        return prompt

#### 5. Simulate the conversation

In [11]:
# Create two chatbots
client = ClientBot('Azure')
data_scientist = DataScientistBot('Azure')

# Specify instructions
client.instruct(industry, business_size, scenario)
data_scientist.instruct(industry, business_size, problem_type)

In [12]:
# Book-keeping
question_list = []
answer_list = []

# Start conversation
for i in range(6):
    if i == 0:
        question = client.step('Start the conversation')
    else:
        question = client.step(answer)
    question_list.append(question)
    print("👨‍💼 Client: " + question)
    
    answer = data_scientist.step(question)
    answer_list.append(answer)

    print("👩‍💻 Data Scientist: " + answer)
    print("\n\n")

👨‍💼 Client: Hello, thank you for taking the time to meet with me. We are a manufacturing company that produces electronic components for the automotive, health and beauty, and home appliance industries. We have been facing issues with defects in the final products, including faulty components, irregular sizes, and inadequate finishing. The quality control is done manually by the team, which can result in human errors and missed defects. We have invested in traditional quality control tools such as Statistical Process Control (SPC) and Failure Mode and Effect Analysis (FEMA) but still struggling to curb the defects. We are hoping to work with a data scientist to develop a predictive model that can detect anomalies in real-time, in order to reduce the waste produced, minimize recalls, and improve customer satisfaction. We have some data available that we hope can be useful in developing this model.
👩‍💻 Data Scientist: Hello, thank you for providing an overview of the problem, it helps me

👨‍💼 Client: Thank you for providing detailed information regarding the approaches. Based on the information provided, we think that the multi-modal deep learning approach would be more suitable for our needs because we have a reasonable amount of labeled data, and we believe that the neural network could learn from the data to identify anomalies in real-time. However, we would like to consider the feasibility of implementing the solution in our production environment before making a final decision. We also would like to know more about the pre-processing and cleaning of data and the performance metrics used to evaluate the model. Can you provide more details about those?
👩‍💻 Data Scientist: Sure, I can provide more information.

For the pre-processing and cleaning of data, we would perform the following steps:

1. Data Cleaning: We would remove any irrelevant or inconsistent data, handle any missing values, and correct any data entry errors.

2. Feature Engineering: We would extract re

#### 6. Summarizer

The objective of summarizer is to condense the conversation rounds between the client bot and data scientist bot.

In [27]:
summarizer_llm = AzureOpenAI(
                    deployment_name="deployment-5af509f3323342ee919481751c6f8b7d",
                    model_name="text-davinci-003",
                    openai_api_base="https://abb-chcrc.openai.azure.com/",
                    openai_api_version="2023-03-15-preview",
                    openai_api_key=os.environ["OPENAI_API_KEY_AZURE"],
                    openai_api_type="azure")

In [50]:
template = """Please concisely summarize the following segment of a conversation between a client and 
a data scientist discussing a potential data science project:

{conversation}
"""

prompt = PromptTemplate(
    template=template,
    input_variables=["conversation"],
)

In [51]:
# Loop over individual rounds
conversation_summary = []
for i, (q, a) in enumerate(zip(question_list, answer_list)):
    print(f"Processing {i+1}/{len(question_list)}th conversation round.")
    
    # Compile one round of conversation
    conversation_round = ''
    conversation_round += 'Client: ' + q + '\n\n'
    conversation_round += 'Data scientist: ' + a

    response = summarizer_llm.predict(prompt.format(conversation=conversation_round))
    conversation_summary.append(response)

Processing 1/6th conversation round.
Processing 2/6th conversation round.
Processing 3/6th conversation round.
Processing 4/6th conversation round.
Processing 5/6th conversation round.
Processing 6/6th conversation round.


#### 7. Analyst

The objective of the analyst bot is to examine the conversation between the client and data scientist bot, and provide feedback on the strategy adopted by the data scientist bot for problem defining and scoping, as well as indicate potential area for improvement.

In [56]:
analyst_llm = AzureOpenAI(
                    deployment_name="deployment-5af509f3323342ee919481751c6f8b7d",
                    model_name="text-davinci-003",
                    openai_api_base="https://abb-chcrc.openai.azure.com/",
                    openai_api_version="2023-03-15-preview",
                    openai_api_key=os.environ["OPENAI_API_KEY_AZURE"],
                    openai_api_type="azure")

In [59]:
template = """You are a senior data scientist who has been asked to review a conversation between a data scientist 
and a client from a {industry} company of {business_size} size, focusing on a {problem_type} problem. 
The client and data scientist are discussing how to define and scope a data science project to address the problem.

Please provide an assessment of the conversation, focusing on the strategy adopted by the data scientist to 
define and scope the problem, any potential room for improvement, and any other key points you think are important.
Please organize your response with nicely formatted bulletpoints.

Here is the conversation: 
{conversation}
"""

prompt = PromptTemplate(
    template=template,
    input_variables=["industry", "business_size", "problem_type", "conversation"],
)

In [60]:
analysis = analyst_llm.predict(prompt.format(industry=industry,
                                            business_size=business_size,
                                            problem_type=problem_type,
                                            conversation=' '.join(conversation_summary)))
print(analysis)


Assessment: 
- The data scientist has adopted an effective strategy to define and scope the problem by providing detailed information on each approach, including the amount of data required, metrics to evaluate performance, expected timelines and costs, and the feasibility and cost of collecting and labeling large amounts of data.
- The data scientist has demonstrated knowledge and expertise in the domain of anomaly detection and has provided the client with a potential solution using a multi-modal deep learning approach. 
- The data scientist has taken into account the client's constraints and needs, offering their expertise and support, and welcoming the client's questions and concerns.
- The data scientist has provided information on the pre-processing and cleaning of data, performance metrics to evaluate the model, as well as the hardware and software requirements, data availability in real-time, and the integration with existing systems.
- There is room for improvement in terms o