### Objective

In this notebook, we aim to develop the data scientist trainer that can simulate the process of defining a data science project.

### 1. Load libraries

In [1]:
from langchain.prompts import PromptTemplate
from langchain.chat_models import ChatOpenAI, AzureChatOpenAI
from langchain.llms import OpenAI, AzureOpenAI
from langchain.memory import ConversationBufferMemory
from langchain.prompts import (
    ChatPromptTemplate, 
    MessagesPlaceholder, 
    SystemMessagePromptTemplate, 
    HumanMessagePromptTemplate
)
from langchain.chains import ConversationChain
import utilities
import os

### 2. Scenario generator

In [2]:
problem_type_list = ["classification", "regression", "clustering",
                    "anomaly detection", "recommendation", "time series analysis",
                    "natural language processing", "computer vision"]
business_size_list = ["small (less than 100 employees)",
                    "Medium (100-500 employees)",
                    "large (more than 500 employees)"]
industry_list = ["healthcare", "finance", "retail", "technology",
                "manufacturing", "transportation", "energy",
                "real estate", "education", "government", "non-profit"]

In [9]:
scen_generator = OpenAI(model_name="text-davinci-003", temperature=1.0)

In [4]:
scen_generator = AzureOpenAI(
                    deployment_name="deployment-5af509f3323342ee919481751c6f8b7d",
                    model_name="text-davinci-003",
                    openai_api_base="https://abb-chcrc.openai.azure.com/",
                    openai_api_version="2023-03-15-preview",
                    openai_api_key=os.environ["OPENAI_API_KEY_AZURE"],
                    openai_api_type="azure",
                    temperature=1.0
                )

In [12]:
template = """For a {industry} company of {business_size} size focusing on {problem_type} problems, 
generate a concrete data science project scenario that a data scientist might encounter in real life. 
Please provide concrete and specific details relevant to the selected industry and problem type.
For example, if the industry is {industry}, the specific details could include {details}.

For the generated scenario, please provide:
1. A concrete, specific and realistic description of a problem faced by the company.
2. The desired outcome that the company is hoping to achieve by solving the problem.
3. A list of potential data sources that might be available for solving the problem.

Output format:
Problem description: [specific and realistic content of problem description]
Desired outcome: [specific and realistic content of desired outcome]
Available data: [list of potential data sources]
"""

prompt = PromptTemplate(
    template=template,
    input_variables=["industry", "business_size", "problem_type", "details"],
)

Please provide a concrete, specific, and realistic description of a problem faced by a medium-sized manufacturing company specializing in producing electronic devices. Include details such as the types of products manufactured, machines used in the production process, common issues faced by the company, and tools and technologies used for quality control.


In [14]:
industry = "manufacturing"
business_size = "medium"
problem_type = "anomaly detection"
details = "Types of products manufactured, machines used in the production process, \
common issues faced by the company, tools and technologies used for quality control."

# Generate scenarios
response = scen_generator.predict(prompt.format(industry=industry, 
                                                business_size=business_size, 
                                                problem_type=problem_type,
                                                details=details))
print(response)


Problem description: A medium-sized manufacturing company is producing a wide variety of products such as electronic devices, furniture, and kitchen appliances. Currently, the company is facing a quality control problem because it is difficult to identify anomalous production results due to the uneven production environment. 

Desired outcome: The desired outcome for this company is to be able to accurately detect and identify anomalous production results through machine learning models to enable timely corrective action.

Available data: The following data sources can be used to solve the problem: 
(1) Machine data: recorded data from various production machines such as temperature, pressure, and flow readings.
(2) Specifications data: recorded specifications of parts used in the manufacturing process.
(3) Quality assurance data: recorded raw material and finished product inspections info.
(4) Production log data: recorded production logs from the factory. 
(5) Historical production 

#### Parse results

In [None]:
scenario = {}
sections = response.split('\n')
for section in sections:
    if ': ' in section:
        key, value = section.split(': ', 1)
        scenario[key] = value

### 3. Define two chatbots

#### 3.1 Client bot

In [None]:
class ClientBot():
    """Class definition for the client bot, created with LangChain."""

    
    def __init__(self, engine):
        """Setup journalist bot.
        """
        
        if engine == 'OpenAI':
            self.llm = ChatOpenAI(
                model_name='gpt-3.5-turbo',
                temperature=0.8
            )

        elif engine == 'Azure':
            self.llm = AzureChatOpenAI(openai_api_base="https://abb-chcrc.openai.azure.com/",
                    openai_api_version="2023-03-15-preview",
                    openai_api_key=os.environ["OPENAI_API_KEY_AZURE"],
                    openai_api_type="azure",
                    deployment_name="gpt-35-turbo-0301",
                    temperature=0.8)

        else:
            raise KeyError("Currently unsupported chat model type!")
            
        
        # Instantiate memory
        self.memory = ConversationBufferMemory(return_messages=True)

        
    def instruct(self, industry, business_size, scenario):
        """Determine the context of client chatbot. 
        """
        
        self.industry = industry
        self.business_size = business_size
        self.scenario = scenario
        
        # Define prompt template
        prompt = ChatPromptTemplate.from_messages([
            SystemMessagePromptTemplate.from_template(self._specify_system_message()),
            MessagesPlaceholder(variable_name="history"),
            HumanMessagePromptTemplate.from_template("""{input}""")
        ])
        
        # Create conversation chain
        self.conversation = ConversationChain(memory=self.memory, prompt=prompt, 
                                              llm=self.llm, verbose=False)
        

    def step(self, prompt):
        """Client chatbot speaks. 
        """
        response = self.conversation.predict(input=prompt)
        
        return response
        

    def _specify_system_message(self):
        """Specify the behavior of the client chatbot.
        """      
        
        # Prompt
        prompt = f"""You are role-playing a representative from a {self.industry} company of {self.business_size} size and 
        you are meeting with a data scientist (which is played by another bot), to discuss how to leverage machine learning 
        to address a problem your company is facing. 
        
        The problem description, desired outcome, and available data are:
        {self.scenario}.
        
        Your ultimate goal is to work with the data scientist to define a clear problem and agree on a suitable data science solution or approach.

        Guidelines to keep in mind:
        - **Engage in Conversation**: You are only the client. Do not provide the entire conversation yourself.
        - **Clarify and Confirm**: Always make sure to clarify and confirm the problem, desired outcome, and any proposed solutions with the data scientist. 
        - **Stay in Role**: Your role as a client is to represent your company's needs and work with the data scientist to define a clear problem and agree on a suitable data science solution or approach.
        - **Provide Detailed Information**: Provide as much detail as possible about the problem, available data, constraints, and requirements. If the data scientist asks a question and the information was not provided in the problem description, it is okay to improvise and create details that seem reasonable.
        - **Collaborate**: Collaborate with the data scientist to clearly define the problem and to consider any proposed solutions or approaches.
        """

        return prompt

You are role-playing a representative from a {self.industry} company of {self.business_size} size and 
you are meeting with a data scientist (played by another bot) to discuss how to leverage machine learning 
to address a specific problem your company is facing. 

The problem description, desired outcome, and available data are:
{self.scenario}.

Your ultimate goal is to work with the data scientist to define a clear problem and agree on a suitable data science solution or approach.

Guidelines to keep in mind:
- **Get Straight to the Point**: Start the conversation by directly addressing the problem at hand. There is no need for pleasantries or introductions.
- **Engage in Conversation**: Respond to the data scientist's questions and prompts. Do not provide all the information at once or provide the entire conversation yourself.
- **Clarify and Confirm**: Always make sure to clarify and confirm the problem, desired outcome, and any proposed solutions with the data scientist.
- **Stay in Role**: Your role as a client is to represent your company's needs and work with the data scientist to define a clear problem and agree on a suitable data science solution or approach.
- **Provide Information as Needed**: Provide information about the problem, available data, constraints, and requirements as it becomes relevant in the conversation. If the data scientist asks a question and the information was not provided in the problem description, it is okay to improvise and create details that seem reasonable.
- **Collaborate**: Collaborate with the data scientist to clearly define the problem and to consider any proposed solutions or approaches.


#### 3.2 Data scientist bot

In [None]:
class DataScientistBot():
    """Class definition for the data scientist bot, created with LangChain."""

    
    def __init__(self, engine):
        """Setup journalist bot.
        """
        
        if engine == 'OpenAI':
            self.llm = ChatOpenAI(
                model_name='gpt-3.5-turbo',
                temperature=0.8
            )

        elif engine == 'Azure':
            self.llm = AzureChatOpenAI(openai_api_base="https://abb-chcrc.openai.azure.com/",
                    openai_api_version="2023-03-15-preview",
                    openai_api_key=os.environ["OPENAI_API_KEY_AZURE"],
                    openai_api_type="azure",
                    deployment_name="gpt-35-turbo-0301",
                    temperature=0.8)

        else:
            raise KeyError("Currently unsupported chat model type!")
            
        
        # Instantiate memory
        self.memory = ConversationBufferMemory(return_messages=True)

        
    def instruct(self, industry, business_size, scenario):
        """Determine the context of data scientist chatbot. 
        """
        
        self.industry = industry
        self.business_size = business_size
        self.problem_type = problem_type
        
        # Define prompt template
        prompt = ChatPromptTemplate.from_messages([
            SystemMessagePromptTemplate.from_template(self._specify_system_message()),
            MessagesPlaceholder(variable_name="history"),
            HumanMessagePromptTemplate.from_template("""{input}""")
        ])
        
        # Create conversation chain
        self.conversation = ConversationChain(memory=self.memory, prompt=prompt, 
                                              llm=self.llm, verbose=False)
        

    def step(self, prompt):
        """Data scientist chatbot speaks. 
        """
        response = self.conversation.predict(input=prompt)
        
        return response
        

    def _specify_system_message(self):
        """Specify the behavior of the data scientist chatbot.
        """      
        
        # Prompt
        prompt = f"""You are role-playing a data scientist meeting with a representative (which is played by another chatbot) 
        from a {self.industry} company of {self.business_size} size. They are currently concerned with 
        a {self.problem_type} problem.

        Your ultimate goal is to understand the problem in depth and agree on a suitable data science solution or approach 
        by engaging in a conversation with the client representative. 

        Guidelines to keep in mind:
        - **Engage in Conversation**: You are only the data scientist. Do not provide the entire conversation yourself.
        - **Understand the Problem**: Make sure to ask questions to get a clear and detailed understanding of the problem, the desired outcome, available data, constraints, and requirements.
        - **Propose Solutions**: Based on the information provided by the client, suggest possible data science approaches or solutions to address the problem.
        - **Consider Constraints**: Be mindful of any constraints that the client may have, such as budget, timeline, or data limitations, and tailor your proposed solutions accordingly.
        - **Collaborate**: Collaborate with the client to refine the problem definition, proposed solutions, and ultimately agree on a suitable data science approach.
        """

        return prompt

### 4. Simulating the conversation

In [None]:
# Create two chatbots
client = ClientBot('Azure')
data_scientist = DataScientistBot('Azure')

# Specify instructions
client.instruct(industry, business_size, response)
data_scientist.instruct(industry, business_size, problem_type)

In [None]:
# Book-keeping
question_list = []
answer_list = []

# Start conversation
for i in range(6):
    if i == 0:
        question = client.step('Start the conversation')
    else:
        question = client.step(answer)
    question_list.append(question)
    print("👨‍💼 Client: " + question)
    
    answer = data_scientist.step(question)
    answer_list.append(answer)

    print("👩‍💻 Data Scientist: " + answer)
    print("\n\n")

### Prompt repository

#### Client bot

In [None]:
prompt = f"""You are role-playing a representative from a {self.industry} company of {self.business_size} size and 
you are meeting with a data scientist (which is played by another bot), to discuss how to leverage machine learning 
to address a problem your company is facing. 

The problem description, desired outcome, and available data are:
{self.scenario}.

Your ultimate goal is to work with the data scientist to define a clear problem and agree on a suitable data science solution or approach.

Guidelines to keep in mind:
- **Engage in Conversation**: You are only the client. Do not provide the entire conversation yourself.
- **Clarify and Confirm**: Always make sure to clarify and confirm the problem, desired outcome, and any proposed solutions with the data scientist. 
- **Stay in Role**: Your role as a client is to represent your company's needs and work with the data scientist to define a clear problem and agree on a suitable data science solution or approach.
- **Provide Detailed Information**: Provide as much detail as possible about the problem, available data, constraints, and requirements. If the data scientist asks a question and the information was not provided in the problem description, it is okay to improvise and create details that seem reasonable.
- **Collaborate**: Collaborate with the data scientist to clearly define the problem and to consider any proposed solutions or approaches.
"""