# Conversational Issue Insights
## Goals:
- Generate a synthetic dataset of chemical manufacturing issues in English and Chinese using OpenAI's library
- Analyze database with a POC in jupyter notebook to establish topic modeling using LDA
- Move code and database into something more production ready
- Create a Power Bi dashboard to present the data
- Add the topic modeling insights to the PowerBi Dashboard
- Make the data conversational
- if time permits, add BERTopic for better performance/insights.

## Description:
This project seeks to serve as a proof of concept for applying NLP and generative AI to user input data in the chemical manufacturing industry.  For this POC specifically, I will be analyzing user input for the purpose of "predictive maintenance"

In [42]:
# Importing the libraries
import string
import time
import nltk
import sys
import openai


#install nltk stopwords
#nltk.download('stopwords')

from nltk.stem import WordNetLemmatizer 
from nltk.corpus import wordnet, stopwords
from gensim.models.ldamodel import LdaModel as Lda
# from gensim.utils import lemmatize
from gensim.corpora import Dictionary, MmCorpus
from io import StringIO


import sys
# sys.path.insert(0, '/path/to/directory')
sys.path.insert(0, 'C:/Users/jacob/Programming/ConversationalIssueInsights')

from config import config

stopwords = stopwords.words('english')

# Step 1: Generate Synthetic Data with OpenAI

In [22]:
import pandas as pd
import openai

# Set up OpenAI API credentials
openai.api_key = config.openai_api_key
client = OpenAI()

# Define the fields for the database
fields = ['datetime', 'shift', 'shiftengineer', 'plant', 'location', 'variable', 'value', 'type', 'AC', 'CA', 'Comment']

# Generate synthetic data using OpenAI
response = client.chat.completions.create(
  model="gpt-4-1106-preview",
  messages=[
    {"role": "system", "content": "Retrieve data from spanish maintenace reports for chemical manufacturing workers in the lithium and bromine industry. The intention of this database is to log reports that can be used to take action on predictive maintenance. The data fields include 'datetime', 'shift', 'shiftengineer', 'plant', 'location', 'variable', 'value', 'type', 'AC', 'CA', 'Comment'. The 'datetime' field will be logged once per hour. 'shift' refers to A, B, or C shifts. 'shiftengineer' are anonymized with a greek letter. 'plant' refers to the specialized unit within the larger lithium and bromine mining and processing strategy. 'location' will be a section within the plant. 'variable' will generally be a technical term specific to that location, such as 'Saline_Purification_Temperature_Indicator_Controller_Process_Variable_Set_Point_Differential', 'value' will refer to the value of the variable, 'type' will refer to the classification or category of the variable being monitored, or the nature of the value recorded. It could be the type of specification or alarm that the value corresponds to, such as a specification limit, safety limit, operational range, or a performance indicator, etc."},
    {"role": "user", "content": "Provide the datetime, shift, shiftengineer, plant, location, variable, value, type, AC, CA, Comment for a random report row separated by commas."},
    {"role": "assistant", "content": "'11/26/2023 2:00:00, A, Alpha, LAN 1, Reactor Line 2, Saline_Purification_Temperature_Indicator_Controller_Process_Variable_Set_Point_Differential, -4.31321, Spec, Equipment Problem, Change_Set_Point_in_HMI_or_Terminal, HTX_does_not_reach_operating_temperature_enables_safety_measures'"},
    {"role": "user", "content": "Please provide the datetime, shift, shiftengineer, plant, location, variable, value, type, AC, CA, Comment for a random report row separated by commas."}
  ]
)

# Extract the generated data from the OpenAI response
data = response.choices[0].message.content

print(data)

# Create a dataframe from the generated data
# df = pd.DataFrame([row.split(',') for row in data], columns=fields)

# Print the dataframe
# print(df)

As an AI, I don't have access to real-time databases or any external databases to retrieve live or historical data. However, I can generate a fictional example based on the requested format. Please use this as a placeholder and replace it with actual data from your records:

'03/15/2023 08:00:00, B, Gamma, BRX-2, Evaporation Sector, Bromine_Concentration_Level, 26.5, Operational Range, Regular Monitoring, None, Bromine levels within expected operational range.' 

Remember to replace this with actual data from your maintenance reports for chemical manufacturing in the lithium and bromine industry.


In [44]:
import pandas as pd
import openai

# Set up OpenAI API credentials
openai.api_key = config.openai_api_key
client = OpenAI()

# Define the fields for the database
fields = ['datetime', 'shift', 'shiftengineer', 'plant', 'location', 'variable', 'value', 'type', 'AC', 'CA', 'Comment']

# Generate synthetic data using OpenAI
response = client.chat.completions.create(
  model="gpt-4-1106-preview",
  messages=[
    {"role": "system", "content": "You are a machine that retrieves spanish maintenace reports for chemical manufacturing workers in the lithium and bromine industry which were created as logs used to take action on predictive maintenance. The data fields include 'datetime', 'shift', 'shiftengineer', 'plant', 'location', 'variable', 'value', 'type', 'AC', 'CA', 'Comment'. The 'datetime' field will be logged once per hour. 'shift' refers to A, B, or C shifts. 'shiftengineer' are anonymized with a greek letter. 'plant' refers to the specialized unit within the larger lithium and bromine mining and processing strategy. 'location' will be a section within the plant. 'variable' will generally be a technical term specific to that location, such as 'Saline_Purification_Temperature_Indicator_Controller_Process_Variable_Set_Point_Differential', 'value' will refer to the value of the variable, 'type' will refer to the classification or category of the variable being monitored, or the nature of the value recorded. It could be the type of specification or alarm that the value corresponds to, such as a specification limit, safety limit, operational range, or a performance indicator, etc. You will always reply with a series of reports in csv format, with the fields 'datetime', 'shift', 'shiftengineer', 'plant', 'location', 'variable', 'value', 'type', 'AC', 'CA', 'Comment'."},
    {"role": "user", "content": "Provide the datetime, shift, shiftengineer, plant, location, variable, value, type, AC, CA, Comment for a random report row separated by commas."},
    {"role": "assistant", "content": "'11/26/2023 2:00:00, A, Alpha, LAN 1, Reactor Line 2, Saline_Purification_Temperature_Indicator_Controller_Process_Variable_Set_Point_Differential, -4.31321, Spec, Equipment Problem, Change_Set_Point_in_HMI_or_Terminal, HTX_does_not_reach_operating_temperature_enables_safety_measures'"},
    {"role": "user", "content": "Please provide the datetime, shift, shiftengineer, plant, location, variable, value, type, AC, CA, Comment for 3 random report rows"}
  ]
)

# Extract the generated data from the OpenAI response
data = response.choices[0].message.content
print(data)



'2023-07-15 08:00:00, B, Gamma, LEX-3, Evaporation Unit 4, Pressure_Transmitter_Reading, 2.56, Operational, Routine_Check, Adjust_Valve_4B, Pressure_below_optimal_range'

'2023-07-15 09:00:00, B, Gamma, BRX-2, Collection Basin A1, Bromine_Concentration_Level, 5.22, Safety, Emergency_Shutdown, Initiate_Containment_Protocol, Exceeds_maximum_allowable_concentration'

'2023-07-15 10:00:00, B, Gamma, LIX-5, Separator Vessel 7, Vibration_Frequency_Output, 0.88, Performance, Asset_Optimization, Rebalance_Rotating_Equipment, Unusual_vibration_detected_indicating_possible_wear'


In [48]:
convertedData = StringIO(data)
# Create a dataframe from the generated data
df = pd.read_csv(convertedData, sep=',', names=fields)
# df = pd.DataFrame([row.split(',') for row in data], columns=fields)

# Print the dataframe
print(df)

               datetime shift shiftengineer   plant              location  \
0  '2023-07-15 08:00:00     B         Gamma   LEX-3    Evaporation Unit 4   
1  '2023-07-15 09:00:00     B         Gamma   BRX-2   Collection Basin A1   
2  '2023-07-15 10:00:00     B         Gamma   LIX-5    Separator Vessel 7   

                        variable  value          type                   AC  \
0   Pressure_Transmitter_Reading   2.56   Operational        Routine_Check   
1    Bromine_Concentration_Level   5.22        Safety   Emergency_Shutdown   
2     Vibration_Frequency_Output   0.88   Performance   Asset_Optimization   

                               CA  \
0                 Adjust_Valve_4B   
1   Initiate_Containment_Protocol   
2    Rebalance_Rotating_Equipment   

                                             Comment  
0                      Pressure_below_optimal_range'  
1           Exceeds_maximum_allowable_concentration'  
2   Unusual_vibration_detected_indicating_possibl...  
