# Generative AI Use Cases for GTTS:

---
Welcome to Amazon [SageMaker JumpStart](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-jumpstart.html)! You can use SageMaker JumpStart to solve many Machine Learning tasks through one-click in SageMaker Studio, or through [SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable/overview.html#use-prebuilt-models-with-sagemaker-jumpstart).


In this demo notebook, we demonstrate how to use the SageMaker Python SDK for deploying Foundation Models as an endpoint and use them for various NLP tasks. The Foundation models perform **Text2Text Generation**. It takes a prompting text as an input, and returns the text generated by the model according to the prompt.

Here, we show how to use the state-of-the-art pre-trained **FLAN T5 models** from [Hugging Face](https://huggingface.co/docs/transformers/model_doc/flan-t5) for Text2Text Generation in the following tasks. You can directly use FLAN-T5 model for many NLP tasks, without fine-tuning the model.


* Text summarization
* Common sense reasoning / natural language inference
* Question and answering
* Sentence / sentiment classification
* Translation
* Pronoun resolution

---

1. [Set Up](#1.-Set-Up)
2. [Select a model](#2.-Select-a-model)
3. [Retrieve Artifacts & Deploy an Endpoint](#3.-Retrieve-Artifacts-&-Deploy-an-Endpoint)
4. [Query endpoint and parse response](#4.-Query-endpoint-and-parse-response)
5. [Advanced features: How to use various parameters to control the generated text](#5.-Advanced-features:-How-to-use-various-advanced-parameters-to-control-the-generated-text)
6. [Advanced features: How to use prompts engineering to solve different tasks](#6.-Advacned-features:-How-to-use-prompts-engineering-to-solve-different-tasks)
5. [Clean up the endpoint](#5.-Clean-up-the-endpoint)

Note: This notebook was tested on ml.t3.medium instance in Amazon SageMaker Studio with Python 3 (Data Science) kernel and in Amazon SageMaker Notebook instance with conda_python3 kernel.

# 1. Set Up And model Upload

## Install Libraries

---
Before executing the notebook, there are some initial steps required for set up. This notebook requires ipywidgets.

---

In [2]:
!pip install ipywidgets==7.0.0 --quiet
!pip install --upgrade sagemaker --quiet

[0m[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
sagemaker-datawrangler 0.4.3 requires sagemaker-data-insights==0.4.0, but you have sagemaker-data-insights 0.3.3 which is incompatible.[0m[31m
[0m[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
spyder 5.1.5 requires pyqt5<5.13, which is not installed.
spyder 5.1.5 requires pyqtwebengine<5.13, which is not installed.
awscli 1.27.153 requires PyYAML<5.5,>=3.10, but you have pyyaml 6.0 which is incompatible.
docker-compose 1.29.2 requires PyYAML<6,>=3.10, but you have pyyaml 6.0 which is incompatible.
jupyterlab 3.2.1 requires jupyter-server~=1.4, but you have jupyter-server 2.6.0 which is incompatible.
jupyterlab 3.2.1 requires nbclassic~=0.2, but you have nbclassic 1.0.0 wh

#### Permissions and environment variables

---
To host on Amazon SageMaker, we need to set up and authenticate the use of AWS services. Here, we use the execution role associated with the current notebook as the AWS account role with SageMaker access. 

---

In [3]:
import sagemaker, boto3, json
from sagemaker.session import Session

sagemaker_session = Session()
aws_role = sagemaker_session.get_caller_identity_arn()
aws_region = boto3.Session().region_name
sess = sagemaker.Session()



## Select a pre-trained model
***
You can continue with the default model, or can choose a different model from the dropdown generated upon running the next cell. A complete list of SageMaker pre-trained models can also be accessed at [SageMaker pre-trained Models](https://sagemaker.readthedocs.io/en/stable/doc_utils/pretrainedmodels.html#).
***

In [4]:
model_id, model_version = (
    "huggingface-text2text-flan-t5-xl",
    "*",
)

***
[Optional] Select a different SageMaker pre-trained model. Here, we download the model_manifest file from the Built-In Algorithms s3 bucket, filter-out all the Text Generation models and select a model for inference.
***

In [5]:
from ipywidgets import Dropdown
from sagemaker.jumpstart.notebook_utils import list_jumpstart_models

# Retrieves all Text Generation models available by SageMaker Built-In Algorithms.
filter_value = "task == text2text"
text_generation_models = list_jumpstart_models(filter=filter_value)

# display the model-ids in a dropdown to select a model for inference.
model_dropdown = Dropdown(
    options=text_generation_models,
    value=model_id,
    description="Select a model",
    style={"description_width": "initial"},
    layout={"width": "max-content"},
)

#### Choose a model for Inference

In [6]:
display(model_dropdown)

A Jupyter Widget

In [7]:
# model_version="*" fetches the latest version of the model
model_id, model_version = model_dropdown.value, "*"

## Retrieve Artifacts & Deploy an Endpoint

***

Using SageMaker, we can perform inference on the pre-trained model, even without fine-tuning it first on a new dataset. We start by retrieving the `deploy_image_uri`, `deploy_source_uri`, and `model_uri` for the pre-trained model. To host the pre-trained model, we create an instance of [`sagemaker.model.Model`](https://sagemaker.readthedocs.io/en/stable/api/inference/model.html) and deploy it. This may take a few minutes.

***

In [8]:
def get_sagemaker_session(local_download_dir) -> sagemaker.Session:
    """Return the SageMaker session."""

    sagemaker_client = boto3.client(
        service_name="sagemaker", region_name=boto3.Session().region_name
    )

    session_settings = sagemaker.session_settings.SessionSettings(
        local_download_dir=local_download_dir
    )

    # the unit test will ensure you do not commit this change
    session = sagemaker.session.Session(
        sagemaker_client=sagemaker_client, settings=session_settings
    )

    return session

We need to create a directory to host the downloaded model. 

In [9]:
!mkdir -p download_dir

---
This text-to-text generation task supports a wide variety of model sizes that have different compute requirements. Here, we specify the instance type for several large models along with an environment variable to set the multi-model endpoint number of workers to 1. This ensures we can support the largest possible token lengths since additional models are not consuming GPU memory resources.

---

In [10]:
_large_model_env = {
    "SAGEMAKER_MODEL_SERVER_WORKERS": "1",
    "TS_DEFAULT_WORKERS_PER_MODEL": "1"
}
_model_env_variable_map = {
    "huggingface-text2text-flan-t5-xxl": _large_model_env,
    "huggingface-text2text-flan-t5-xxl-fp16": _large_model_env,
    "huggingface-text2text-flan-t5-xxl-bnb-int8": _large_model_env,
    "huggingface-text2text-flan-t5-xl": {"MMS_DEFAULT_WORKERS_PER_MODEL": "1"},
    "huggingface-text2text-flan-t5-large": {"MMS_DEFAULT_WORKERS_PER_MODEL": "1"},
    "huggingface-text2text-flan-ul2-bf16": _large_model_env,
    "huggingface-text2text-bigscience-t0pp": _large_model_env,
    "huggingface-text2text-bigscience-t0pp-fp16": _large_model_env,
    "huggingface-text2text-bigscience-t0pp-bnb-int8": _large_model_env,
}

In [11]:
from sagemaker import image_uris, instance_types, model_uris, script_uris
from sagemaker.model import Model
from sagemaker.predictor import Predictor
from sagemaker.utils import name_from_base


endpoint_name = name_from_base(f"jumpstart-example-{model_id}")

# Retrieve the inference instance type for the specified model.
instance_type = instance_types.retrieve_default(
    model_id=model_id, model_version=model_version, scope="inference"
)

# Retrieve the inference docker container uri. This is the base HuggingFace container image for the default model above.
deploy_image_uri = image_uris.retrieve(
    region=None,
    framework=None,  # automatically inferred from model_id
    image_scope="inference",
    model_id=model_id,
    model_version=model_version,
    instance_type=instance_type,
)

# Retrieve the inference script uri. This includes all dependencies and scripts for model loading, inference handling etc.
deploy_source_uri = script_uris.retrieve(
    model_id=model_id, model_version=model_version, script_scope="inference"
)

# Retrieve the model uri.
model_uri = model_uris.retrieve(
    model_id=model_id, model_version=model_version, model_scope="inference"
)

# Create the SageMaker model instance
if model_id in _model_env_variable_map:
    # For those large models, we already repack the inference script and model
    # artifacts for you, so the `source_dir` argument to Model is not required.
    model = Model(
        image_uri=deploy_image_uri,
        model_data=model_uri,
        role=aws_role,
        predictor_cls=Predictor,
        name=endpoint_name,
        env=_model_env_variable_map[model_id],
    )
else:
    model = Model(
        image_uri=deploy_image_uri,
        source_dir=deploy_source_uri,
        model_data=model_uri,
        entry_point="inference.py",  # entry point file in source_dir and present in deploy_source_uri
        role=aws_role,
        predictor_cls=Predictor,
        name=endpoint_name,
        sagemaker_session=get_sagemaker_session("download_dir"),
    )

# deploy the Model. Note that we need to pass Predictor class when we deploy model through Model class,
# for being able to run inference through the sagemaker API.
model_predictor = model.deploy(
    initial_instance_count=1,
    instance_type=instance_type,
    predictor_cls=Predictor,
    endpoint_name=endpoint_name,
)

-----------!

# 2. Intro EXAMPLE : 

### Reasoning:

In [182]:
payload = {
    "text_inputs": "Can Geoffrey Hinton have a conversation with George Washington? \n Give the rationale before answering.",
    "max_length": 50,
    "max_time": 50,
    "num_return_sequences": 3,
    "top_k": 50,
    "top_p": 0.95,
    "do_sample": True,
}


def query_endpoint_with_json_payload(encoded_json, endpoint_name):
    client = boto3.client("runtime.sagemaker")
    response = client.invoke_endpoint(
        EndpointName=endpoint_name, ContentType="application/json", Body=encoded_json
    )
    return response


query_response = query_endpoint_with_json_payload(
    json.dumps(payload).encode("utf-8"), endpoint_name=endpoint_name
)


def parse_response_multiple_texts(query_response):
    model_predictions = json.loads(query_response["Body"].read())
    generated_text = model_predictions["generated_texts"]
    return generated_text


generated_texts = parse_response_multiple_texts(query_response)
print(generated_texts)

['George Washington died in 1799. Geoffrey Hinton was born in 1928. So the final answer is no.', 'George Washington died in 1799. Geoffrey Hinton was born in 1890. So the final answer is no.', 'George Washington died in 1799. Geoffrey Hinton was born in 1928. So the final answer is no.']


## Aspect based sentiment analysis

In [150]:
def make_prompt(num_shots):
    prompt = ''
    for i in range(num_shots + 1):
        if i == num_shots:
            dialogue = data[i]
            summary = answer[i]
            prompt = prompt + f'{start_prompt}{dialogue}{end_prompt}'
        else:
            dialogue = data[i]
            summary = answer[i]
            prompt = prompt + f'{start_prompt}{dialogue}{end_prompt}{summary}\n{stop_sequence}\n'
    return prompt

In [151]:
data=['I like the film, but the actor James was not performing as usual',
     'I loved the food in the restaurant but the place was not clean',
     'I enjoyed the hotel room but the reception guy was unpleasant',
     'I liked the boat trip, but the weather was not good']
answer=['Film : positive, Actor James: Negative',
       'Food : positive, Place: negative',
       'Hotel room: positive, Reception guy: negative',
       'Boat trip: positive, Weather: negative']

In [152]:
start_prompt = 'Review:\n'
end_prompt = '\nWhat are the entities and their associated sentiments? '
stop_sequence = '\n\n\n'

In [153]:
few_shot_prompt = make_prompt(3)
print(few_shot_prompt)

Review:
I like the film, but the actor James was not performing as usual
What are the entities and their associated sentiments? Film : positive, Actor James: Negative




Review:
I loved the food in the restaurant but the place was not clean
What are the entities and their associated sentiments? Food : positive, Place: negative




Review:
I enjoyed the hotel room but the reception guy was unpleasant
What are the entities and their associated sentiments? Hotel room: positive, Reception guy: negative




Review:
I liked the boat trip, but the weather was not good
What are the entities and their associated sentiments? 


In [154]:
payload = {"text_inputs": few_shot_prompt, **parameters}
query_response = query_endpoint_with_json_payload(
    json.dumps(payload).encode("utf-8"), endpoint_name=endpoint_name
)
generated_texts = parse_response_multiple_texts(query_response)
print(f"{bold} The reasoning result is{unbold}: '{generated_texts}'{newline}")

[1m The reasoning result is[0m: '['negative, Boat trip: positive, Weather: negative']'



## Common sense reasoning:

In [1264]:
premise = "The film was long and there was scenes of beaches."
hypothesis = "Is the review positive or negative?"
options = """["positive", "negative"]"""

In [1265]:
prompts = [
    """{premise}\n\nBased on the paragraph above can we conclude that "\"{hypothesis}\"?\n\n{options_}""",
    """{premise}\n\nBased on that paragraph can we conclude that this sentence is true?\n{hypothesis}\n\n{options_}""",
    """{premise}\n\nCan we draw the following conclusion?\n{hypothesis}\n\n{options_}""",
    """{premise}\nDoes this next sentence follow, given the preceding text?\n{hypothesis}\n\n{options_}""",
    """{premise}\nCan we infer the following?\n{hypothesis}\n\n{options_}""",
    """Read the following paragraph and determine if the hypothesis is true:\n\n{premise}\n\nHypothesis: {hypothesis}\n\n{options_}""",
    """Read the text and determine if the sentence is true:\n\n{premise}\n\nSentence: {hypothesis}\n\n{options_}""",
    """Can we draw the following hypothesis from the context? \n\nContext:\n\n{premise}\n\nHypothesis: {hypothesis}\n\n{options_}""",
    """Determine if the sentence is true based on the text below:\n{hypothesis}\n\n{premise}\n{options_}""",
]

parameters = {
    "max_length": 50,
    "max_time": 50,
    "num_return_sequences": 1,
    "top_k": 50,
    "top_p": 0.95,
    "do_sample": True,
}


for each_prompt in prompts:
    input_text = each_prompt.replace("{premise}", premise)
    input_text = input_text.replace("{hypothesis}", hypothesis)
    input_text = input_text.replace("{options_}", options)
    print(f"{bold} For prompt{unbold}: '{input_text}'{newline}")
    payload = {"text_inputs": input_text, **parameters}
    query_response = query_endpoint_with_json_payload(
        json.dumps(payload).encode("utf-8"), endpoint_name=endpoint_name
    )
    generated_texts = parse_response_multiple_texts(query_response)
    print(f"{bold} The reasoning result is{unbold}: '{generated_texts}'{newline}")

[1m For prompt[0m: 'The film was long and there was scenes of beaches.

Based on the paragraph above can we conclude that ""Is the review positive or negative?"?

["positive", "negative"]'

[1m The reasoning result is[0m: '['negative']'

[1m For prompt[0m: 'The film was long and there was scenes of beaches.

Based on that paragraph can we conclude that this sentence is true?
Is the review positive or negative?

["positive", "negative"]'

[1m The reasoning result is[0m: '['negative']'

[1m For prompt[0m: 'The film was long and there was scenes of beaches.

Can we draw the following conclusion?
Is the review positive or negative?

["positive", "negative"]'

[1m The reasoning result is[0m: '['negative']'

[1m For prompt[0m: 'The film was long and there was scenes of beaches.
Does this next sentence follow, given the preceding text?
Is the review positive or negative?

["positive", "negative"]'

[1m The reasoning result is[0m: '['negative']'

[1m For prompt[0m: 'The film 

# 3. Cap GPT: (1.0)

## Dissociating problems from improvements (positive or negative change) : Sentence / Sentiment Classification

In [1617]:
#sentence = 'Submitted override on Sunday due to severe weather in area + yard closures 6/21-6/22 - Site has in overrides stating Floor Layout'
sentence='Site has decreased labor from 49.1 to 43.7 in past 4 weeks due to increase of utilisation of Kangaroo volume and cube. Site improved utilization of their Linear Sorter and capouts problems are decreasing in current week.'

options_ = """OPTIONS:\n-positive \n-negative """

In [1618]:
options_ = """OPTIONS:\n-positive \n-negative """
prompts = [
    """Review:\n{sentence}\nIs this customer review sentence negative or positive?\n{options_}""",
    """Short review: {sentence}\nDid the critic think positively or negatively of the operations?\n{options_}""",
    """\"{sentence}\"\nHow would the sentiment of this sentence be perceived?\n\n{options_}""",
    """Is the sentiment of the following sentence positive or negative?\n{sentence}\n{options_}""",
    """What is the sentiment of the following review sentence?\n{sentence}\n{options_}""",
]

parameters = {
    "max_length": 50,
    "max_time": 50,
    "num_return_sequences": 1,
    "top_k": 50,
    "top_p": 0.95,
    "do_sample": True,
}


for each_prompt in prompts:
    input_text = each_prompt.replace("{sentence}", sentence)
    input_text = input_text.replace("{options_}", options_)
    print(f"{bold} For prompt{unbold}: '{input_text}'{newline}")
    payload = {"text_inputs": input_text, **parameters}
    query_response = query_endpoint_with_json_payload(
        json.dumps(payload).encode("utf-8"), endpoint_name=endpoint_name
    )
    generated_texts = parse_response_multiple_texts(query_response)
    print(f"{bold} The reasoning result is{unbold}: '{generated_texts}'{newline}")

[1m For prompt[0m: 'Review:
Site has decreased labor from 49.1 to 43.7 in past 4 weeks due to increase of utilisation of Kangaroo volume and cube. Site improved utilization of their Linear Sorter and capouts problems are decreasing in current week.
Is this customer review sentence negative or positive?
OPTIONS:
-positive 
-negative '

[1m The reasoning result is[0m: '['positive']'

[1m For prompt[0m: 'Short review: Site has decreased labor from 49.1 to 43.7 in past 4 weeks due to increase of utilisation of Kangaroo volume and cube. Site improved utilization of their Linear Sorter and capouts problems are decreasing in current week.
Did the critic think positively or negatively of the operations?
OPTIONS:
-positive 
-negative '

[1m The reasoning result is[0m: '['positive']'

[1m For prompt[0m: '"Site has decreased labor from 49.1 to 43.7 in past 4 weeks due to increase of utilisation of Kangaroo volume and cube. Site improved utilization of their Linear Sorter and capouts pro

## Detecting what the problem was : (Question answering)

### Dissociating forecast errors from operational changes:

In [1623]:
context='Submitted override on Sunday due to severe weather in area + yard closures 6/21-6/22 - Site has in overrides stating Floor Layout'
#context='Site has decreased labor from 49.1 to 43.7 in past 4 weeks due to increase of utilisation of Kangaroo volume and cube. Site improved utilization of their Linear Sorter and capouts problems are decreasing in current week.'
context='Problem in forecast, expected capacity is of 10k, while reality shows 5k capacity only. Had to move away volume'

question='Is the problem related to an error in a forecast or to an operational change?'


In [1624]:
prompts = [
    """Answer based on context:\n\n{context}\n\n{question}""",
    """{context}\n\nAnswer this question based on the article: {question}""",
    """{context}\n\n{question}""",
    """{context}\nAnswer this question: {question}""",
    """Read this article and answer this question {context}\n{question}""",
    """{context}\n\nBased on the above article, answer a question. {question}""",
    """Write an article that answers the following question: {question} {context}""",
    """Article: {context} \n\nAnswer this question based on the article:{question}""",
]


parameters = {
    "max_length": 50,
    "max_time": 50,
    "num_return_sequences": 1,
    "top_k": 50,
    "top_p": 0.95,
    "do_sample": True,
}


for each_prompt in prompts:
    input_text = each_prompt.replace("{context}", context)
    input_text = input_text.replace("{question}", question)
    print(f"{bold} For prompt{unbold}: '{each_prompt}'{newline}")
    payload = {"text_inputs": input_text, **parameters}
    query_response = query_endpoint_with_json_payload(
        json.dumps(payload).encode("utf-8"), endpoint_name=endpoint_name
    )
    generated_texts = parse_response_multiple_texts(query_response)
    print(f"{bold} The reasoning result is{unbold}: '{generated_texts}'{newline}")

[1m For prompt[0m: 'Answer based on context:

{context}

{question}'

[1m The reasoning result is[0m: '['An error in a forecast']'

[1m For prompt[0m: '{context}

Answer this question based on the article: {question}'

[1m The reasoning result is[0m: '['error in a forecast']'

[1m For prompt[0m: '{context}

{question}'

[1m The reasoning result is[0m: '['operational change']'

[1m For prompt[0m: '{context}
Answer this question: {question}'

[1m The reasoning result is[0m: '['']'

[1m For prompt[0m: 'Read this article and answer this question {context}
{question}'

[1m The reasoning result is[0m: '['an error in a forecast']'

[1m For prompt[0m: '{context}

Based on the above article, answer a question. {question}'

[1m The reasoning result is[0m: '['an error in a forecast']'

[1m For prompt[0m: 'Write an article that answers the following question: {question} {context}'

[1m The reasoning result is[0m: '['Capacity in this week was 5 k, actual was 5 k volume. 

### Detecting problem and root cause : (Question Answering)

In [1627]:
context='Submitted override on Sunday due to severe weather in area + yard closures 6/21-6/22 - Site has in overrides stating Floor Layout'
#context='Site has decreased labor from 49.1 to 43.7 in past 4 weeks due to increase of utilisation of Kangaroo volume and cube. Site improved utilization of their Linear Sorter and capouts problems are decreasing in current week.'
context='Problem in forecast, expected capacity is of 10k, while reality shows 5k capacity only. Had to move away volume'


question='What is the problem and what caused it?'

In [1628]:
prompts = [
    """Answer based on context:\n\n{context}\n\n{question}""",
    """{context}\n\nAnswer this question based on the article: {question}""",
    """{context}\n\n{question}""",
    """{context}\nAnswer this question: {question}""",
    """Read this article and answer this question {context}\n{question}""",
    """{context}\n\nBased on the above article, answer a question. {question}""",
    """Write an article that answers the following question: {question} {context}""",
    """Article: {context} \n\nAnswer this question based on the article:{question}""",
]


parameters = {
    "max_length": 50,
    "max_time": 50,
    "num_return_sequences": 1,
    "top_k": 50,
    "top_p": 0.95,
    "do_sample": True,
}


for each_prompt in prompts:
    input_text = each_prompt.replace("{context}", context)
    input_text = input_text.replace("{question}", question)
    print(f"{bold} For prompt{unbold}: '{each_prompt}'{newline}")
    payload = {"text_inputs": input_text, **parameters}
    query_response = query_endpoint_with_json_payload(
        json.dumps(payload).encode("utf-8"), endpoint_name=endpoint_name
    )
    generated_texts = parse_response_multiple_texts(query_response)
    print(f"{bold} The reasoning result is{unbold}: '{generated_texts}'{newline}")

[1m For prompt[0m: 'Answer based on context:

{context}

{question}'

[1m The reasoning result is[0m: '['Volume not reaching capacity.']'

[1m For prompt[0m: '{context}

Answer this question based on the article: {question}'

[1m The reasoning result is[0m: '['Forecast, expected capacity is of 10k, while reality shows 5k capacity only. Had to move away volume']'

[1m For prompt[0m: '{context}

{question}'

[1m The reasoning result is[0m: '['In reality there is lesser capacity than expected due to shortage of water.']'

[1m For prompt[0m: '{context}
Answer this question: {question}'

[1m The reasoning result is[0m: '['Problem in forecast, expected capacity is of 10k, while reality shows 5k capacity only. Had to move away volume']'

[1m For prompt[0m: 'Read this article and answer this question {context}
{question}'

[1m The reasoning result is[0m: '['There is a problem in forecast']'

[1m For prompt[0m: '{context}

Based on the above article, answer a question. {qu

### Personalizing CAP info extraction (Few Shot Prompting):

In [1602]:
def make_prompt(num_shots):
    prompt = ''
    for i in range(num_shots + 1):
        if i == num_shots:
            dialogue = data[i]
            summary = answer[i]
            prompt = prompt + f'{start_prompt}{dialogue}{end_prompt}'
        else:
            dialogue = data[i]
            summary = answer[i]
            prompt = prompt + f'{start_prompt}{dialogue}{end_prompt}{summary}\n{stop_sequence}\n'
    return prompt

In [1604]:
data_i=['Submitted override on Sunday due to severe weather in area + yard closures 6/21-6/22 - Site has in overrides stating Floor Layout',
     'Site has decreased labor from 49.1 to 43.7 in past 4 weeks because site improved utilization of their Linear Sorter and capouts problems are decreasing in current week.',
     'Problem in forecast, expected capacity is of 10k, while reality shows 5k capacity only. Had to move away volume',
     'Scheduled maintenance in site causes 47K total volume moving to LGA5 on 6/14 and 17K additional on 6/21'
]
answer_i=['Change: severe weather + yard closures, Consequence: override on Sunday, Type of change: Negative',
       'Change: improved utilization, Consequence: labor decrease, Type of change: Positive',
       'Change: forecast inaccuary, Consequence: moving volume, Type of change: Negative',
       'Change: scheduled maintenance, Consequence: moving volume, Type of change: Negative']

In [1605]:
start_prompt = 'Review:\n'
end_prompt = '\nWhat is the change, its consequence and its type? '
stop_sequence = '\n\n\n'

In [1606]:
few_shot_prompt = make_prompt(3)
print(few_shot_prompt)

Review:
Submitted override on Sunday due to severe weather in area + yard closures 6/21-6/22 - Site has in overrides stating Floor Layout
What is the change, its consequence and its type? Change: severe weather + yard closures, Consequence: override on Sunday, Type of change: Negative




Review:
Site has decreased labor from 49.1 to 43.7 in past 4 weeks because site improved utilization of their Linear Sorter and capouts problems are decreasing in current week.
What is the change, its consequence and its type? Change: improved utilization, Consequence: labor decrease, Type of change: Positive




Review:
Problem in forecast, expected capacity is of 10k, while reality shows 5k capacity only. Had to move away volume
What is the change, its consequence and its type? Change: forecast inaccuary, Consequence: moving volume, Type of change: Negative




Review:
Scheduled maintenance in site causes 47K total volume moving to LGA5 on 6/14 and 17K additional on 6/21
What is the change, its cons

In [1608]:
payload = {"text_inputs": few_shot_prompt, **parameters}
query_response = query_endpoint_with_json_payload(
    json.dumps(payload).encode("utf-8"), endpoint_name=endpoint_name
)
generated_texts = parse_response_multiple_texts(query_response)
print(f"{bold} The reasoning result is{unbold}: '{generated_texts}'{newline}")

[1m The reasoning result is[0m: '['Change: scheduled maintenance in site, Consequence: volume moves, Type of change: negative']'



## CAP GPT results:

### Root causing stories:

#### Many Weak Models:

In [210]:
import pandas as pd

In [211]:
df= pd.read_excel('./CapData.xlsx')

In [212]:
del df['Unnamed: 0']
del df['Unnamed: 1']

In [241]:
review_messages=df['Notes '].values

In [258]:
def make_prompt(num_shots,data,answer):
    prompt = ''
    for i in range(num_shots + 1):
        if i == num_shots:
            dialogue = data[i]
            summary = answer[i]
            prompt = prompt + f'{start_prompt}{dialogue}{end_prompt}'
        else:
            dialogue = data[i]
            summary = answer[i]
            prompt = prompt + f'{start_prompt}{dialogue}{end_prompt}{summary}\n{stop_sequence}\n'
    return prompt

In [261]:
for val in review_messages[3:]:
    print("*******************************************************************************************")
    data=['Submitted override on Sunday due to severe weather in area + yard closures 6/21-6/22 - Site has in overrides stating Floor Layout',
     'Site has decreased labor from 49.1 to 43.7 in past 4 weeks because site improved utilization of their Linear Sorter and capouts problems are decreasing in current week.',
     'Problem in forecast, expected capacity is of 10k, while reality shows 5k capacity only. Had to move away volume',
     'Scheduled maintenance in site causes 47K total volume moving to LGA5 on 6/14 and 17K additional on 6/21']
    answer=['Change: severe weather + yard closures, Consequence: override on Sunday, Type of change: Negative',
       'Change: improved utilization, Consequence: labor decrease, Type of change: Positive',
       'Change: forecast inaccuary, Consequence: moving volume, Type of change: Negative',
       'Change: scheduled maintenance, Consequence: moving volume, Type of change: Negative']
    data.append(val)
    answer.append('')
    few_shot_prompt = make_prompt(4,data,answer)
    print(few_shot_prompt)
    payload = {"text_inputs": few_shot_prompt, **parameters}
    query_response = query_endpoint_with_json_payload(
        json.dumps(payload).encode("utf-8"), endpoint_name=endpoint_name
    )
    generated_texts = parse_response_multiple_texts(query_response)
    print(f"{bold} The reasoning result is{unbold}: '{generated_texts}'{newline}")

*******************************************************************************************
Review:
Submitted override on Sunday due to severe weather in area + yard closures 6/21-6/22 - Site has in overrides stating Floor Layout
What is the change, its consequence and its type? Change: severe weather + yard closures, Consequence: override on Sunday, Type of change: Negative




Review:
Site has decreased labor from 49.1 to 43.7 in past 4 weeks because site improved utilization of their Linear Sorter and capouts problems are decreasing in current week.
What is the change, its consequence and its type? Change: improved utilization, Consequence: labor decrease, Type of change: Positive




Review:
Problem in forecast, expected capacity is of 10k, while reality shows 5k capacity only. Had to move away volume
What is the change, its consequence and its type? Change: forecast inaccuary, Consequence: moving volume, Type of change: Negative




Review:
Scheduled maintenance in site causes 47K

In [369]:
result = pd.DataFrame({'Text':[], 'Prompt_1':[], 'Prompt_2':[], 'Prompt_3':[], 'Prompt_4':[], 'Prompt_5':[], 'Prompt_6':[],'Prompt_7':[],'Prompt_8':[]})
for val in review_messages:
    liste=[]
    print('***********************************************************************')
    context=val
    liste.append(context)
    question='What is the problem and what caused it?'
    prompts = [
    """Answer based on context:\n\n{context}\n\n{question}""",
    """{context}\n\nAnswer this question based on the article: {question}""",
    """{context}\n\n{question}""",
    """{context}\nAnswer this question: {question}""",
    """Read this article and answer this question {context}\n{question}""",
    """{context}\n\nBased on the above article, answer a question. {question}""",
    """Write an article that answers the following question: {question} {context}""",
    """Article: {context} \n\nAnswer this question based on the article:{question}""",
    ]


    parameters = {
        "max_length": 50,
        "max_time": 50,
        "num_return_sequences": 1,
        "top_k": 50,
        "top_p": 0.95,
        "do_sample": True,
    }


    for each_prompt in prompts:
        input_text = each_prompt.replace("{context}", context)
        input_text = input_text.replace("{question}", question)
        print(f"{bold} For prompt{unbold}: '{each_prompt}'{newline}")
        payload = {"text_inputs": input_text, **parameters}
        query_response = query_endpoint_with_json_payload(
            json.dumps(payload).encode("utf-8"), endpoint_name=endpoint_name
        )
        generated_texts = parse_response_multiple_texts(query_response)
        liste.append(generated_texts)
        print(f"{bold} The reasoning result is{unbold}: '{generated_texts}'{newline}")
    result=result.append(pd.DataFrame([liste], columns=["Text",'Prompt_1', 'Prompt_2', 'Prompt_3', 'Prompt_4', 'Prompt_5', 'Prompt_6','Prompt_7','Prompt_8']), ignore_index=True)

***********************************************************************
[1m For prompt[0m: 'Answer based on context:

{context}

{question}'

[1m The reasoning result is[0m: '['severe weather in area + yard closures 6/21-6/22 - Site has in overrides stating Floor Layout']'

[1m For prompt[0m: '{context}

Answer this question based on the article: {question}'

[1m The reasoning result is[0m: '['severe weather in area + yard closures 6/21-6/22']'

[1m For prompt[0m: '{context}

{question}'

[1m The reasoning result is[0m: '['I am having issues with overriding dst ffwd']'

[1m For prompt[0m: '{context}
Answer this question: {question}'

[1m The reasoning result is[0m: '['Site has in overrides stating Floor Layout, which was a temporary fix.']'

[1m For prompt[0m: 'Read this article and answer this question {context}
{question}'

[1m The reasoning result is[0m: '['The flooring contractor for the project failed to submit a floor layout']'

[1m For prompt[0m: '{context}

#### Cancatenating in One result: Good track Solution!

In [370]:
result['Prompts']= result['Prompt_1']+ result['Prompt_2']+result['Prompt_3']+result['Prompt_4']+result['Prompt_5']+result['Prompt_6']+result['Prompt_7']+result['Prompt_8']

In [372]:
result['Total_prompt']=result['Prompt_1']+ result['Prompt_2']+result['Prompt_3']+result['Prompt_4']+result['Prompt_5']+result['Prompt_6']+result['Prompt_7']+result['Prompt_8']

In [921]:
result

Unnamed: 0,Text,Prompt_1,Prompt_2,Prompt_3,Prompt_4,Prompt_5,Prompt_6,Prompt_7,Prompt_8,Prompts,Total_prompt
0,Submitted override on Sunday due to severe wea...,severe weather in area + yard closures 6/21-6...,severe weather in area + yard closures 6/21-6/22,I am having issues with overriding dst ffwd,"Site has in overrides stating Floor Layout, w...",The flooring contractor for the project faile...,The overrides had been submitted on Sunday du...,6/14 - Site is in override stating Floor Layo...,Site has in overrides stating Floor Layout,[severe weather in area + yard closures 6/21-6...,severe weather in area + yard closures 6/21-6...
1,Site has decreased labor from 49.1 to 43.7 in ...,(III),capouts,Capouts were occurring and this caused site t...,Capouts have decreased,"a) No use of Kangaroo, cube and volume, which...",capouts problems are decreasing in current week,"By 9 December 2011 PPL had a total of 7,921 c...",capouts problems are decreasing,"[(III), capouts, Capouts were occurring and th...",(III) capouts Capouts were occurring and this...
2,47K total volume moving to LGA5 on 6/14 and 17...,47K total volume moving to LGA5 on 6/14 and 1...,47K total volume moving to LGA5 on 6/14 and 1...,Scheduled maintenance,47K total volume moving to LGA5 on 6/14 and 1...,"a lanes were launching on June 22, 20, and 21",Scheduled Maintenance is affecting cargo,The Port of Portland received a major setback...,scheduled Maintenance,[47K total volume moving to LGA5 on 6/14 and 1...,47K total volume moving to LGA5 on 6/14 and 1...
3,6/18 - Site had overrides because of sorter down,Site had overrides because of sorter down,Site had overrides because of sorter down,The site overrides were occurring.,The site had overrides because the sorter was...,Site had overrides because of sorter down.,Site had overrides because of sorter down.,6/19 - WSPS (Federal Railroad Signal Supt.) r...,site had overrides because of sorter down,"[Site had overrides because of sorter down, Si...",Site had overrides because of sorter down Sit...
4,18/06/23 -Capped out because 100 Volunteer Ext...,"The volunteer posting was too much, which cau...",The time slots were too limited.,The server was too small to handle the number...,The event had too many volunteers,Volunteer extra time needed but only 33 accepted,Volunteers did not accept their post and not ...,"The ""No More Banana Pod"" campaign is ongoing ...",Several people were not accepted and were rej...,"[The volunteer posting was too much, which cau...","The volunteer posting was too much, which cau..."
5,6/18 - Capped out at 110% on buffer of 110% be...,8% increase from forecast,Capped out at 110% on buffer of 110% because ...,A small number of gallons of fuel was pumped ...,Capped out at 110% on buffer of 110% because ...,there was an 8% increase from forecast.,Capped out at 110% on buffer of 110% because ...,"On 7/18, the CRB began to take positive meter...",Capped out at 110% on buffer of 110% because ...,"[8% increase from forecast, Capped out at 110%...",8% increase from forecast Capped out at 110% ...
6,6/18 - 31 Volunteer Extra Time needed with 28 ...,The problem was caused by too few people want...,Volunteer Extra Time needed with 28 VET poste...,Volunteer Extra Time needed with 28 VET poste...,"28 VET posted but only 10 accepted, which cau...",volume exceeded capacity,Volunteer Extra Time needed with 28 VET poste...,6/18 - 31 Volunteer Extra Time Needed with 28...,Voluntary Extra Time needed with 28 VET poste...,[The problem was caused by too few people want...,The problem was caused by too few people want...
7,"Transfers from BFI5 offered, pending acceptanc...","Increased labor needed, caused by forecast in...",Increased labor needed,Increased labor was needed due to forecast in...,Increased labor needed,Increased labor needed,"Increased labor needed, caused by forecast in...","The United States announced on April 5, 2006 ...",Increased labor needed caused by forecast inc...,"[Increased labor needed, caused by forecast in...","Increased labor needed, caused by forecast in..."
8,Site had Volunteer Extra Time needs of 120 wit...,Extra Volunteer time was not enough.,The forecast for extra time didn't reflect th...,The volunteer extra time was not accepted as ...,Volunteers are unable to complete volunteer time,The Site posted too few volunteer hours and a...,The forecast got bad and volunteer extra time...,The forecast indicates the site will be dry i...,"The Volunteer Extra Time needs are high, but ...","[Extra Volunteer time was not enough., The for...",Extra Volunteer time was not enough. The fore...
9,6/18 - Capped out at 123% on expected buffer o...,a) There was too much capacity on the system ...,Capped out at 123% on expected buffer of 110%...,Were trying to be competitive.,Capped out at 123% on expected buffer of 110%...,"12% on expected buffer of 110%, because of 6%...",Capped out at 123% on expected buffer of 110%...,The company has made its first payment from a...,6% increase of forecast from Plan-3 to Plan-1.,[a) There was too much capacity on the system ...,a) There was too much capacity on the system ...


In [718]:
promptss=result['Total_prompt'].values

##### Preprocessing of text: 

In [719]:
import string
from collections import Counter

In [None]:
### Pbs encountered: 4, 

##### Small Cleaning of text:

In [1128]:
txt=result['Total_prompt'].values[4].lower()
txt = ''.join([i for i in txt if not i.isdigit()])
txt=txt.translate(str.maketrans('', '', string.punctuation))

txt=txt.replace('because', '')
txt=txt.replace('had', '')
txt=txt.replace('of', '')
txt=txt.replace('overrides', '')
#txt=txt.replace('the', '')
txt=txt.replace('site', '')
txt=txt.replace('on', '')
txt=txt.replace('from', '')
txt=txt.replace('not', '')
txt=txt.replace('more', '')
txt=txt.replace('too', '')
txt=txt.replace('to', '')
txt=txt.replace('with', '')
txt=txt.replace('but', '')

txt=' '.join(txt.split())

##### One word importance:

In [1129]:
txt

'the volunteer posting was much which caused the time limit be reached the time slots were limited the server was small handle the number volunteers needed the event many volunteers volunteer extra time needed ly accepted volunteers did accept their post and enough people applied for their post the no banana pod campaign is going collect datis the public for the no banana pod campaign since june tal has been collected date several people were accepted and were rejected no e posted an extra time'

In [1130]:
d=Counter(txt.split())
ow={k: v for k, v in sorted(d.items(), key=lambda item: item[1])}
ow 

{'posting': 1,
 'much': 1,
 'which': 1,
 'caused': 1,
 'limit': 1,
 'be': 1,
 'reached': 1,
 'slots': 1,
 'limited': 1,
 'server': 1,
 'small': 1,
 'handle': 1,
 'number': 1,
 'event': 1,
 'many': 1,
 'ly': 1,
 'did': 1,
 'accept': 1,
 'enough': 1,
 'applied': 1,
 'is': 1,
 'going': 1,
 'collect': 1,
 'datis': 1,
 'public': 1,
 'since': 1,
 'june': 1,
 'tal': 1,
 'has': 1,
 'been': 1,
 'collected': 1,
 'date': 1,
 'several': 1,
 'rejected': 1,
 'e': 1,
 'posted': 1,
 'an': 1,
 'volunteer': 2,
 'was': 2,
 'needed': 2,
 'extra': 2,
 'accepted': 2,
 'their': 2,
 'post': 2,
 'and': 2,
 'people': 2,
 'for': 2,
 'banana': 2,
 'pod': 2,
 'campaign': 2,
 'were': 3,
 'volunteers': 3,
 'no': 3,
 'time': 4,
 'the': 9}

##### Two words importance:

In [1131]:
list_txt=txt.split()
n=2
two_words=[list_txt[i:i+n] for i in range(0, len(list_txt)-n+1)]

In [1132]:
import numpy as np
tw=pd.Series(two_words).value_counts()
tw=pd.DataFrame(tw)
tw=tw.reset_index()
tw.columns=['chunk','occurrences']
tw

Unnamed: 0,chunk,occurrences
0,"[their, post]",2
1,"[the, no]",2
2,"[pod, campaign]",2
3,"[banana, pod]",2
4,"[no, banana]",2
...,...,...
72,"[the, event]",1
73,"[needed, the]",1
74,"[volunteers, needed]",1
75,"[number, volunteers]",1


In [1152]:
tw

Unnamed: 0,chunk,occurrences
0,"[their, post]",2
1,"[the, no]",2
2,"[pod, campaign]",2
3,"[banana, pod]",2
4,"[no, banana]",2
...,...,...
72,"[the, event]",1
73,"[needed, the]",1
74,"[volunteers, needed]",1
75,"[number, volunteers]",1


In [1041]:
chunk=tw.loc[tw['occurrences']==tw['occurrences'].max(),'chunk'][0]
chunk

['posted', 'ly']

#### Bayesian identification of payoffs:

In [1320]:
# Defines a Bayesian Rule to dissociate two words-payoffs from single word payoff
def payout(tw,ow,two_words_chunk,number_occurences):
    
    num_occurrences_two_words= number_occurences
    if num_occurrences_two_words==0:
        return {},{}
    else:
        #num_occurrences_two_words=10
        print(num_occurrences_two_words)
        num_occurences_first_word=ow[two_words_chunk[0]]
        first_word=two_words_chunk[0]
        #num_occurences_first_word=10
        print(num_occurences_first_word)
        num_occurences_second_word=ow[two_words_chunk[1]]
        second_word=two_words_chunk[1]
        #num_occurences_second_word=15
        print(num_occurences_second_word)
        """
        max_occ=max(num_occurences_first_word,num_occurences_second_word)
        if max_occ==num_occurences_first_word:
            print('Good')
            first_word=two_words_chunk[0]
            second_word=two_words_chunk[1]
        else: 
            first_word=two_words_chunk[1]
            second_word=two_words_chunk[0]
            num_occurences_first_word=ow[first_word]
            num_occurences_second_word=ow[second_word]
        """
        ## First word Payout between bundle and individual contribution:
        proba_bayesian= num_occurrences_two_words/num_occurences_first_word
        print('PAYOUT OF THE BUNDLE '+str(two_words_chunk)+' :')
        payout_bundle=proba_bayesian
        print(payout_bundle)
        print('PAYOUT OF WORD '+first_word+' :')
        payout_indiv_1=1-proba_bayesian
        print(payout_indiv_1)

        ## Second word Payout between bundle and individual contribution:
        proba_bayesian= num_occurrences_two_words/num_occurences_second_word
        print('PAYOUT OF THE BUNDLE '+str(two_words_chunk)+' :')
        payout_bundle_2=payout_bundle
        print(payout_bundle_2)
        print('PAYOUT OF WORD '+second_word+' :')
        payout_indiv_2=(1-proba_bayesian)*payout_bundle_2/proba_bayesian
        print(payout_indiv_2)    

        results={str(two_words_chunk):payout_bundle_2, two_words_chunk[0]:payout_indiv_1, two_words_chunk[1]:payout_indiv_2}
        #Final Payouts
        results_fp={str(two_words_chunk):num_occurences_first_word*payout_bundle_2, two_words_chunk[0]:num_occurences_first_word*payout_indiv_1, two_words_chunk[1]:num_occurences_first_word*payout_indiv_2}
    return results,results_fp


In [1439]:
# Defines a Bayesian Rule to dissociate two words-payoffs from single word payoff
def payout(tw,ow,two_words_chunk,number_occurences):
    
    num_occurrences_two_words= number_occurences
    if num_occurrences_two_words==0:
        return {},{}
    else:
        #num_occurrences_two_words=10
        #print(num_occurrences_two_words)
        num_occurences_first_word=ow[two_words_chunk[0]]
        #num_occurences_first_word=10
        #print(num_occurences_first_word)
        num_occurences_second_word=ow[two_words_chunk[1]]
        #num_occurences_second_word=15
        #print(num_occurences_second_word)
        max_occ=max(num_occurences_first_word,num_occurences_second_word)
        if max_occ==num_occurences_first_word:
            #print('Good')
            first_word=two_words_chunk[0]
            second_word=two_words_chunk[1]
        else: 
            first_word=two_words_chunk[1]
            second_word=two_words_chunk[0]
            num_occurences_first_word=ow[first_word]
            num_occurences_second_word=ow[second_word]

        ## First word Payout between bundle and individual contribution:
        proba_bayesian= num_occurrences_two_words/num_occurences_first_word
        #print('PAYOUT OF THE BUNDLE '+str(two_words_chunk)+' :')
        payout_bundle=proba_bayesian
        #print(payout_bundle)
        #print('PAYOUT OF WORD '+first_word+' :')
        payout_indiv_1=1-proba_bayesian
        #print(payout_indiv_1)

        ## Second word Payout between bundle and individual contribution:
        proba_bayesian= num_occurrences_two_words/num_occurences_second_word
        #print('PAYOUT OF THE BUNDLE '+str(two_words_chunk)+' :')
        payout_bundle_2=payout_bundle
        #print(payout_bundle_2)
        #print('PAYOUT OF WORD '+second_word+' :')
        payout_indiv_2=(1-proba_bayesian)*payout_bundle_2/proba_bayesian
        #print(payout_indiv_2)    

        results={str(two_words_chunk):payout_bundle_2, two_words_chunk[0]:payout_indiv_1, two_words_chunk[1]:payout_indiv_2}
        #Final Payouts
        results_fp={str(two_words_chunk):num_occurences_first_word*payout_bundle_2, two_words_chunk[0]:num_occurences_first_word*payout_indiv_1, two_words_chunk[1]:num_occurences_first_word*payout_indiv_2}
    return results,results_fp


In [1440]:
payout(tw,ow,chunk,6)

({"['posted', 'ly']": 6.0, 'posted': -5.0, 'ly': -5.0},
 {"['posted', 'ly']": 6.0, 'posted': -5.0, 'ly': -5.0})

In [1441]:
def final_payouts(ow, tw):
    
    all_results=[]
    all_results_fp=[]
    for i in range(3):
        
        chunk=tw.loc[tw['occurrences']==tw['occurrences'].max(),'chunk'].values[0]
        print(chunk)
        number_occurences=tw['occurrences'].max()
        if number_occurences==1:
            return [],[]
        results,results_fp=payout(tw,ow,chunk,number_occurences)
        all_results.append(results)
        all_results_fp.append(results_fp)
        tw.loc[tw['occurrences']==tw['occurrences'].max(),'occurrences']=0
        #print(tw)
        
    return all_results, all_results_fp
    

In [1442]:
all_r,all_rpf=final_payouts(ow, tw)

['for', 'their']


In [1443]:
all_rpf

[]

In [1444]:
all_r


[]

In [1445]:
def update_final_dico(all_r):
    if all_r==[]:
        return {}
    d=all_r[0]
    dd=all_r[1]
    dd.update(d)
    d=all_r[2]
    dd.update(d)
    return dd

In [1497]:
def update_dico_and_get_kw(all_r):
    k=update_final_dico(all_r)
    k=pd.Series(k)
    k=pd.DataFrame(k)
    k=k.reset_index()
    k.columns=['chunk','occurrences']
    if 'the' in k['chunk'].values:
        k.loc[k['chunk']=='the','occurrences']=0
    if 'due' in k['chunk'].values:
        k.loc[k['chunk']=='the','occurrences']=0
    k=k.sort_values(by='occurrences', ascending= False)
    print(k)
    kw=k.loc[:2,'chunk'].values
    #kw=k.loc[k['occurrences']==k['occurrences'].max(),'chunk'].values
    return kw

In [1498]:
update_dico_and_get_kw(all_r)

Empty DataFrame
Columns: [chunk, occurrences]
Index: []


  k=pd.Series(k)


array([], dtype=object)

In [1499]:
def get_kw(ow, tw):
    all_r,all_rpf=final_payouts(ow, tw)
    kws=update_dico_and_get_kw(all_r)
    return kws
    

In [1500]:
def clean(txt):
    txt=txt.translate(str.maketrans('', '', string.punctuation))

    txt=txt.replace('because', '')
    txt=txt.replace('had', '')
    txt=txt.replace('of', '')
    txt=txt.replace('overrides', '')
    #txt=txt.replace('the', '')
    txt=txt.replace('site', '')
    txt=txt.replace('on', '')
    txt=txt.replace('from', '')
    txt=txt.replace('not', '')
    txt=txt.replace('more', '')
    txt=txt.replace('too', '')
    txt=txt.replace('to', '')
    txt=txt.replace('with', '')
    txt=txt.replace('but', '')

    txt=' '.join(txt.split())
    return txt

In [1501]:
def tw_dict(txt):
    list_txt=txt.split()
    n=2
    two_words=[list_txt[i:i+n] for i in range(0, len(list_txt)-n+1)]
    tw=pd.Series(two_words).value_counts()
    tw=pd.DataFrame(tw)
    tw=tw.reset_index()
    tw.columns=['chunk','occurrences']
    return tw

In [1546]:
def clean_tw(tw):
    tw['check']=tw['chunk'].apply(lambda s:'out' in s)
    tw.loc[tw['check']==True,'occurrences']=0
    tw['check']=tw['chunk'].apply(lambda s:'at' in s)
    tw.loc[tw['check']==True,'occurrences']=0
    tw['check']=tw['chunk'].apply(lambda s:'ly' in s)
    tw.loc[tw['check']==True,'occurrences']=0
    tw['check']=tw['chunk'].apply(lambda s:'issue' in s)
    tw.loc[tw['check']==True,'occurrences']=0
    tw['check']=tw['chunk'].apply(lambda s:'which' in s)
    tw.loc[tw['check']==True,'occurrences']=0
    tw['check']=tw['chunk'].apply(lambda s:'awaiting' in s)
    tw.loc[tw['check']==True,'occurrences']=0
    tw['check']=tw['chunk'].apply(lambda s:'reply' in s)
    tw.loc[tw['check']==True,'occurrences']=0
    tw['check']=tw['chunk'].apply(lambda s:'did' in s)
    tw.loc[tw['check']==True,'occurrences']=0
    tw['check']=tw['chunk'].apply(lambda s:'a' in s)
    tw.loc[tw['check']==True,'occurrences']=0
    tw['check']=tw['chunk'].apply(lambda s:'need' in s)
    tw.loc[tw['check']==True,'occurrences']=0
    """
    tw['check']=tw['chunk'].apply(lambda s:'the' in s)
    tw.loc[tw['check']==True,'occurrences']=0
    tw['check']=tw['chunk'].apply(lambda s:'their' in s)
    tw.loc[tw['check']==True,'occurrences']=0
    tw['check']=tw['chunk'].apply(lambda s:'causing' in s)
    tw.loc[tw['check']==True,'occurrences']=0
    tw.loc[tw['check']==True,'occurrences']=0
    tw['check']=tw['chunk'].apply(lambda s:'awaiting' in s)
    tw.loc[tw['check']==True,'occurrences']=0
    tw['check']=tw['chunk'].apply(lambda s:'due' in s)
    tw.loc[tw['check']==True,'occurrences']=0
    """
    return tw

In [1547]:
def clean_ow(ow):
    if 'in' in list(ow.keys()):
        ow['in']=0
    return ow

In [1548]:
def get_kw_txt(result,index):
    txt=result['Total_prompt'].values[index].lower()
    txt = ''.join([i for i in txt if not i.isdigit()])
    txt=clean(txt)
    d=Counter(txt.split())
    ow={k: v for k, v in sorted(d.items(), key=lambda item: item[1])}
    tw=tw_dict(txt)
    tw=clean_tw(tw)
    kw=get_kw(ow, tw)
    #print('KEY WORDS:')
    #print(kw)
    return kw
    

In [1647]:
#1,10,3,4,14,18,
get_kw_txt(result,9)

['forecast', 'plan']
['expected', 'buffer']
['the', 'company']
                    chunk  occurrences
0  ['expected', 'buffer']     1.000000
3    ['forecast', 'plan']     0.500000
4                forecast     0.500000
6      ['the', 'company']     0.333333
1                expected     0.000000
2                  buffer     0.000000
5                    plan     0.000000
7                     the     0.000000
8                 company     0.000000


array(["['expected', 'buffer']", "['forecast', 'plan']", 'forecast',
       "['the', 'company']", 'expected', 'buffer'], dtype=object)

In [1638]:
result

Unnamed: 0,Text,Prompt_1,Prompt_2,Prompt_3,Prompt_4,Prompt_5,Prompt_6,Prompt_7,Prompt_8,Prompts,Total_prompt
0,Submitted override on Sunday due to severe wea...,severe weather in area + yard closures 6/21-6...,severe weather in area + yard closures 6/21-6/22,I am having issues with overriding dst ffwd,"Site has in overrides stating Floor Layout, w...",The flooring contractor for the project faile...,The overrides had been submitted on Sunday du...,6/14 - Site is in override stating Floor Layo...,Site has in overrides stating Floor Layout,[severe weather in area + yard closures 6/21-6...,severe weather in area + yard closures 6/21-6...
1,Site has decreased labor from 49.1 to 43.7 in ...,(III),capouts,Capouts were occurring and this caused site t...,Capouts have decreased,"a) No use of Kangaroo, cube and volume, which...",capouts problems are decreasing in current week,"By 9 December 2011 PPL had a total of 7,921 c...",capouts problems are decreasing,"[(III), capouts, Capouts were occurring and th...",(III) capouts Capouts were occurring and this...
2,47K total volume moving to LGA5 on 6/14 and 17...,47K total volume moving to LGA5 on 6/14 and 1...,47K total volume moving to LGA5 on 6/14 and 1...,Scheduled maintenance,47K total volume moving to LGA5 on 6/14 and 1...,"a lanes were launching on June 22, 20, and 21",Scheduled Maintenance is affecting cargo,The Port of Portland received a major setback...,scheduled Maintenance,[47K total volume moving to LGA5 on 6/14 and 1...,47K total volume moving to LGA5 on 6/14 and 1...
3,6/18 - Site had overrides because of sorter down,Site had overrides because of sorter down,Site had overrides because of sorter down,The site overrides were occurring.,The site had overrides because the sorter was...,Site had overrides because of sorter down.,Site had overrides because of sorter down.,6/19 - WSPS (Federal Railroad Signal Supt.) r...,site had overrides because of sorter down,"[Site had overrides because of sorter down, Si...",Site had overrides because of sorter down Sit...
4,18/06/23 -Capped out because 100 Volunteer Ext...,"The volunteer posting was too much, which cau...",The time slots were too limited.,The server was too small to handle the number...,The event had too many volunteers,Volunteer extra time needed but only 33 accepted,Volunteers did not accept their post and not ...,"The ""No More Banana Pod"" campaign is ongoing ...",Several people were not accepted and were rej...,"[The volunteer posting was too much, which cau...","The volunteer posting was too much, which cau..."
5,6/18 - Capped out at 110% on buffer of 110% be...,8% increase from forecast,Capped out at 110% on buffer of 110% because ...,A small number of gallons of fuel was pumped ...,Capped out at 110% on buffer of 110% because ...,there was an 8% increase from forecast.,Capped out at 110% on buffer of 110% because ...,"On 7/18, the CRB began to take positive meter...",Capped out at 110% on buffer of 110% because ...,"[8% increase from forecast, Capped out at 110%...",8% increase from forecast Capped out at 110% ...
6,6/18 - 31 Volunteer Extra Time needed with 28 ...,The problem was caused by too few people want...,Volunteer Extra Time needed with 28 VET poste...,Volunteer Extra Time needed with 28 VET poste...,"28 VET posted but only 10 accepted, which cau...",volume exceeded capacity,Volunteer Extra Time needed with 28 VET poste...,6/18 - 31 Volunteer Extra Time Needed with 28...,Voluntary Extra Time needed with 28 VET poste...,[The problem was caused by too few people want...,The problem was caused by too few people want...
7,"Transfers from BFI5 offered, pending acceptanc...","Increased labor needed, caused by forecast in...",Increased labor needed,Increased labor was needed due to forecast in...,Increased labor needed,Increased labor needed,"Increased labor needed, caused by forecast in...","The United States announced on April 5, 2006 ...",Increased labor needed caused by forecast inc...,"[Increased labor needed, caused by forecast in...","Increased labor needed, caused by forecast in..."
8,Site had Volunteer Extra Time needs of 120 wit...,Extra Volunteer time was not enough.,The forecast for extra time didn't reflect th...,The volunteer extra time was not accepted as ...,Volunteers are unable to complete volunteer time,The Site posted too few volunteer hours and a...,The forecast got bad and volunteer extra time...,The forecast indicates the site will be dry i...,"The Volunteer Extra Time needs are high, but ...","[Extra Volunteer time was not enough., The for...",Extra Volunteer time was not enough. The fore...
9,6/18 - Capped out at 123% on expected buffer o...,a) There was too much capacity on the system ...,Capped out at 123% on expected buffer of 110%...,Were trying to be competitive.,Capped out at 123% on expected buffer of 110%...,"12% on expected buffer of 110%, because of 6%...",Capped out at 123% on expected buffer of 110%...,The company has made its first payment from a...,6% increase of forecast from Plan-3 to Plan-1.,[a) There was too much capacity on the system ...,a) There was too much capacity on the system ...


In [1637]:
promptss

array([' severe weather in area + yard closures 6/21-6/22 - Site has in overrides stating Floor Layout severe weather in area + yard closures 6/21-6/22 I am having issues with overriding dst ffwd Site has in overrides stating Floor Layout, which was a temporary fix. The flooring contractor for the project failed to submit a floor layout The overrides had been submitted on Sunday due to severe weather in the area + yard closures 6/21-6/22. 6/14 - Site is in override stating Floor Layout is set to No; however is not present for the customer. Submitted an override on Sunday due to severe weather in area. Yards remained open Site has in overrides stating Floor Layout',
       ' (III) capouts Capouts were occurring and this caused site to decrease labor. Capouts have decreased a) No use of Kangaroo, cube and volume, which has decreased capouts problems are decreasing in current week By 9 December 2011 PPL had a total of 7,921 cars on site. Of these, 5,813 were used for kerbside collection a

#### Cancatenating all prompts and running an LLM : not working at all / Detection of language generated text?

In [1562]:
for i in range(promptss.shape[0]):
    print('***********************************************************************')
    context=promptss[i]
    question='What is the problem and what caused it?'
    prompts = [
    """Article: {context} \n\nAnswer this question based on the article:{question}"""
    ]


    parameters = {
        "max_length": 50,
        "max_time": 50,
        "num_return_sequences": 1,
        "top_k": 50,
        "top_p": 0.95,
        "do_sample": True,
    }


    for each_prompt in prompts:
        input_text = each_prompt.replace("{context}", context)
        input_text = input_text.replace("{question}", question)
        print(f"{bold} For prompt{unbold}: '{each_prompt}'{newline}")
        payload = {"text_inputs": input_text, **parameters}
        query_response = query_endpoint_with_json_payload(
            json.dumps(payload).encode("utf-8"), endpoint_name=endpoint_name
        )
        generated_texts = parse_response_multiple_texts(query_response)
        print(f"{bold} The reasoning result is{unbold}: '{generated_texts}'{newline}")

***********************************************************************
[1m For prompt[0m: 'Article: {context} 

Answer this question based on the article:{question}'

[1m The reasoning result is[0m: '['Site has in overrides stating Floor Layout severe weather in area + yard closures']'

***********************************************************************
[1m For prompt[0m: 'Article: {context} 

Answer this question based on the article:{question}'

[1m The reasoning result is[0m: '['Capouts were occurring and this caused site to decrease labor']'

***********************************************************************
[1m For prompt[0m: 'Article: {context} 

Answer this question based on the article:{question}'

[1m The reasoning result is[0m: '['schedule Maintenance']'

***********************************************************************
[1m For prompt[0m: 'Article: {context} 

Answer this question based on the article:{question}'

[1m The reasoning result is[0m

In [296]:
for i in range(1,9):
    column_name='Prompt_'+str(i)
    result[column_name]=result[column_name].apply(lambda s:s[0])

In [299]:
result['Total_prompt']=result['Prompt_1']+ result['Prompt_2']+result['Prompt_3']+result['Prompt_4']+result['Prompt_5']+result['Prompt_6']+result['Prompt_7']+result['Prompt_8']

# 4. Truck GPT 

### Sentence / Sentiment Classification

<b>context 1:</b> 'Hello. This truck can be canceled since we have another one and it is fully loaded. Thank you for your comprehension and collaboration'


<b>context 2:</b> 'NUE9 has been experiencing capacity breaches due to labour shortages. While NCM has implemented labour strategies to improve the situation, we have reason to believe NUE9 will not regain its capacity in the short term. Total costs incurred by end of May due to site capping: 5.6MM EUR We need to move volume away from the site in order to solve for capacity and mitigate further speed and attainment losses. '

In [1576]:
sentence= 'Hello. This truck can be canceled since we have another one and it is fully loaded. Thank you for your comprehension and collaboration'


#sentence='NUE9 has been experiencing capacity breaches due to labour shortages. While NCM has implemented labour strategies to improve the situation, we have reason to believe NUE9 will not regain its capacity in the short term. Total costs incurred by end of May due to site capping: 5.6MM EUR We need to move volume away from the site in order to solve for capacity and mitigate further speed and attainment losses. '

In [1577]:
options_ = """OPTIONS:\n-positive \n-negative """
prompts = [
    """Review:\n{sentence}\nIs this customer review sentence negative or positive?\n{options_}""",
    """Short review: {sentence}\nDid the critic think positively or negatively of the operations?\n{options_}""",
    """Sentence from a client review: {sentence}\nWas the customer seen positively or negatively based on the preceding review? \n\n{options_}""",
    """\"{sentence}\"\nHow would the sentiment of this sentence be perceived?\n\n{options_}""",
    """Is the sentiment of the following sentence positive or negative?\n{sentence}\n{options_}""",
    """What is the sentiment of the following movie review sentence?\n{sentence}\n{options_}""",
]

parameters = {
    "max_length": 50,
    "max_time": 50,
    "num_return_sequences": 1,
    "top_k": 50,
    "top_p": 0.95,
    "do_sample": True,
}


for each_prompt in prompts:
    input_text = each_prompt.replace("{sentence}", sentence)
    input_text = input_text.replace("{options_}", options_)
    print(f"{bold} For prompt{unbold}: '{input_text}'{newline}")
    payload = {"text_inputs": input_text, **parameters}
    query_response = query_endpoint_with_json_payload(
        json.dumps(payload).encode("utf-8"), endpoint_name=endpoint_name
    )
    generated_texts = parse_response_multiple_texts(query_response)
    print(f"{bold} The reasoning result is{unbold}: '{generated_texts}'{newline}")

[1m For prompt[0m: 'Review:
Hello. This truck can be canceled since we have another one and it is fully loaded. Thank you for your comprehension and collaboration
Is this customer review sentence negative or positive?
OPTIONS:
-positive 
-negative '

[1m The reasoning result is[0m: '['negative']'

[1m For prompt[0m: 'Short review: Hello. This truck can be canceled since we have another one and it is fully loaded. Thank you for your comprehension and collaboration
Did the critic think positively or negatively of the operations?
OPTIONS:
-positive 
-negative '

[1m The reasoning result is[0m: '['positive']'

[1m For prompt[0m: 'Sentence from a client review: Hello. This truck can be canceled since we have another one and it is fully loaded. Thank you for your comprehension and collaboration
Was the customer seen positively or negatively based on the preceding review? 

OPTIONS:
-positive 
-negative '

[1m The reasoning result is[0m: '['positive']'

[1m For prompt[0m: '"Hell

### Dissociating shortage from excess:

In [1583]:
#context= 'Hello. This truck can be canceled since we have another one and it is fully loaded. Thank you for your comprehension and collaboration'


context='NUE9 has been experiencing capacity decrease due to labour shortages.  We need to add more trucks urgently.'

question='Should we cancel or add more capacities?'

In [1584]:
liste=[]

prompts = [
    """Answer based on context:\n\n{context}\n\n{question}""",
    """{context}\n\nAnswer this question based on the article: {question}""",
    """{context}\n\n{question}""",
    """{context}\nAnswer this question: {question}""",
    """Read this article and answer this question {context}\n{question}""",
    """{context}\n\nBased on the above article, answer a question. {question}""",
    """Write an article that answers the following question: {question} {context}""",
    """Article: {context} \n\nAnswer this question based on the article:{question}""",
]


parameters = {
    "max_length": 50,
    "max_time": 50,
    "num_return_sequences": 1,
    "top_k": 50,
    "top_p": 0.95,
    "do_sample": True,
}


for each_prompt in prompts:
    input_text = each_prompt.replace("{context}", context)
    input_text = input_text.replace("{question}", question)
    print(f"{bold} For prompt{unbold}: '{each_prompt}'{newline}")
    payload = {"text_inputs": input_text, **parameters}
    query_response = query_endpoint_with_json_payload(
        json.dumps(payload).encode("utf-8"), endpoint_name=endpoint_name
    )
    generated_texts = parse_response_multiple_texts(query_response)
    liste.append(generated_texts)
    print(f"{bold} The reasoning result is{unbold}: '{generated_texts}'{newline}")

[1m For prompt[0m: 'Answer based on context:

{context}

{question}'

[1m The reasoning result is[0m: '['add']'

[1m For prompt[0m: '{context}

Answer this question based on the article: {question}'

[1m The reasoning result is[0m: '['add more capacities']'

[1m For prompt[0m: '{context}

{question}'

[1m The reasoning result is[0m: '['add']'

[1m For prompt[0m: '{context}
Answer this question: {question}'

[1m The reasoning result is[0m: '['add more capacities']'

[1m For prompt[0m: 'Read this article and answer this question {context}
{question}'

[1m The reasoning result is[0m: '['add']'

[1m For prompt[0m: '{context}

Based on the above article, answer a question. {question}'

[1m The reasoning result is[0m: '['add more capacities']'

[1m For prompt[0m: 'Write an article that answers the following question: {question} {context}'

[1m The reasoning result is[0m: '['Although NUE9 is the highest rated network for capacity, there has been considerable capaci

#### Consolidated one word importance: 

In [1585]:
liste=[val[0] for val in liste]
txt=' '.join(liste).lower()

In [1586]:
d=Counter(txt.split())
d_r={k: v for k, v in sorted(d.items(), key=lambda item: item[1])}
d_r

{'although': 1,
 'is': 1,
 'highest': 1,
 'rated': 1,
 'network': 1,
 'for': 1,
 'capacity,': 1,
 'there': 1,
 'has': 1,
 'been': 1,
 'considerable': 1,
 'capacity': 1,
 'decline': 1,
 'on': 1,
 'over': 1,
 'last': 1,
 'two': 1,
 'years,': 1,
 'due': 1,
 'an': 1,
 'increased': 1,
 'number': 1,
 'of': 1,
 'shippers,': 1,
 'which': 1,
 'cannot': 1,
 'service.': 1,
 'therefore,': 1,
 'need': 1,
 'nue9': 2,
 'the': 2,
 'to': 2,
 'we': 2,
 'more': 4,
 'capacities': 4,
 'add': 8}

In [1588]:
print('Votes for Cancel:')
if 'cancel' in list(d_r.keys()):
    print(d_r['cancel'])
else :
    print('0')
print('Votes for Add:')
if 'add' in list(d_r.keys()):
    print(d_r['add'])
else :
    print('0')

Votes for Cancel:
0
Votes for Add:
8


### Getting dates and impacted FC:

<b>Mail sample:</b> 

#### Dates:

In [664]:
context='From: hamdaoum@amazon.com \n Sent: Jun 03, 2023 11:35 PM\n  To: eu-roc-ob-cancel-truck@amazon.com\n  Grettings,\n Please, we need to cancel this truck, because, we have got insufficient volume to load.If you need any info, please, contact with us.Thank you for your comprehension and collaboration\n'

question='When was the mail sent?'

In [665]:
liste=[]

prompts = [
    """Answer based on context:\n\n{context}\n\n{question}""",
    """{context}\n\nAnswer this question based on the article: {question}""",
    """{context}\n\n{question}""",
    """{context}\nAnswer this question: {question}""",
    """Read this article and answer this question {context}\n{question}""",
    """{context}\n\nBased on the above article, answer a question. {question}""",
    """Write an article that answers the following question: {question} {context}""",
    """Article: {context} \n\nAnswer this question based on the article:{question}""",
]


parameters = {
    "max_length": 50,
    "max_time": 50,
    "num_return_sequences": 1,
    "top_k": 50,
    "top_p": 0.95,
    "do_sample": True,
}


for each_prompt in prompts:
    input_text = each_prompt.replace("{context}", context)
    input_text = input_text.replace("{question}", question)
    print(f"{bold} For prompt{unbold}: '{each_prompt}'{newline}")
    payload = {"text_inputs": input_text, **parameters}
    query_response = query_endpoint_with_json_payload(
        json.dumps(payload).encode("utf-8"), endpoint_name=endpoint_name
    )
    generated_texts = parse_response_multiple_texts(query_response)
    liste.append(generated_texts)
    print(f"{bold} The reasoning result is{unbold}: '{generated_texts}'{newline}")

[1m For prompt[0m: 'Answer based on context:

{context}

{question}'

[1m The reasoning result is[0m: '['(III)']'

[1m For prompt[0m: '{context}

Answer this question based on the article: {question}'

[1m The reasoning result is[0m: '['June 3, 2023']'

[1m For prompt[0m: '{context}

{question}'

[1m The reasoning result is[0m: '['Jun 03, 2023']'

[1m For prompt[0m: '{context}
Answer this question: {question}'

[1m The reasoning result is[0m: '['Jun 03, 2023']'

[1m For prompt[0m: 'Read this article and answer this question {context}
{question}'

[1m The reasoning result is[0m: '['June 3rd 2023']'

[1m For prompt[0m: '{context}

Based on the above article, answer a question. {question}'

[1m The reasoning result is[0m: '['June, 3, 2023']'

[1m For prompt[0m: 'Write an article that answers the following question: {question} {context}'

[1m The reasoning result is[0m: '['Amazon Transportation Operations (Europe) Inc. is the shipping and transportation arm of A

#### Sender and receiver:

In [671]:
context='From: hamdaoum@amazon.com \n Sent: Jun 03, 2023 11:35 PM\n  To: eu-roc-ob-cancel-truck@amazon.com\n  Grettings,\n Please, we need to cancel this truck, because, we have got insufficient volume to load.If you need any info, please, contact with us.Thank you for your comprehension and collaboration\n'

question='Who sent the mail?'
#question='Who received the mail?'

In [672]:
liste=[]

prompts = [
    """Answer based on context:\n\n{context}\n\n{question}""",
    """{context}\n\nAnswer this question based on the article: {question}""",
    """{context}\n\n{question}""",
    """{context}\nAnswer this question: {question}""",
    """Read this article and answer this question {context}\n{question}""",
    """{context}\n\nBased on the above article, answer a question. {question}""",
    """Write an article that answers the following question: {question} {context}""",
    """Article: {context} \n\nAnswer this question based on the article:{question}""",
]


parameters = {
    "max_length": 50,
    "max_time": 50,
    "num_return_sequences": 1,
    "top_k": 50,
    "top_p": 0.95,
    "do_sample": True,
}


for each_prompt in prompts:
    input_text = each_prompt.replace("{context}", context)
    input_text = input_text.replace("{question}", question)
    print(f"{bold} For prompt{unbold}: '{each_prompt}'{newline}")
    payload = {"text_inputs": input_text, **parameters}
    query_response = query_endpoint_with_json_payload(
        json.dumps(payload).encode("utf-8"), endpoint_name=endpoint_name
    )
    generated_texts = parse_response_multiple_texts(query_response)
    liste.append(generated_texts)
    print(f"{bold} The reasoning result is{unbold}: '{generated_texts}'{newline}")

[1m For prompt[0m: 'Answer based on context:

{context}

{question}'

[1m The reasoning result is[0m: '['Hamdaoum']'

[1m For prompt[0m: '{context}

Answer this question based on the article: {question}'

[1m The reasoning result is[0m: '['hamdaoum@amazon.com']'

[1m For prompt[0m: '{context}

{question}'

[1m The reasoning result is[0m: '['eu-roc-ob-cancel-truck@amazon.com']'

[1m For prompt[0m: '{context}
Answer this question: {question}'

[1m The reasoning result is[0m: '['Hamdaoum']'

[1m For prompt[0m: 'Read this article and answer this question {context}
{question}'

[1m The reasoning result is[0m: '['2).']'

[1m For prompt[0m: '{context}

Based on the above article, answer a question. {question}'

[1m The reasoning result is[0m: '['amazon']'

[1m For prompt[0m: 'Write an article that answers the following question: {question} {context}'

[1m The reasoning result is[0m: '["We had got insufficient volume to load. We will need this truck to deliver, whic

# END

# Cleaning the model:

In [None]:
# Delete the SageMaker endpoint
model_predictor.delete_model()
model_predictor.delete_endpoint()