<a href="https://colab.research.google.com/github/aviadm24/aviad2/blob/master/Welcome_To_Colab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [6]:
data = {
  "test_cases": [
    {
      "test_requirement": "Verify aircraft engine start-up sequence.",
      "steps": [
        "Ensure aircraft is in a safe condition for engine start.",
        "Turn on the battery master switch.",
        "Set the fuel control lever to the 'RUN' position.",
        "Engage the starter motor.",
        "Monitor engine parameters for normal start-up indications.",
        "Verify engine oil pressure and temperature within limits."
      ]
    },
    {
      "test_requirement": "Test aircraft navigation system accuracy.",
      "steps": [
        "Power on the navigation system.",
        "Enter the desired flight plan or waypoints.",
        "Initiate navigation mode.",
        "Compare displayed position and track with known ground truth.",
        "Verify accuracy within acceptable tolerances."
      ]
    },
    {
      "test_requirement": "Evaluate aircraft response to control inputs during flight.",
      "steps": [
        "Establish stable flight conditions.",
        "Apply aileron input and observe roll rate.",
        "Apply elevator input and observe pitch rate.",
        "Apply rudder input and observe yaw rate.",
        "Verify aircraft response is within expected parameters."
      ]
    }
  ]
}

In [7]:
import json
import tensorflow as tf
from transformers import TFGPT2LMHeadModel, GPT2Tokenizer
from tensorflow.keras.optimizers import Adam

# Load the JSON data
# with open('/content/test_data.json', 'r') as f:
#     data = json.load(f)

# Load pre-trained GPT2 model and tokenizer
model = TFGPT2LMHeadModel.from_pretrained('gpt2')
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')



# Extract the test cases from the loaded JSON data
test_cases = data["test_cases"]

# Prepare the dataset for fine-tuning
requirements = [case["test_requirement"] for case in test_cases]
steps = [case["steps"] for case in test_cases]  # Steps remain as an array of strings

# Tokenize the test requirements (input) and steps (output)
input_encodings = tokenizer(requirements, padding=True, truncation=True, return_tensors="tf")
output_encodings = tokenizer(['\n'.join(step) for step in steps], padding=True, truncation=True, return_tensors="tf")

input_ids = input_encodings['input_ids']
output_ids = output_encodings['input_ids']

# Fine-tune the pre-trained GPT-2 model on your custom data
model.compile(optimizer=Adam(learning_rate=5e-5))

# Fine-tune the model for 3 epochs (or more depending on the size of your data)
model.fit(input_ids, output_ids, epochs=3, batch_size=8)

# Function to generate test steps in JSON format
def generate_test_steps_in_json(test_requirement):
    # Tokenize the input test requirement
    input_ids = tokenizer(test_requirement, return_tensors="tf")['input_ids']

    # Generate the test steps using the fine-tuned model
    generated_ids = model.generate(input_ids, max_length=100, num_return_sequences=1)

    # Decode the generated sequence back into readable text
    generated_text = tokenizer.decode(generated_ids[0], skip_special_tokens=True)

    # Split the generated text into individual steps based on newline
    generated_steps = generated_text.split('\n')

    # Create a dictionary to hold the test steps in JSON format
    test_steps_json = {
        "test_requirement": test_requirement,
        "steps": [step.strip() for step in generated_steps if step.strip()]
    }

    # Convert the dictionary to JSON string (pretty-printing for readability)
    test_steps_json_str = json.dumps(test_steps_json, indent=4)
    return test_steps_json_str

# Example test requirement
test_requirement = "Test aircraft performance during high-speed flight."

# Generate the corresponding test steps in JSON format
generated_steps_json = generate_test_steps_in_json(test_requirement)

# Print the generated test steps as JSON
print(generated_steps_json)


All PyTorch model weights were used when initializing TFGPT2LMHeadModel.

All the weights of TFGPT2LMHeadModel were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFGPT2LMHeadModel for predictions without further training.


ValueError: Asking to pad but the tokenizer does not have a padding token. Please select a token to use as `pad_token` `(tokenizer.pad_token = tokenizer.eos_token e.g.)` or add a new pad token via `tokenizer.add_special_tokens({'pad_token': '[PAD]'})`.