# Generate Scenarios, Step 1

This notebook is used to generate the baseline design scenarios prior to augmenting the scenarios with design details specific for a verifiable requirement. The procedure consists of the following steps:

1. Read the complete app description downloaded from the app store.
2. Summarize each complete app description into a single sentence.
3. Identify the main user and app actions from the complete app description.
4. Identify prospective data types processed by each identified action.
5. Rewrite the summary, action and relevant data types into a brief scenario.


In [1]:
#%pip install langchain
#%pip install langserve[all]

In [2]:
import json

ds_name = 'apple_app'
#ds_name = 'google_play'

app_desc_file = 'scenarios/%s_eu_200_en.json' % ds_name

raw_data = json.load(open(app_desc_file, 'r'))
raw_data = [d['description'] for d in raw_data]
    
print('Read %i App descriptions from: %s' % (len(raw_data), app_desc_file))

Read 200 App descriptions from: scenarios/apple_app_eu_200_en.json


In [3]:
from langchain.chat_models import ChatOpenAI
import os

model = ChatOpenAI(
    openai_api_key = os.environ["OPENAI_API_KEY"],
    model_name = 'gpt-3.5-turbo'
)

## Prompt 1: Summarize the complete app description

Prompt the model to summarize the complete app description into a single sentence. Attempt to remove app names from the summary. Write the results to a file.

In [4]:
from langchain.prompts.chat import ChatPromptTemplate

prompt1 = ChatPromptTemplate.from_messages([
    ('system', 'You are a helpful assistant.'),
    ('human', """Summarize the following app description in one sentence. Refer to the app as 'The mobile app' and do not refer to the app's name. Do not comment or elaborate.

App Description: {app_desc}""")
])

chain = prompt1 | model

dataset = []
for i, app_desc in enumerate(raw_data):
    response = chain.invoke({'app_desc': app_desc})
    dataset.append({'summary': response.content})
    print(i)
    
print('Acquired %i app summaries.' % len(dataset))

# write dataset to file
json.dump(dataset, open('scenarios/%s_scenarios1.json' % ds_name, 'w+'))
json.dump(dataset, open('scenarios/%s_scenarios2.json' % ds_name, 'w+'))

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
Acquired 200 app summaries.


## Prompt 2: Identify actions from the complete app description

Prompt the model to identify user or app actions from the complete app description. Generate the actions in a JSON list for parsing and reuse later.

If the prompt loop appears to be suspended, then stop the execution and re-run the cell. The cell is designed to recover and skip previoulsy processed app descriptions.

In [5]:
dataset = json.load(open('scenarios/%s_scenarios2.json' % ds_name, 'r'))
print('Read %i app summaries' % len(dataset))

Read 200 app summaries


In [6]:
from langchain.prompts.chat import ChatPromptTemplate
from langchain_core.output_parsers import JsonOutputParser

prompt2 = ChatPromptTemplate.from_messages([
    ('system', 'You are a helpful assistant.'),
    ('human', """Identify the main actions described in the following app description, and respond with a list in JSON format where each item in the list is a user or app action description. Each action should have a subject idenitfying who performs the action. Do not comment or elaborate. The JSON output should look like the following example.
    
Example: ["The user reads and responds to posts by their friends.", "The app provides support for multiple file formats.", "The user creates a video from their favorite dance moves.", "The app instructs the user how to speak in a foreign language."]


App Description: {app_desc}""")
])

parser = JsonOutputParser(return_exceptions=True)
chain = prompt2 | model | parser
parse_errors2 = []

for i, app_desc in enumerate(raw_data):
    if 'actions' in dataset[i]:
        print('%i -' % i)
        continue
        
    try:
        response = chain.invoke({'app_desc': app_desc})
        dataset[i]['actions'] = [[a, []] for a in response]
        
    except Exception:
        dataset[i]['actions'] = []
        parse_errors2.append(i)
        print('%i *' % i)
        continue
        
    print(i)
        
print('Processed %i summaries, encountered %i errors' % (len(dataset), len(parse_errors2)))


0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
Processed 200 summaries, encountered 0 errors


In [7]:
# write dataset to file
json.dump(dataset, open('scenarios/%s_scenarios2.json' % ds_name, 'w+'))

In [8]:
# export the actions to a CSV file for visual inspection

import csv

with open('scenarios/%s_scenarios2.csv' % ds_name, 'w+') as f:
    writer = csv.writer(f)
    for scenario in dataset:
        for action, _ in scenario['actions']:
            writer.writerow([scenario['summary'], action])

## Prompt 3: Identify data types from each identified action

Prompt the model to generate a list of data types that could be processed by the action. Generate the data types in a JSON list for parsing and reuse. Remove any actions whose data types fail to parse, or that parse as empty lists.

In [9]:
from langchain.prompts.chat import ChatPromptTemplate
from langchain.output_parsers.json import SimpleJsonOutputParser

dataset = json.load(open('scenarios/%s_scenarios2.json' % ds_name, 'r'))

def generate_datatypes(dataset):
    prompt3 = ChatPromptTemplate.from_messages([
        ('system', 'You are a helpful assistant.'),
        ('human', """For the following actions, list the personal data processed by the action. Respond with a list in JSON format where each item in the list is a personal data. Do not comment or elaborate.

    Action: The user reads and responds to posts by their friends.
    Personal data: ["posts", "shared photos", "links to articles"]

    Action: The user creates a video from their favorite dance moves.
    Personal data: ["videos", "dance styles", "music interests", "clothing preferences"]

    Action: The app instructs the user how to speak in a foreign language.
    Personal data: ["native language", "foreign language", "travel interests", "inferred gender"]

    Action: {action}
    Personal data: """)
    ])

    chain = prompt3 | model | parser
    parse_errors3 = []
    action_count = 0

    for i, entry in enumerate(dataset):
        for j in range(len(entry['actions'])):            
            if len(entry['actions'][j][1]) > 0:
                continue
                
            action_count += 1
            response = chain.invoke({'action': entry['actions'][j][0]})

            try:
                datatypes = response
                entry['actions'][j][1] = datatypes

            except Exception:
                parse_errors3.append([i, j])
        print(i)
    
    print('Processed %i actions, encountered %i errors' % (action_count, len(parse_errors3)))
    return dataset
                                 
# write dataset to file
dataset = generate_datatypes(dataset)
json.dump(dataset, open('scenarios/%s_scenarios3.json' % ds_name, 'w+'))

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
Processed 1928 actions, encountered 0 errors


## Remove incomplete entries from the dataset

In [10]:
dataset = json.load(open('scenarios/%s_scenarios3.json' % ds_name, 'r'))

removed_scenarios = 0

for i in reversed(range(len(dataset))):
    for j in reversed(range(len(dataset[i]['actions']))):
        if not dataset[i]['actions'][j][1]:
            dataset[i]['actions'].pop(j)
        elif len(dataset[i]['actions'][j][1]) == 0:
            dataset[i]['actions'].pop(j)
            
    if len(dataset[i]['actions']) == 0:
        dataset.pop(i)
        removed_scenarios += 1
    
print('Removed %i scenarios with no actions, yielding %i scenarios' % (
    removed_scenarios, len(dataset)))
json.dump(dataset, open('scenarios/%s_scenarios4.json' % ds_name, 'w+'))

Removed 5 scenarios with no actions, yielding 195 scenarios


In [11]:
# export the actions to a CSV file for visual inspection

import csv

dataset = json.load(open('scenarios/%s_scenarios4.json' % ds_name))

row_count = 0
with open('scenarios/%s_scenarios4.csv' % ds_name, 'w+') as f:
    writer = csv.writer(f)
    writer.writerow(['summary', 'actions', 'datatypes'])
    for scenario in dataset:
        for [action, datatypes] in scenario['actions']:
            writer.writerow([scenario['summary'], action, datatypes])
            row_count += 1

print('Wrote %i rows.' % row_count)

Wrote 1156 rows.


In [12]:
# print a random sample of five actions and their data types.

from random import randrange

for i in range(0, 5):
    j = randrange(len(dataset))
    print(dataset[j]['actions'][0])
    print()

['The user customizes units for speed, altitude, visibility, temperature, and atmospheric pressure.', ['preferred units', 'location']]

['The user creates a customized package of their favorite content to enjoy on all their devices.', ['content preferences', 'devices used', 'viewing habits']]

["The user can create their own user account to track Aki Awards, unlocked accessories, and Genizs' balance.", ['user account information', 'Aki Awards progress', 'unlocked accessories', 'Genizs balance']]

["The user learns a new language through Duolingo's educational app.", ['language learning progress', 'lessons completed', 'time spent on exercises', 'correct/incorrect responses']]

['The user saves all user credentials in the Keychain for convenience.', ['user credentials']]



## Prompt 4: Rewrite responses into a brief scenario

Prompt the model to rewrite the summary, action and relevant data types into a brief scenario. Instruct to avoid using personal names for users or app names in the new scenario, and to minimize overly expressive language (e.g., superlatives seen in marketing material.)

Perform the rewrite for a sample of 10 scenarios.

In [13]:
import random

dataset = json.load(open('scenarios/%s_scenarios4.json' % ds_name))

# itemize scenario elements for random sampling
itemized = []
for entry in dataset:
    for action, datatypes in entry['actions']:
        itemized.append([entry['summary'], action, datatypes])
        
samples = random.sample(itemized, 200)

In [14]:
with open('scenarios/%s_sample.csv' % ds_name, 'w+') as f:
    writer = csv.writer(f)
    writer.writerow(['summary', 'actions', 'datatypes'])
    for summary, action, datatypes in samples:
            writer.writerow([summary, action, datatypes])

In [15]:
prompt4 = ChatPromptTemplate.from_messages([
    ('system', 'You are a helpful assistant.'),
    ('human', """Rewrite the following sentences into a brief user scenario in third person. Refer to the user or app, but do not mention names. Do not mention the name of the app. Minimize overly expressive language.
    
Information: {summary} {action} The app uses {datatypes} to perform this function.

Scenario: """)
])

chain = prompt4 | model
scenarios = []

for i, [summary, action, datatypes] in enumerate(samples):
    response = chain.invoke({
        'summary': summary,
        'action': action,
        'datatypes': datatypes
    })

    scenarios.append(response.content)
    print(i)
        
print('Rewrote %i scenarios.' % len(scenarios))

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
Rewrote 200 scenarios.


In [16]:
json.dump(scenarios, open('scenarios/%s_sample.json' % ds_name, 'w+'))

In [17]:
print('Summary: %s\nAction: %s\nDatatypes: %s\n' % (
    summary, action, datatypes))
print('Scenario: %s\n\n' % response.content)

Summary: The mobile app offers convenient access to check usage, manage Wi-Fi network, pay bills, and oversee services with ease, ensuring users are always up to date on their Telenet account.
Action: The user measures internet speed or reboots the modem via the app.
Datatypes: ['internet speed', 'modem status']

Scenario: The user quickly checks their internet speed and reboots the modem using the app. They easily manage their Wi-Fi network, pay bills, and stay updated on their account services.


