# Using AI/ML to aid in verification
### Pilot - Abishek Shyamsunder

# Contents  
| S.no | Contents                                           |
|------|----------------------------------------------------|
| 1    | Problem areas Identified                           |
| 2    | Current Implementations and corresponding problems |
| 3    | Innovations to try                                 |
| 4    | Scope and Limitations                              |

### Problem areas Identified
___  

## Areas that require Intervention of AI/ML  
1. Generating code that covers extensive area of the search space (In search of errors)  
    - Best case scenario requires 100% of the code to be tested. 
    - All possible instructions
    - All possible combination of instructions (Order in which instructions are executed)
2. Identification of code that throws errors and generates code that either magnifies the error or at minimnum maintains the same percentage of error.  


### Problem areas Identified  
___  
Solutions  
- Problem 1:  
    - Divide code into possible blocks, each corresponding to a specific area (such as branching, looping etc)    
    - Use AI to generate code uniformly across all blocks, such that testing is done in a spiral manner  
- Problem 2:  
    - Implement solution as a subproblem of previous problem  
        - Identify blocks that provided errors and generate code concenterating on that  
    - Use Natural Language processing to generate code similar to erroneous code  

### Current Implementations and corresponding problems  
___ 

1. To solve a problem that involves the computer in understanding the piece of written text is a part of NLP (Natural Language Processing)    
2. Usually build solutions for AI/ML on the shoulders of existing solutions (Never building from scratch because of extensive data requirement)  
3. All current solutions cater to human communication languages like English/ French/ German  
4. Developing system to work with code not easy, takes time!

### Problem 1: Generating code to match exhaustive testing  
___  
Code in below cells tries to mimic basic abstract behaviour of proposed algorithm  

In [5]:
import tensorflow as tf
from tensorflow import keras
import numpy as np
import pandas as pd

In [38]:
blocks = np.array([0,1,2,3,4])

#training_x1 = np.random.choice(blocks, size = (10000,50), p=[0.2, 0.2, 0.2, 0.2, 0.2])
training_x2 = np.random.choice(blocks, size = (10000,50), p=[0.5, 0.125, 0.125, 0.125, 0.125])
training_x3 = np.random.choice(blocks, size = (10000,50), p=[0.125, 0.5, 0.125, 0.125, 0.125])
training_x4 = np.random.choice(blocks, size = (10000,50), p=[0.125, 0.125, 0.5, 0.125, 0.125])
training_x5 = np.random.choice(blocks, size = (10000,50), p=[0.125, 0.125, 0.125, 0.5, 0.125])
training_x6 = np.random.choice(blocks, size = (10000,50), p=[0.125, 0.125, 0.125, 0.125, 0.5])

#training_y1 = np.random.choice([0,1,2,3,4],size=(10000,1),p=[0.2, 0.2, 0.2, 0.2, 0.2])
training_y2 = np.random.choice([0,1,2,3,4],size=(10000,1),p=[1, 0, 0, 0, 0])
training_y3 = np.random.choice([0,1,2,3,4],size=(10000,1),p=[0, 1, 0, 0, 0])
training_y4 = np.random.choice([0,1,2,3,4],size=(10000,1),p=[0, 0, 1, 0, 0])
training_y5 = np.random.choice([0,1,2,3,4],size=(10000,1),p=[0, 0, 0, 1, 0])
training_y6 = np.random.choice([0,1,2,3,4],size=(10000,1),p=[0, 0, 0, 0, 1])


In [73]:
train_x = np.concatenate((training_x2,training_x3,training_x4,training_x5,training_x6), axis=0)
train_y = np.concatenate((training_y2,training_y3,training_y4,training_y5,training_y6), axis=0)

train_x.shape = (50000,50,1)
train_y.shape = (50000,1)

In [74]:
model1c = tf.keras.models.Sequential([
    tf.keras.layers.SimpleRNN(units=20,return_sequences=True,input_shape=[None,1]),
    tf.keras.layers.SimpleRNN(units=20),
    tf.keras.layers.Dense(units=5,activation="softmax")
])
model1c.compile(loss="sparse_categorical_crossentropy",optimizer="adam",metrics=['accuracy'])
model1c.fit(train_x,train_y,epochs=10,verbose=1)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x144610390>

In [96]:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' 
x = np.array([1,2,3])
x.shape = (1,3,1)
for i in range(50):
    y = np.argmin(model1c.predict(x),axis=-1)
    y.shape = (1,1,1)
    x = np.append(x,y,axis=1)
print(x)

[[[1]
  [2]
  [3]
  [0]
  [0]
  [0]
  [4]
  [0]
  [4]
  [1]
  [4]
  [2]
  [0]
  [4]
  [1]
  [3]
  [2]
  [2]
  [1]
  [4]
  [0]
  [4]
  [1]
  [4]
  [1]
  [4]
  [1]
  [4]
  [1]
  [3]
  [2]
  [2]
  [1]
  [4]
  [1]
  [4]
  [1]
  [0]
  [4]
  [3]
  [2]
  [2]
  [0]
  [4]
  [1]
  [4]
  [1]
  [4]
  [1]
  [4]
  [1]
  [2]
  [2]]]


### Problem 1: Generating code to match exhaustive testing  
___  
More fine grained control can be obtained by using blocks as well as sub-blocks. 
For example, block 1 represent branch instructions. Then, we can also assign sub-blocks such as 1.1, 1.2, 1.3, ... each representing a different type of branch instruction  
We can also maintain a modular model, in which the above program can be perinially run and will output test programs at fixed intervals, each an increment of the previous output...  

### Problem 2: Generating code, similar to one that throws errors  
___  
Key Point to note from the implementation of the previous solution  
`training_x2 = np.random.choice(blocks, size = (10000,50), p=[0.5, 0.125, 0.125, 0.125, 0.125])`  
- When given a set of blocks / sub-blocks, then we can decide, what percentage of each block should be present in the output depending on whether, the block provided an error or not while running.  
- If we feel that a particular 'sequence of blocks' seem to throw errors, we can define the sequence as a new individual block and work from there.  


# Problems/ Areas where I need help...  
- Some help in identifying and segregating the blocks and the sub-blocks  
    - Ideas to help maintain the code to be modular at all times  
- Idea on how to limit the size of the program as by this algorithm, the output program size keeps growing.  