# Learning to localize and repair real bugs from real bug fixes
This accompanying notebook is an interactive demo for the RealiT bug localization and repair model. The goal is to localize and repair single token bugs in Python 3 code.

## Setup

In [4]:
!mkdir tmp
!git clone https://github.com/cedricrupb/nbfbaselines tmp/

mkdir: tmp: File exists
Cloning into 'tmp'...
Username for 'https://github.com': 

In [None]:
%cd tmp

%pip install -r requirements.txt

import nbfbaselines

## RealiT

In [1]:
# Load the pre-trained RealiT checkpoint
from nbfbaselines import NBFModel

realit_model = NBFModel.from_pretrained("realit")

Now, we are ready for the first test. Let's start with a simple example:

In [None]:
# To run RealiT, we simply call the object with the Python 3 code for the analysis

realit_model("""

def f(x, y):
    return x + x

""")

tensor(13)
tensor(11)
tensor(12)
tensor(0)
tensor(6)


[{'text': 'def f ( x , y ) :\n    return x + y\n    ',
  'before': 'x',
  'after': 'y',
  'prob': 0.981337789216316}]

If you run the cell above, you will see that RealiT successfully detected that we likely want to use `y` for the addition. It is also very confident that adding `x` with itself unlikely to happen. 

## More examples
Now, we try more examples from the README.

**Variable Misuse:**

In [None]:
realit_model("""

def compare(x1, x2):
    s1 = str(x1)
    s2 = str(x1)
    return s1 == s2

""")

[{'text': 'def compare ( x1 , x2 ) :\n    s1 = str ( x1 )\n    s2 = str ( x2 )\n    return s1 == s2\n    ',
  'before': 'x1',
  'after': 'x2',
  'prob': 0.9991145826164102}]

**Binary Operator Bug:**

In [None]:
realit_model("""

def add_one(L):
    i = 0
    while i <= len(L): 
        L[i] = L[i] + 1
        i += 1

""")

[{'text': 'def add_one ( L ) :\n    i = 0\n    while i < len ( L ) :\n        L [ i ] = L [ i ] + 1\n        i += 1\n        ',
  'before': '<=',
  'after': '<',
  'prob': 0.9939126001029219}]

**Unary Operator Bug:**

In [None]:
realit_model("""

if namespace:
    self.namespacesFilter = [ "prymatex", "user" ] 
else:
    self.namespacesFilter = namespace.split()

""")

[{'text': 'if not namespace :\n    self . namespacesFilter = [ "prymatex" , "user" ]\n    \nelse :\n    self . namespacesFilter = namespace . split ( )\n    ',
  'before': 'namespace',
  'after': 'not namespace',
  'prob': 0.9006187905922454}]

We can observe that RealiT can also handle partial code without the need of a function implementation. However, RealiT will usually perform better if the complete function is given.

**Wrong Literal Bug:**

In [None]:
realit_model("""

def add_one(L):
    i = 0
    while i < len(L): 
        L[i] = L[i] + 1
        i += 2 

""")

[{'text': 'def add_one ( L ) :\n    i = 0\n    while i < len ( L ) :\n        L [ i ] = L [ i ] + 1\n        i += 1\n        ',
  'before': '2',
  'after': '1',
  'prob': 0.8798252121683507}]

**Additional: Correct program**

In the following, we view a simple program that adds to numbers together

In [6]:
realit_model('''

def add(x, y):
    """Adds two numbers x and y"""
    return x + y

''')

[{'text': 'def add ( x , y ) :\n    """Adds two numbers x and y"""\n    return x + y\n    ',
  'before': '[CLS]',
  'after': '#norepair#',
  'prob': 0.386080474034792}]

Here, the model successfully detects that the given code is correct (indicated by `#norepair#`).

## Limitations
While RealiT can solve all of the previously given problem, RealiT can miss a bug or identify bugs in correct code if the given implementation context is not sufficient.

In [None]:
realit_model("""

def add(x, y):
    return x + y

""")

[{'text': 'def add ( x , y ) :\n    return x * y\n    ',
  'before': '+',
  'after': '*',
  'prob': 0.31847917865549624}]

Here, while not confident in its decision, RealiT still predicts that the plus operator has to be changed into a multiplication operator.

Since RealiT is not that confident, let us look what RealiT generates if we reject the first hypothesis:

In [3]:
realit_model("""

def add(x, y):
    return x + y

""", topk = 3)

[{'text': 'def add ( x , y ) :\n    return x * y\n    ',
  'before': '+',
  'after': '*',
  'prob': 0.31847917865549624},
 {'text': 'def add ( x , y ) :\n    return x - y\n    ',
  'before': '+',
  'after': '-',
  'prob': 0.22648751427207245},
 {'text': 'def add ( x , y ) :\n    return x + y\n    ',
  'before': '[CLS]',
  'after': '#norepair#',
  'prob': 0.22007132045823513}]

`topk = 3` allows to query for the top 3 most likely repairs according to the model.

## Handling tokens
Sometimes when working with repair models you might be interested in further processing the output. For this, it often more handy to work with a token representation instead code:

In [4]:
realit_model("""

def f(x, y):
    return x + x

""", 
    return_tokens = True
)

[{'text': 'def f ( x , y ) :\n    return x + y\n    ',
  'before': 'x',
  'after': 'y',
  'prob': 0.981337789216316,
  'tokens': ['[CLS]',
   'def',
   'f',
   '(',
   'x',
   ',',
   'y',
   ')',
   ':',
   '#INDENT#',
   'return',
   'x',
   '+',
   'y',
   '#NEWLINE#',
   '#DEDENT#',
   '[EOS]'],
  'token_error_loc': 13}]

Here, `tokens` refers to the internal token representation used by RealiT after applying the fix. In addition, we also provide access to the predicted error location `token_error_loc`. 