<a href="https://colab.research.google.com/github/agemagician/CodeTrans/blob/main/prediction/multitask/fine-tuning/function%20documentation%20generation/ruby/large_model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**<h3>Predict the documentation for ruby code using codeTrans multitask finetuning model</h3>**
<h4>You can make free prediction online through this 
<a href="https://huggingface.co/SEBIS/code_trans_t5_large_code_documentation_generation_ruby_multitask_finetune">Link</a></h4> (When using the prediction online, you need to parse and tokenize the code first.)

**1. Load necessry libraries including huggingface transformers**

In [1]:
!pip install -q transformers sentencepiece

[K     |████████████████████████████████| 1.8MB 5.6MB/s 
[K     |████████████████████████████████| 1.2MB 57.5MB/s 
[K     |████████████████████████████████| 890kB 51.5MB/s 
[K     |████████████████████████████████| 3.2MB 56.9MB/s 
[?25h  Building wheel for sacremoses (setup.py) ... [?25l[?25hdone


In [2]:
from transformers import AutoTokenizer, AutoModelWithLMHead, SummarizationPipeline

**2. Load the token classification pipeline and load it into the GPU if avilabile**

In [3]:
pipeline = SummarizationPipeline(
    model=AutoModelWithLMHead.from_pretrained("SEBIS/code_trans_t5_large_code_documentation_generation_ruby_multitask_finetune"),
    tokenizer=AutoTokenizer.from_pretrained("SEBIS/code_trans_t5_large_code_documentation_generation_ruby_multitask_finetune", skip_special_tokens=True),
    device=0
)



HBox(children=(FloatProgress(value=0.0, description='Downloading', max=643.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=2950910481.0, style=ProgressStyle(descr…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=797030.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1786.0, style=ProgressStyle(description…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=24.0, style=ProgressStyle(description_w…




**3 Give the code for summarization, parse and tokenize it**

In [4]:
code = "def add(severity, progname, &block)\n      return true if io.nil? || severity < level\n      message = format_message(severity, progname, yield)\n      MUTEX.synchronize { io.write(message) }\n      true\n    end" #@param {type:"raw"}

In [5]:
!pip install tree_sitter
!git clone https://github.com/tree-sitter/tree-sitter-ruby

Collecting tree_sitter
[?25l  Downloading https://files.pythonhosted.org/packages/cd/c2/7816b62138532028ea760268aef746dae78c542f55f4751bb5f0ef7d28e4/tree_sitter-0.2.2.tar.gz (110kB)
[K     |███                             | 10kB 25.8MB/s eta 0:00:01[K     |██████                          | 20kB 8.6MB/s eta 0:00:01[K     |█████████                       | 30kB 7.8MB/s eta 0:00:01[K     |███████████▉                    | 40kB 7.1MB/s eta 0:00:01[K     |██████████████▉                 | 51kB 4.2MB/s eta 0:00:01[K     |█████████████████▉              | 61kB 4.7MB/s eta 0:00:01[K     |████████████████████▊           | 71kB 4.8MB/s eta 0:00:01[K     |███████████████████████▊        | 81kB 5.2MB/s eta 0:00:01[K     |██████████████████████████▊     | 92kB 5.0MB/s eta 0:00:01[K     |█████████████████████████████▋  | 102kB 4.1MB/s eta 0:00:01[K     |████████████████████████████████| 112kB 4.1MB/s 
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting require

In [6]:
from tree_sitter import Language, Parser

Language.build_library(
  'build/my-languages.so',
  ['tree-sitter-ruby']
)

RUBY_LANGUAGE = Language('build/my-languages.so', 'ruby')
parser = Parser()
parser.set_language(RUBY_LANGUAGE)

In [7]:
def get_string_from_code(node, lines):
  line_start = node.start_point[0]
  line_end = node.end_point[0]
  char_start = node.start_point[1]
  char_end = node.end_point[1]
  if line_start != line_end:
    code_list.append(' '.join([lines[line_start][char_start:]] + lines[line_start+1:line_end] + [lines[line_end][:char_end]]))
  else:
    code_list.append(lines[line_start][char_start:char_end])

def my_traverse(node, code_list):
  lines = code.split('\n')
  if node.child_count == 0:
    get_string_from_code(node, lines)
  elif node.type == 'string':
    get_string_from_code(node, lines)
  else:
    for n in node.children:
      my_traverse(n, code_list)
 
  return ' '.join(code_list)

In [8]:
tree = parser.parse(bytes(code, "utf8"))
code_list=[]
tokenized_code = my_traverse(tree.root_node, code_list)
print("Output after tokenization: " + tokenized_code)

Output after tokenization: def add ( severity , progname , & block ) return true if io . nil? || severity < level message = format_message ( severity , progname , yield ) MUTEX . synchronize { io . write ( message ) } true end


**4. Make Prediction**

In [9]:
pipeline([tokenized_code])

[{'summary_text': 'Writes a message if the severity is high enough . This method is executed asynchronously .'}]