# Experiment: issue similarity

## Load packages

In [33]:
import tensorflow as tf
from github import Github
from transformers import (
    AutoTokenizer,
    TFAutoModelForQuestionAnswering
)
from tqdm.notebook import tqdm
from time import sleep

## Define global variables

In [34]:
question = "How to do question answering?"
github_org = "huggingface"
github_repo = "transformers"

# Define GitHub utilities

In [35]:
class GitHubUtils():
    def __init__(self, org, repo):
        self._org = org
        self._repo = repo
        self._github = Github()
        self._repo = self._github.get_repo(f"{self._org}/{self._repo}")
        self._issues = self._repo.get_issues(state='closed')
    
    def get_issues(self):
        return self._issues

In [36]:
gh = GitHubUtils(github_org, github_repo)

RateLimitExceededException: 403 {"message": "API rate limit exceeded for 71.198.193.157. (But here's the good news: Authenticated requests get a higher rate limit. Check out the documentation for more details.)", "documentation_url": "https://docs.github.com/rest/overview/resources-in-the-rest-api#rate-limiting"}

## Load pre-trained BERT

In [25]:
model_id = "distilbert-base-uncased-distilled-squad"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = TFAutoModelForQuestionAnswering.from_pretrained(model_id)

Some layers from the model checkpoint at distilbert-base-uncased-distilled-squad were not used when initializing TFDistilBertForQuestionAnswering: ['dropout_19']
- This IS expected if you are initializing TFDistilBertForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFDistilBertForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some layers of TFDistilBertForQuestionAnswering were not initialized from the model checkpoint at distilbert-base-uncased-distilled-squad and are newly initialized: ['dropout_132']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [26]:
issues = gh.get_issues()

In [27]:
for i, issue in tqdm(enumerate(issues)):
    if i % 100 == 0:
        print("Pause 1 second to meet the GitHub API rate limit.")
        sleep(1)
    issue_context = (issue.title + issue.body)[:500]
    inputs = tokenizer(question, issue_context, add_special_tokens=True, return_tensors="tf")
    input_ids = inputs["input_ids"].numpy()[0]
    outputs = model(inputs)
    answer_start_scores = outputs.start_logits
    answer_end_scores = outputs.end_logits
    answer_start = tf.argmax(answer_start_scores, axis=1).numpy()[0]
    answer_end = (tf.argmax(answer_end_scores, axis=1) + 1).numpy()[0]
    answer = tokenizer.convert_tokens_to_string(
        tokenizer.convert_ids_to_tokens(input_ids[answer_start:answer_end]))
    print(f"Question: {question}")
    print(f"Answer: {answer}")

0it [00:00, ?it/s]

 # # environment info - transformers version : 4. 5. 1 - python version : python 3. 7 - using gpu in script? yes # # # who can help
Question: How to do question answering?
Answer: mention them, if possible by @ gh - username
Question: How to do question answering?
Answer: instantiating an automodelthe current ` _ baseautomodelclass ` class initialization does not accept any argument, and therefore fails with an arcane error when instantiating it incorrectly
Question: How to do question answering?
Answer: 
Question: How to do question answering?
Answer: 
Question: How to do question answering?
Answer: 
Question: How to do question answering?
Answer: 
Question: How to do question answering?
Answer: using my own modified scripts
Question: How to do question answering?
Answer: flax port vision transformer to flaxport the existing vision - transformer to flax. [SEP]
Question: How to do question answering?
Answer: fixes a typo or improves the docs
Question: How to do question answering?
Answ

RateLimitExceededException: 403 {"message": "API rate limit exceeded for 71.198.193.157. (But here's the good news: Authenticated requests get a higher rate limit. Check out the documentation for more details.)", "documentation_url": "https://docs.github.com/rest/overview/resources-in-the-rest-api#rate-limiting"}