<h1> Answering Questions with DistilBERT Transformer model </h1>
<p> ChapGPT is built on the GPT3 transformer model, but DistilBERT is a reasonable approximation of GPT3. The Hugging Face library has a plethora of tools for building a NLP solution. This walkthrough provides a high-level (emphasis on high-level) overview of the fundamentals of Hugging Face, with links to the official documentation. We'll import a DistillBERT pretrained model, then improve it with a wiki answers dataset. </p>


<h3> Instantiating a Model </h3>

<p> Hugging face provides access to models through a 'pipeline'. Pipelines are an interface for interacting with an underlying model - objects instantiated with the 'pipeline' class accept inputs as arugments and outputs iteratable results Get started with pipelines <a href="https://huggingface.co/docs/transformers/v4.26.1/en/quicktour">here</a> </p>

In [6]:
### Import from transformers
from transformers import pipeline, 

### Instantiate a pipeline object. 
question_answerer = pipeline(task="question-answering")

### Save results. A 'question_answering' model requires a 'quesiton' and 'context' argument
result = question_answerer(question="What is the meaning of life?", context="A young person searching for his role")

### Output results 
print(
f"Answer: '{result['answer']}', score: {round(result['score'], 4)}, start: {result['start']}, end: {result['end']}")

No model was supplied, defaulted to distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


Answer: 'searching for his role', score: 0.519, start: 15, end: 37


<h3> Customizing a model </h3>
<p> Passing a task argument to pipeline lets Hugging Face pick the pre-trained model, but the pipeline provides further customization options. We'll take advantage of them to make use of our DistilBert model. Hugging Face provides  classes for specific models - a list of available models can be found <a href=https://huggingface.co/docs/transformers/v4.26.1/en/model_doc/auto#transformers.AutoModel.from_pretrained> here</a> </p>

In [7]:
### Import and instantiate the GPDistilBert model. 'from_pretrained' accepts the name of the model as an argument. 
### The name is found at the top of the models 'model card'. https://huggingface.co/models?pipeline_tag=question-answering&sort=downloads
from transformers import DistilBertForQuestionAnswering, AutoTokenizer
model = DistilBertForQuestionAnswering.from_pretrained('distilbert-base-cased-distilled-squad')

<h5> Tokenizers </h5>
A model cannot accept generic text as input - it must accept 'tokenized' inputs. A tokenizer object <a href="https://huggingface.co/docs/transformers/main_classes/tokenizer"> prepares the input</a> by converting the text into into 'tokens' the transformer reads. Each model has an associated tokenizer that can be instantiated by passing the model name to the <a href="https://huggingface.co/docs/transformers/v4.26.1/en/model_doc/auto#transformers.AutoTokenizer">'AutoTokenizer'</a> Hugging Face class

In [9]:
tokenizer = AutoTokenizer.from_pretrained('distilbert-base-cased-distilled-squad')

### now, we create a new question answerer out of our model

question_answerer_distil_bert = pipeline(task="question-answering", model=model, tokenizer=tokenizer)

In [None]:
### Answer my question, print results
result_distil_bert = question_answerer_distil_bert(question="What is the meaning of life?", context="A young person searching for his role")
print(
f"Answer: '{result_distil_bert['answer']}', score: {round(result_distil_bert['score'], 4)}, start: {result_distil_bert['start']}, end: {result_distil_bert['end']}")