In [1]:
from src.llama2.generate import LLamaQueryGenerator
from src.utils.utils import merge

### Llama 2 Expansion Model

#### Option 1: Meta Website

Download the Llama 2 7B pre-trained weights by visiting [Llama 2 GitHub Page](https://github.com/meta-llama/llama) and following the instructions.

We need to convert the checkpoint from its original format into the Hugging Face format. For that, run:

```bash
pip install -r requirements.txt
```

Assuming the downloaded checkpoint resides under `./llama2/7B`, run the following:

```bash
TRANSFORM=`python -c "import transformers;print('/'.join(transformers.__file__.split('/')[:-1])+'/models/llama/convert_llama_weights_to_hf.py')"`
python ${TRANSFORM} --input_dir llama2 --model_size 7B --output_dir llama2/hf/7B
```

In [2]:
LLAMA2_PATH = './llama2/hf/7B'

#### Option 2: Hugging Face

Request access to the model by acknowledging the license and filling the form in the model card at [https://huggingface.co/meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) and use the link directly.

This option might take more time.

In [3]:
# LLAMA2_PATH = 'meta-llama/Llama-2-7b-hf'

#### Hyperparameters used for expansion

In [4]:
MAX_INPUT_TOKEN_LEN = 512
NUM_QUERIES = 80
MAX_NEW_TOKENS = 50
TOP_K = 50
TOP_P = 0.95

#### Loading Llama 2 with LoRA weights

In [5]:
generator = LLamaQueryGenerator(
    llama_path=LLAMA2_PATH,
    max_tokens=MAX_INPUT_TOKEN_LEN,
    peft_path='soyuj/llama2-doc2query'
)

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

### Document Expansion

In [59]:
document = "The presence of communication amid scientific minds was equally important to the success of the Manhattan Project as scientific intellect was. The only cloud hanging over the impressive achievement of the atomic researchers and engineers is what their success truly meant; hundreds of thousands of innocent lives obliterated."
inputs = [document]

In [66]:
expansions = generator.generate(
    inputs,
    num_return_sequences=NUM_QUERIES,
    max_new_tokens=MAX_NEW_TOKENS,
    do_sample=True,
    top_k=TOP_K,
    top_p=TOP_P
)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


In [67]:
for expansion in expansions[0]:
    print(expansion)

who was involved in the manhattan project
what is the meaning of the manhattan project
how important was communication in the manhattan project
what was the importance of communications in the success of the manhattan project?
why was the manhattan project a success
what was the scientific achievement of the atomic bomb
manhattan project research team
was the manhattan project an success or failure
what does it mean to be involved with the manhattan project
what was the manhattan project and why was it successful
what was the manhattan project and what did it accomplish
which scientific discipline had the most impact on the success of the manhattan project
how does the scientists feel about atomic bomb
what is the manhattan project?
why was science important to the manhattan project
which project included the development of nuclear weapons
who was behind the manhattan project
what was the manhattan project
was the manhattan project important
why was communication important in the manha

#### Appending new expansion terms

In [68]:
expanded_document = merge(document, expansions[0])
print(expanded_document)

The presence of communication amid scientific minds was equally important to the success of the Manhattan Project as scientific intellect was. The only cloud hanging over the impressive achievement of the atomic researchers and engineers is what their success truly meant; hundreds of thousands of innocent lives obliterated. team does place led leaders created weapons at so when how name be feel included failure discipline impact do did were fiction about who worked gained hiroshima mean most on stages ramifications significant science which that characteristic event nuclear social research war succeed secret significance early work ii bombing successful novel bomb meaning involved development accomplish it play goal behind for japan knowledge contributed a had during made roles statement have vital not an other interaction or importance world communications scientists background conducted purpose progress responsible reflect role they with in why justified necessary consider beginning


### Inference with DeeperImpact

In [69]:
from src.deep_impact.models import DeepImpact

In [70]:
deeper_impact = DeepImpact.from_pretrained('soyuj/deeper-impact')
deeper_impact.eval();

In [71]:
impact_scores = deeper_impact.get_impact_scores(expanded_document)
print(impact_scores)

[('the', 0.65567946), ('presence', 2.607322), ('of', 0.48188052), ('communication', 3.691589), ('amid', 3.4650323), ('scientific', 2.6603425), ('minds', 2.482521), ('was', 0.7672796), ('equally', 2.8385103), ('important', 2.6603296), ('to', 0.67902374), ('success', 3.1919205), ('manhattan', 4.527264), ('project', 2.7042632), ('as', 0.31514826), ('intellect', 3.555631), ('only', 1.3839537), ('cloud', 3.512771), ('hanging', 2.7201285), ('over', 1.6333817), ('impressive', 2.7397997), ('achievement', 2.761028), ('atomic', 3.3154886), ('researchers', 1.7769612), ('and', 0.23050006), ('engineers', 2.0636017), ('is', 0.3808705), ('what', 0.44478962), ('their', 0.41361985), ('truly', 1.799064), ('meant', 0.88288176), ('hundreds', 1.0101533), ('thousands', 0.8593701), ('innocent', 2.6967251), ('lives', 1.3273858), ('obliterated', 3.5495284), ('team', 1.418661), ('does', 0.51023555), ('place', 0.5456477), ('led', 0.70278203), ('leaders', 0.79924226), ('created', 0.0), ('weapons', 0.40899017), ('

For performing expansions and inferences for larger collection of documents, refer to the `README.md`.