In [1]:
from initialize import *
%load_ext autoreload
%autoreload 2

In [2]:
### Load the model

base_model_path: str = "meta-llama/Meta-Llama-3-8B-Instruct"#"meta-llama/Llama-2-13b-chat-hf"
model_path=base_model_path
###model_path="cackerman/llama2_13b_chat_projection_tune_neg_in"
device: str = "mps" if torch.backends.mps.is_available() else "cuda" if torch.cuda.is_available() else "cpu" 

model = load_model(model_path, base_model_path, device)

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [3]:
def generate_completion_prompts(filename, comp_file=None, base = False):
    starts = load_from_json(filename)
    if comp_file: completions = load_from_json(comp_file)
    completion_prompts=[]
    for start in starts:
        if base:
            prompt = LLAMA3_CONTINUATION_BASE_PROMPT_TEMPLATE.format(user_prompt=start['text']).strip()
        else:
            if comp_file:
                index = next((i for i, d in enumerate(completions) if d['id'] == start['id']), -1)
                #textarr = model.tokenizer.encode(completions[index]['text'], return_tensors="pt")[0]
                textarr = completions[index]['text'].split()
                wdct = round(len(textarr)/10)*10
                prompt = LLAMA3_CONTINUATION_PROMPT_TEMPLATE.format(system_prompt=CONTINUATION_SYSTEM_PROMPT_TEMPLATE.format(wordcount=wdct),user_prompt=start['text']).strip()### without the strip I sometimes get null (from the chat model) or garbage (from the base model) output
            else:
                prompt = LLAMA3_CONTINUATION_PROMPT_TEMPLATE.format(system_prompt=CONTINUATION_SYSTEM_PROMPT,user_prompt=start['text']).strip()### without the strip I sometimes get null (from the chat model) or garbage (from the base model) output
        completion_prompts.append({'id': start['id'], 'source': start['source'], 'text': prompt})
    return completion_prompts

In [4]:
completion_prompts=generate_completion_prompts('./starts_full/starts_train.json', comp_file='./completions_full/completions_human_train.json', base=False)
completion_prompts[:2]

[{'id': 'id633',
  'source': 'forum',
  'text': "<|start_header_id|>system<|end_header_id|>\n\nYou are an engine that repeats given text and writes a natural continuation of it. Your task is to repeat the text given by the user and continue it. Your generated text should have at least 400 words.<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nYusunoha: Question regarding diffusing a led strip on a plaque.\n[removed]\nWolfiesden: Well, I would have to ask first, what is the purpose of the light?\n\nAre you adding it to illuminate the animal head?  Are you adding it as a decoration?  Trying to make it an eye catcher?\n\nJust remember that anyone looking at the head will be staring into the light which may make it less appealing to look at rather than more appealing if you end up putting it on the front like your image shows.\n\nNow, if you are going the route of lighting the object, then you need to use some sort of directed light that is at least partly shielded from the person vie

In [None]:
def do_batch_decode(generated_tokens, input_ids, tokenizer):
    batch_size = generated_tokens.shape[0]
    start_indices = input_ids.shape[1]
    max_len = generated_tokens.shape[1] - start_indices
    tokens_to_decode = torch.full((batch_size, max_len), tokenizer.pad_token_id, dtype=torch.long)
    
    for i in range(batch_size):
        len_to_decode = generated_tokens.shape[1] - input_ids.shape[1]
        tokens_to_decode[i, :len_to_decode] = generated_tokens[i, input_ids.shape[1]:]
    
    return tokenizer.batch_decode(tokens_to_decode, skip_special_tokens=True, clean_up_tokenization_spaces=False)
    
def generate_completions(model, model_name, prompts, batch_size=4, split="train"):
    dir_name = "completions_longer"
    outputfile = f"completions_{model_name}_{split}"
    filename = f"{dir_name}/{outputfile}.json"
    partial_filename = f"{dir_name}/{outputfile}_partial.json"    

    results = []    
    if os.path.exists(filename) or os.path.exists(partial_filename):
        results_tmp = load_from_json(filename) if os.path.exists(filename) else load_from_json(partial_filename)
        results = [result for result in results_tmp if result['text'] != ""]
        existing_combinations = {(result['id'], result['source']) for result in results}
        prompts = [q for q in prompts if (q['id'], q['source']) not in existing_combinations]
        if prompts == []:
            print(f"All data in {filename} has been processed. Exiting.")
            return
        else:
            print(f"Processing new data not in {filename}")
    initlenresults = len(results)

    model.eval()
    model.tokenizer.padding_side="left"
    sampling_kwargs={"use_cache": True, "pad_token_id": model.tokenizer.pad_token_id, "max_new_tokens": 1500, "do_sample": True}

    encodelens=[]
    for i in tqdm(range(initlenresults, initlenresults+len(prompts), batch_size)):
        #print(f"Processing batch {i} to {i + batch_size}")
        batch = prompts[i-initlenresults:min(i-initlenresults+batch_size,len(prompts))]
        input_texts = [item[f"text"] for item in batch] 
        inputs = model.tokenizer(input_texts, return_tensors="pt", truncation=True, padding=True)
        #encodelens.extend([len(input['input_ids']) for input in inputs])
        inputs = {k: v.to(model.device) for k, v in inputs.items()}
        with torch.no_grad():
            output_ids = model.generate(**inputs, **sampling_kwargs)

        output = do_batch_decode(output_ids, inputs['input_ids'], model.tokenizer)

        for j, item in enumerate(batch):
            if len(results) <= i + j:#if this is your first time processing this input
                results.append({"id": item["id"], "source": item["source"]})
            
            result = results[i + j]
            result["text"] = output[j]
            if output[j] == "": print("Bad output at ",item["id"])
            encodelens.append(len(inputs['input_ids'][j]))
            
        if (i-initlenresults + batch_size) % 10 == 0 or (i-initlenresults + batch_size) >= len(prompts):
            print(f"Completed {i-initlenresults + batch_size} rows out of {len(prompts)}")
            save_to_json(results, partial_filename)

    save_to_json(results, filename)
    return encodelens

encodelens=generate_completions(model=model, model_name="llama3_8bbase", prompts=completion_prompts, batch_size=2, split="train")
if encodelens: print("Max encodelens=",max(encodelens))

  0%|          | 0/500 [00:00<?, ?it/s]Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
  1%|          | 5/500 [02:07<3:18:06, 24.01s/it]

Completed 10 rows out of 1000


  2%|▏         | 8/500 [02:55<2:36:07, 19.04s/it]

Bad output at  id1769


  2%|▏         | 10/500 [03:49<3:12:55, 23.62s/it]

Completed 20 rows out of 1000


  3%|▎         | 15/500 [06:40<3:53:40, 28.91s/it]

Bad output at  id445
Completed 30 rows out of 1000


  4%|▍         | 20/500 [08:41<3:18:54, 24.86s/it]

Completed 40 rows out of 1000


  5%|▌         | 25/500 [11:02<3:36:41, 27.37s/it]

Completed 50 rows out of 1000


  6%|▌         | 30/500 [12:54<2:57:08, 22.61s/it]

Completed 60 rows out of 1000


  7%|▋         | 35/500 [15:03<3:30:26, 27.15s/it]

Completed 70 rows out of 1000


  8%|▊         | 39/500 [16:32<2:52:15, 22.42s/it]

Bad output at  id587


  8%|▊         | 40/500 [16:51<2:43:05, 21.27s/it]

Completed 80 rows out of 1000


  9%|▉         | 45/500 [18:40<2:35:12, 20.47s/it]

Completed 90 rows out of 1000


 10%|█         | 50/500 [20:36<2:58:40, 23.82s/it]

Completed 100 rows out of 1000


 11%|█         | 55/500 [23:01<3:47:34, 30.69s/it]

Completed 110 rows out of 1000


 12%|█▏        | 60/500 [25:08<2:57:22, 24.19s/it]

Completed 120 rows out of 1000


 13%|█▎        | 65/500 [26:50<2:33:09, 21.13s/it]

Completed 130 rows out of 1000


 14%|█▍        | 70/500 [30:18<4:18:29, 36.07s/it]

Completed 140 rows out of 1000


 15%|█▌        | 75/500 [32:30<3:23:48, 28.77s/it]

Completed 150 rows out of 1000


 16%|█▌        | 80/500 [34:38<3:28:44, 29.82s/it]

Completed 160 rows out of 1000


 17%|█▋        | 85/500 [36:31<2:51:15, 24.76s/it]

Completed 170 rows out of 1000


 17%|█▋        | 86/500 [37:03<3:05:48, 26.93s/it]

In [6]:
#, "pad_token_id": model.tokenizer.pad_token_id
model.tokenizer.padding_side="left"
index = 1#next((i for i, d in enumerate(completion_prompts) if d['id'] == 'id1522'), -1)
sampling_kwargs={"use_cache": True, "pad_token_id": model.tokenizer.pad_token_id, "max_new_tokens": 1500, "do_sample": True}#, "repetition_penalty": 1.5, "penalty_alpha": 0.6, "top_k": 4}#, "no_repeat_ngram_size": 4}
#input_ids = model.tokenizer.encode(completion_prompts[3]['text'], return_tensors="pt").to(model.device)
inputs = model.tokenizer(completion_prompts[index]['text'], return_tensors="pt")
inputs = {k: v.to(model.device) for k, v in inputs.items()}
model.eval()  # Set model to evaluation mode
with torch.no_grad():
    outputs = model.generate(**inputs,**sampling_kwargs)[0]
output=model.tokenizer.decode(outputs[len(inputs['input_ids'][0]):])
print(len(output.split()))
print(len(outputs)-len(inputs['input_ids'][0]))
print(output)

346
483
 will also discuss the contributions of other mathematicians and cartographers who played a significant role in the development of quasiconformal mappings.

Ptolemy's work, "Geographia", is a fundamental source of information on ancient cartography and its methods. He developed a system of coordinates and a method for projecting the curved surface of the Earth onto a flat plane. His work had a significant impact on the development of cartography and was widely used for many centuries. The development of quasiconformal mappings can be seen as an extension of Ptolemy's work, as it deals with the problem of projecting the curved surface of the sphere onto a flat plane while preserving the angles and shapes of the original surface.

In the 17th and 18th centuries, cartographers such as Gerardus Mercator and Johann Bayer developed new methods for projecting the Earth's surface onto a flat plane. Mercator's projection, which is still widely used today, is a conformal mapping that pre

In [7]:
completion_prompts[index]['text']

'<|start_header_id|>system<|end_header_id|>\n\nYou are an engine that repeats given text and writes a natural continuation of it. Your task is to repeat the text given by the user and continue it. Your generated text should have at least 240 words.<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nThe origin of quasiconformal mappings, like that of conformal mappings, can be traced back to old cartography where the basic problem was the search for mappings from the sphere onto the plane with minimal deviation from conformality, subject to certain conditions which were made precise. In this paper, we survey the development of cartography, highlighting the main ideas that are related to quasiconformality. Some of these ideas were completely ignored in the previous historical surveys on quasiconformal mappings. We then survey early quasiconformal theory in the works of Gr\\"otzsch, Lavrentieff, Ahlfors and Teichm\\"uller, which are the 20th-century founders of the theory. The period we