Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deduplication #7

Merged
merged 35 commits into from
Jun 29, 2023
Merged

Deduplication #7

merged 35 commits into from
Jun 29, 2023

Conversation

poedator
Copy link
Collaborator

@poedator poedator commented Jun 10, 2023

combining compression code between llama/falcon and perllexity/lm_eval

tested ppl with Llama65B - negligibly better than original

poedator and others added 9 commits June 9, 2023 15:57
main.py black

interim main, modelU

upd modelutils

upd modelutils_2

added einops to req

main & hf & rest upd

remove lm_eval/quant + some minors

gitignore

more black
main.py Outdated
)
parser.add_argument("--load", type=str, default="", help="Load quantized model.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this argument?

Copy link
Collaborator

@Godofnothing Godofnothing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, is fine to me

datautils.py Show resolved Hide resolved
datautils.py Outdated

if "llama" in model_path:
tokenizer = LlamaTokenizer.from_pretrained(model_path, use_fast=False)
# addresses problem on inconsistent `LLaMATokenizer` capitalization
Copy link
Owner

@Vahe1994 Vahe1994 Jun 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addresses problem on inconsistent LLaMATokenizer capitalization see ...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, moved to docstring

torch.nn.init.normal_,
) # preserving


Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check that low_cpu_mem_usage=True do the same thing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

datautils.py Outdated

if "llama" in model_path:
tokenizer = LlamaTokenizer.from_pretrained(model_path, use_fast=False)
# addresses problem on inconsistent `LLaMATokenizer` capitalization
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, moved to docstring

datautils.py Show resolved Hide resolved
torch.nn.init.normal_,
) # preserving


Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import os
import sys

import_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), "../../..")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recheck imports

elif 'falcon' in pretrained.lower():
falcon_sequential(self.model, train_data, quantization_config, device)

quantize_model(self.model, train_data, quantization_config, device)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recheck is this still needed

model.model.embed_tokens = model.model.embed_tokens.cpu()
model.model.norm = model.model.norm.cpu()
layers[0] = layers[0].to(layer_dev)
model.get_input_embeddings().to(emb_dev)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check that changes are equal

main.py Outdated
stats_payload["layer_time"] = time.time() - start_time
stats_payload["ol_share"] = round(normal_outlier_count / w_count, 6)
stats_payload["out_loss"] = torch.mean(out_losses).item()
stats_payload["layer_time"] = round(time.time() - start_time, 2)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get rid off round(c)`vahe1994 -- DONE


model.model.embed_tokens = model.model.embed_tokens.to(dev)
layers[0] = layers[0].to(dev)
def quantize_nearest(model, args, dev):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run 1 exp

model.model.norm = model.model.norm.to(dev)
model.lm_head = model.lm_head.to(dev)

get_model_head(model).to(dev)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

проверил 22.06 - такой код работает, голова переносится.

lmeval.py Show resolved Hide resolved
Copy link
Owner

@Vahe1994 Vahe1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@Vahe1994 Vahe1994 merged commit 1c27ed6 into main Jun 29, 2023
@Vahe1994 Vahe1994 deleted the dedup branch June 29, 2023 09:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants