Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct method to load 2.7B? #10

Open
BlinkDL opened this issue Jan 26, 2023 · 4 comments
Open

Correct method to load 2.7B? #10

BlinkDL opened this issue Jan 26, 2023 · 4 comments

Comments

@BlinkDL
Copy link

BlinkDL commented Jan 26, 2023

Hi I can run 1.3B using benchmark code here, but 2.7B is still not working (bad results) with the following params:

parser = argparse.ArgumentParser(description='H3 generation benchmarking')
parser.add_argument('--dmodel', type=int, default=2560) # 2048
parser.add_argument('--nlayer', type=int, default=32) # 24
parser.add_argument('--attn-layer-idx', type=list, default=[8, 16, 24]) # [8, 16]
parser.add_argument('--nheads', type=int, default=20) # 16
parser.add_argument('--ckpt', type=str, default='/fsx/BlinkDL/CODE/_PUBLIC_/H3/H3-2.7B/model-3attn.pt')
parser.add_argument('--promptlen', type=int, default=1024)
parser.add_argument('--genlen', type=int, default=128)
args = parser.parse_args()
@DanFu09
Copy link
Contributor

DanFu09 commented Jan 28, 2023

We're looking into this, stay tuned!

@tridao
Copy link
Contributor

tridao commented Jan 28, 2023

Thanks for the bug report, we've just fixed this.
There was a mistake in the mapping between old and new parameter names that we've now fixed.

@BlinkDL
Copy link
Author

BlinkDL commented Jan 29, 2023

Great. How abt the configuration for 125M and 355M

@DanFu09
Copy link
Contributor

DanFu09 commented Jan 30, 2023

Here are examples about how to load all the models, and example outputs: https://github.com/HazyResearch/H3/blob/main/examples/README.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants