Correct method to load 2.7B? #10

BlinkDL · 2023-01-26T13:58:26Z

Hi I can run 1.3B using benchmark code here, but 2.7B is still not working (bad results) with the following params:

parser = argparse.ArgumentParser(description='H3 generation benchmarking')
parser.add_argument('--dmodel', type=int, default=2560) # 2048
parser.add_argument('--nlayer', type=int, default=32) # 24
parser.add_argument('--attn-layer-idx', type=list, default=[8, 16, 24]) # [8, 16]
parser.add_argument('--nheads', type=int, default=20) # 16
parser.add_argument('--ckpt', type=str, default='/fsx/BlinkDL/CODE/_PUBLIC_/H3/H3-2.7B/model-3attn.pt')
parser.add_argument('--promptlen', type=int, default=1024)
parser.add_argument('--genlen', type=int, default=128)
args = parser.parse_args()

DanFu09 · 2023-01-28T02:21:30Z

We're looking into this, stay tuned!

tridao · 2023-01-28T16:48:31Z

Thanks for the bug report, we've just fixed this.
There was a mistake in the mapping between old and new parameter names that we've now fixed.

BlinkDL · 2023-01-29T21:31:17Z

Great. How abt the configuration for 125M and 355M

DanFu09 · 2023-01-30T02:34:25Z

Here are examples about how to load all the models, and example outputs: https://github.com/HazyResearch/H3/blob/main/examples/README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correct method to load 2.7B? #10

Correct method to load 2.7B? #10

BlinkDL commented Jan 26, 2023

DanFu09 commented Jan 28, 2023

tridao commented Jan 28, 2023

BlinkDL commented Jan 29, 2023 •

edited

Loading

DanFu09 commented Jan 30, 2023

Correct method to load 2.7B? #10

Correct method to load 2.7B? #10

Comments

BlinkDL commented Jan 26, 2023

DanFu09 commented Jan 28, 2023

tridao commented Jan 28, 2023

BlinkDL commented Jan 29, 2023 • edited Loading

DanFu09 commented Jan 30, 2023

BlinkDL commented Jan 29, 2023 •

edited

Loading