Maximum input token count 4919 exceeds limit of 4096 for train data in 03_Model_customization/03_continued_pretraining_titan_text.ipynb #224

FireballDWF · 2024-04-03T11:50:36Z

"Maximum input token count 4919 exceeds limit of 4096 for train data" in model-customization-job/amazon.titan-text-lite-v1:0:4k/nhjsh25oes0i in notebook 03_Model_customization/03_continued_pretraining_titan_text.ipynb

Fix Maximum input token count 4919 exceeds limit of 4096 for train data in 03_Model_customization/03_continued_pretraining_titan_text.ipynb aws-samples#224

HiDhineshRaja · 2024-04-09T10:16:12Z

I too face the same issue. I tried to reduce the chunksize to 10000 even though getting the same error after training about 2 hours

jicowan · 2024-04-25T22:50:34Z

I am also getting this error.

nmudkey000 · 2024-05-10T16:14:32Z

I was able to fix the issue by reducing the chunk size and chunk overlap to 5000 and 1000, respectively; 5000 was an assumption, I am sure it would still create the model with anything below 10000 (someone above observed that the model would not get created for a chunk size equaling 10000):

text_splitter = RecursiveCharacterTextSplitter(
# Set a really small chunk size, just to show.
chunk_size = 5000, # assumption
chunk_overlap = 1000, # overlap for continuity across chunks
)

docs = text_splitter.split_documents(document)

jimmus69 · 2024-06-13T11:14:00Z

+1 to this issue. Now re-trying with @nmudkey000 fix (5000/1000).

jgtavarez · 2024-06-28T20:22:50Z

same problem here, even using a chunks strategy that is below the max get the same error

Amazon says "...Use 6 characters per token as an approximation for the number of tokens"

4096 tokens * 6 chars per token = max 24,576 chunks size

That means that every chunk below 24,576 should work, but it is not the case

FireballDWF mentioned this issue Apr 3, 2024

fix Maximum input token count exceeds limit #224 FireballDWF/amazon-bedrock-workshop#1

Open

FireballDWF linked a pull request Apr 25, 2024 that will close this issue

fix Maximum input token count exceeds limit aws-samples#224 FireballDWF/amazon-bedrock-workshop#3

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Maximum input token count 4919 exceeds limit of 4096 for train data in 03_Model_customization/03_continued_pretraining_titan_text.ipynb #224

Maximum input token count 4919 exceeds limit of 4096 for train data in 03_Model_customization/03_continued_pretraining_titan_text.ipynb #224

FireballDWF commented Apr 3, 2024

HiDhineshRaja commented Apr 9, 2024

jicowan commented Apr 25, 2024

nmudkey000 commented May 10, 2024 •

edited

Loading

jimmus69 commented Jun 13, 2024

jgtavarez commented Jun 28, 2024

Maximum input token count 4919 exceeds limit of 4096 for train data in 03_Model_customization/03_continued_pretraining_titan_text.ipynb #224

Maximum input token count 4919 exceeds limit of 4096 for train data in 03_Model_customization/03_continued_pretraining_titan_text.ipynb #224

Comments

FireballDWF commented Apr 3, 2024

HiDhineshRaja commented Apr 9, 2024

jicowan commented Apr 25, 2024

nmudkey000 commented May 10, 2024 • edited Loading

jimmus69 commented Jun 13, 2024

jgtavarez commented Jun 28, 2024

nmudkey000 commented May 10, 2024 •

edited

Loading