Describe the Issue
I crated a new with a bot on KoboldLite and wrote enough to go far over the context limit. Many context shifts happen and tokens are erased. I then undo a few messages and write something. It then takes a long time to compute.
I'm not exactly sure what the intended behaviour is here, or how to fix it. I'm guessing this is a natural result of the frontend passing as many tokens as it can and the context shifting, so undoing will try and prepend text at the start of the model and cause a recompute. It would be nice to work around that somehow.
Additional Information:
I use this unit file to run koboldcpp:
[Unit]
Description=koboldcpp daemon
[Service]
AmbientCapabilities=
CapabilityBoundingSet=
DeviceAllow=
DynamicUser=yes
ExecStart=koboldcpp --quiet --whispermodel whisper.gguf --ttsmodel tts.gguf --ttswavtokenizer ttswavtokenizer.gguf model.gguf
IPAddressAllow=127.0.0.1
IPAddressDeny=any
LockPersonality=yes
MemoryDenyWriteExecute=yes
PrivateDevices=yes
PrivateMounts=yes
PrivatePIDs=yes
PrivateUsers=yes
ProcSubset=pid
ProtectClock=yes
ProtectControlGroups=yes
ProtectHome=yes
ProtectHostname=yes
ProtectKernelLogs=yes
ProtectKernelModules=yes
ProtectKernelTunables=yes
ProtectProc=invisible
RemoveIPC=yes
RestrictAddressFamilies=AF_INET
RestrictNamespaces=yes
RestrictRealtime=yes
RestrictSUIDSGID=yes
SecureBits=
SystemCallArchitectures=native
SystemCallFilter=@system-service
SystemCallFilter=~@privileged
SystemCallFilter=~@resources
Type=simple
WorkingDirectory=/var/local/koboldcpp
[Install]
WantedBy=multi-user.target
I'm using the Cydonia-22B-v2k-Q4_K_M model, OuteTTS-0.3-1B-Q4_0 model, and whisper-large-v3-f16 models.
I'm using Arch Linux with an AMD Ryzen 7 3700X processor. No GPU acceleration is used.
Log and story textdata:
LOG.txt
STORY_TEXTDATA.txt
Describe the Issue
I crated a new with a bot on KoboldLite and wrote enough to go far over the context limit. Many context shifts happen and tokens are erased. I then undo a few messages and write something. It then takes a long time to compute.
I'm not exactly sure what the intended behaviour is here, or how to fix it. I'm guessing this is a natural result of the frontend passing as many tokens as it can and the context shifting, so undoing will try and prepend text at the start of the model and cause a recompute. It would be nice to work around that somehow.
Additional Information:
I use this unit file to run koboldcpp:
I'm using the Cydonia-22B-v2k-Q4_K_M model, OuteTTS-0.3-1B-Q4_0 model, and whisper-large-v3-f16 models.
I'm using Arch Linux with an AMD Ryzen 7 3700X processor. No GPU acceleration is used.
Log and story textdata:
LOG.txt
STORY_TEXTDATA.txt