Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unabe to add new records to 'poller_time' table #2317

Closed
ikorzha opened this issue Jan 16, 2019 · 18 comments
Closed

Unabe to add new records to 'poller_time' table #2317

ikorzha opened this issue Jan 16, 2019 · 18 comments
Labels
resolved A fixed issue

Comments

@ikorzha
Copy link

ikorzha commented Jan 16, 2019

Cacti team I am absolutely desperate, I am still on 1.1.38 with spine at 1.18. I have a Cacti system with 5 pollers, recently I have started experiencing: "Spine: poller[1] error: sql failed! error:'1062', message:'duplicate entry", Cacti still works, but spine times out at 1minute mark on every poller.
So far I been fighting the problem by restoring from last known working sql backup of the master server, which works for about a day and then problem comes back.

Please help what can I do to resolve this issue permanently when it occurs without restore from sql backup?

timeouts
,

@netniV
Copy link
Member

netniV commented Jan 16, 2019

How many pollers and devices per poller do you have?

@ikorzha
Copy link
Author

ikorzha commented Jan 17, 2019

netniV sorry for a slow reply:
See below table when duplicate key is taking place:
broken
This table is is after the restore before duplicate key issue reappears:
image

@ikorzha
Copy link
Author

ikorzha commented Jan 18, 2019

netniV, can you please suggest a course of auction to correct this problem as I am stuck.
I even made an attempt to upgrade to 1.2 to escape the issue, but that was a total disaster, as on a first poller run it deleted all of the 658 devices and graphs and presented me with: "here you go start with a clean slate :)" Restore back 1.1.38 took a few hours..
In any case looking for your advice on this issue. P.S will glady share my DB if you willing to try to look into issue or check why upgrade deletes all of the hosts on first poller run.

@netniV
Copy link
Member

netniV commented Jan 18, 2019

If you can share your main DB with us, that would be great. Send it to developers@cacti.net and we will take a look because I don't see why it would suddenly delete all your devices's.

@ikorzha
Copy link
Author

ikorzha commented Jan 18, 2019

netniV I send you and email for a link to your google drive as my DB 400Megs I believe and can't be easily emailed...

@netniV
Copy link
Member

netniV commented Jan 18, 2019

Is that after you tar/gzip it? Then yes, you will need to send a link to download it from.

@ikorzha
Copy link
Author

ikorzha commented Jan 18, 2019

Compressed it came to be 41.9Meg
I can try to split archive to 5 pieces and mail it...

@ikorzha
Copy link
Author

ikorzha commented Jan 18, 2019

netniV, can you please check developers mail box. All 5 archive pieces were sent, please acknowledge receipt if you got it..

@cigamit
Copy link
Member

cigamit commented Jan 18, 2019

Just a note, with 16 processes and 15 threads per process, your spine data collector is killing your central MySQL/MariaDB database. This is your primary issue. Starting in 1.2, we have moved those settings, and many of the key polling tables to the remote collectors alone. So, there will not be as much overloading, but even still, 16 *15 is too high. You are killing you central database, and that is as much the issue as anything else. I would reduce it to 1 to 2 processes and at most 20 threads. You also need to watch your max_connections on the core database server.

Lastly, since you are in the middle of things with @netniV though, I'll defer to him.

@netniV
Copy link
Member

netniV commented Jan 18, 2019

I will try and get your dB loaded again but if this so the same DB that we ran through last time, there were no issues with the main upgrade. I will also attempt to replicate the poller situation if I can. Will need to find a box to spin up the extra VMs though. The connections is certainly an issue though I can’t see why that would remove the devices yet.

@ikorzha
Copy link
Author

ikorzha commented Jan 18, 2019

Thank you netniV, I would wait for your assessment. Please try the this latest DB that I have uploaded today...

@cigamit
Copy link
Member

cigamit commented Jan 26, 2019

Ope, this is an issue with running out of auto_increment keys. Wow. Near term, truncate this table to resolve.

cigamit added a commit that referenced this issue Jan 26, 2019
Poller time table running out of auto increment keys
@cigamit cigamit changed the title Spine: poller[1] error: sql failed! error:'1062', message:'duplicate entry Poller time table running out of auto increment keys Jan 26, 2019
@cigamit cigamit added the resolved A fixed issue label Jan 26, 2019
@cigamit
Copy link
Member

cigamit commented Jan 26, 2019

This is resolved now. The original title was: Spine: poller[1] error: sql failed! error:'1062', message:'duplicate entry

@cigamit
Copy link
Member

cigamit commented Jan 26, 2019

We are going to revert that change. The correct change will be to do the following:

ALTER TABLE poller_time MODIFY column id bigint(20) unsigned auto_increment;

Once you do that, this problem will not happen again in your lifetime.

cigamit added a commit that referenced this issue Jan 26, 2019
Poller time table running out of auto increment keys
Also correct two copyright years
cigamit added a commit that referenced this issue Jan 26, 2019
Poller time table running out of auto increment keys
@ikorzha
Copy link
Author

ikorzha commented Jan 26, 2019

Cigamit, I am grateful for the fix. I have already implemented it.. My cacti is back online without poller timeouts.. It is now making a perfect sense, due to my 16 poller processes (and I can't use less as poller doesn't finish within a min ( I tested that already), in 2 years that installation was running it consumed 32 years worth! of polling time IDs hence the exaust error I was receiving.
Thank you so much for removing this limitation in cacti installation....

@ikorzha ikorzha closed this as completed Jan 26, 2019
@netniV netniV changed the title Poller time table running out of auto increment keys Unabe to add new records to 'poller_time' table Feb 24, 2019
@thurban
Copy link
Contributor

thurban commented Apr 16, 2020

I know this is closed, but I was wondering what that "id" column is actually used for. I have the very same habbit of adding id columns to tables, but if that column isn't used anywhere then it can actually be removed completely without any impact.

The primary key could be as well the "pid, poller_id and start_time" combnation.

Any thoughts ?

@TheWitness
Copy link
Member

That's one approach Thomas. Are there any design issues other than method though.

@netniV
Copy link
Member

netniV commented Apr 17, 2020

Generally speaking for primary indexing, you should have an id column. If PID is effectively the ID, then that works just as well though it may be useful to have multiple if the system reuses the same PID?

@github-actions github-actions bot locked and limited conversation to collaborators Jul 17, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
resolved A fixed issue
Projects
None yet
Development

No branches or pull requests

5 participants