Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replication file missed key information #33

Open
kevin00036 opened this issue Sep 1, 2016 · 2 comments
Open

Replication file missed key information #33

kevin00036 opened this issue Sep 1, 2016 · 2 comments

Comments

@kevin00036
Copy link

In replication files, there exists these types of updates:

<event id="335617599" op="U" table="track_mbid"><keys><column name="track_id">9578103</column></keys><values><column name="submission_count">112</column></values></event>

For table track_mbid, it always uses only track_id as key, but the unique key should be (track_id, mbid), or use primary key id.
Also, one track_id can correspond to many mbids. In this case, this update doesn't tell which (track_id, mbid) to update.

@mihwas
Copy link

mihwas commented Nov 29, 2016

Yes, kevin00036 is right.
A bit more information for example:

I have hold duplicate key violation:
2016-11-09 20:48:40 CET [7621-5990] acoustid@acoustid2 ERROR: duplicate key value violates unique constraint "track_mbid_idx_uniq" 2016-11-09 20:48:40 CET [7621-5991] acoustid@acoustid2 DETAIL: Key (track_id, mbid)=(9502104, 6029d549-5858-4936-9156-b90770d2ae92) already exists.

This error produced by this update statment from an update file:
UPDATE track_mbid SET mbid='6029d549-5858-4936-9156-b90770d2ae92' WHERE track_id='9502104'

But when we run
acoustid2=# SELECT * FROM track_mbid WHERE track_id=9502104;
http://pastebin.com/XTQsHmPh
We will understand, that the above UPDATE statment try to update every 3 rows.
Yes, i can fix this with catching exception, ignore and proceed update, but...

There can be more than 3 rows with the same track_id or mbid. When the update file contains something like this:
"UPDATE track_mbid SET disabled='t' WHERE track_id=xxxxxxxxxxx;
All rows with track_id=xxxxxxxxxxx will be disabled, yes?
So -database become uncompleted for search. So - the whole installation will be unworking.

So i think it is need to remake an export algorithm and set unique values for the XML constructions, then create a newly one fulldump and start an hourly export process again.
Unfortunately, this cannot be done without maintainer, and it looks like not interesting for him.

@lalinsky
Copy link
Member

To explain my reason for not being interested:

The PostgreSQL-based replication was a failed experiment from my point of view. It was never actually used. I created it because I hoped that giving somebody access to the PostgreSQL database would mean that external contributors could get interested working on AcoustID. That never happened. The same applies to the PostgreSQL database dumps. It's hard to generate them at this size, I need a dedicated server just to do those dumps.

The only interest in the replication is from various companies and to support those, it's far easier to just point them to the running service. Supporting the PostgreSQL database as a standalone product it just too hard for me to do and I'm not going to do it for free for the few companies that are interested in that.

I'm currently trying to slowly rework the AcoustID backend, which will involve a lot of database changes. I'm going to change the data files to a database-neutral format and only include data that people need to experiment with the fingerprint database. That is, it will only include fingerprints and the mapping between fingerprints and MusicBrainz. I'm open to discussing the details of that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants