Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync issues when repository deleted and re-created #2658

Closed
bradrydzewski opened this issue Apr 11, 2019 · 28 comments
Closed

Sync issues when repository deleted and re-created #2658

bradrydzewski opened this issue Apr 11, 2019 · 28 comments
Milestone

Comments

@bradrydzewski
Copy link

bradrydzewski commented Apr 11, 2019

When a repository is deleted and then re-created with the same name, Drone is unable to add the repository to list because an entry with the same name exists but a different unique identifier.

We should be able to account for this in the synchornization process. If not, we should log some sort of error and then give a way to manually purge a repository by name from the database as a workaround.

@bradrydzewski bradrydzewski added this to the v1.1.0 milestone Apr 11, 2019
@bradrydzewski bradrydzewski changed the title Sync fails when repository deleted and re-created Sync issues when repository deleted and re-created Apr 14, 2019
@bradrydzewski
Copy link
Author

Turns out this is difficult to handle in synchronization in an efficient manner. For now, we will require manual deletion of the repository using DELETE /api/repos/:namespace/:name?remove=true. This endpoint was added by 727177d.

The command line utility still requires a drone repo delete command be added.

@bradrydzewski
Copy link
Author

bradrydzewski commented Apr 26, 2019

There is still a few edge cases that are not handled.

Scenario 1

  1. User creates repository foo/bar
  2. User deletes repository foo/bar
  3. User re-creates repository foo/bar

This will result in the newer repository being ignored, while the old repository remains active in the system. The permanent fix for this issue would be to check the number of rows updated after the insert. If 0, we know the insert was ignored due to a key violation. If this happens we should 1) check for the old repository and 2) remove the old repository and 3) re-try the insert.

Scenario 2

  1. User creates repository foo/bar
  2. User deletes repository foo/bar
  3. User creates repository foo/baz
  4. User renames repository foo/baz to foo/bar

This results in the following error (error message may vary by database vendor)

Error updating repository: UNIQUE constraint failed: repos.repo_slug

The permanent fix for this is to have a separate routine for repository updates vs repository renames. For repository renames we should check to see if a repository with the same name already exists. If yes, we can delete the old repository and insert the new repository. If no, we can update the existing repository.

Short Term Workaround

The short term solution is to delete the old repository from the system, and then re-synchronize, so that the new repository is inserted into the database. This endpoint must be invoked by an administrator.

DELETE /api/repos/:namespace/:name?remove=true

@sergey-lapin
Copy link

sergey-lapin commented May 24, 2019

@bradrydzewski I am on Scenario 1 currently

  1. create repo on gh with name foo/bar
  2. drone repo sync -> success I see repo
  3. remove repo from gh
  4. recreate repo on gh with name foo/bar
  5. drone repo sync -> fail I don't see repo

I tried /api/repos/lapanoid/foo/bar?remove=true (after step 3)) as well, but no luck.

@sergey-lapin
Copy link

Maybe I misunderstood how syncronization works, since when I run /api/repos/lapanoid/foo/bar?remove=true I get

{
    "message": "Not Found"
}

@bradrydzewski
Copy link
Author

in that case you may need to delete the repository from the database

@sergey-lapin
Copy link

sergey-lapin commented May 24, 2019

Can you elaborate? if it is "Not Found" I guess it is not in db already..

@bradrydzewski
Copy link
Author

bradrydzewski commented May 24, 2019

it is likely in the db. not found could indicate you no longer have access to the repository, which is expected because when you deleted it in github, your access entry was removed from the drone database.

@sergey-lapin
Copy link

Ok, how can I remove it from drone db, any api for that? Or any hints? I am willing to dig deeper)

@bradrydzewski
Copy link
Author

delete from repos where repo_slug = 'your/repo';

@sergey-lapin
Copy link

I ran drone with docker, so I guess I need to enter container

docker exec -it <CONTAINER_ID>  sh

and enter sqlite

sqlite3

But I have error sqlite3: not found

@sergey-lapin
Copy link

@bradrydzewski figured out! Sync works. Thanks, I guess I just was tired yesterday.

@sergey-lapin
Copy link

@bradrydzewski As a workaround I can add ssh to machine and delete from db step, but if I can help somehow with fixing this, feel free to write me.

@ChrisMcKenzie
Copy link

when trying to do this fix in sqlite i am getting the following.

Unable to open database "database.sqlite": file is encrypted or is not a database

I suspect this is because I have an encryption key for secrets, what is the process for decrypting so that it can be updated?

@bradrydzewski
Copy link
Author

I suspect this is because I have an encryption key for secrets, what is the process for decrypting so that it can be updated?

@ChrisMcKenzie this would not be the case. Drone encrypts secrets in memory and saves the encrypted strings to the database. It does not encrypt the database itself. This error comes from sqlite and I believe to be unrelated to Drone itself.

@ChrisMcKenzie
Copy link

interesting ok, I will see what is going on there.

@ChrisMcKenzie
Copy link

oh duh, I was using sqlite instead of sqlite3 my bad all good now. thanks

@ChrisMcKenzie
Copy link

So I guess my next question is, is there a way to tell which repo might have had this happen to it?

@bradrydzewski
Copy link
Author

bradrydzewski commented May 31, 2019

@ChrisMcKenzie you should see an error in the Drone server logs that specifies which unique constraint was violated and using which value. Assuming you are using latest version of Drone.

@ChrisMcKenzie
Copy link

ChrisMcKenzie commented May 31, 2019

The message I am getting it

UNIQUE constraint failed: repos.repo_slug

so I know that it is the repo_slug that is wrong I just don't know which repo it is trying sync.

we are running version 1.0.1 we are not able to upgrade past then as the next version does not parse the old 0.8 syntax properly and we are unable to upgrade the number of repos we are running that fast.

@bradrydzewski
Copy link
Author

bradrydzewski commented May 31, 2019

we have made a bunch of improvements to the conversion code over the past weeks and even in the past 24 hours. So you may want to try the 1.2 release or at least v1.1.1 of the CLI. If you have examples that are failing we may also be able to use them to improve the converter. But I'm afraid that troubleshooting may be limited without the option to upgrade :(

@bradrydzewski
Copy link
Author

@ChrisMcKenzie following up on previous comment, please consider creating an issue and providing a few samples of files that cannot automatically convert. There were a lot of improvements since 1.0.1 and I would hate for anyone to be stuck on an old version. We use the following issue tracker for our yaml converter https://github.com/drone/drone-yaml

@gempesaw
Copy link

gempesaw commented Jun 7, 2019

Do you have any suggestions on how to determine which repo is causing the duplicate key error? I'm not aware of any of our Github repos that were re-created, or renamed back to an existing name...

We've recently run a migration from 0.8.x to 1.1. (The migrate tool worked like a breeze, except for the final activate-repos step was failing authorization, so we ended up just scripting up a mass drone repo repair on our own). When attempting to synchronize, the request returns

{"message":"Error updating repository: pq: duplicate key value violates unique constraint \"repos_repo_slug_key\""}

and the server logs a similar message:

{"error":"Error updating repository: pq: duplicate key value violates unique constraint \"repos_repo_slug_key\"","level":"warning","login":"gempesaw","msg":"syncer: cannot batch update","time":"2019-06-07T04:02:50Z"}
{"error":"Error updating repository: pq: duplicate key value violates unique constraint \"repos_repo_slug_key\"","level":"warning","msg":"api: cannot synchronize account","request-id":"1MIlmnwJxG8PacECa1uMF3PHdup","time":"2019-06-07T04:02:50Z","user.login":"gempesaw"}

I pulled the Drone repos DB and my organization's Github repositories and compared the IDs and I couldn't find anything amiss - each ID from github corresponded uniquely to an ID in the drone DB.

@bradrydzewski
Copy link
Author

bradrydzewski commented Jun 7, 2019

I can push some improvements to capture the offending repository name and unique identifier in the error message. In this case I see repos_repo_slug_key was violated which means a repository already exists with the same name, but a different repository identifier. This usually happens when a repository is deleted and then re-created with the same name. However, without data, debating the root cause is not a productive use of our time. So we can revisit the root cause once you have data to analyze.

@gempesaw
Copy link

gempesaw commented Jun 7, 2019

I can push some improvements to capture the offending repository name and unique identifier in the error message

That sounds very helpful. We are running server version 1.1 and drone-cli version 1.1.1, and I tried looking through the changelog for 1.2 to see if logging for the offending repository was already added, and it seemed like it was not. Thanks!

@gempesaw
Copy link

gempesaw commented Jun 12, 2019

Cool, the added logging made it immediately clear which repo had the issue. I deleted the row and the sync started working. I'm inquiring with the developers to see if that repo has been renamed, or... another one deleted & renamed, or if anything has happened with it at all. Thanks!

edit: so yeah, the repo existed before, it was deleted, and then re-created with the exact same name.

@bradrydzewski bradrydzewski modified the milestones: v1.x.x, 1.5.0 Sep 17, 2019
@bradrydzewski
Copy link
Author

We added new batching logic that will appear in 1.5 behind a feature flag:

DRONE_DATABASE_EXPERIMENTAL_BATCH=true

The new batch logic should resolve this issue but at the expense of slower syncs times. This will be most noticeable the first time you sync your repository list if it is large (e.g. an increase from 15 seconds to 30 seconds). It may also be noticeable for subsequent syncing attempts if the list of deltas is large.

We will remove the feature flag in 1.6 pending successful testing at cloud.drone.io.

@bradrydzewski bradrydzewski modified the milestones: v1.5.0, v1.6.0 Sep 24, 2019
@ViRb3
Copy link

ViRb3 commented Oct 3, 2019

I believe I found another edge case:

  • fork repo
  • delete forked repo
  • create source repo with same name

Tried curl -i -X DELETE https://cloud.drone.io/api/repos/ViRb3/MagiskFrida?remove=true but it results in 404 {"message":"Not Found"}. @bradrydzewski Can you please purge the offending entry from the database?

@bradrydzewski
Copy link
Author

the new batch logic is now enabled by default in master via patch fa0ebed. this will be available in the tagged 1.6 release.

in the event the new batch logic causes issues, you can rollback to the previous batch logic using the DRONE_DATABASE_LEGACY_BATCH=true flag.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants