Skip to content

Re-running archivebox init loses metadata #557

@shepner

Description

@shepner

This is extracting a defect commented in #556

Describe the bug

Running archivebox init is a dangerous task (data is lost when the DB is recreated)

Steps to reproduce

Run archivebox init

Software versions

archivebox/archivebox:latest as of 11/29/2020

Discussion

archivebox init should not be dangerous. Did some folders get wiped, or some entries in the database get lost when you ran it? Can you reliable reproduce it? It would be very helpful if that is the case.

Originally posted by @cdvv7788 in #556 (comment)

I was mainly referring to the Timestamp and Title fields. I havnt used tags yet so I havnt tested that. When archivebox init re-adds the snapshots to the DB, the timestamp gets overwritten and the title is re-generated. The net result is that I have a few hundred entries that claim to be added the same minute and a handful of titles that reverted back to "403 Forbidden" due to the WARC method failing. Finally, the Files indicators dont seem to be re-populating correctly in the admin panel

Files indicators:
image

Timestamps:
image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions