Search, index and tag your files and find duplicates.
The goal of tagfile is to manage and organize any sort of file (documents, music, pictures and videos in particular) in a way that is not tied to any file browser program, filesystem or operating system. The metadata that tagfile creates and uses to keep track of these files should be portable for use in multiple computer systems and be independent from any persistent mount points, filepaths or filenames.
Tagfile is primarily a unixy command-line application with a focus on simplicity, interactivity and scriptabilty through (shell) scripts. It is shaping up to be an amalgamation of features from applications such as locate, ls, find, file, cksum, sort and grep performed in constrained scopes of specific sets of files that are defined and controlled by the user in one or more sqlite databases.
It also is a package for Python (but the API is unstable at this point).
- Homes
- Features
- Quick Manual
- Help and synopses for commands
- Installing tagfile
- Relation between media-paths, databases and config files
- Status
- Software license
- Docs: https://babab.codeberg.page/tagfile
- Codeberg: https://codeberg.org/babab/tagfile
- Github: https://github.com/babab/tagfile
- scan all files in a directory (media-path) recursively
- ignore files when scanning according to rules in user config
- maintain a list of media-paths to prune/scan on a regular basis
- index files with their checksums, size and MIME-type into a sqlite database
- show a list of files in index, sortable by checksum, size and mimetype
- find duplicate files, based on checksums
- find files by matching on checksum, mimetype, size, name or substring of name and/or path
- prune index from files that got moved or deleted
- print results of list and find commands terminated with a null character to use for piping to other utilities like xargs.
- configure aliases for certain commands and options (like git alias)
Features to be implemented in later versions:
- remove duplicate files in the same directory
- remove duplicate files interactively across directories
- add user defined tags to files (using checksums, independent from filenames)
Ideas that may or may not be implemented in later versions:
- ability to filter files using tags to create listings to use with other programs
- ability to use tags to create directory structures of symlinked content
Open a terminal, and add one or more media-paths be scanned for files:
cd ~/Music
tagfile add .
tagfile add ~/Videos
This will only save a reference to the directory. To actually walk
through the directories and hash the files to get checksums, you can use
the updatedb
command. This will recursively scan all media-paths
you've added and may take some time, especially the first time.
For both the prune and scan actions, progressbars will be shown with
estimates of the remaining time to complete. Add a --verbose
flag to
also output every filename and actions performed. Use Ctrl+C to cancel.
All progress already done will be saved.
tagfile updatedb
To see statistics of indexed files and a list of media-paths:
tagfile info
Show duplicate files using the clones command with/without option flags
(see tagfile help clones
to see all available options):
tagfile clones
Show hash, path, size and full MIME-type (using long opts):
tagfile clones --show-size -show-mime
Show hash, path, size and first part of MIME-type (using short opts):
tagfile clones -st
The list and find commands are the most important part of tagfile and probably the reason why you might want to use it. What follows are some usage examples with both short and long optional arguments.
List all files sorted by filesize (showing checksum, filesize and mimetype columns):
tagfile list -aS size
tagfile list --show-all --sort=size
List all files with MIME-type text/plain sorted by filesize from small to big (showing checksum, filesize and mimetype columns):
tagfile find --mime text/plain -a -S size
tagfile find --mime=text/plain -show-all --sort=size
List all files, sorted by filetype (showing checksum, size and type):
tagfile list -HstS type
tagfile list --show-hash --show-size --show-type --sort=type
List all videos larger than 100MB, sorted by filesize from big to small (showing type and filesize):
tagfile find --type video --size-gt 104857600 -stS size --reverse
tagfile find --type video --size-gt 104857600 --show-size --show-type --sort=size --reverse
tagfile
Usage: tagfile [--config <filename>] [--db <name>] <command>
or: tagfile [-h | --help] | [-V | --version]
Search, index and tag your files and find duplicates
Options:
--config=<filename> use specified config file
--db=<name> use database <name>, defined in config file
-h, --help show this help information
-V, --version show version and platform information
Commands:
add add a directory to media paths
clones show all indexed duplicate files
find find files according to certain criterias
help show help information
info show statistics for index and media paths
list show all indexed files
updatedb scan media paths and index newly added files
version show version and platform information
See 'tagfile help <command>' for more information on a
specific command, before using it.
tagfile add
usage: tagfile add [-q | --quiet] <media-path>
or: tagfile add [-h | --help]
Add a directory to media paths
Options:
-h, --help show this help information
-q, --quiet print nothing except fatal errors
tagfile clones
usage: tagfile clones [-s | --show-size] [-t | --show-type] [-m | --show-mime]
or: tagfile clones [-h | --help]
Show files with matching checksums. In this overview the column
with hashes is always printed. Add `-stm` flags to display more
columns.
By default, an extra line is printed after each list of clones,
showing the total number of duplicates. This can be hidden with
`--hide-sum`.
Options:
-h, --help show this help information
-s, --show-size display column with filesizes
-t, --show-type display column with MIME type
-m, --show-mime display column with MIME type/subtype
--hide-sum do not print "X clones/duplicates" line
tagfile find
usage: tagfile find [--type=TYPE] [--mime=MIMETYPE] [--size-gt=BYTES]
[--size-lt=BYTES] [--hash=HEX] [--in-path=STRING]
[--name=NAME | --in-name=STRING] [-H | --show-hash]
[-s | --show-size] [-t | --show-type] [-m | --show-mime]
[-a | --show-all] [-S COL | --sort=COL] [--reverse]
or: tagfile find [--type=TYPE] [--mime=MIMETYPE] [--size-gt=BYTES]
[--size-lt=BYTES] [--hash=HEX] [--in-path=STRING]
[--name=NAME | --in-name=STRING] [-0 | --print0]
[-S COL | --sort=COL] [--reverse]
or: tagfile find [-h | --help]
Find files according to certain criterias
Options:
-h, --help show this help information
--type=TYPE match files on 1st part of MIME type
--mime=MIMETYPE match files on full MIME type/subtype
--size-gt=BYTES match files where size is greater than BYTES
--size-lt=BYTES match files where size is lesser than BYTES
--hash=HEX match files where checksum is (or starts with) HEX
--in-path=STRING match absolute paths with a substring of STRING
--name=NAME match filenames that are exactly NAME
--in-name=STRING match filenames with a substring of STRING
-H, --show-hash display column with checksum hash
-s, --show-size display column with filesizes
-t, --show-type display column with MIME type
-m, --show-mime display column with MIME type/subtype
-a, --show-all display hash, size, mime (same as -Hsm)
-S COL, --sort=COL sort on: name, hash, size, type or mime
--reverse reverse sort order
-0, --print0 end lines with null instead of newline
tagfile help
usage: tagfile help [<command>]
Show usage information (for subcommands)
Options:
-h, --help show usage information for help command
tagfile info
usage: tagfile info [-C | --show-config]
or: tagfile info [-h | --help]
Show media paths, user config and statistics for index.
Options:
-h, --help show this help information
-C, --show-config pretty print active config in python
tagfile list
usage: tagfile list [-H | --show-hash] [-s | --show-size] [-t | --show-type]
[-m | --show-mime] [-a | --show-all] [-S COL | --sort=COL]
[--reverse]
or: tagfile list [-0 | --print0] [-S COL | --sort=COL] [--reverse]
or: tagfile list [-h | --help]
Output a list of all indexed files.
By default, the list is sorted on file path.
Options:
-h, --help show this help information
-H, --show-hash display column with checksum hash
-s, --show-size display column with filesizes
-t, --show-type display column with MIME type
-m, --show-mime display column with MIME type/subtype
-a, --show-all display hash, size, mime (same as -Hsm)
-S COL, --sort=COL sort on: name, hash, size, type or mime
--reverse reverse sort order
-0, --print0 end lines with null instead of newline
tagfile updatedb
usage: tagfile updatedb [-v, --verbose] [-q, --quiet] [--prune] [--scan]
[-n ID, --path-id=ID]
or: tagfile updatedb [-h | --help]
Scan media paths. Index added files and prune removed files.
Use the option `--prune` if you only want to remove entries
from the index if files are missing. Use the option `--scan`
to only scan for newly added files without pruning.
To prune and/or scan for a single media-path only, use
`--path-id=ID`. See tagfile info for an overview of paths/ID's.
Options:
-h, --help show this help information
-v, --verbose display a message for every action
-q, --quiet display nothing except fatal errors
--prune prune removed files only; don't scan
--scan scan for new files only; don't prune
-n ID, --path-id=ID prune/scan only files in path with this id
When no options are specified, updatedb will both scan and prune.
It will always prune deleted files before scanning for new files.
tagfile version
usage: tagfile version [-h | --help]
Show version and platform information
Options:
-h, --help show this help information
All commands should be run as a regular user (not root).
Tagfile is a command-line end-user application written in Python that is dependant on packages from PyPI. You can install it using pip. But using pipx (https://pypa.github.io/pipx/) is recommended because it avoids dependency problems and/or clashes with python packages from your system's package manager in the future.
Install latest release from PyPI:
pipx install tagfile
Install latest development version from git:
pipx install git+https://github.com/babab/tagfile@devel
To build and install from source you can use:
make install
To upgrade or uninstall tagfile in the future you can use:
pipx upgrade tagfile
pipx uninstall tagfile
By default, tagfile uses one config file and one database.
A config file:
- Contains a single set of ignore rules for all databases.
- Defines one or more databases. New databases must be defined in the
config
[databases]
section with aname = "location-path"
key-value pair. - Can be specified with the tagfile
--config=FILENAME
option
A database:
-
Can contain zero, one or multiple media-paths.
-
The most used commands/actions (add, find, list and updatedb) are performed in a database-wide scope.
-
The default database to use can be:
- configured in the config file
default_database = "name"
setting. - specified with the tagfile
--config=FILENAME
option
- configured in the config file
A media-path is a parent directory that contains one or more files you
want to index. By scanning with updatedb
, tagfile will walk
recursively through all subdirectories and add any file that does not
match the ignore rules from the config. Any files that are indexed but
removed in the filesystem itself afterwards, will be pruned from the
index on the next run of updatedb
.
Until a stable version 1.0.0 is ready, the API, CLI and config settings are subject to change from 0.x version to 0.x version, likely without offering migrations. Tagfile adheres to Semantic Versioning.
- Current stable release: v0.1.0
- Current dev/git version: v0.2.0a13
Tagfile has been written in a short time and used by me sporadically for
8 years after that. All code was contained in a single file script in
~/bin
, available from Github only.
Starting in March 2023 I've decided to properly release it to PyPI and flesh out the current project structure, command interface and database handling before working on new features so it may live up to its name. Since at this moment in time, you cannot tag your files yet :)
Prerequisites:
- Python 3.8 or later
Dependencies (automatically installed with pipx / pip):
- Peewee ORM (https://peewee.readthedocs.org/en/latest/)
- pycommand (https://babab.github.io/pycommand/)
- python-magic (https://pypi.python.org/pypi/python-magic/)
- rich (https://pypi.python.org/pypi/rich/)
Copyright (c) 2015-2023 Benjamin Althues <benjamin at babab . nl>
Tagfile is open source software, licensed under a BSD-3-Clause license. See the LICENSE file for the full license text.