Skip to content

python project for finding duplicate files in a filesystem and moving them to an archive folder. Stats and stuff are stored in an sqlite database

License

Notifications You must be signed in to change notification settings

binbash23/fscan

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 

Repository files navigation

20221211

fscan by Jens Heine <binbash@gmx.net>

Scan a filesystem/path and store the result in a sqlite db. View statistics of
the scanned files. Find duplicate files and move them to a configurable archive
fodler. Browse the sqlite database file with your favorite sqlite browser 
(i.e. https://sqlitebrowser.org/dl/) and find out more statistics of your
files (changes, new/deleted files, ...)

If you do not want to mess around with the python code, you can use the 
precompiled binaries for linux and windows: 
https://github.com/binbash23/fscan/tree/master/source/dist

Best regards, Jens


./fscan.py -h 
(or with the precompiled binaries under windows: fscan.exe or under linux: ./fscan)

Usage: fscan.py [options]

Options:
  -h, --help            show this help message and exit
  -s, --scan            Start filesystem scan
  -a, --analyze         Start filesystem analysis (hashing and mime type
                        detection)
  -m, --moveduplicates  Move duplicate files to archive folder
  -c, --config          Show configuration
  -M, --mimetypes       Show mime type statistics
  -d, --duplicates      Show duplicate files
  -S, --stats           Show database statistics
  -l LOGLEVEL, --loglevel=LOGLEVEL
                        Set loglevel (DEBUG, INFO, WARN, ERROR, CRITICAL)
  -V, --version         Show fscan version info

  Set configuration parameter:
    -p PROPERTY, --property=PROPERTY
                        Set property
    -v VALUE, --value=VALUE
                        Set value


HOWTO start using fscan
-----------------------

1. Show configuration

> ./fscan.py -c

FILESCAN_COMMIT_BATCH_SIZE   15000                          The commit size for the file scanning process. Default is 50000.
WORKPATH                     /path_to/Pictures              The path where the filescanner will work.
FILEHASH_COMMIT_BATCH_SIZE   100                            The commit size for the process that calculates filehashes and mime types. Default is 50.
FILEHASH_BLOCK_SIZE          1073741824                     The block size that the hash calculator process will use. Default is 1073741824.
DUPLICATES_ARCHIVE_PATH      /path_to/duplicates            The path where the duplicates will be stored. This path can be inside the WORKPATH
LOG_LEVEL                    INFO                           Loglevel of the application: CRITICAL, ERROR, WARN, INFO, DEBUG

2. Set scan path in configuration

> ./fscan -p WORKPATH -v /home/myusername/Bilder

3. Set archive path in configuration

> ./fscan -p DUPLICATES_ARCHIVE_PATH -v /home/myusername/Bilder_duplicates

4. Start scanning and hashing

> ./fscan.py -sa

5. Show statistics

> ./fscan.py -S

Filecount                    116521                         Total number of files in workpath.
Filecount 0 byte files       5                              Total number of files with zero byte size.
Filesize                     583.147 GB                     Total size of all files in workpath.
Files hashed / not hashed    116517 / 0                     Total number of files which have been hashed / not hashed (yet).
Filesize hashed / not hashed 583.147 GB / 0.0 GB            Total size in bytes of files that have been hashed / not hashed (yet).
Percent bytes hashed         100.0 %                        Percentage of bytes that have been hashed.
Duplicate files              704                            Total number of files that can be deleted from the workpath because they are duplicates.
Duplicates filesize          11.432 GB                      Total size in bytes of the duplicate files which can be safely deleted.
Percent duplicate bytes      1.96 %                         Percentage of space from the workpath space usage that is used by duplicates.
Different mime types         42                             Number of distinct mime types that have been found inside the workpath.

6. Show mime type statistics

> ./fscan.py -M

image/jpeg                                                   100491    
video/mp4                                                    5207      
text/html                                                    3152      
video/quicktime                                              1354      
audio/x-hx-aac-adts                                          1346      
audio/x-m4a                                                  894       
audio/ogg                                                    759       
application/json                                             622       
video/x-msvideo                                              479       
image/png                                                    450       
application/dicom                                            426       
image/g3fax                                                  302       
image/gif                                                    227       
application/octet-stream                                     227       
text/xml                                                     145       
text/plain                                                   93        
...

7. Show duplicates

> ./fscan.py -d

8. Move duplicates into the previous configured folder for duplicates (step 3.)

> ./fscan.py -m





Info for installing magic (mime type detection) modules under windows:

pip install python-libmagic
pip install python-magic-bin

About

python project for finding duplicate files in a filesystem and moving them to an archive folder. Stats and stuff are stored in an sqlite database

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages