Skip to content

Removes duplicate files within a specified directory or directories (a Windows PowerShell script).

License

Notifications You must be signed in to change notification settings

auberginehill/remove-duplicate-files

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Remove-DuplicateFiles.ps1

OS: Windows
Type: A Windows PowerShell script
Language: Windows PowerShell
Description: Remove-DuplicateFiles searches for duplicate files from a directory specified with the -Path parameter. The files of a folder are analysed with the inbuilt Get-FileHash cmdlet in machines that have PowerShell version 4 or later installed, and in machines that are running PowerShell version 2 or 3 the .NET Framework commands (and a function called Check-FileHash, which is based on Lee Holmes' Get-FileHash script in "Windows PowerShell Cookbook (O'Reilly)") are invoked for determining whether or not any duplicate files exist in a particular folder.

Multiple paths may be entered to the -Path parameter (separated with a comma) and sub-directories may be included to the list of folders to process by adding the -Recurse parameter to the launching command. By default the removal of files in Remove-DuplicateFiles is done on 'per directory' -basis, where each individual folder is treated as its own separate entity, and the duplicate files are searched and removed within one particular folder realm at a time, so for example if a file exists twice in Folder A and also once in Folder B, only the second instance of the file in Folder A would be deleted by Remove-DuplicateFiles by default. To make Remove-DuplicateFiles delete also the duplicate file that is in Folder B (in the previous example), a parameter called -Global may be added to the launching command, which makes Remove-DuplicateFiles behave more holistically and analyse all the items in every found directory in one go and compare each found file with each other.

If deletions are made, a log-file (deleted_files.txt by default) is created to $env:temp, which points to the current temporary file location and is set in the system (– for more information about $env:temp, please see the Notes section). The filename of the log-file can be set with the -FileName parameter (a filename with a .txt ending is recommended) and the default output destination folder may be changed with the -Output parameter. During the possibly invoked log-file creation procedure Remove-DuplicateFiles tries to preserve any pre-existing content rather than overwrite the specified file, so if the -FileName parameter points to an existing file, new log-info data is appended to the end of that file.

To invoke a simulation run, where no files would be deleted in any circumstances, a parameter -WhatIf may be added to the launching command. If the -Audio parameter has been used, an audible beep would be emitted after Remove-DuplicateFiles has deleted one or more files. Please note that if any of the parameter values (after the parameter name itself) includes space characters, the value should be enclosed in quotation marks (single or double) so that PowerShell can interpret the command correctly.
Homepage: https://github.com/auberginehill/remove-duplicate-files
Short URL: http://tinyurl.com/jv4jlbe
Version: 1.2
Sources:
Emojis: Emoji Table
Mekac: Get folder where Access is denied
Mike F Robbins: PowerShell Advanced Functions: Can we build them better?
Lee Holmes: Windows PowerShell Cookbook (O'Reilly): Get-FileHash script
Gisli: Unable to read an open file with binary reader
Twon of An: Get the SHA1,SHA256,SHA384,SHA512,MD5 or RIPEMD160 hash of a file
Downloads: For instance Remove-DuplicateFiles.ps1. Or everything as a .zip-file.

Screenshot

        screenshot

Parameters

📐
  • Parameter -Path

    with aliases -Start, -Begin, -Folder, and -From. The -Path parameter determines the starting point of the duplicate file analysation. The -Path parameter also accepts a collection of path names (separated by a comma). It's not mandatory to write -Path in the remove duplicate files command to invoke the -Path parameter, as is shown in the Examples below, since Remove-DuplicateFiles is trying to decipher the inputted queries as good as it is machinely possible within a 50 KB size limit.

    The paths should be valid file system paths to a directory (a full path name of a directory (i.e. folder path such as C:\Windows)). In case the path name includes space characters, please enclose the path name in quotation marks (single or double). If a collection of path names is defined for the -Path parameter, please separate the individual path names with a comma. The -Path parameter also takes an array of strings for paths and objects could be piped to this parameter, too. If no path is defined in the command launching Remove-DuplicateFiles the user will be prompted to enter a -Path value. Whether or not the subdirectories are added to the list of folders to be processed is toggled with the -Recurse parameter. Furthermore, the parameter -Global toggles whether the contents of found folders are compared with each other or not.

  • Parameter -Output

    with an alias -ReportPath. Specifies where the log-file (deleted_files.txt by default), which is created or updated when deletions are made, is to be saved. The default save location is $env:temp, which points to the current temporary file location, which is set in the system. The default -Output save location is defined at line 16 with the $Output variable. In case the path name includes space characters, please enclose the path name in quotation marks (single or double). For usage, please see the Examples below and for more information about $env:temp, please see the Notes section below.

  • Parameter -FileName

    with an alias -File. The filename of the log-file can be set with the -FileName parameter (a filename with a .txt ending is recommended, the default filename is deleted_files.txt). During the possibly invoked log-file creation procedure Remove-DuplicateFiles tries to preserve any pre-existing content rather than overwrite the specified file, so if the -FileName parameter points to an existing file, new log-info data is appended to the end of that file. If the filename includes space characters, please enclose the filename in quotation marks (single or double).

  • Parameter -Recurse

    If the -Recurse parameter is added to the command launching Remove-DuplicateFiles, also each and every sub-folder in any level, no matter how deep in the directory structure or behind how many sub-folders, is added to the list of folders to be processed by Remove-DuplicateFiles. If the -Recurse parameter is not used, the only folders that are processed are those which have been defined with the -Path parameter.

  • Parameter -Global

    with aliases -Combine and -Compare. If the -Global parameter is added to the command launching Remove-DuplicateFiles, the contents of different folders are combined and compared with each other, so for example if a file exists twice in Folder A and also once in Folder B, the second instance in folder A and the file in Folder B would be deleted by Remove-DuplicateFiles (only one instance of a file would be universally kept). Before trying to remove files from multiple locations with the -Global parameter in Remove-DuplicateFiles, it is recommended to use both the -WhatIf parameter and the -Global parameter in the command launching Remove-DuplicateFiles in order to make sure, that the correct original file in the correct directory would be left untouched by Remove-DuplicateFiles.

    If the -Global parameter is not used, the removal of files is done on 'per directory' -basis and the contents of different folders are not compared with each other, so those duplicate files, which exist alone in their own folder will be preserved (as per default one instance of a file in each folder) even after Remove-DuplicateFiles has been run (each folder is regarded as an separate entity or realm).

  • Parameter -WhatIf

    The parameter -WhatIf toggles whether the deletion of files is actually done or not. By adding the -WhatIf parameter to the launching command only a simulation run is performed. When the -WhatIf parameter is added to the command launching Remove-DuplicateFiles, a -WhatIf parameter is also added to the underlying Remove-Item cmdlet that is deleting the files in Remove-DuplicateFiles. In such case and if duplicate file(s) was/were detected by Remove-DuplicateFiles, a list of files that would be deleted by Remove-DuplicateFiles is displayed in console ("What if:"). Since no real deletions aren't made, the script will return an "Exit Code 1" (A simulation run: the -WhatIf parameter was used).

    In case there were no duplicate files to begin with, the result is the same, whether the -WhatIf parameter was used or not. Before trying to remove files from multiple locations with the -Global parameter in Remove-DuplicateFiles, it is recommended to use both the -WhatIf parameter and the -Global parameter in the command launching Remove-DuplicateFiles in order to make sure, that the correct original file in the correct directory would be left untouched by Remove-DuplicateFiles.

  • Parameter -Audio

    If this parameter is used in the remove duplicate files command, an audible beep will occur, if any deletions are made by Remove-DuplicateFiles (and if the system is not set to mute).

Outputs

➡️
  • Deletes duplicate files in one or multiple folders.

  • Displays results about deleting duplicate files in console, and if any deletions were made, writes or updates a logfile (deleted_files.txt) at $env:temp. The filename of the log-file can be set with the -FileName parameter (a filename with a .txt ending is recommended) and the default output destination folder may be changed with the -Output parameter.
  • Default values (the log-file creation/updating procedure only occurs if deletion(s) is/are made by Remove-DuplicateFiles):
    1. Path Type Name
      $env:temp\deleted_files.txt TXT-file deleted_files.txt

Notes

⚠️
  • Please note that all the parameters can be used in one remove duplicate files command and that each of the parameters can be "tab completed" before typing them fully (by pressing the [tab] key).

  • Please also note that the possibly generated log-file is created in a directory, which is end-user settable in each remove duplicate files command with the -Output parameter. The default save location is defined with the $Output variable (at line 16). The $env:temp variable points to the current temp folder. The default value of the $env:temp variable is C:\Users\<username>\AppData\Local\Temp (i.e. each user account has their own separate temp folder at path %USERPROFILE%\AppData\Local\Temp). To see the current temp path, for instance a command

    [System.IO.Path]::GetTempPath()

    may be used at the PowerShell prompt window [PS>]. To change the temp folder for instance to C:\Temp, please, for example, follow the instructions at Temporary Files Folder - Change Location in Windows, which in essence are something along the lines:
    1. Right click on Computer and click on Properties (or select Start → Control Panel → System). In the resulting window with the basic information about the computer...
    2. Click on Advanced system settings on the left panel and select Advanced tab on the resulting pop-up window.
    3. Click on the button near the bottom labeled Environment Variables.
    4. In the topmost section labeled User variables both TMP and TEMP may be seen. Each different login account is assigned its own temporary locations. These values can be changed by double clicking a value or by highlighting a value and selecting Edit. The specified path will be used by Windows and many other programs for temporary files. It's advisable to set the same value (a directory path) for both TMP and TEMP.
    5. Any running programs need to be restarted for the new values to take effect. In fact, probably also Windows itself needs to be restarted for it to begin using the new values for its own temporary files.

Examples

📖 To open this code in Windows PowerShell, for instance:

  1. ./Remove-DuplicateFiles -Path "E:\chiore"
    Run the script. Please notice to insert ./ or .\ before the script name. Removes duplicate files from the "E:\chiore" directory and saves the generated log-file at the default location ($env:temp), if any deletions were made. Regardless of how many subfolders there are or are not in "E:\chiore" the duplicate files are analysed at the first level only (i.e. the base for the file analysation is non-recursive, similar to a common command "dir", for example). During the possibly invoked log-file creation procedure Remove-DuplicateFiles tries to preserve any pre-existing content rather than overwrite the file, so if the default log-file (deleted_files.txt) already exists, new log-info data is appended to the end of that file. Please note, that -Path and the quotation marks can be omitted in this example, because

    ./Remove-DuplicateFiles E:\chiore

    will result in the exact same outcome, since the path name is accepted as a first defined value automatically and since the path name doesn't contain any space characters.
  2. help ./Remove-DuplicateFiles -Full
    Display the help file.
  3. ./Remove-DuplicateFiles -Path "E:\chiore", "C:\dc01" -Output "C:\Scripts" -Global
    Run the script and remove all duplicate files from the first level of "E:\chiore" and "C:\dc01" (i.e. those duplicate files, which would be listed by combining the results of "dir E:\chiore" and "dir E:\dc01" commands), and if any deletions are made, save the log-file to C:\Scripts with the default filename (deleted_files.txt). If a file exists in "E:\chiore" and also in "C:\dc01" (i.e. the other instance is a duplicate file), one instance would be preserved and the other would be deleted by Remove-DuplicateFiles. The word -Path and the quotation marks could be omitted in this example, too.
  4. ./Remove-DuplicateFiles -Path "C:\Users\Dropbox" -Recurse -WhatIf
    Because the -WhatIf parameter was used, only a simulation run occurs, so no files would be deleted in any circumstances. The script will look for duplicate files from C:\Users\Dropbox and will add all sub-directories of the sub-directories of the sub-directories and their sub-directories as well to the list of folders to process (the search for other folders to process is done recursively). Each of the found folders is searched separately (or individually) for duplicate files (so if a file exists twice in Folder A and also once in Folder B, only the second instance of the file in Folder A would be added to list of files to be deleted).

    If duplicate files aren't found (when looked at every folder separately and the contents of each folder are not compared with each other, since the -Global parameter was not used), the result would be identical regardless whether the -WhatIf parameter was used or not. If, however, duplicate files were indeed found, only an indication of what the script would delete ("What if:") is displayed.

    The Path variable value is case-insensitive (as is most of the PowerShell), and since the path name doesn't contain any space characters, it doesn't need to be enveloped with quotation marks. Actually the -Path parameter may be left out from the command, too, since, for example,

    ./Remove-DuplicateFiles c:\users\dROPBOx -Recurse -WhatIf

    is the exact same command in nature.
  5. .\Remove-DuplicateFiles.ps1 -From C:\dc01 -ReportPath C:\Scripts -File log.txt -Recurse -Combine -Audio
    Run the script and delete all the duplicate files found in C:\dc01 and in every subfolder under C:\dc01 combined. The duplicate files are searched in one go from all the found folders and the contents of all folders are compared with each other.

    If any deletions were made, the log-file would be saved to C:\Scripts with the filename log.txt and an audible beep would occur. This command will work, because -From is an alias of -Path and -ReportPath is an alias of -Output, -File is an alias of -FileName and -Combine is an alias of -Global. Furthermore, since the path names or the file name don't contain any space characters, they don't need to be enclosed in quotation marks.
  6. Set-ExecutionPolicy remotesigned
    This command is altering the Windows PowerShell rights to enable script execution for the default (LocalMachine) scope. Windows PowerShell has to be run with elevated rights (run as an administrator) to actually be able to change the script execution properties. The default value of the default (LocalMachine) scope is "Set-ExecutionPolicy restricted".

    Parameters:

      Restricted Does not load configuration files or run scripts. Restricted is the default execution policy.
      AllSigned Requires that all scripts and configuration files be signed by a trusted publisher, including scripts that you write on the local computer.
      RemoteSigned Requires that all scripts and configuration files downloaded from the Internet be signed by a trusted publisher.
      Unrestricted Loads all configuration files and runs all scripts. If you run an unsigned script that was downloaded from the Internet, you are prompted for permission before it runs.
      Bypass Nothing is blocked and there are no warnings or prompts.
      Undefined Removes the currently assigned execution policy from the current scope. This parameter will not remove an execution policy that is set in a Group Policy scope.

    For more information, please type "Get-ExecutionPolicy -List", "help Set-ExecutionPolicy -Full", "help about_Execution_Policies" or visit Set-ExecutionPolicy or about_Execution_Policies.

  7. New-Item -ItemType File -Path C:\Temp\Remove-DuplicateFiles.ps1
    Creates an empty ps1-file to the C:\Temp directory. The New-Item cmdlet has an inherent -NoClobber mode built into it, so that the procedure will halt, if overwriting (replacing the contents) of an existing file is about to happen. Overwriting a file with the New-Item cmdlet requires using the Force. If the path name and/or the filename includes space characters, please enclose the whole -Path parameter value in quotation marks (single or double):

      New-Item -ItemType File -Path "C:\Folder Name\Remove-DuplicateFiles.ps1"

    For more information, please type "help New-Item -Full".

Contributing

Find a bug? Have a feature request? Here is how you can contribute to this project:

contributing Bugs: Submit bugs and help us verify fixes.
Feature Requests: Feature request can be submitted by creating an Issue.
Edit Source Files: Submit pull requests for bug fixes and features and discuss existing proposals.

www

www Script Homepage
Mekac: Get folder where Access is denied
Mike F Robbins: PowerShell Advanced Functions: Can we build them better?
Lee Holmes: Windows PowerShell Cookbook (O'Reilly): Get-FileHash script
Gisli: Unable to read an open file with binary reader
Twon of An: Get the SHA1,SHA256,SHA384,SHA512,MD5 or RIPEMD160 hash of a file
RemoveEmptyFolders.ps1
Remove all empty folders.ps1
Append Text to a File Using Add-Content in PowerShell
About Functions Advanced Parameters
SHA256CryptoServiceProvider Class
MD5CryptoServiceProvider Class
Get-FileHash
MACTripleDES Class
RIPEMD160 Class
System.Security.Cryptography Namespace
Path Methods
Test-Path
How do I get PowerShell 4 cmdlets such as Test-NetConnection to work on Windows 7?
Calculate MD5 and SHA1 File Hashes Using PowerShell
remove-duplicate-files.ps1
Get-FileHash.ps1
ASCII Art: http://www.figlet.org/ and ASCII Art Text Generator

Related scripts

www Disable-Defrag
Firefox Customization Files
Get-AsciiTable
Get-BatteryInfo
Get-ComputerInfo
Get-CultureTables
Get-DirectorySize
Get-HashValue
Get-InstalledPrograms
Get-InstalledWindowsUpdates
Get-PowerShellAliasesTable
Get-PowerShellSpecialFolders
Get-RAMInfo
Get-TimeDifference
Get-TimeZoneTable
Get-UnusedDriveLetters
Emoji Table
Java-Update
Remove-EmptyFolders
Remove-EmptyFoldersLite
Rename-Files
Rock-Paper-Scissors
Toss-a-Coin
Update-AdobeFlashPlayer
Update-MozillaFirefox

About

Removes duplicate files within a specified directory or directories (a Windows PowerShell script).

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published