# The shutil Module (Copying Files and Folders)

The shutil module provides functions for copying files, as well as entire folders. Calling `shutil.copy(source, destination)` will copy the file at the path source to the folder at the path destination. (Both source and destination are strings.) If destination is a filename, it will be used as the new name of the copied file. This function returns a string of the path of the copied file.

In [1]:
import os
import shutil

Changes the current working directory to the root directory of the file system

In [2]:
os.chdir('/Users/carlos/intro-data-engineering/02-python-fundamentals/task-automation/organize-files/') 

In [3]:
!pwd

/Users/carlos/intro-data-engineering/02-python-fundamentals/task-automation/organize-files


In [4]:
# Create directory if it does not exist
DATASETS_DIR = "test/"
if not os.path.exists(DATASETS_DIR):
    os.makedirs(DATASETS_DIR)
    print(f"Directory '{DATASETS_DIR}' created successfully.")
else:
    print(f"Directory '{DATASETS_DIR}' already exists.")

Directory 'test/' created successfully.


In [5]:
shutil.copy('./dataset/wild_cat/image_0001.jpg','test/')

'test/image_0001.jpg'

While `shutil.copy()` will copy a single file, shutil.copytree() will copy an entire folder and every folder and file contained in it. Calling `shutil.copytree(source, destination)` will copy the folder at the path source , along with all of its files and subfolders, to the folder at the path destination . The source and destination parameters are both strings. The function returns a string of the path of the copied folder.

In [7]:
shutil.copytree('./dataset/wild_cat','./test/')

'./test/'

# Moving and Renaming Files and Folders
Calling `shutil.move(source, destination)` will move the file or folder at the path source to the path destination and will return a string of the absolute path of the new location.

If `destination` points to a folder, the `source` file gets moved into destination and keeps its current filename.

In [9]:
# Create the folder test_2 first
shutil.move('./test/image_0001.jpg','./test_2/')

'./test_2/image_0001.jpg'

The `destination` path can also specify a filename. In the following example, the `source` file is moved and renamed.

In [10]:
shutil.move('./test/image_0002.jpg','./test_2/image_2.jpg')

'./test_2/image_2.jpg'

Finally, the folders that make up the destination must already exist, or else Python will throw an exception.

In [11]:
shutil.move('./test/image_0002.jpg','./does_not_exist/image_2.jpg')

FileNotFoundError: [Errno 2] No such file or directory: './test/image_0002.jpg'

# Permanently Deleting Files and Folders
You can delete a single file or a single empty folder with functions in the os module, whereas to delete a folder and all of its contents, you use the shutil module. 

* Calling `os.unlink(path)` will delete the file at path. 
* Calling `os.rmdir(path)` will delete the folder at path . This folder must be empty of any files or folders.
* Calling `shutil.rmtree(path)` will remove the folder at path, and all files and folders it contains will also be deleted.

Be careful when using these functions in your programs! It’s often a good idea to first run your program with these calls commented out and with `print()` calls added to show the files that would be deleted.

In [12]:
%cd test_2/

/Users/carlos/intro-data-engineering/02-python-fundamentals/task-automation/organize-files/test_2


In [13]:
import os

for filename in os.listdir():
  #print(filename)
  if filename.endswith('.jpg'):
    os.unlink(filename)
    print(f'{filename} file deleted')

image_2.jpg file deleted
image_0001.jpg file deleted


# Safe Deletes with the send2trash Module
Using `send2trash` is much safer than Python’s regular delete functions, because it will send folders and files to your computer’s trash or recycle bin instead of permanently deleting them. If a bug in your program deletes something with `send2trash` you didn’t intend to delete, you can later restore it from the recycle bin.

In [14]:
!pip install send2trash


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m23.2.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [15]:
import send2trash
with open('bacon.txt','a') as baconFile:
    baconFile.write('Bacon is not a vegetable.')

In [16]:
send2trash.send2trash('bacon.txt')

Note that the send2trash() function can only send files to the recycle bin; it cannot pull files out of it.

> **HINT**  
Check the trash in Mac, Linux or Windows

# Walking a Directory Tree
The `os.walk()` function is passed a single string value: the path of a folder. You can use `os.walk()` in a for loop statement to walk a directory tree, much like how you can use the `range()` function to walk over a range of numbers. Unlike `range()` , the `os.walk()` function will return three values on each iteration through the loop: 
1. A string of the current folder’s name 
2. A list of strings of the folders in the current folder 
3. A list of strings of the files in the current folder

In [17]:
%cd ..

/Users/carlos/intro-data-engineering/02-python-fundamentals/task-automation/organize-files


In [18]:
import os
for foldername, subfolders, filenames in os.walk('./dataset'):
  print(f'The current folder is{foldername}')
  for subfolder in subfolders:
    print('\t Subfolder of' + foldername + ': ' + subfolder)
    for filename in filenames:
      print('\t\t File inside ' + foldername + ': '+ filename)
    print('')

The current folder is./dataset
	 Subfolder of./dataset: wild_cat
		 File inside ./dataset: .DS_Store

	 Subfolder of./dataset: llama
		 File inside ./dataset: .DS_Store

The current folder is./dataset/wild_cat
The current folder is./dataset/llama


# Compressing Files with the zipfile Module
Compressing a file reduces its size, which is useful when transferring it over the Internet. And since a ZIP file can also contain multiple files and subfolders, it’s a handy way to package several files into one. This single file, called an archive file , can then be, say, attached to an email.

Your Python programs can both create and open (or extract ) ZIP files using functions in the zipfile module.


In [20]:
!wget 'https://github.com/carloslme/intro-data-engineering/raw/main/02-python-fundamentals/task-automation/organize-files/datasets.zip' -P './'

--2023-08-02 20:42:33--  https://github.com/carloslme/intro-data-engineering/raw/main/02-python-fundamentals/task-automation/organize-files/datasets.zip
Resolving github.com (github.com)... 140.82.112.4
Connecting to github.com (github.com)|140.82.112.4|:443... 

connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/carloslme/intro-data-engineering/main/02-python-fundamentals/task-automation/organize-files/datasets.zip [following]
--2023-08-02 20:42:34--  https://raw.githubusercontent.com/carloslme/intro-data-engineering/main/02-python-fundamentals/task-automation/organize-files/datasets.zip
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 2606:50c0:8000::154, 2606:50c0:8001::154, 2606:50c0:8002::154, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|2606:50c0:8000::154|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2644024 (2.5M) [application/zip]
Saving to: ‘./datasets.zip’


2023-08-02 20:42:34 (10.3 MB/s) - ‘./datasets.zip’ saved [2644024/2644024]



# Reading ZIP Files
To read the contents of a ZIP file, first you must create a ZipFile object (note the capital letters Z and F).

To create a ZipFile object, call the zipfile.`ZipFile()` function, passing it a string of the .zip file’s filename. Note that zipfile is the name of the Python module, and `ZipFile()` is the name of the function.


In [22]:
import zipfile, os
os.chdir('./') # move ot directory with .zip
exampleZip = zipfile.ZipFile('datasets.zip')

In [23]:
exampleZip.namelist()

['README.md',
 '__MACOSX/._README.md',
 'anscombe.json',
 '__MACOSX/._anscombe.json',
 'california_housing_test.csv',
 '__MACOSX/._california_housing_test.csv',
 'california_housing_train.csv',
 '__MACOSX/._california_housing_train.csv',
 'mnist_test.csv',
 '__MACOSX/._mnist_test.csv']

In [24]:
docInfo = exampleZip.getinfo('california_housing_test.csv')

In [25]:
docInfo.file_size

301141

In [26]:
docInfo.compress_size

71399

In [27]:
'Compressed fil is %sx smaller!' % (round(docInfo.file_size / docInfo.compress_size, 2))

'Compressed fil is 4.22x smaller!'

In [28]:
exampleZip.close()

# Extracting from ZIP Files 
The `extractall()` method for ZipFile objects extracts all the files and folders from a ZIP file into the current working directory.


In [29]:
!pwd

/Users/carlos/intro-data-engineering/02-python-fundamentals/task-automation/organize-files


In [31]:
import zipfile, os
exampleZip = zipfile.ZipFile('datasets.zip')
exampleZip.extractall("datasets_")
exampleZip.close()

The `extract()` method for ZipFile objects will extract a single file from the ZIP file.

In [32]:
exampleZip = zipfile.ZipFile('datasets.zip')

In [33]:
exampleZip.namelist()

['README.md',
 '__MACOSX/._README.md',
 'anscombe.json',
 '__MACOSX/._anscombe.json',
 'california_housing_test.csv',
 '__MACOSX/._california_housing_test.csv',
 'california_housing_train.csv',
 '__MACOSX/._california_housing_train.csv',
 'mnist_test.csv',
 '__MACOSX/._mnist_test.csv']

In [35]:
final_path = exampleZip.extract('california_housing_test.csv', './dataset/')
final_path

'dataset/california_housing_test.csv'

If this second argument is a folder that doesn’t yet exist, Python will create the folder. The value that `extract()` returns is the absolute path to which the file was extracted.

# Creating and Adding to ZIP Files
To create your own compressed ZIP files, you must open the ZipFile object in write mode by passing 'w' as the second argument. (This is similar to opening a text file in write mode by passing 'w' to the `open()` function.) 

When you pass a path to the `write()` method of a ZipFile object, Python will compress the file at that path and add it into the ZIP file. The `write()` method’s first argument is a string of the filename to add. The second argument is the compression type parameter, which tells the computer what algorithm it should use to compress the files; you can always just set this value to `zipfile.ZIP_DEFLATED` . (This specifies the deflate compression algorithm, which works well on all types of data.)

In [36]:
import zipfile
newZip = zipfile.ZipFile('new.zip', 'w')

In [37]:
!ls

README.md                    datasets.zip.1
[34m__MACOSX[m[m                     [34mdatasets_[m[m
anscombe.json                mnist_test.csv
california_housing_test.csv  new.zip
california_housing_train.csv [31morganizing_files.ipynb[m[m
[34mdataset[m[m                      [34mtest[m[m
dataset.zip                  [34mtest_2[m[m
datasets.zip


In [39]:
newZip.write('./dataset/llama/image_0001.jpg', compress_type=zipfile.ZIP_DEFLATED)
newZip.close()

Keep in mind that, just as with writing to files, write mode will erase all existing contents of a ZIP file. If you want to simply add files to an existing ZIP file, pass 'a' as the second argument to `zipfile.ZipFile()` to open the ZIP file in append mode .

In [40]:
# Adding to ZIP file
newZip = zipfile.ZipFile('new.zip','a')

In [42]:
newZip.write('./dataset/wild_cat/image_0001.jpg', compress_type=zipfile.ZIP_DEFLATED)
newZip.close()

Checking the new data added by reading the zip file

In [43]:
newZip = zipfile.ZipFile('new.zip')

In [44]:
newZip.infolist()

[<ZipInfo filename='dataset/llama/image_0001.jpg' compress_type=deflate filemode='-rw-r-xr-x' file_size=12395 compress_size=12250>,
 <ZipInfo filename='dataset/wild_cat/image_0001.jpg' compress_type=deflate filemode='-rw-r-xr-x' file_size=14964 compress_size=14350>]