# Navigating inside the os



Nice explanations:
- https://www.pythonlearn.com/html-008/cfbook017.html

#### Relevant functions in `os`

- **`os.getcwd()`** returns a string with the path of the current working directory.

- **`os.path.abspath(file)`** returns a string with the absolute path of the file.

- **`os.listdir()`** returns a list containing all files and folders in the current directory.


- **`os.mkdir(dirname)`** creates a directory with name `dirname`.


- **`os.makedirs(dirname)`** creates a directory with name `dirname`. If `dirname` if dirname contains several levels of directories they will also be created (this does not happen with `os.mkdir`).


- **`os.rmdir(dirname)`** deletes the directory.


- **`os.removedirs(dirname)`** deletes the directory and all intermediate directories.


-  **`os.rename(filename1, filename2)`** renames `filename1` with `filename2`.


- **`os.stat(filename)`** returns information about `filename` such as the file size.


- **`os.walk()`** returns a `generator` used to navigate throghout the filesystem tree. The generator yeilds a tuple of 3 values. The 3 values correspond to the dirpath, dirnames (inside dirpath), filenames (inside dirpath).


- **`os.environ`** returns a `os._Environ` type  cotaining information about the environment variables. For example, `os.environ.get('HOME')` will return the home directory.

In [6]:
import os

In [7]:
os.getcwd()

'/Users/davidbuchacagmail.com/Documents/git_stuff/python_tutorials/os_communication'

In [8]:
os.listdir()

['.DS_Store',
 'os_navigation.ipynb',
 'folder_for_tests',
 'test.txt',
 '.ipynb_checkpoints']

In [9]:
os.path.abspath('test.txt')

'/Users/davidbuchacagmail.com/Documents/git_stuff/python_tutorials/os_communication/test.txt'

In [10]:
os.path.abspath('')

'/Users/davidbuchacagmail.com/Documents/git_stuff/python_tutorials/os_communication'

In [11]:
# Look all that is inside the path
path = "./"
print(os.listdir(path))

['.DS_Store', 'os_navigation.ipynb', 'folder_for_tests', 'test.txt', '.ipynb_checkpoints']


In [12]:
os.mkdir('created_by_me')

In [13]:
os.listdir()

['created_by_me',
 '.DS_Store',
 'os_navigation.ipynb',
 'folder_for_tests',
 'test.txt',
 '.ipynb_checkpoints']

In [14]:
# This will not work since B is created in A
# and A does not exit
os.mkdir('created_by_me/A/B')

FileNotFoundError: [Errno 2] No such file or directory: 'created_by_me/A/B'

In [15]:
os.makedirs('created_by_me/A/B')

In [16]:
os.listdir()

['created_by_me',
 '.DS_Store',
 'os_navigation.ipynb',
 'folder_for_tests',
 'test.txt',
 '.ipynb_checkpoints']

In [17]:
os.removedirs('created_by_me/A/B')

In [18]:
os.listdir()

['.DS_Store',
 'os_navigation.ipynb',
 'folder_for_tests',
 'test.txt',
 '.ipynb_checkpoints']

In [19]:
filename = './folder_for_tests/A_txt_files/f1.txt'
os.stat(filename)

os.stat_result(st_mode=33188, st_ino=6514416, st_dev=16777220, st_nlink=1, st_uid=501, st_gid=20, st_size=16, st_atime=1514652219, st_mtime=1514652218, st_ctime=1514652218)

In [20]:
print("Size of the file in bytes:", os.stat(filename).st_size)

Size of the file in bytes: 16


In [21]:
from datetime import datetime
modification_time = os.stat(filename).st_mtime
print("The file was modified in:", 
      datetime.fromtimestamp(modification_time))

The file was modified in: 2017-12-30 17:43:38.912361


In [22]:
os.environ.get('HOME')

'/Users/davidbuchacagmail.com'

#### About os.path

In [23]:
filepath=os.path.join(os.environ.get('HOME'), 'some_file_.txt')
filepath

'/Users/davidbuchacagmail.com/some_file_.txt'

In [24]:
os.path.basename('inventedpath/another_invented/f.txt')

'f.txt'

In [25]:
os.path.dirname('inventedpath/another_invented/f.txt')

'inventedpath/another_invented'

In [26]:
os.path.split('inventedpath/another_invented/f.txt')

('inventedpath/another_invented', 'f.txt')

In [107]:
os.path.exists('inventedpath/another_invented/f.txt')

False

In [28]:
os.path.isdir('inventedpath/another_invented/f.txt')

False

In [111]:
os.path.isfile('inventedpath/another_invented/f.txt')

False

In [113]:
os.path.exists('./folder_for_tests/test.txt')

True

In [109]:
## root and extension
os.path.splitext('inventedpath/another_invented/f.txt')

('inventedpath/another_invented/f', '.txt')

## copy files: `shutil` library

We can use the `shutil` library from the standard python library to move and copy files.

The following table summarizes the different functions for copying files and the main differences between them.


```
---------------------------------------------------------------------------
| Function          |Copies Metadata|Copies Permissions|Can Specify Buffer|
---------------------------------------------------------------------------
| shutil.copy       |      No       |        Yes       |        No        |
---------------------------------------------------------------------------
| shutil.copyfile   |      No       |         No       |        No        |
---------------------------------------------------------------------------
| shutil.copy2      |     Yes       |        Yes       |        No        |
---------------------------------------------------------------------------
| shutil.copyfileobj|      No       |         No       |       Yes        |
---------------------------------------------------------------------------
```


- **`shutil.copy2(A,B)`**: copies `A` into `B`.
- **`shutil.disk_usage(path)`**: checks total, used and free space in a disk.


In [33]:
import shutil

In [34]:
os.listdir()

['.DS_Store',
 'os_navigation.ipynb',
 'folder_for_tests',
 'test.txt',
 '.ipynb_checkpoints']

In [35]:
os.listdir('folder_for_tests/')

['.DS_Store', 'A_txt_files', 'test.txt', 'B_txt_files']

In [29]:
# copies test.txt from the main folder to /folder_for_tests/
shutil.copy2('test.txt', './folder_for_tests/test.txt')

'./folder_for_tests/test.txt'

In [37]:
os.listdir('folder_for_tests/')

['.DS_Store', 'A_txt_files', 'test.txt', 'B_txt_files']

In [46]:
## Check space in a drive (total, used, free space)
shutil.disk_usage("/")

usage(total=499963170816, used=304449593344, free=192672837632)


## Navigating the filesystem

We can walk over all the folders and subfolders of a a given `path` using the **`os.walk(path)`**. The `os.walk` method returns a `generator`.

Let us use this function to print all subfolders of `folder_for_tests` which is a folder containing several subfolders that we will use to test the different functions in `os`.

In [31]:
type(os.walk(path))

generator

In [32]:
path = './folder_for_tests/'
for dirpath, dirnames, filenames in os.walk(path):
    print('Current path:', dirpath)
    print('Directories:', dirnames)
    print('Files:', filenames)
    print()

Current path: ./folder_for_tests/
Directories: ['A_txt_files', 'B_txt_files']
Files: ['.DS_Store', 'test.txt']

Current path: ./folder_for_tests/A_txt_files
Directories: []
Files: ['f1.txt', 'f2.txt']

Current path: ./folder_for_tests/B_txt_files
Directories: ['A_2_txt_files']
Files: ['f5.txt', 'f4.txt', '.DS_Store', 'f3.txt']

Current path: ./folder_for_tests/B_txt_files/A_2_txt_files
Directories: []
Files: ['f6.txt', 'f7.txt']



#### Gathering filenames, and folders

Print the names of all folders inside `path` (or inside folders that are inside path).

In [33]:
path = './folder_for_tests/'
for dirpath, dirnames, filenames in os.walk(path):
    for directory in dirnames:
        print('folder name:', directory)

folder name: A_txt_files
folder name: B_txt_files
folder name: A_2_txt_files


Print the full path of all previous folders. The path will start in `path`

In [34]:
for dirpath, dirnames, filenames in os.walk(path):
    print("folder path:", dirpath)

folder path: ./folder_for_tests/
folder path: ./folder_for_tests/A_txt_files
folder path: ./folder_for_tests/B_txt_files
folder path: ./folder_for_tests/B_txt_files/A_2_txt_files


Print all `.txt` files inside all subfolders contained in `path` 

In [36]:
path = './folder_for_tests/'
for dirpath, dirnames, filenames in os.walk(path):
    for f in filenames:
        if f.endswith('.txt'):
            print('File:', f)

File: test.txt
File: f1.txt
File: f2.txt
File: f5.txt
File: f4.txt
File: f3.txt
File: f6.txt
File: f7.txt


## Application for video search 

Now we will apply the previously seen functions to find all videos in a given folder and subfolders.

#### find all (.mkv or .mp4 or .avi ) files in the home directory

In [53]:
def find_movie_files():
    path = os.environ.get('HOME')
    files = []
    files_full_path = []
    
    for dirpath, dirnames, filenames in os.walk(path):
        for f in filenames:
            if f.endswith('.mkv') or f.endswith('.avi') or f.endswith('.mp4'):
                fpath = os.path.join(dirpath, f)
                files.append(f)
                files_full_path.append(fpath)
                
    return files, files_full_path

In [54]:
movie_files, full_path = find_movie_files()

In [61]:
len(movie_files)

24

In [91]:
def retrieve_movie_data():
    path = os.environ.get('HOME')
    files = []
    sizes = []
    for dirpath, dirnames, filenames in os.walk(path):
        for f in filenames:
            fpath = os.path.join(dirpath, f)
            if fpath.endswith('.mkv') or fpath.endswith('.avi') or fpath.endswith('.mp4'):
                #print('File:', f)
                files.append(fpath)
                sizes.append(os.stat(fpath).st_size*10**(-9)) # size in GigaBytes
                
    return files,sizes

In [92]:
filenames, sizes_GB = retrieve_movie_data()

In [93]:
movie_files = [os.path.basename(f) for f in filenames]

In [100]:
## Print movie files sorted
# sorted by size (decreasingly)
import numpy as np
[movie_files[x] for x in np.argsort(sizes_GB)[::-1]]

In [101]:
np.sort(sizes_GB)[::-1]

### Make function more general

In [102]:

def find_files_with_provided_extension(source_path, file_extension):
    #path = os.environ.get('HOME')
    files = []
    files_full_path = []
    file_sizes_GB = []
    
    for dirpath, dirnames, filenames in os.walk(source_path):
        for f in filenames:
            if f.endswith(file_extension):
                fpath = os.path.join(dirpath, f)
                files.append(f)
                files_full_path.append(fpath)
                file_sizes_GB.append(os.stat(fpath).st_size*10**(-9)) 

    return files, files_full_path, file_sizes_GB


In [103]:
path = os.environ.get('HOME')

files, pathfiles, file_sizes_GB = find_files_with_provided_extension(path, 'mp3')

In [117]:
os.path.split(pathfiles[0])[1]

'50 Countdown.mp3'