# Directories and Paths

Up till now we've seen how to use python to solve simple math problems, store data in lists etc. But what if we wanted to make something more practical? Something that would help us crate scripts that automate some mundane tasks, such as copying, moving, renaming or deleting files and folders.

An important aspect of file organization in operating systems are **paths**. Paths are strings that tell us where a certain file or folder can be found inside a the OS's directory structure. These files are typically organized in a hierarchical tree structure.

For example in **Windows** systems, in the root of a boot partition one can find the following folders:
- `\Program Files`: Is where the programs are installed
- `\ProgramData`: Contains program data that are expected to be accessed by computer programs regardless of the user account in the context of which they run.
- `\Users`: Is the user profile folder containing onw subfolder for each user.
- `\Windows`: Windows is installed in this folder.
- `\System`: Is where the DLLs, that implement the core features of Windows are stored.  
etc.

A path in a windows like system looks like:
<pre>C:\Users\Thanos\Projects\Python\</pre>

This means that my user name is *`Thanos`* and that I have a folder called *`Python`* in another folder called *`Projects`* in my user directory. The *`C:`* part in the beginning means that my root directory is the partition *`C`*.

In **UNIX** based systems, have their own directory structure:

- `/bin`: Contains the binaries for certain fundamental utilities.
- `/boot`: Contains all the files needed for booting.
- `/home`: Contains the a subfolder for each user (similar to window's *`\Users`* folder).
- `/lib`: Contains essencial libraries needed for the programs in *`/bin`*.  
etc.

Unix type paths use regular slashes (`/`) to separate directories.  
Ths equivalent path in Unix would be:
<pre>/home/thanos/projects/python/</pre>

The first slash in the beginning indicates the root directory.

Paths are used extensively in computer science to represent the directory/file relationships common in modern operating systems, and are essential in the construction of Uniform Resource Locators (**URL**s). Resources can be represented by either **absolute** or **relative** paths.

- An **absolute** or full path points to the same location in a file system, regardless of the current working directory. To do that, it must include the root directory.
- A **relative** path starts from some given working directory, avoiding the need to provide the full absolute path. A relative path **never** starts with a slash.

From now on, we'll assume a UNIX based operating system.

# OS

Python's OS module in Python provides a way of using operating system dependent functionality. Let's dive in.

In [1]:
from __future__ import print_function
import os

os.getcwd()

'/home/thanos/Documents/Notes/python/tutorial'

This command returns the path of the current working directory (in this case the directory from which the notebook was launched).

If we wanted to, say, get a list of all items inside our working directory?

In [2]:
original_dir = os.getcwd()
os.listdir('.')

['05_basic_tuple_dict_operations.py',
 '16_marplotlib_seaborn.ipynb',
 'test_file',
 '03_basic_string_operations.ipynb',
 'scr_args.py',
 '01_basic_data_types.py',
 '08_input_output.py',
 'custom_module.pyc',
 '12_exception_handling.ipynb',
 '06_logical_operations.ipynb',
 '03_basic_string_operations.py',
 '18_sklearn.ipynb',
 '06_logical_operations.py',
 '15_numpy_scipy.ipynb',
 '08_input_output.ipynb',
 '10_classes.ipynb',
 '13_time_random_ordereddict.ipynb',
 '09_functions.py',
 'pyth',
 '02_basic_numerical_operations.py',
 'a_file.csv',
 '10_classes.py',
 '04_basic_list_operations.ipynb',
 'test_file.pkl',
 '04_basic_list_operations.py',
 '11_modules_and_packages.ipynb',
 '__pycache__',
 'custom_module.py',
 '01_basic_data_types.ipynb',
 '00_python_basics.ipynb',
 '17_pandas.ipynb',
 '07_iterations.ipynb',
 '12_exception_handling.py',
 '00_python_basics.py',
 '02_basic_numerical_operations.ipynb',
 '11_modules_and_packages.py',
 '09_functions.ipynb',
 '.ipynb_checkpoints',
 '07_ite

The previous command needs a path as an argument: `os.listdir(path)`.  
To refer to the working directory we use the dot (`.`).

Say we want to change the working directory to match the example above (`/home/thanos/projects/python/`).

In [3]:
os.chdir('/home/thanos/projects/python/')
# feel free to change this path to your home dir + projects + python  

FileNotFoundError: [Errno 2] No such file or directory: '/home/thanos/projects/python/'

Oops... the directory doesn't exist. We'll need to create it.

Let's change to a one we know exists.

In [4]:
home = os.path.expanduser("~")  # identify the user's home directory
os.chdir(home)  # change to home directory
os.getcwd()  # to confirm

'/home/thanos'

Now we need to create a directory named `projects` in our working directory.

In [5]:
os.mkdir('projects')  # make directory
os.chdir('projects')  # change directory
os.getcwd()  # confirm

'/home/thanos/projects'

Note that we are using relative paths! This means that we don't have to write the full path each time (`/home/thanos/projects`). 

In short there are two options when wanting to use paths:
- Absolute paths: We instruct python to create a directory in `/home/thanos/projects`.
- Relative paths: We tell python to create a directory called `projects`. Where? Where I currently am working from (`/home/thanos/`)

The difference between the two is that absolute paths **always** start with a slash (`/`).

To merge two paths into one we can use `os.path.join()`. 

In [6]:
os.path.join(home, 'projects')

'/home/thanos/projects'

Another choice we have is to use our operating system's own commands. We can do this easily:

```python
os.system(command)
```
where command is a string containing the command we want to give to our os. For instance in UNIX systems `mkdir /path/name` is the command used to make a directory.

In [7]:
os.system('mkdir python')  # this should work in windows systems too

0

If these worked correctly we should have created a directory with the path: `/home/thanos/projects/python`. We can easily confirm this.

In [8]:
os.chdir('python')
os.getcwd()

'/home/thanos/projects/python'

A last interesting function of the os module is `os.walk()`. This method generates the file names in a directory tree by walking the tree either top-down or bottom-up.
```python
os.walk(top[, topdown=True[, onerror=None[, followlinks=False]]])
```
- `top`: Each directory rooted at directory, yields 3-tuples, i.e., (dirpath, dirnames, filenames)
- `topdown`: If optional argument `topdown` is `True` or not specified, directories are scanned from top-down. If `topdown` is set to `False`, directories are scanned from bottom-up.
- `onerror`: This can show error to continue with the walk, or raise the exception to abort the walk.
- `followlinks`: This visits directories pointed to by symlinks, if set to true.

In [9]:
for directory, subdirs, files in os.walk(os.path.join(home, 'projects')):
    for f in files:
        print(os.path.join(directory, f))  # this should recurrsively print all files under '/home/thanos/projects'
                                           # currently this is an empty directory, so it won't print anything
    for s in subdirs:
        print(os.path.join(directory, s))  # this should find the 'python' directory we created during the 
                                           # previous step

/home/thanos/projects/python


We could expand this thought into a function that prints everything under a path:

In [10]:
def recurrsive_listdir(path):
    for directory, subdirs, files in os.walk(path, topdown=False):
        for name in files:
            print(os.path.join(directory, name))
        for name in subdirs:
            print(os.path.join(directory, name))
            
p = os.path.join(home, 'projects')
recurrsive_listdir(p)

/home/thanos/projects/python


The above code will scan all the directories and subdirectories in `/home/thanos/projects/` bottom-to-up. If `topdown` was set to `True`, the code will scan the directories top-to-bottom.

The OS module also offers a lot more functionality like renaming or removing files, manipulating users, groups, environmental variables, process id's etc.

### Important! Don't use:  `from os import *`
This will cause the `os.open()` function to override the built-in `open()` function which operates differently!

## os.path
This is the os module's submodule that contains functions on pathnames. We already saw a couple of functions from this submodule, namely `os.path.expanduser('~')` which returns the home directory of the current user and `os.path.join()` which merges two paths.

In order to showcase some of this module's functionality we'll first create a file. Remember we're still working off of  `/home/thanos/projects/python`. 

In [11]:
with open('test', 'w') as f:
    f.write('bla bla bla')

Now let's confirm that actually created this file.

In [12]:
os.listdir('.')

['test']

Note that, we can't tell if this is a file, a directory or even a link. How can we make such a check?

In [13]:
print('file:', os.path.isfile('test'))  # checks if 'test' is a file
print('directory:', os.path.isdir('test'))  # checks if 'test' is a directory
print('link:', os.path.islink('test'))  # checks if 'test' is a link
print('mount point:', os.path.ismount('test'))  # checks if 'test' is a mount point

file: True
directory: False
link: False
mount point: False


We can learn other things about our file, such as it's size or the time from it's last modification:

In [14]:
print('last modified: ', os.path.getmtime('test'))
print('file size: ', os.path.getsize('test'))

last modified:  1524836254.9658806
file size:  11


The first one returns the time of the last modification (in seconds) since the epoch. The second one is the filesize in bytes.

The former can be converted to a more human readable format through the `time` module, which is capable of handling such formats. 

In [15]:
import time
time.ctime(os.path.getmtime('test'))

'Fri Apr 27 16:37:34 2018'

We can also check if the path is an absolute path or not.

In [16]:
print(os.path.isabs('/absolute/path'))
print(os.path.isabs('relative/path'))

True
False


This, in UNIX systems, just checks if there is a slash in front of the path. It **doesn't ** check if this actual path is valid.

For this, we use the `os.path.exists(path)` function.

In [17]:
print(os.path.exists(os.path.join(p, 'python', 'test')))
print(os.path.exists(os.path.join(p, 'python', 'wrong')))

True
False


We can also convert a relative to an absolute path.

In [18]:
print(os.path.abspath('test'))

/home/thanos/projects/python/test


This just joins the **working directory** to the relative path:

<pre> absolute = cwd + relative </pre>

Alternatively, if we know where a file is located, we can join that with the filename to create the absolute path.

In [19]:
file_location = '/an/absolute/path'
filename = 'a_file'
os.join(file_location, filename)

AttributeError: module 'os' has no attribute 'join'

Other functions of the `os.path` module can split paths into lists, join lists into paths etc.

# pathlib

While the `os` module provides most of the functionality we might need, some commands might get a bit overwhelming. For example to get a list of the *absolute* paths in everything under a certain path, probably the easiest way would be:

```python
path = '/a/path/to/list/its/contents'
[os.path.join(path, x) for x in os.listdir(path)]
```
Ok, that's not so hard. But what if we wanted to list **only** the python scripts (the ones that end with *.py*) under that path.

```python
path = '/a/path/to/list/its/contents'
[os.path.join(path, x) for x in os.listdir(path) if x.endswith('.py')]
```
How about searching all directories one level below `path` for python scripts.

```python
scripts = []
for s in [os.path.join(path, d) for d in os.listdir(path) if os.path.isdir(os.path.join(path, d))]:
    scripts += [os.path.join(path, s, x) for x in os.listdir(s) if x.endswith('.py')]
```

Things get out of hand really quick. **Pathlib** creates an easy way to handle paths in python. 

In [20]:
from pathlib import Path

p = Path.cwd()
p

PosixPath('/home/thanos/projects/python')

Joining paths is much easier in pathlib.

In [21]:
print(Path(home) / 'projects' / 'python')  # join paths with a simple slash operator (/)
print(Path.home() / 'projects' / 'python')  # identify home folder instead of casting as Path
print(Path.home().joinpath(*['projects', 'python']))  # join paths from a list

/home/thanos/projects/python
/home/thanos/projects/python
/home/thanos/projects/python


While still supporting most of the functionality provided by `os`.

In [22]:
print(p.is_dir())    # checks if p is a directory
print(p.is_file())   # checks if p is a file
print(p.parts)       # splits the path into parts
print(p.absolute())  # gets the absolute path of p
print(p.parent)      # returns the parend dir of p

True
False
('/', 'home', 'thanos', 'projects', 'python')
/home/thanos/projects/python
/home/thanos/projects


Where pathlib shines is in identifying **glob** patterns. To better illustrate this lets have a look at our original directory. To list all files, we can use the `*` glob wildcard.

In [23]:
list(Path(original_dir).glob('*'))

[PosixPath('/home/thanos/Documents/Notes/python/tutorial/05_basic_tuple_dict_operations.py'),
 PosixPath('/home/thanos/Documents/Notes/python/tutorial/16_marplotlib_seaborn.ipynb'),
 PosixPath('/home/thanos/Documents/Notes/python/tutorial/test_file'),
 PosixPath('/home/thanos/Documents/Notes/python/tutorial/03_basic_string_operations.ipynb'),
 PosixPath('/home/thanos/Documents/Notes/python/tutorial/scr_args.py'),
 PosixPath('/home/thanos/Documents/Notes/python/tutorial/01_basic_data_types.py'),
 PosixPath('/home/thanos/Documents/Notes/python/tutorial/08_input_output.py'),
 PosixPath('/home/thanos/Documents/Notes/python/tutorial/custom_module.pyc'),
 PosixPath('/home/thanos/Documents/Notes/python/tutorial/12_exception_handling.ipynb'),
 PosixPath('/home/thanos/Documents/Notes/python/tutorial/06_logical_operations.ipynb'),
 PosixPath('/home/thanos/Documents/Notes/python/tutorial/03_basic_string_operations.py'),
 PosixPath('/home/thanos/Documents/Notes/python/tutorial/18_sklearn.ipynb'),


This lists everything under `original_dir`. Note that every path is absolute. If we want to find only the `.py` files we can instruct pathlib to look for the pattern `'*.py'`, which means *every file ending with `.py`*. 

In [24]:
list(Path(original_dir).glob('*.py'))

[PosixPath('/home/thanos/Documents/Notes/python/tutorial/05_basic_tuple_dict_operations.py'),
 PosixPath('/home/thanos/Documents/Notes/python/tutorial/scr_args.py'),
 PosixPath('/home/thanos/Documents/Notes/python/tutorial/01_basic_data_types.py'),
 PosixPath('/home/thanos/Documents/Notes/python/tutorial/08_input_output.py'),
 PosixPath('/home/thanos/Documents/Notes/python/tutorial/03_basic_string_operations.py'),
 PosixPath('/home/thanos/Documents/Notes/python/tutorial/06_logical_operations.py'),
 PosixPath('/home/thanos/Documents/Notes/python/tutorial/09_functions.py'),
 PosixPath('/home/thanos/Documents/Notes/python/tutorial/02_basic_numerical_operations.py'),
 PosixPath('/home/thanos/Documents/Notes/python/tutorial/10_classes.py'),
 PosixPath('/home/thanos/Documents/Notes/python/tutorial/04_basic_list_operations.py'),
 PosixPath('/home/thanos/Documents/Notes/python/tutorial/custom_module.py'),
 PosixPath('/home/thanos/Documents/Notes/python/tutorial/12_exception_handling.py'),
 Pos

To look for every `.py` file in the directories under `original_dir`, we could simply write:

```python
list(Path(original_dir).glob('*/*.py'))
```
Because we don't have any files that match this we will move up one level first.

In [25]:
list(Path(original_dir).glob('../*/*.py'))

[PosixPath('/home/thanos/Documents/Notes/python/tutorial/../theano/theano.py'),
 PosixPath('/home/thanos/Documents/Notes/python/tutorial/../tutorial/05_basic_tuple_dict_operations.py'),
 PosixPath('/home/thanos/Documents/Notes/python/tutorial/../tutorial/scr_args.py'),
 PosixPath('/home/thanos/Documents/Notes/python/tutorial/../tutorial/01_basic_data_types.py'),
 PosixPath('/home/thanos/Documents/Notes/python/tutorial/../tutorial/08_input_output.py'),
 PosixPath('/home/thanos/Documents/Notes/python/tutorial/../tutorial/03_basic_string_operations.py'),
 PosixPath('/home/thanos/Documents/Notes/python/tutorial/../tutorial/06_logical_operations.py'),
 PosixPath('/home/thanos/Documents/Notes/python/tutorial/../tutorial/09_functions.py'),
 PosixPath('/home/thanos/Documents/Notes/python/tutorial/../tutorial/02_basic_numerical_operations.py'),
 PosixPath('/home/thanos/Documents/Notes/python/tutorial/../tutorial/10_classes.py'),
 PosixPath('/home/thanos/Documents/Notes/python/tutorial/../tutori

We will explain what the two dots (`..`) are in a bit. Finally, for a recurrsive search (looking under **all** subdirectories under a certain path) we could use `.rglob()`:

```python
list(path.rglob(pattern))
```

# shutil

The shutil module offers a number of high-level operations on files and collections of files. In particular, functions are provided which support file copying and removal.  

The shutil module helps you automate copying files and directories. This saves the steps of opening, reading, writing and closing files when there is no actual processing.

```python
shutil.copy(source, destination)
```
Copies file from path `source` to `destination`.

In [26]:
import shutil

pt = os.path.join(home, 'projects', 'python')
shutil.copy('test', 'test2') # relative paths
shutil.copy(os.path.join(pt, 'test'), os.path.join(pt, 'test3')) # absolute paths
os.listdir('.')

['test3', 'test', 'test2']

There are also more specific `shutil` functions that allow us to copy a file's metadata, contents or permissions.

Another useful feature of `shutil` is the `copytree` function.

```python
shutil.copytree(src, dst, symlinks=False, ignore=None)
```
- `src` is the source path.
- `dst` is the destination path.
- If `symlinks` is `True`, symbolic links in the source tree are represented as symbolic links in the new tree, but the metadata of the original links is not copied; if false or omitted, the contents and metadata of the linked files are copied to the new tree.
- `ignore` is a list of paths (as returned from `os.listdir()`) that won't be copied.

In [27]:
shutil.copytree('../../projects', '../projects2')
# copies /home/thanos/projects to /home/thanos/projects/projects2
recurrsive_listdir(str(p.parent))
# prints all the directory tree along with it's contents in /home/thanos/projects 

/home/thanos/projects/projects2/python/test3
/home/thanos/projects/projects2/python/test
/home/thanos/projects/projects2/python/test2
/home/thanos/projects/projects2/python
/home/thanos/projects/python/test3
/home/thanos/projects/python/test
/home/thanos/projects/python/test2
/home/thanos/projects/projects2
/home/thanos/projects/python


First of all what we did with the `shutil.copytree()` command was to copy the whole `projects` directory to a subdirectory called `projects2` inside `projects`. 

Secondly, we saw a new way of referring to paths: the two dots `..`  
The two dots refer to the parent directory, or the directory one level up in the hierarchy.

So if our working directory is `/home/thanos/projects/python/`:
- `..` is `/home/thanos/projects/`
- `../..` is `/home/thanos/`
etc

With shutil we can also move a file or directory from a location `src` to another `dst`.

In [28]:
shutil.move('test3', '..')
# moves /home/thanos/projects/python/test3 to /home/thanos/projects/test3
print(os.listdir('..'))
# lists files and directories in /home/thanos/projects
print(os.listdir('.'))
# lists files and directories in /home/thanos/projects/python

['test3', 'projects2', 'python']
['test', 'test2']


We can confirm it has been moved.

This module also offers tools for removing directory trees.

In [29]:
shutil.rmtree(os.path.join(home, 'projects'))
# removes everything under /home/thanos/projects
os.chdir(home)

We can confirm it's deleted.

In [30]:
os.listdir('projects')
# The error is pretty self-explanatory.

FileNotFoundError: [Errno 2] No such file or directory: 'projects'

Another important module for automating computer tasks is the `zipfile` module that helps for compressing and extracting files to and from zip files.

# sys

This module provides helps with command line input/output operations.

In shell environments there are three ways the shell interacts with the user. We call these I/O connections **standard streams**:

- stdin: Is the standard input. This is what we type to the terminal.
- stdout: Is the standard output. This is what we see as output.
- stderr: Is the standard error. This is an output as well, but it is used for error messages.

## stdin

We have used `sys.stdin` in the past without knowing it! This module is called for the execution of `input()`. 

## stderr & stdout

These two can be used to manipulate the two output channels.

In [31]:
import sys
sys.stderr.write('This is an error message!\n')
sys.stderr.flush()
sys.stdout.write('This is an output!\n')

This is an error message!


This is an output!


Because the output and error messages are being buffered. We can use `.flush()` for forcing an output.

`sys.stderr` outputs are a lot of times accompanied by a termination of the script. The easiest way to do that is:
```python
sys.exit() # terminates the script
```

## Command line arguments

Probably the most common usage of this module is that it allows us to use command line arguments in our scripts.

This is done by utilizing `sys.argv`. This is a list that stores all arguments entered from the command line.

We have a script called *src_args.py*. This script contains the following code.

```python
from __future__ import print_function
import sys

print(sys.argv)
```

If we call a script like this:
<pre>python src_args.py arg1 arg2 arg3</pre>

Our ouput should be:
<pre>['scr_args.py', 'arg1', 'arg2', 'arg3']</pre>

Incorporating those arguments into our script depends on the situation and differs each time. It is a good practice to do a lot of checks to make sure that the user inputs what you expect to receive.

# argparse

`argparse` creates an easier interface for handling command line arguments. We won't go into much detail on this module, but it is very helpful when wanting to create scripts that accept command line arguments.

```python
import argparse

parser = argparse.ArgumentParser(description='a script that does this and that ...')

parser.add_argument('-v', '--verbose', action='store_true', dest='verbose',
                    default=False, help='print status messages to stdout')
# whenever the user types a '-v' or '--verbose' argument, a variable
# called verbose will become true. In any other case it will be False

parser.add_argument('-f', '--file', dest='filename', help='write report to FILE', 
                    metavar='FILE', help='path to the filename')
# if the user types '-f a_file_name', a variable called filename 
# will take the value specified by the user.

parser.add_argument('-n', '--number', dest='n', default=10, type=int,
                    help='a number')
# if the user types '-f 13', a variable called n will take the value n
# the default value is 10.

# if the user types '-h' or '--help' the help strings for each argument
# will be displayed

args = parser.parse_args()

args.verbose   # verbose argument
args.filename  # file argument
args.n         # n argument
```