<img src="assets/jeremy-lapak-CVvFVQ_-oUg-unsplash.png" alt="Python Envs" style="display: block; margin: 0 auto" />

# Learning Python 10 minutes a day #15
## Working with files and paths in Python
[Medium article link](https://towardsdatascience.com/learning-python-10-minutes-a-day-15-50523202db27)

This is a [series](https://towardsdatascience.com/tagged/10minutespython) of short 10 minute Python articles helping you to get started with Python. I try to post an article each day (no promises), starting from the very basics, going up to more complex idioms. Feel free to contact me on [LinkedIn](https://www.linkedin.com/in/dennisbakhuis/) for questions or requests on particular subjects of Python, you want to know about.

We have already learned the basics of handling files. Using the open() function we can open or create a file object and using this file object we can read and write a text file. But what if we need to access multiple files? The data that we need to process is often not in a single file but spread over multiple files or even multiple directories. To work with files and paths, there are a couple of standard library packages we can use. A couple of years ago I would always use the glob package for file listings and the os package for file system actions like deleting files. Nowadays, I mostly use the pathlib package from the standard library, which combines the previous two and is platform independent. The pathlib package introduces the Path() class, an object to represent a path or file.

To use the Path() class, we need to import it. Next week we will spend a session on what importing actually does. For now, we will just use the import statement to add the additional functionality to Python. We do this using the special from … import … statement, which imports a selective part into the current namespace (don’t worry, we will discuss those terms next week). Lets have a look at a couple of examples:

In [None]:
from pathlib import Path

# specially named paths
cwd = Path.cwd()
home = Path.home()
print('current directory:', cwd)
print('home folder:', home)

# a specific file
path = Path('test.txt')
if not path.exists():
    print(f'{path} does not exist')

# there are many related characteristics:
print(path.name)
print(path.stem)
print(path.suffix)
print(path.absolute())  # returns a new Path object
print(path.absolute().parent)  # returns a new Path object
print(path.absolute().parent.parent)  # I think you get it

# you can directly open the file
with open(path, 'w') as f:
    f.write('PathLib is amazing!\n')
if path.exists():
    print('yup, the file exists now!')
    
# you can get file statistics (times, sizes, permissions, etc.)
stats = path.stat()
print(f'The file is {stats.st_size} bytes!')
import time  # time is used to convert system time to a date/time string
print('File is created on', time.ctime(stats.st_mtime))

# file sytem operations
path.rename('test2.txt')  # the file is deleted, but the Path object stays 'test.txt'
path.touch()  # create an empty file (called test.txt)
path.unlink()  # to delete a file, use unlink; to delete empty folders rmdir()
path = Path('test2.txt')
if path.exists():
    path.unlink()

As you can see, the Path() class is very convenient. It comes with a lot of information of the path or file. For example it has a attribute for the stem filename or the extension. Many methods are built-in to extract for example information on creation date, file size, checks for existence, if it is a file or directory, and many more. Also, many file system operations or available. Some of these operations return a new Path() object. All the examples are in my opinion relatively easy to understand. There are some small subtleties which are nice to know. The Path() object is an immutable data type. This means that you can use it as dictionary keys. This is most probably also the reason why the path in the object does not change when you rename a file. In the next examples, we will show how to work with multiple files.

In [None]:
from pathlib import Path

# create a pointer for a directory
path = Path('temporary_directory/')  # Path removes the final '/'

# the is_dir() method checks the actual file system
print(f'{path} is a directory: {path.is_dir()}')
print(f'{path} is a file: {path.is_file()}')
path.mkdir(exist_ok=True)  # with exist_ok=True, it is fine when the dir exists
print(f'{path} is a directory: {path.is_dir()}')
print(f'{path} is a file: {path.is_file()}')

# let us create ten files
for number in range(10):
    file_path = path / f'some_file_{number + 1}.txt'
    file_path.touch()

# now lets find all .txt files and iterate over them
for filename in path.glob('*.txt'):
    print('filename:', filename.stem)
    filename.unlink()

# delete an empty directory
path.rmdir()

To work with multiple files, pathlib has incorporated the .glob() method. It is identical to the glob package where you can find all names according to a pattern. In this example we are finding all files ending with the .txt extension. The result is a list of Path objects, which in their turn open all the previously discussed options. You might notice that the files are not neatly ordered but in the order they occur in the file system. Other things which you might notice is the ‘/’ operator. Path() objects have the ‘/’ operator defined such that you can concatenate paths. We do this in our example with a directory (a Path() object) and a filename (a string).

Pathlib combines many packages from the standard library into a single object to do practically any file system handling. Therefore, I would highly recommend using this way, instead of the older standards using the os package. While those methods are fine too, the pathlib gives you a much neater format, with almost all power you need close at hand.

## Practice for today:
Path() has many great use cases and today we will practice one as well. I have provided a zip-file with a folder of files ([here](https://www.dropbox.com/s/k5td7mfyio8rbis/data_files.zip?dl=1)). The files in the folder are of the type ‘word’ and each is a text-file containing a single word.

### Assignment:
Use pathlib to iterate through all files and read all the words. There is one problem: the glob result is not sorted in ascending order. Therefore, you have to order the words in the ascending order from the filenames. Can you print the words in the right order? You might find a cheesy joke ;-).

### Hints:
1. Use glob to get a list of files.
2. When iterating over each file, store the words in a dictionary, using the file number as a key. You can extract the file number from the stem of the file and using the .split() method. Make sure to convert the number to an integer.
3. Next you have to iterate over the sorted keys and print each word with a space (‘ ‘) as end character (with the end=’ ‘ parameter in print()). Below is an example on how to sort dictionary keys, as we have not yet discussed sorted().

In [None]:
my_dict = {4: 'ba', 6: 'bing', 5: 'da', 8:'!!!'}
for key in sorted(my_dict.keys()):
    print(my_dict[key], end=' ')

I have posted a [solution](https://gist.github.com/dennisbakhuis/a5535ef28c398e8266d8fad810bf4989) on my Github.

If you have any questions, feel free to contact me through [LinkedIn](https://www.linkedin.com/in/dennisbakhuis/).