# Learning Python: The file system

[![Binder](https://mybinder.org/badge.svg)](https://mybinder.org/v2/gh/berniehogan/introducingpython/main?filepath=chapters%2FCh.06.TheFileSystem.ipynb)
[![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/berniehogan/introducingpython/blob/main/chapters/Ch.06.TheFileSystem.ipynb)



Up until now we have only used Jupyter notebooks and stayed pretty closely in this environment. But we will need to branch out eventually. This involves learning a number of features about the operating system and how to interface with it. We will also learn how to create a file, write data to it and read the data from it. 

Windows, MacOS, Linux, Chrome or iOS, all computers have an __operating system__. The operating system runs programs, manages memory, and allocates computer tasks so that multiple programs can run at the same time. An operating system is what tends to differentiate modern computing from mechanical tasks. The OS is meant to be a general purpose means for accomplishing tasks. 

Virtually all operating systems operate on __[Von Neumann architecture](https://en.wikipedia.org/wiki/Von_Neumann_architecture)__. This means they will have an input device (typically a keyboard but could be any sensor), a processor that does calculations, memory (typically fast memory such as RAM and slow memory such as a Hard Drive for more long term storage) and an output device (often, but not necessarily a screen). Other architectures based on state machines and quantum computing exist but are often experimental and not quite relevant at the moment. 

Most operating systems these days are either Windows-based or \*nix-based (i.e., Unix or Linux). Macintosh used to have its own operating system kernel, but has switched roughly 20 years ago to Unix for Mac OS X. There are many differences under the hood between Mac and Linux but herein, we will focus on the many similarities related to file storage. These operating systems store files on a hard drive which can be accessed using the __file path__. 

## Navigating the file path with Python and in the terminal

Files, or more properly speaking __file indices__, are stored in hierarchical structures.  The topmost structure would be considered __the root directory__. Under the root directory would be child directories. Even though it can be characterised as a 'tree', we tend to use the terms children and parent. Perhaps think of it more like a family tree, a very peculiar family tree. 

### Note: File systems are different on \*Nix and Windows. 

Windows was a small operating system back in the day. It wanted to preserve backwards compatibility with earlier DOS systems and IBM systems, but the `/` was already used in systems as a 'switch' meaning it would not be easy (so it seemed) to use it to denote directory structures. Thus, __`\`__ was used instead since it looks similar. It didn't seem like a big deal at a time, but it has certainly become a nuisance for programmers ever since. Along with some other quirks, such as the drives as having letter names, means that there are a number of little differences between Windows and \*Nix users. 

In the code herein, I want to have notes that are as generic as possible across systems as well as notes that are 'robust'. So I will be using some Python libraries to ensure the code stays consistent. In particular we will be using the `os` module, as you will likely see a lot of code snippets that use this module. I will also be using the `pathlib` library, which has been available since Python 3.4. I think it is more tidy and coherent than the earlier `os` module, but it is less commonly used still. 

In the `os` library is a variable called `os.sep`. If you print it on Windows it will print a `\` and if you print on a \*Nix system it will print `/`. You can use this to build a string representing path that are operating system independent. However, a more robust way is to use `pathlib` to create a Path object, for example: 

~~~ python
from pathlib import Path
p = Path(".")
open(p / "file.txt")
~~~

Notice that if we initialise an object called `p` with a path of `"."` that means we are initialising it with a location of the current working directory. Then we can write in Python strings and `/` for folder dividers, yet the `/` here works regardless of windows or mac because it is not literally a `"/"` character. It is outside the string and represents _folder divider_ more generally.  

### What is a root directory and how do I get there?

If we have a hierarchical structure, we can say that there is something at the top and other things below (or vice versa). So our root directory contains other directories, which can contain more directories themselves. On \*Nix, the topmost directory for the OS is usually simply `/` or _"root"_. For windows there is no single topmost directory, but rather a set of drives. The operating system lives on the `C` drive. So you might have `C:\Program Files\Mozilla\Firefox.exe` as a path to a browser. Whereas on Mac it would be `/Applications/Firefox.app`.  

Notably, in Windows, if you use the _PowerShell_ which is a souped up terminal, you can use the \*Nix based commands and backslashes. I recommend using the PowerShell for Python or using the 'Ubuntu shell', which is like running Linux on your Windows computer. 

To navigate there you can also use a terminal inside Jupyter Lab. If you select a new Launcher from the side (either the + button, or 'new Launcher' from the 'File' menu), then you can select a new Terminal session. Do that and marvel at how plain the terminal looks. It should look something like this: 

~~~ zsh
(base) work@MacBookAir ~ % 
~~~
Then to the right of the % would be a blinking cursor, which is where you navigate the terminal by typing commands. First let's find out where we are. 

> As a tip for this section. If you are working on Jupyter Lab, you can drag the tabs at the top so that you can have multiple windows open, so that you can have these notes and a terminal window side-by-side. 

The first thing we are going to want to do is discover where we are in the directory structure. This is the 'current working directory'. For a Mac you would type in `pwd` or 'Print Working Directory'. For me, since I named this account 'work' my current working directory is:
~~~ zsh
(base) work@Bernies-Air ~ % pwd
/Users/work
~~~
This might be considered 'the default directory'. I can use `~` to substitute for this directory, as in `~/Documents` to refer to `/Users/work/Documents`. 

In this notebook, I have a different working directory, which is the directory that this notebook was in when I run it. Watch below how to get the current working directory of the notebook:  

In [None]:
import os 

print( os.getcwd() )
print("The separator on this computer is: ", os.sep)

This means that if we were to do something like create a file in Jupyter then the file will, by default, be written to this specific directory. Watch that happen below: 

In [None]:
fileout = open("example_file.txt", 'w')
fileout.write("Here is the example file!\n")
fileout.write("You can open it in a text editor, too.")
fileout.close()

You should now see a new file named 'example_file.txt' in the same directory as these notes. To write it to a different directory, you can specify the _absolute path_ to that directory. So in my case, since I know that I have an account called `work`, then I can create a file under that directory using the following: 

In [None]:
fileout = open("/Users/work/example_file2.txt", 'w')
fileout.write("Here is another example file!")
fileout.close()

That is a pretty bad form, for two reasons. 
1. You probably do not have a `/Users/work/` path on your computer. So when you run the above you would get a `FileNotFoundError`. 
2. Writing example files all over your computer will create a mess. 

To solve the second problem, I will first clean up my file with the help of `os.remove`, which will delete a file given the path name, or throw an error if the file is not found.  

In [None]:
try: 
    os.remove("/Users/work/example_file2.txt")
except FileNotFoundError:
    pass

To solve the first problem is a little tricker, since I want a solution that will work for both my computer and yours. The first thing we can consider, instead of _absolute_ paths are _relative_ paths. The simplest relative path is `.`, which means "here". So if you see `./file.txt`, that's the same as the file underneath this directory. If you see `..` that means the _parent_ directory. In my case, since this is in a folder called `Python`, under `2021MT`, if I wanted to create a new file in the parent directory, `/Users/work/OneDrive - Nexus365/Teaching/2021MT/`, I could write: 

In [None]:
fileout = open("../example_file2.txt", 'w')
fileout.write("Here is another example file!")
fileout.close()

Where this comes in handy is in having a folder for data separate from your notes. For my data courses, I recommend having a folder structure like the following: 

~~~
<course_name> 
  |- notebooks
  |- data
  |- output
  |- other
~~~

It's always a challenging moment when I have to help a student who has their entire course (and most others) running as notebooks from their downloads folder. It might seem like a good idea for the first file, but after a dozen or so files things will get messy and lost. But importantly, then even if you are working from multiple different files, you always know where to put your data and your output (like charts and tables).

This [StackOverflow](https://stackoverflow.com/questions/2632199/how-do-i-get-the-path-of-the-current-executed-file-in-python?lq=1) conversation discusses many of the tricky aspects of getting the path of a specific Python file. 

## How to navigate the file system through a terminal. (*Nix edition)

This section will be done in a terminal window, so you'll have to switch back and forth. If I write a command it will be preceded by `$` and if I write the expected output, it will be preceded by `>`. So we will want to open a terminal and type `ls`. As in don't type the \$, just:  

~~~
$ ls
~~~

That should list all the files in the current working directory _for the shell_. Notice that it is probably not the same directory as the one you saw above. But let's navigate to our current Python working directory. To do this we use `cd <desired directory>`. In my case (since I have learned the location above) it is: 

~~~
$ cd /Users/work/OneDrive - Nexus365/Teaching/2021MT/Python
~~~

### To shorten the prompt 
The prompt might be showing you the very long path to your directory, making it hard to type commands. To shorten it down to just the `$` type the following at the prompt: 

~~~
$ PS1='<label>$'
~~~

You can then confirm you are still in the directory you expected with `pwd`. 

### Making a new file

You can make a new empty file here by typing: 

~~~
$ touch temp_file.txt
~~~

### Removing a file.

You can delete a file with `rm` command. 

~~~
$ rm temp_file.txt
~~~

You can use * to pattern match in the shell. Thus, you can delete multiple files that match a pattern with 

~~~
$ rm *.txt
~~~

### Making a new directory
You can make a new directory by typing `mkdir` followed by the directory name. 

~~~
$ mkdir <directory name>
~~~

### Removing a directory

Directories have files in them. This means that on Windows they cannot be removed in the shell without also removing the files. 

~~~
$ rm -r <directory_name> 
~~~

will only work on an empty directory. If it has files you will need to add the -R (or recursive) argument after the file name. 

~~~
$ rmdir -r <directory_name>
~~~

### The trouble with spaces 

You might notice that my file names do not have spaces. This is because when you are entering commands in a terminal, space is considered a break. So a file named `FSDS Week1 Lecture1.txt` would be considered `FSDS` as the file name by the terminal. In order to operate on that file you have to encase it in quotes, which is a nuisance. Such as: 
    
~~~
$ rm "FSDS Week1 Lecture1.txt"
~~~

## How to navigate the file system through PowerShell (Windows edition)

This is a rewritten section to reflect the fact that the Windows PowerShell should now include Python with the Anaconda install and even launch Jupyter Lab. Where possible, I would stringly encourage you to use the PowerShell over the Anaconda Prompt or the standard `cmd` command line. You can even run Jupyter directly in the PowerShell by typing `jupyter lab` directly in the `Anaconda Power Shell`. 

If I write a command it will be preceded by `>` and if I write the expected output, it will be preceded by `|`. So we will want to open a console or 'command line window' and type `dir`. As in don't type the `>`:  
~~~
> dir
~~~ 

or on PowerShell only 
~~~
> ls
~~~

That should list all the files in the current working directory _for the shell_. Notice that it is probably not the same directory as the one you saw above. But let's navigate to our current Python working directory. To do this we use `cd <desired directory>`. 


### To shorten the prompt 
The prompt might be showing you the very long path to your directory, making it hard to type commands. Please note that unlike in the Command Prompt, shortening this name, to the best of my knowledge, requires editing the PowerShell profile. If you wish to do this, [this StackExchange conversation is useful](https://superuser.com/questions/446827/configure-windows-powershell-to-display-only-the-current-folder-name-in-the-shel). 

### Making a new file

You can make a new file here (just the file name) by typing: 

~~~ 
> $null > temp_file.txt
~~~

In this case, \$null is the empty character (meaning send 'nothing'). Normally it would send it to standad out (i.e. to the terminal screen). But by using `>` in the terminal we are saying send it to a file. 

### Removing a file.

You can delete a file with `Remove-Item` or `rm` command. 
~~~ 
> Remove-Item temp_file.txt
~~~

You can use * to pattern match in the shell. Thus, you can delete multiple files that match a pattern with 
~~~
> Remove-Item *.txt
~~~

This [Microsoft help page](https://docs.microsoft.com/en-us/powershell/module/microsoft.powershell.management/remove-item) goes into greater detail on how to remove items under a variate of conditions. 

### Making a new directory
You can make a new directory by typing ```mkdir``` followed by the directory name. 
~~~
> mkdir <directory name>
~~~

### Removing a directory

Directories have files in them. This means that on windows they cannot be removed in the shell without also removing the files. 

~~~
rmdir <directory_name> 
~~~

will only work on an empty directory. If it has files you will need to add the /s argument after the file name. 

~~~
> rmdir <directory_name> /s
| <directory_name>, Are you sure (Y/N)? Y
~~~


### The trouble with spaces 

You might notice that my file names do not have spaces. This is because when you are entering commands in a terminal, space is considered a break. So a file named ```FSDS Week1 Lecture1.txt``` would be considered ```FSDS``` as the file name by windows. In order to operate on that file you have to encase it in quotes, which is a nuisance. Such as: 
~~~
> rm "FSDS Week1 Lecture1.txt"
~~~

However on PowerShell if you type the first character it will then automatically encase it with quotes for you.

# Writing and reading files with Python

## Creating files by creating a file 'opener'

In Python, you typically interact with a file by creating a __file opener__. You do this by calling the ```open()``` command. This command has a number of arguments that determine why you're opening the file: to read? to write? to append? If you forget the argument the default is to open the file for reading. Below we will:
- Open a file for writing; 
- Check if we wrote to it correctly by opening it for reading;
- Open it for appending;
- Check if we did this correctly by reading it again;
- Mercilessly write over our work with more text.

Pay attention to the argument in the `open()` function. For writing it is `'w'`, for reading it is `'r'` and for appending it is `'a'`. There are others as well, but we won't be using them today. They are primarily for __bytestrings__ which is relevant if you are writing image data or other streaming data rather than characters. You can review those in the [doc strings for the open command](https://docs.python.org/3.3/library/functions.html#open)

In [None]:
# Here is the first line from the Tao Te Ching (trans. Stephen Mitchell) 
# It reminds us that in life we can only give guidance but not specific instructions. 

str_to_be_written = '''The tao that can be told
is not the eternal Tao.
The name that can be named
is not the eternal Name.'''

# Writing: 
fileout = open("example_tao.txt",'w')
fileout.write(str_to_be_written)
fileout.close()

# Did it work? Let's open up the file and find out: 
filein = open("example_tao.txt", 'r')
print( filein.read() )
filein.close()

In [None]:
# So far so good, but the second line is also very useful. It reminds us 
# that operationalisations are always an approximation. 
# The real world is unnamed; it simply is. We create names for our use.

str_to_be_appended = '''
The unnamable is the eternally real.
Naming is the origin 
of all particular things.'''

# Appending: 
fileout = open("example_tao.txt",'a')
fileout.write(str_to_be_appended)
fileout.flush()
fileout.close()

# Did it work? Let's open up the file and find out: 
filein = open("example_tao.txt",'r')
print( filein.read() )
filein.close()

### Note: Remember to flush! 

Python might send things off to be written to disk and act as if its job is complete. The operating system might be busy writing other things to disk, however. This means that the operating system could, in some instances, not have written the charaters to disk when you assume it has. By writing ```fileout.close()``` we ensure that everything has flushed before we move on. But if you are writing a very big file and are worried you can also periodically use ```fileout.flush()```.

### Note: Be careful. You can actually damage things with Python. 

The ```os``` package is pretty dangerous as you can literally delete files without question. Python, like most programming languages, will __clobber__ a file name. To clobber means that it will overwrite a file without asking you first. In actuality, the file hasn't disappeared, but the pointer to that file is lost to the operating system. But in practice, especially on encrypted compuuters, that means it's lost once you overwrite it. __Also...encrypt your computer!__ That means FileVault on Mac and BitLocker on Windows. 

## Reading files in a loop

Often times you will want to read a file line by line rather than use __read()__ which dumps the whole file into memory. To do this, you can use the file opener in a loop, as files, like lists and dictionaries, are __iterable__. Try the following: 

In [None]:
filein = open("example_tao.txt",'r')

for i in filein: 
    print (i)

That seemed to print every line and then a space, unlike what happened above. Why is that? It's because it prints the _entire line including the new line character at the end_. Remember from day 1 that we can remove characters from a string using `[:-1]`. We can use this to remove the last character. However, sometimes that doesn't work as intended (if there's a `\r\n` for example, which is often the case with excel documents). Luckily, there's a string method called __strip__ for removing whitespace characters from the ends of a string. As with most methods (outside of those pesky lists), it _returns_ the cleaned string rather than altering the variable in place. 

To remove whitespace from both sides: 

~~~
newvar = strvar.strip()
~~~

To remove it only from the left: 

~~~
newvar = strvar.lstrip()
~~~

But what we _really_ want is to remove the new lines on the right:

~~~
newvar = strvar.rstrip() 
~~~ 

In [None]:
filein = open("example_tao.txt",'r')

for i in filein: 
    print (i.rstrip())

print(filein.closed)

Now while this works, it is not necessarily the most robust or Pythonic way to open a file. For example, we have created a file opener, but we haven't closed it when we finished. Generally, you'll want to close the file when you're done with it using `<filein>.close()`. However, if you are doing something where you are reading the file line by line, you can condense this by using a `with` statement, such as the following. The with statement will automatically close the file when you exit that block of code. 

In [None]:
with open("example_tao.txt", 'r') as filein: 
    for line in filein:
        print(line.rstrip())

print(filein.closed)

# Running Python in the shell (and Python programs)

## Running Python in the console.

Python consoles are available for you on your computer right now. There are at least two! One of them is in JupyterLab and the other is in the shell (sometimes a shell is called a prompt or a CLI for command line interface). 

To open the Python console in Windows you should be running the PowerShell. On Mac you would run it through the terminal. You should be able to simply type `python` into either shell and receive a header that looks similar to the following: 

~~~
> python
Python 3.8.3 (default, Jul  2 2020, 11:26:31) 
[Clang 10.0.0 ] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>
~~~

However, it is a little bit slicker to run ipython. It gives you extra colours, tips, etc...

~~~
>ipython

Python 3.8.3 (default, Jul  2 2020, 11:26:31) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.16.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]:
~~~

Fun fact: ipython is the ancestor of JupyterLab. 

## Interacting with the Python shell. 

Regardless of which Python console you use, you can now interact with it by tying Python commands one by one and seeing their output

Sometimes this is really nice if you just want to test a single line of code or two. Try typing 

~~~
for i in range(10): 
~~~

And see what happens. Notice that it doesn't execute, because it knows that there's going to be another line after the colon. 

You an also use the up and down cursors to navigate through previous statements. 

## Creating a Python program to run.  

If you give ```python``` a Python filename as an argument then it will execute that file rather than go to the console. So, since we know how to create files, let's create an incredibly simple one "test.py" then run that file and see what happens. First run the code in the block below then copy then type the following in the console (without the '>' of course). 

~~~
> python test.py
~~~

Of course you will have to navigate to the directory of this file. Remember, you can use ```os.getcwd()``` as a quick way to determine this directory. 

In [None]:
# filetext is a string that we write to a program. 
# Normally you would edit a program in a text editor, such as SublimeText or Notepad++ 

import os

filetext = '''print("Hello, yet another world.\\nLet's count to ten!\\n")
for i in range(10): print(i)
'''

fileout = open("test.py",'w')
fileout.write(filetext)
fileout.close()
print("The file can be found by navigating with the following command:\n\ncd \"{}\" ".format(os.getcwd()))

## Running the Python program with file arguments. 

A file that you run from the command line might have some arguments that you want to send it. Imagine that it cleans up mp3 tags, you might want to send it the folder of mp3s. In this case we will just give it a word and then print that word, because we're keeping it simple here. 

Now the thing about the arguments is that Python can only read them if you use a special command from the ```sys``` module. 

Below I'm going to write a file that takes arguments, then lets switch over to the console and run it. 

In [None]:
# filetext is a string that we write to a program. 
# Normally you would edit a program in a text editor, such as SublimeText or Notepad++ 

filetext = '''import sys

print('Number of arguments:', len(sys.argv), 'arguments.')
print('Argument List:', str(sys.argv))'''

fileout = open("example_argv.py",'w')
fileout.write(filetext)
fileout.close()
print("The file can be found by navigating with the following command:\n\ncd \"{}\"".format(os.getcwd()))

Now go to the console and navigate to that folder. Then you can run the command below (omitting the '>'). It will work if there's a `example_argv.py` file in that folder. 

~~~ bash
> python example_argv.py arg1 arg34 YetAnotherArgument "is this also an argument?"
~~~

You can see that if you encase words in quotes it is counted as one argument. Then the program will be able to access these as a list of arguments in the `sys.argv` object. The first one (`sys.argv[0]`) will be the file name. The next few will be the argument strings written in the commend.  

## The 'main' statement 

A python program can be imported into other programs. It can have a series of functions or just a line of data. For example, if you type a program with `x = "potatoes"` and save it as `veggies.py`, then you can use the following:
~~~
import veggies

print veggies.x
> potatoes
~~~

However, sometimes you will want to ensure that the program that you pass to python is an executable program, and in that case, if so, you should do some things in particular. Thus, you will see many Python scripts with the following syntax in them: 

~~~
if __name__ == "__main__":
        <additional code>
~~~

For example, the `veggies.py` might have a series of useful methods about slicing and dicing veggies. You might want to import them into your food processing program. However, if you run veggies.py as a standalone program from the terminal you would expect it not only to load those modules but to run whatever is included under `"__main__"`. 

Try yourself to rename the files en masse using the `os.rename()` method. Be careful only to rename the ones you want to! 

# Navigating in Python effectively with `pathlib` 

`pathlib` is a library meant to streamline the way paths are managed in Python. To remind, paths are the names for a file location. Paths become 'objects' in Python. Below are some features of `pathlib` and some examples of how to navigate with this library. The most important to remember is that we import a `Path` object. Then we give that object a location. This location may or may not exist on your directory, but it is still a Path object. You could instatiate `Path('dfasdfjlsfgg/fsadgag')` but it would not work if you try to search there for a file. It might either return `None` or `FileNotFoundError` or other error. This is helpful because we often want to check whether a directory exists and if not, then to create it. 

Fortunately, the method `Path().exists()` is helpful as it returns `True` or `False`. So if you were on Windows, almost certainly `Path('C:\').exists()` would return `True` and on Mac `Path('/Applications').exists()` would similarly return `True`. `/Applications` is a string that refers to the directory. Path(`/Applications`) is an object. 

Path(`.`) is a special object that refers to the current directory.

In [None]:
from pathlib import Path 

print(Path('/Applications').exists())
print(Path('dfasdfjlsfgg/fsadgag').exists())

## The current directory 
The 'current' or current working directory is the directory from which commands are run and where files are stored. So if you type `open('fileout.txt','w')` then it will create a file in this working directory with the name `fileout.txt`. You can change the working directory or navigate to other directories, but by default we act as if the directory that runs the code is the working directory. 

To display the working directory in the terminal or PowerShell, you can type `pwd` ("print working directory"). To capture it in Python using the `os` module you would write `os.get_cwd()`.  To get it using a Path object, you would write `Path.cwd()`. 
Let's see all of these in action one after the other: 

### Using the `os` module

In [None]:
# This may or may not work on windows. If it does not, please continue. 
import os 

print(os.popen('pwd').read())

Notice that we did not use `os.system` to run the command in the terminal. If you run a temrinal command in `os.system` it will run the command but it will only return a `0` for successful or `-1` for unsuccessful. Since we wanted the result from the terminal we can use `os.popen` which opens a pipe from the terminal to here, so the result of the terminal gets piped in to Python. Then we read that result. It's a bit overcomplicated, which makes sense that Python would find ever simpler ways of doing it. 

The first is to use the `os` module directly. Notice that it is now cwd (for current working directory) and not pwd (for print working directory). Because this is an old, old command dating back to Python 1.0, it doesn't use underscores like more recent commands.

In [None]:
print(os.getcwd())

This works well, but there is one problem with this approach. What gets returned is not a path, per se, as in a thing that you can navigate, but a string that represents the _address_ to that path. Just watch:  

In [None]:
result = os.getcwd()
print(type(result))
print(result.split(os.sep))

We used `os.sep` since that will be the correct separator, whether it is Windows or Mac, but it split the string. 

### Using `pathlib` 
You see you can transform this path just like a string. What might help us is to have a path as an object where we do things like navigate the folder structure. Then you can ask that object for the `stem` (meaning the part of the path with the filename) or add to it using the directory separator `\`. You can check what is the directory 'above' this one or navigate to a directory below. Notice that this directory separator works the same on Windows and Linux. Let's see that below:

In [None]:
from pathlib import Path 

print(Path.cwd()) 

In [None]:
Path.cwd()[:5]

## Features of Path

The working directory can now be an object now with it's own methods. In fact, it's a good place to explore what an object does. For that you can use the 'directory' command or `dir`. It will return both system methods (that begin with `_` and regular methods. Below I'll use a list comprehension to filter to the non-system methods and display them.

In [None]:
curdir = Path.cwd()
print([mthd for mthd in dir(curdir) if mthd.startswith("_") == False])

### Listing path objects

A path can refer to either a directory ( or a file. A directory contains files and other directories. If directory `A` contains directory `B`, we would say `A` is the parent directory and `B` the child directory. 
- `<pathobject>.is_dir()` will return `True` if the path is a directory and false otherwise. 
- `<pathobject>.is_file()` will return `True` if the path is a file and false otherwise. 

If the file refers to a directory then you can list the files in the directory with the command `iterdir()`. Let's observe these below: 

In [None]:
if curdir.is_dir():
    for c,pp in enumerate(curdir.iterdir()):                                  
        if c > 5: break
        print(pp.name)

If you want to check for a specific kind of file, you can use the `glob` command, which refers to `global`. There is also the full `glob` module, but nowadays I actually recommend switching to `<pathobject>.glob()` instead. So I strongly suspect that unless you changed your working directory, this notebook is in there and will end in `.ipynb`. Do you have other notebooks in the same folder? Let's inspect with a wildcard search: 

In [None]:
for i in curdir.glob("Ch.0*.ipynb"):
    print(i.name)

### Parts of a path 

What do you call the specific parts of a path? You can query for the `parent`, `stem`, `suffix`, and the `name` (unshown, but it's `stem` + `suffix`). 

In [None]:
for i in curdir.glob("Ch.0*.ipynb"): 
    print(i.stem, i.suffix, sep=" >>> ")
else:
    print(f"These are in:{i.parent}")

The parent then is both the directory above and a way to get the part of the path address other than the file name. Thus, you can navigate to the folder above this one and discover what files are in there, too.

In [None]:
for i in curdir.parent.iterdir(): 
    print(i.name);
else:
    print(f"These are in: {i.parent}")

## Recursion: A thorough wildcard search

A wildcard search in one directory will just list the files and directories in that main directory. But what about the files _underneath_ those directories. There are a few ways to approach this. One uses a technique called recursion. We can see this in the exercises for next week once we learn how to write functions. 

For now, we can skip that with a very clever wildcard: `**/*`

In [None]:
for i in curdir.glob("**/*"): 
    if i.is_dir(): 
        print(i)
    else:
        print("FILE_REDACTED", i.suffix,sep=None)

The way that operation worked was to use 'recursion'. This meant that it would do the same operation within itself. So for each pathobject, if it was a file it would just return it, but if it was a directory it would then start again, list a file if it was a file, but if it was a directory it would start again, and so forth. One it ran out of directories to list it finishes.

## Changing files and directories with `pathlib`

It is a common practice to create a directory in which you store files, or to check if a directory exists and only try to create it if it does not exist already. You can do this with `mkdir(<directory_name>)`. Let's create a directory below this one. And then we will destroy it.   

In [None]:
(curdir / "temp").exists()

In [None]:
if (curdir / "temp").exists():
    (curdir / "temp").rmdir()
    print("I removed a directory called temp")
else:
    (curdir / "temp").mkdir()
    print("I made a directory called temp")

# Conclusion 

So you might be a little terrified at just how much you can get away with in Python. You can read and write files all over your computer. Your operating system will sandbox some of this but you really are out in the wild here with the ability to read and alter files on your computer. Hopefully, this skill will help you think of small Python projects you might do on your own. 

This is pretty much it for the start of the journey. With these skills you can set out to learn in a variety of directions. The next chapter signposts some places to go. And to remind as usual, in the appendices are exercises. There are no specific exercises for this chapter, but this chapter will really benefit you for the longer exercises which do ask you to think about writing files as well as running scripts. 