<div style="text-align:left;font-size:2em"><span style="font-weight:bolder;font-size:1.25em">SP2273 | Learning Portfolio</span><br><br><span style="font-weight:bold;color:darkred">Files, Folders & OS (Need)</span></div>

# What to expect in this chapter

There is no choice; you must always interact with your operating system (OS) to get anything done. 

This particularly applies to programming, as you must communicate with the OS to create, modify, move, copy, and delete files and directories (folders). 

This section is devoted to getting you up-to-speed with some Python modules (e.g., <span style="color:purple">os</span>, <span style="color:purple">os</span>, <span style="color:purple">shutil</span>) that will allow you to execute these necessary actions. 

We will learn how to write code that will seamlessly run on both macOS and Windows.



# 1 Important concepts

These are some concepts you need to navigate the OS efficiently.

In what follows, please note that I use the terms **folder** and **directory** interchangeably; they refer to the same thing.

## 1.1 Path

When dealing with computers, you will often encounter the term ‘<span style="color:orange">path</span>’. 

The path is simply a way to specify a location on your computer. 

It is like an address, and if you follow the path, it will take you to your file or folder.



Like specifying location, you can specify your path **absolutely** or **relatively**. 

So, for example, I can specify that SPS is located on level 3 of block S16. However, if I am already on Level 5 of S16, I can say, go two floors down. The former is an absolute path, and the latter is relative. I have always found it easier to use relative paths, especially if I later want to move my folders about.



**Remember that the path tells us how to find a file or folder and that you can specify it absolutely or relatively.**



## 1.2 More about relative paths

When dealing with relative paths, you will find it helpful to know <span style="color:purple">.</span> and <span style="color:purple">..</span> notation.



| Notation | Meaning          |
|----------|------------------|
| `.`      | 'this folder'    |
| `..`     | 'one folder above' |


<span style="color:purple">.\data-files\data-01.txt</span> means the file <span style="color:purple">data-01.txt</span> in the folder data-files in the current folder.

<span style="color:purple">..\data-files\data-01.txt</span> means the file <span style="color:purple">data-01.txt</span> in the folder data-files located in the folder above.



Remember <span style="color:purple">.</span> means **current folder**, and <span style="color:purple">..</span> means **one folder up**.



### macOS or Linux

macOS and Linux allow you to use <span style="color:purple">~</span> to refer to your home directory. So, for example, you can access the <span style="color:purple">Desktop</span> in these systems ‘relatively’ with <span style="color:purple"> ~/Desktop</span>. 

So, I can look for a file in my Desktop using:



In [3]:
~\Desktop\data-01.txt

SyntaxError: unexpected character after line continuation character (3057177757.py, line 1)

## 1.3 Path separator

Today’s major OSs (Windows, macOS, Linux) offer similar graphical environments. However, one of the most striking differences between Windows and macOS (or Linux) is the <span style="color:orange">path separator</span>.

Windows uses <span style="color:purple"> \ </span> as the path separator 

while macOS (or Linux) uses <span style="color:purple">/</span>. 

So, the absolute path to a file on the Desktop on each of these systems will look like this:

In [4]:
/Users/hazel/Desktop/data-01.txt

SyntaxError: invalid decimal literal (2560216805.py, line 1)

If you want to share your code and want it to work on both systems, you must not **hardcode** either path separator. 

Later, I will show you how to use the Python <span style="color:purple">os</span> package to fix this problem.



## 1.4 Text files vs. Binary files

You can think of all files on your computer as being either **text files** or **binary files**. 

Text files are simple and can be opened, and their contents examined by almost any software (e.g., Notepad, TextEdit, Jupiter,…). 

Examples of text file formats are <span style="color:purple">.txt</span>, <span style="color:purple">.md</span> or <span style="color:purple">.csv</span>.

Binary files, in contrast, require some processing to make sense of what they contain. 

For example, if you look at the raw data in a  <span style="color:purple">.png</span> file, you will see gibberish. 

In addition, some binary files will only run on specific OSs. For example, the  <span style="color:purple">Excel.app</span> on a Mac will not run on Windows, nor will the <span style="color:purple">Excel.exe</span> file run on macOS (or Linux). Some reasons for having binary files are speed and size; text files, though simple, can get bulky.



## 1.5 Extensions

Files are usually named to end with an <span style="color:orange">extension</span> separated from the name by a <span style="color:purple">.</span> like <span style="color:purple">name.extension</span>. 

This <span style="color:purple">extension</span> lets the OS know what software or app to use to extract the details in a file. 

For example, a <span style="color:purple">.xlsx</span> means use Excel or <span style="color:purple">.pptx</span> means use PowerPoint. Be careful about changing the extension of a file, as it will make your OS cough and throw a fit. If you don’t believe me, try changing a <span style="color:purple">.xlsx</span> to <span style="color:purple">.txt</span> and double-click.



# 2 Opening and closing files

Let’s look at how we can open a file for reading and writing. 

We will see a slightly advanced but better way of doing this by using the <span style="color:purple">with</span> statement (called a context manager).

## 2.1 Reading data

In [5]:
with open('spectrum-01.txt', 'r') as file:
    file_content = file.read()

print(file_content)

Light Intensity, Ch A vs Actual Angular Position, Run #4
Actual Angular Position (  )	Light Intensity, Ch A ( % max )
0.000	-0.2
0.000	-0.1
0.000	-0.1
0.000	-0.1
0.000	-0.1
0.000	-0.2
0.000	-0.1
0.000	-0.1
0.000	-0.1
0.000	-0.2
0.000	-0.1
0.000	-0.1
0.000	-0.2
0.000	-0.3
0.000	-0.2
0.000	-0.2
0.001	-0.1
0.001	-0.1
0.001	-0.1
0.001	-0.1
0.001	-0.1
0.001	-0.1
0.004	-0.1
0.010	-0.2
0.018	-0.2
0.024	-0.3
0.029	-0.3
0.033	-0.3
0.036	-0.2
0.039	-0.1
0.043	-0.1
0.047	-0.1
0.053	-0.1
0.060	-0.1
0.066	-0.1
0.069	-0.1
0.073	-0.1
0.076	-0.1
0.079	-0.1
0.081	-0.1
0.082	-0.1
0.083	-0.2
0.083	-0.2
0.086	-0.2
0.090	-0.2
0.095	-0.2
0.100	-0.3
0.103	-0.3
0.104	-0.2
0.105	-0.3
0.107	-0.2
0.110	-0.2
0.115	-0.1
0.122	-0.2
0.128	-0.1
0.134	-0.2
0.139	-0.1
0.144	-0.2
0.150	-0.2
0.157	-0.2
0.164	-0.2
0.170	-0.3
0.175	-0.3
0.180	-0.2
0.185	-0.2
0.191	-0.1
0.195	-0.1
0.198	-0.2
0.201	-0.1
0.204	-0.2
0.206	-0.2
0.208	-0.3
0.210	-0.3
0.213	-0.1
0.217	0.3
0.222	0.6
0.226	0.2
0.230	0.0
0.233	-0.1
0.235	-0.1
0.237	

The <span style="color:purple">open()</span> function ‘opens’ your file. The <span style="color:purple">'r'</span> specifies that I only want to read from the file. Using <span style="color:purple">with</span> frees you from worrying about closing the file after you are done.



## 2.2 Writing data

In [7]:
text = 'Far out in the uncharted backwaters of the unfashionable end of the western spiral arm of the Galaxy lies a small unregarded yellow sun.\nOrbiting this at a distance of roughly ninety-two million miles is an utterly insignificant little blue green planet whose ape-descended life forms are so amazingly primitive that they still think digital watches are a pretty neat idea.'

I will use two writing methods just so that I can show you how they work.



### Writing to a file in one go

In [8]:
with open('my-text-once.txt', 'w') as file:
    file.write(text)

You should now have a file <span style="color:purple"> my-text-once.txt</span> in your directory. By the way, the <span style="color:purple">'w'</span> indicates that I am opening the file for writing.

### Writing to a file, line by line

Let me show you how to write a line at a time. This is useful when dealing with data generated on the fly. Since I don’t have such data now, I will split the lines of the previous text.


In [11]:
with open('my-text-lines.txt', 'w') as file:
    for line in text.splitlines():
        file.writelines(line)
#writing to a file is a very slow operation. So, it will slow things down if you do it in a loop.

# 3 Some useful packages

| Package | Primarily used for                                          |
|---------|-------------------------------------------------------------|
| `os`    | To 'talk' to the OS to create, modify, delete folders and write OS-agnostic code. |
| `glob`  | To search for files.                                        |
| `shutil`| To copy files.                                              |


I am using <span style="color:purple">os</span> and <span style="color:purple">shutil</span> because <span style="color:purple">shutil</span> offers some function (e.g., <span style="color:purple">shutil.copy()</span>) that <span style="color:purple">os</span> does not. There are also subtle differences in what these functions do.


In [12]:
import os
import glob
import shutil


# 4 OS safe paths

Consider <span style="color:purple">data-01.txt</span> in the sub-directory <span style="color:purple">sg-data</span> of the directory of <span style="color:purple">all-data</span>.

<span style="color:purple">all-data</span> –> <span style="color:purple">sg-data</span> –> <span style="color:purple">data-01.txt</span>

In [15]:
#To access data-01.txt,

path = os.path.join('.', 'all-data', 'sg-data', 'data-01.txt')
print(path)

#So, using os.path.join() will adjust your path with either / or \ as necessary. This means your code will seamlessly run on all the OS.

./all-data/sg-data/data-01.txt


# 5 Folders

## 5.1 Creating folders

You can create a folder programmatically using <span style="color:purple">os.mkdir()</span>. 

This is very useful because you can write a tiny bit of code to quickly organise your data. 

For example, let’s say we need to store information about the people ‘John’, ‘Paul’ and ‘Ringo’. I can quickly create some folders for this by:



In [20]:
os.mkdir('ppl')

for person in ['John', 'Paul', 'Ringo']:
    path = os.path.join('ppl', person)
    print(f'Creating {path}')
    os.mkdir(path)

Creating ppl/John
Creating ppl/Paul
Creating ppl/Ringo


In [21]:
#You don’t need the print() statement. I have included it so I have some feedback on what is (or is not) happening.

## 5.2 Checking for existence

Python will complain if you try to run this code twice, saying that the file (yes, Python refers to folders as files) already exists. 

So, when you create resources, it is a good idea to check if they already exist. 

There are two ways to do this: use <span style="color:purple">try-except</span> with the <span style="color:purple">FileExistsError</span> or use <span style="color:purple"> os.path.exists()</span>.



### Using try-except

In [26]:
for person in ['John', 'Paul', 'Ringo']:
    path = os.path.join('ppl', person)
    try:
        os.mkdir(path)
        print(f'Creating {path}')
    except FileExistsError:
        print(f'{path} already exists; skipping creation.')

ppl/John already exists; skipping creation.
ppl/Paul already exists; skipping creation.
ppl/Ringo already exists; skipping creation.


### Using os.path.exists()

In [27]:
for person in ['John', 'Paul', 'Ringo']:
    path = os.path.join('ppl', person)
    if os.path.exists(path):
        print(f'{path} already exists; skipping creation.')
    else:
        os.mkdir(path)
        print(f'Creating {path}')

ppl/John already exists; skipping creation.
ppl/Paul already exists; skipping creation.
ppl/Ringo already exists; skipping creation.


## 5.3 Copying files

First, there should be a copy of the 73 logo (sp2273_logo.png) in the current folder. Then, I will copy this into the folders I created for ‘John’, ‘Paul,’ and ‘Ringo’.



In [40]:
for person in ['John', 'Paul', 'Ringo']:
    path_to_destination = os.path.join('ppl', person)
    shutil.copy('sp2273_logo.png', path_to_destination)
    print(f'Copied file to {path_to_destination}')

Copied file to ppl/John
Copied file to ppl/Paul
Copied file to ppl/Ringo


In [64]:
print(path_to_destination)

ppl/Ringo


# 6 Listing and looking for files

If I want to know what files are in a folder, then glob does easy work of this. Let me show you how to use it.



## Example 1

I use this if I want all the files in the current directory.

The <span style="color:purple">*</span> is called a <span style="color:orange">wildcard</span> and is read as **‘anything’**. So, I am asking <span style="color:purple">glob</span> to give me anything in the folder.



In [41]:
glob.glob('*')

['my-text-once.txt',
 'ppl',
 'files,_folders_&_os_(need).ipynb',
 'sp2273_logo.png',
 'spectrum-01.txt',
 'people',
 'my-text-lines.txt']

## Example 2

In [42]:
glob.glob('pp*')

['ppl']

In [43]:
glob.glob('peo*')

['people']

In [44]:
#I want to refine my search and ask glob to give only those files that match the pattern ‘pp’ followed by ‘anything’.
#I want to refine my search and ask glob to give only those files that match the pattern ‘peo’ followed by ‘anything’

## Example 3

In [45]:
#I now want to know what is inside the folders that start with peo.
glob.glob('peo*/*')

['people/Paul', 'people/John', 'people/Ringo']

## Example 4

In [46]:
#Now, I want to see the whole, detailed structure of the folder people. 
#For this, I need to tell glob to search recursively (i.e. dig through all sub-file directories) by putting recursive=True.

#I must also use two wildcards ** to say all ‘sub-directories’.

glob.glob('people/**', recursive=True)

['people/', 'people/Paul', 'people/John', 'people/Ringo']

## Example 5

In [48]:
#I want only the .png files. So, I just need to modify my pattern. 
#I am asking glob to go through the whole structure of people and show me those files with the pattern ‘anything’.png!

glob.glob('ppl/**/*.png', recursive=True)

['ppl/Paul/sp2273_logo.png',
 'ppl/John/sp2273_logo.png',
 'ppl/Ringo/sp2273_logo.png']

# 7 Extracting file info

When dealing with files and folders, you often have to extract the filename, folder or extension. 

You can do this by simple string manipulation; for example if I want the filename and extension:



In [49]:
path = 'ppl/Ringo/imgs/sp2273_logo.png'
filename = path.split(os.path.sep)[-1]
extension = filename.split('.')[-1]
print(filename, extension)

sp2273_logo.png png


<span style="color:purple">os.path.sep</span> is the path separator (i.e. <span style="color:purple">\ </span> or <span style="color:purple">/</span>)  for the OS. 

I split the path where the separator occurred and picked the last element in the list. I use a similar strategy for the file extension.


In [51]:
path = 'ppl/Ringo/imgs/sp2273_logo.png'

In [52]:
os.path.split(path)      # Split filename from the rest

('ppl/Ringo/imgs', 'sp2273_logo.png')

In [53]:
os.path.splitext(path)   # Split extension

('ppl/Ringo/imgs/sp2273_logo', '.png')

In [54]:
os.path.dirname(path)    # Show the directory

'ppl/Ringo/imgs'

# 8 Deleting stuff

In [57]:
os.remove('ppl/Ringo/sp2273_logo.png')

In [61]:
os.rmdir('ppl/Ringo')

In [63]:
shutil.rmtree('ppl/John')

It goes without saying that you should be careful when using these functions. 

With great power comes great responsibility, so use with extreme caution!

