<div style="text-align:left;font-size:2em"><span style="font-weight:bolder;font-size:1.25em">SP2273 | Learning Portfolio</span><br><br><span style="font-weight:bold;color:darkred">Files, Folders & OS (Need)</span></div>

# What to expect in this chapter

#  Important concepts

## Path

The path is basically a way to specify a location on the computer and it takes you to a file or folder.
Paths can be specified abolutely or relatively. 
Example of an absolute path: C:\NUS\Current Mods\SP2273\Learning portfolio\learning-portfolio-ishansgill

##  More about relative paths

- . means current folder
- .. means one folder above

### macOS or Linux

I am a Windows user hehe but anyway mac and linux allow ~ to refer to the home directory


## Path separator

Windows uses \ as the path separator while mac and linux use /

Absolute path to a file on a Windows desktop:
C:\NUS\Current Mods\SP2273\Learning portfolio\learning-portfolio-ishansgill

On mac or linux it would look like /NUS/Current Mods/SP2273/Learning portfolio/learning-portfolio-ishansgill

Try not to hardcode either path separator so that the code works on both systems.

##  Text files vs. Binary files

- You can think of all files on your computer as being either text files or binary files.
- Text files are simple and can be opened, and their contents examined by almost any software (e.g., Notepad, TextEdit, Jupiter,…). Examples of text file formats are .txt, .md or .csv.
- Binary files, in contrast, require some processing to make sense of what they contain. For example, if you look at the raw data in a .png file, you will see gibberish. In addition, some binary files will only run on specific OSs. For example, the Excel.app on a Mac will not run on Windows, nor will the Excel.exe file run on macOS (or Linux). Some reasons for having binary files are speed and size; text files, though simple, can get bulky.

##  Extensions

Files are usually named to end with an extension separated from the name by a . like name.extension. This extension lets the OS know what software or app to use to extract the details in a file. For example, a .xlsx means use Excel or .pptx means use PowerPoint. Be careful about changing the extension of a file, as it will make your OS cough and throw a fit.

# Opening and closing files

Done by using a with statement (content manager)

##  Reading data

In [3]:
with open('spectrum-01.txt', 'r') as file:
    file_content = file.read()

print(file_content)

Light Intensity, Ch A vs Actual Angular Position, Run #4
Actual Angular Position (  )	Light Intensity, Ch A ( % max )
0.000	-0.2
0.000	-0.1
0.000	-0.1
0.000	-0.1
0.000	-0.1
0.000	-0.2
0.000	-0.1
0.000	-0.1
0.000	-0.1
0.000	-0.2
0.000	-0.1
0.000	-0.1
0.000	-0.2
0.000	-0.3
0.000	-0.2
0.000	-0.2
0.001	-0.1
0.001	-0.1
0.001	-0.1
0.001	-0.1
0.001	-0.1
0.001	-0.1
0.004	-0.1
0.010	-0.2
0.018	-0.2
0.024	-0.3
0.029	-0.3
0.033	-0.3
0.036	-0.2
0.039	-0.1
0.043	-0.1
0.047	-0.1
0.053	-0.1
0.060	-0.1
0.066	-0.1
0.069	-0.1
0.073	-0.1
0.076	-0.1
0.079	-0.1
0.081	-0.1
0.082	-0.1
0.083	-0.2
0.083	-0.2
0.086	-0.2
0.090	-0.2
0.095	-0.2
0.100	-0.3
0.103	-0.3
0.104	-0.2
0.105	-0.3
0.107	-0.2
0.110	-0.2
0.115	-0.1
0.122	-0.2
0.128	-0.1
0.134	-0.2
0.139	-0.1
0.144	-0.2
0.150	-0.2
0.157	-0.2
0.164	-0.2
0.170	-0.3
0.175	-0.3
0.180	-0.2
0.185	-0.2
0.191	-0.1
0.195	-0.1
0.198	-0.2
0.201	-0.1
0.204	-0.2
0.206	-0.2
0.208	-0.3
0.210	-0.3
0.213	-0.1
0.217	0.3
0.222	0.6
0.226	0.2
0.230	0.0
0.233	-0.1
0.235	-0.1
0.237	

The open() function ‘opens’ your file. The 'r' specifies that I only want to read from the file. Using with frees you from worrying about closing the file after you are done.

##  Writing data

In [4]:
text = 'Far out in the uncharted backwaters of the unfashionable end of the western spiral arm of the Galaxy lies a small unregarded yellow sun.\nOrbiting this at a distance of roughly ninety-two million miles is an utterly insignificant little blue green planet whose ape-descended life forms are so amazingly primitive that they still think digital watches are a pretty neat idea.'

### Writing to a file in one go

In [5]:
with open('my-text-once.txt', 'w') as file:
    file.write(text)


'w' indicates that I am opening the file for writing

This is the text in my-text-once.txt: Far out in the uncharted backwaters of the unfashionable end of the western spiral arm of the Galaxy lies a small unregarded yellow sun.
Orbiting this at a distance of roughly ninety-two million miles is an utterly insignificant little blue green planet whose ape-descended life forms are so amazingly primitive that they still think digital watches are a pretty neat idea.

### Writing to a file, line by line

In [6]:
with open('my-text-lines.txt', 'w') as file:
    for line in text.splitlines():
        file.writelines(line)


- Useful when dealing with data generated on the fly
- Could slow things down in a loop as its a slow operation

#  Some useful packages

- os: To ‘talk’ to the OS to create, modify, delete folders and write OS-agnostic code.
- glob: To search for files.
- shutil: To copy files.

**shutil offers some function (e.g., shutil.copy()) that os does not.**

In [7]:
import os
import glob
import shutil

# OS safe paths

Consider a file data-01.txt in the sub-directory sg-data of the directory all-data.

all-data 
 sg-data 
 data-01.txt

If I want to access data-01.txt all I have to do is:

In [8]:
path = os.path.join('.', 'all-data', 'sg-data', 'data-01.txt')
print(path)

.\all-data\sg-data\data-01.txt


In [12]:
path = os.path.join('.', 'NUS', 'Current Mods', 'SP2273', 'Learning Portfolio')
print(path)

.\NUS\Current Mods\SP2273\Learning Portfolio


#  Folders

##  Creating folders

You can create a folder programmatically using os.mkdir(). This is very useful because you can write a tiny bit of code to quickly organise your data. For example, let’s say we need to store information about the people ‘Ishan’, ‘Keith’ and ‘Lippy’. I can quickly create some folders for this by:

In [13]:
os.mkdir('people')

for person in ['Ishan', 'Keith', 'Lippy']:
    path = os.path.join('people', person)
    print(f'Creating {path}')
    os.mkdir(path)

Creating people\Ishan
Creating people\Keith
Creating people\Lippy


The print statement isnt necessary, its just for us to see what's happening here

##  Checking for existence

Python will complain if you try to run this code twice, saying that the file (yes, Python refers to folders as files) already exists. So, when you create resources, it is a good idea to check if they already exist. There are two ways to do this: use try-except with the FileExistsError or use os.path.exists()

### Using try-except

In [14]:
for person in ['Ishan', 'Keith', 'Lippy']:
    path = os.path.join('people', person)
    try:
        os.mkdir(path)
        print(f'Creating {path}')
    except FileExistsError:
        print(f'{path} already exists; skipping creation.')

people\Ishan already exists; skipping creation.
people\Keith already exists; skipping creation.
people\Lippy already exists; skipping creation.


### Using os.path.exists()

In [15]:
for person in ['Ishan', 'Keith', 'Lippy']:
    path = os.path.join('people', person)
    if os.path.exists(path):
        print(f'{path} already exists; skipping creation.')
    else:
        os.mkdir(path)
        print(f'Creating {path}')


people\Ishan already exists; skipping creation.
people\Keith already exists; skipping creation.
people\Lippy already exists; skipping creation.


In [16]:
for person in ['Ishan', 'Keith', 'Juinshin']:
    path = os.path.join('people', person)
    if os.path.exists(path):
        print(f'{path} already exists; skipping creation.')
    else:
        os.mkdir(path)
        print(f'Creating {path}')

people\Ishan already exists; skipping creation.
people\Keith already exists; skipping creation.
Creating people\Juinshin


## Copying files

Copying files programmatically:

In [17]:
for person in ['Ishan', 'Keith', 'Lippy', 'Juinshin']:
    path_to_destination = os.path.join('people', person)
    shutil.copy('sp2273_logo_gray.png', path_to_destination)
    print(f'Copied file to {path_to_destination}')

Copied file to people\Ishan
Copied file to people\Keith
Copied file to people\Lippy
Copied file to people\Juinshin


Let’s say I want all the images in a sub-folder called imgs in each person’s directory. I can do this by first creating the folders imgs and then moving the logo file into that folder:

In [18]:
for person in ['Ishan', 'Keith', 'Lippy', 'Juinshin']:
    # Create folder 'imgs'
    path_to_imgs = os.path.join('people', person, 'imgs')
    if not os.path.exists(path_to_imgs):
        os.mkdir(path_to_imgs)

    # Move logo file
    current_path_of_logo = os.path.join('people', person, 'sp2273_logo_gray.png')
    new_path_of_logo = os.path.join('people', person, 'imgs', 'sp2273_logo_gray.png')

    shutil.move(current_path_of_logo, new_path_of_logo)
    print(f'Moved logo to {new_path_of_logo}')

Moved logo to people\Ishan\imgs\sp2273_logo_gray.png
Moved logo to people\Keith\imgs\sp2273_logo_gray.png
Moved logo to people\Lippy\imgs\sp2273_logo_gray.png
Moved logo to people\Juinshin\imgs\sp2273_logo_gray.png


#  Listing and looking for files

**Example 1**

Use this if I want all the files in the current directory.

The * is called a wildcard and is read as ‘anything’. So, I am asking glob to give me anything in the folder.

In [19]:
glob.glob('*')

['files,_folders_&_os_(need).ipynb',
 'my-text-lines.txt',
 'my-text-once.txt',
 'people',
 'sp2273_logo_gray.png',
 'spectrum-01.txt']

**Example 2**

I want to refine my search and ask glob to give only those files that match the pattern ‘peo’ followed by ‘anything’.

In [21]:
glob.glob('peo*')

['people']

**Example 3**

I now want to know what is inside the folders that start with peo.

In [22]:
glob.glob('peo*/*')

['people\\Ishan', 'people\\Juinshin', 'people\\Keith', 'people\\Lippy']

**Example 4**

Now, I want to see the whole, detailed structure of the folder people. For this, I need to tell glob to search recursively (i.e. dig through all sub-file directories) by putting recursive=True.

I must also use two wildcards ** to say all ‘sub-directories’.

In [23]:
glob.glob('people/**', recursive=True)

['people\\',
 'people\\Ishan',
 'people\\Ishan\\imgs',
 'people\\Ishan\\imgs\\sp2273_logo_gray.png',
 'people\\Juinshin',
 'people\\Juinshin\\imgs',
 'people\\Juinshin\\imgs\\sp2273_logo_gray.png',
 'people\\Keith',
 'people\\Keith\\imgs',
 'people\\Keith\\imgs\\sp2273_logo_gray.png',
 'people\\Lippy',
 'people\\Lippy\\imgs',
 'people\\Lippy\\imgs\\sp2273_logo_gray.png']

**Example 5**

I want only the .png files. So, I just need to modify my pattern. I am asking glob to go through the whole structure of people and show me those files with the pattern ‘anything’.png!

In [24]:
glob.glob('people/**/*.png', recursive=True)

['people\\Ishan\\imgs\\sp2273_logo_gray.png',
 'people\\Juinshin\\imgs\\sp2273_logo_gray.png',
 'people\\Keith\\imgs\\sp2273_logo_gray.png',
 'people\\Lippy\\imgs\\sp2273_logo_gray.png']

#  Extracting file info

When dealing with files and folders, you often have to extract the filename, folder or extension. You can do this by simple string manipulation; for example if I want the filename and extension:

In [25]:
path = 'people/Lippy/imgs/sp2273_logo_gray.png'
filename = path.split(os.path.sep)[-1]
extension = filename.split('.')[-1]
print(filename, extension)

people/Lippy/imgs/sp2273_logo_gray.png png


os.path.sep is the path separator (i.e. \ or /) for the OS. The path was split  where the separator occurred and the last element in the list was picked. A similar strategy was used for the file extension.

Os provides some simple functions for these tasks.

In [26]:
path = 'people/Lippy/imgs/sp2273_logo_gray.png'

In [27]:
os.path.split(path)      # Split filename from the rest

('people/Lippy/imgs', 'sp2273_logo_gray.png')

In [28]:
os.path.splitext(path)   # Split extension

('people/Lippy/imgs/sp2273_logo_gray', '.png')

In [29]:
os.path.dirname(path)    # Show the directory

'people/Lippy/imgs'

#  Deleting stuff

If you want to remove a file:

In [None]:
os.remove('people/Lippy/imgs/sp2273_logo_gray.png')

This won’t work with directories. For an empty directory, use:

In [None]:
os.rmdir('people/Lippy')

For a directory with files, use shutil:

In [None]:
shutil.rmtree('people/Lippy')

**USE WITH EXTREME CAUTION**