<div style="text-align:left;font-size:2em"><span style="font-weight:bolder;font-size:1.25em">SP2273 | Learning Portfolio</span><br><br><span style="font-weight:bold;color:darkred">Files, Folders & OS (Need)</span></div>

In [3]:
!dir #windows

zsh:1: command not found: dir


In [5]:
!pwd #mac

/Users/teresaleong/Desktop/learning-portfolio-TeresaGraceL/files, folders & os


# What to expect in this chapter

In programming, you must communicate with the OS to create, modify, move, copy, and delete files and directories (folders). Different types of computers (eg mac, windows) have very different file structures so our code must be able to run on both types. There are many instances where we need to read and write files as well.

# 1 Important concepts

folder and directory refer to the same thing here

## 1.1 Path

a way to specify the location on the computer (like an address, will take you to file or folder)  

tells us how to find a file or folder, can specify it two ways:

- absolutely (fixes your file structure)
- relatively (easier to use especially if later moving folders about)

Example: absolute path to a file on desktop on windows  
C:\\Users\Chammika\Desktop\data-01.txt

## 1.2 More about relative paths

.	‘this folder’  
..	‘one folder above/ up’

eg ../../../data.txt (go 3 folders up and look for this file)
- this can cause problems if your file structure is not stable/ things are moved around

### macOS or Linux

macOS and Linux allow you to use ~ to refer to your home directory.

eg. look for a file on desktop using ~\Desktop\data-01.txt

## 1.3 Path separator

Windows uses \ as the path separator while macOS or Linux uses /.  
if you want to code to work on both systems you must not hardcode either path separator.

the python os package can fix this problem.

## 1.4 Text files vs. Binary files

Text files are simple and can be opened, and their contents examined by almost any software (eg .txt, .md, .csv).  

Binary files need some processing to make sense of what they contain. If we look at the raw data like in  a png file cannot understand it directly - we must know the format to be able to extract it. An important reason for binary files is size, because patterns can be compressed and more data can be stored taking less space.

## 1.5 Extensions

Files are usually named to end with an extension (name.extension).  
This lets the operating system know what software or app to use to extract details.
Be careful about changing the file extension because the OS will not know what to do with it.

# 2 Opening and closing files

## 2.1 Reading data

In [6]:
with open('spectrum-01.txt', 'r') as file:
    file_content = file.read()

print(file_content)

FileNotFoundError: [Errno 2] No such file or directory: 'spectrum-01.txt'

- The open() function ‘opens’ your file.
- The 'r' specifies we only want to read from the file. Using with frees us from worrying about closing the file after we are done.
- good to be clear so as not to overwrite files.



## 2.2 Writing data

In [7]:
text = 'Far out in the uncharted backwaters of the unfashionable end of the western spiral arm of the Galaxy lies a small unregarded yellow sun.\nOrbiting this at a distance of roughly ninety-two million miles is an utterly insignificant little blue green planet whose ape-descended life forms are so amazingly primitive that they still think digital watches are a pretty neat idea.'

### Writing to a file in one go

In [10]:
with open('my-text-once.txt', 'w') as file:
    file.write(text)

- w indicates opening the file for writing

### Writing to a file, line by line

In [9]:
with open('my-text-lines.txt', 'w') as file:
    for line in text.splitlines():
        file.writelines(line)

- useful when dealing with data generated on the fly
- writing to a file is a very slow operation, doing it in a loop slows it down
- w writes the file - be careful not to overwrite important files
- a appends it to the back

In [11]:
text.splitlines()

['Far out in the uncharted backwaters of the unfashionable end of the western spiral arm of the Galaxy lies a small unregarded yellow sun.',
 'Orbiting this at a distance of roughly ninety-two million miles is an utterly insignificant little blue green planet whose ape-descended life forms are so amazingly primitive that they still think digital watches are a pretty neat idea.']

In [2]:
!pwd

/Users/teresaleong/Desktop/learning-portfolio-TeresaGraceL/files, folders & os


# 3 Some useful packages

In [8]:
import os
import glob
import shutil

# 4 OS safe paths

In [9]:
path = os.path.join('.', 'all-data', 'sg-data', 'data-01.txt')
print(path)

./all-data/sg-data/data-01.txt


- best to write paths using this function.  
- using os.path.join() will adjust your path with either / or \ as necessary -- code will seamlessly run on all the OS.

# 5 Folders

## 5.1 Creating folders

- be careful about files exisiting or not existing

In [10]:
os.mkdir('people')

for person in ['John', 'Paul', 'Ringo']:
    path = os.path.join('people', person)
    print(f'Creating {path}')
    os.mkdir(path)

Creating people/John
Creating people/Paul
Creating people/Ringo


- case sensitive
- print statement is included to know what is happening
- DO NOT use the os commands until you are sure something correct is happening

## 5.2 Checking for existence

Python refers to folders as files, it will complain if the file already exists. two ways to check if something exists:

### Using try-except

In [11]:
for person in ['John', 'Paul', 'Ringo']:
    path = os.path.join('people', person)
    try:
        os.mkdir(path)
        print(f'Creating {path}')
    except FileExistsError:
        print(f'{path} already exists; skipping creation.')

people/John already exists; skipping creation.
people/Paul already exists; skipping creation.
people/Ringo already exists; skipping creation.


### Using os.path.exists()

In [12]:
for person in ['John', 'Paul', 'Ringo']:
    path = os.path.join('people', person)
    if os.path.exists(path):
        print(f'{path} already exists; skipping creation.')
    else:
        os.mkdir(path)
        print(f'Creating {path}')

people/John already exists; skipping creation.
people/Paul already exists; skipping creation.
people/Ringo already exists; skipping creation.


## 5.3 Copying files

In [18]:
for person in ['John', 'Paul', 'Ringo']:
    path_to_destination = os.path.join('people', person)
    shutil.copy('sp2273_logo.png', path_to_destination)
    print(f'Copied file to {path_to_destination}')

Copied file to people/John
Copied file to people/Paul
Copied file to people/Ringo


In [19]:
file_to_copy = 'sp2273_logo.png'
for person in ['John', 'Paul', 'Ringo']:
    path_to_destination = os.path.join('people', person)
    file_path = os.path.join(path_to_destination, file_to_copy)
    
    if os.path.exists(path_to_destination):
        print(f'{path} already exists!')
    else:
        shutil.copy('file_to_copy', path_to_destination)
        print(f'Copied {file_to_copy} to {path_to_destination}')

people/Ringo already exists!
people/Ringo already exists!
people/Ringo already exists!


- all the images in a sub-folder called imgs in each person’s directory.
- first creating the folders imgs and then moving the logo file into that folder.

In [20]:
file_to_copy ='sp2273_logo.png'
for person in ['John', 'Paul', 'Ringo']:
    
    # Create folder 'imgs'
    path_to_imgs = os.path.join('people', person, 'imgs')
    if not os.path.exists(path_to_imgs):
        os.mkdir(path_to_imgs)

    # Move logo file
    current_path_of_logo = os.path.join('people', person, file_to_copy)
    new_path_of_logo = os.path.join('people', person, 'imgs', file_to_copy)

    shutil.move(current_path_of_logo, new_path_of_logo)
    print(f'Moved logo from {current_path_of_logo} to {new_path_of_logo}')

Moved logo from people/John/sp2273_logo.png to people/John/imgs/sp2273_logo.png
Moved logo from people/Paul/sp2273_logo.png to people/Paul/imgs/sp2273_logo.png
Moved logo from people/Ringo/sp2273_logo.png to people/Ringo/imgs/sp2273_logo.png


- be paranoid because this can mess things up
- Mac terminal and loops structure allows you to do things very fast

# 6 Listing and looking for files

In [23]:
!pwd

/Users/teresaleong/Desktop/learning-portfolio-TeresaGraceL/files, folders & os


**Example 1**
- all the files in the current directory.
- the asterisk is a wildcard, read as 'anything'.

In [24]:
glob.glob('*')

['my-text-once.txt',
 'files,_folders_&_os_(need).ipynb',
 'sp2273_logo.png',
 'people',
 'my-text-lines.txt']

**Example 2** 
- files that match 'peo' followed by anything

In [26]:
glob.glob('peo*')

['people']

**Example 3**
- what is inside the folders that start with peo

In [27]:
glob.glob('peo*/*')

['people/Paul', 'people/John', 'people/Ringo']

**Example 4**
- see the whole, detailed structure of the folder people
- need to tell glob to search recursively (i.e. dig through all sub-file directories) by putting recursive=True.
    - to be able to show you everything in the sub-file directories
- must use two wildcards to say "to all subdirectories"

In [28]:
glob.glob('people/**', recursive=True)

['people/',
 'people/Paul',
 'people/Paul/imgs',
 'people/Paul/imgs/sp2273_logo.png',
 'people/John',
 'people/John/imgs',
 'people/John/imgs/sp2273_logo.png',
 'people/Ringo',
 'people/Ringo/imgs',
 'people/Ringo/imgs/sp2273_logo.png']

**Example 5**
- looking only for png files

In [29]:
glob.glob('people/**/*.png', recursive=True)

['people/Paul/imgs/sp2273_logo.png',
 'people/John/imgs/sp2273_logo.png',
 'people/Ringo/imgs/sp2273_logo.png']

- there are also regular expressions which glob is capable of handling
- glob function belongs to glob package, glob has a lot of functions

# 7 Extracting file info

- you can extract the file name, folder and extension by simple string manipulation
- using the os.path.sep means that it will work on windows and on mac

In [32]:
path = 'people/Ringo/imgs/sp2273_logo.png'
filename = path.split(os.path.sep)[-1]
extension = filename.split('.')[-1]
print(filename, 'has the extension', extension)

sp2273_logo.png has the extension png


using the idea of lists to get the file name

**using functions in os**

In [37]:
os.path.split(path) #split filename from the rest

('people/Ringo/imgs', 'sp2273_logo.png')

In [38]:
os.path.splitext(path) #split extension

('people/Ringo/imgs/sp2273_logo', '.png')

In [40]:
os.path.dirname(path) #show the directory

'people/Ringo/imgs'

# 8 Deleting stuff

**be very careful with this** can do some damage

- os.remove works with files
- os.rmdir works with empty directories
- shutil works with directories with files

In [41]:
os.remove('people/Ringo/imgs/sp2273_logo.png')

In [46]:
os.rmdir('people/Ringo')

OSError: [Errno 66] Directory not empty: 'people/Ringo'

In [49]:
shutil.rmtree('people/Ringo')

- side note JPGS are lossy formats
- good to know what data format to save data in