# Lesson 9 - files, os

### modules

Modules provide a high level option of separation of concerns in Python. Similarly to functions, modules allow to store your code in some reusable container to achieve that separation. Technically, any .Py file can be treated as a module. Probably you've already seen some examples of importing some module into your file - you may notice how only some simple name (without the file extention or the path) is being used. The Python interpreter will perform a search automatically based on the provided name. Some main areas of that search would be the current working directory (from where the file is being executed), standard library and site-packages folders (near to the installation place of the interpreter). It can be configured additionally if some non-standard file placement needs to be supported (rarely). 

Notice that the current directory contains file `test_module.py`. Let's try to import it:

In [None]:
import test_module # just the name, which becomes a variable

print(test_module.global_variable)
test_module.test_func(5) # trough that name we can access internal global variables

global value
5
global value


Note how test_module falls into the global scope after the import. This would mean that we should not use any other vars with such a name or we would overwrite the module link. Sometimes it makes sense to have more control over the variable name:

In [5]:
import test_module as tm #the module link is now accessible via tm variable

tm.test_func(10)

10
global value


There is a way to do more precise importing:

In [6]:
from test_module import global_variable as gv, test_func as tf

print(gv)
tf(15)

global value
15
global value


As well as less precise:

In [8]:
from test_module import * # this adds all global names from the imported file to the current one

print(global_variable)
test_func(20)

global value
20
global value


The latter option is the least reccomended one since it brings the highest possibility of a conflict between names of current and imported files.

### os and file systems

Python, as a high-level programming language, offers several advantages when working with the operating system (OS) and files. Its standard library provides a wide range of modules and functions that simplify common tasks related to file and OS operations. Python's `os` module allows you to interact with the underlying operating system (even without knowing which os is being used), enabling you to perform operations such as file and directory management, process control, and environment configuration. The `os.path` module provides a convenient way to manipulate file paths, making it easy to work with files and directories across different operating systems. Python's file handling capabilities, including the built-in `open()` function and file object methods, allow you to read from and write to files efficiently, with support for various file modes and encodings.

### working with OS

The `os` module is a part of standard library, it should be imported to get a high level access to the current OS and a set of usual operations. Since Python is cross-platform, it's important to write a code which will not depend directly on some specific OS versions. The `os` module allows decoupling of the code and the specific of some environment in which the code is running. 

In [1]:
import os

print(os.name) # current os, it will be either nt for Windows or unix for everything else

# the os name is being used for debug purposes only, not for the actual program logic 

nt


In [None]:
current_dir = os.getcwd() # getting the directory path, from which the current instance of interpreter is running
print("Current working directory:", current_dir)

NameError: name 'os' is not defined

: 

In [None]:
# List files and directories in the current directory
contents = os.listdir()
print("Contents of the current directory:")
for item in contents:
    print(item)

Contents of the current directory:
Lesson 9 - files, os.ipynb


In [6]:
# checking if a file or directory exists and deleting
path = "/path/to/file_or_directory"
if os.path.exists(path):
    print(f"{path} exists.")
    os.remove(path)
    print("not anymore")
else:
    print(f"{path} does not exist.")

/path/to/file_or_directory does not exist.


### working with files

The standard way of operating on files is by opening them with `open()`. It returns a file object allowing interactions with files on a computer's file system, specifically to read from or write to files. Everything happens on the high level, an opened file is represented by an interactive object, there is no need to deep dive into io streams or file cursors.  

In [14]:
try:
    file_obj = open("test.txt") # opening by file path (or a name of file from cwd)
except FileNotFoundError:
    print("the file does not exist")
else:
    file_obj.close() # closing file if it was open when it's not needed anymore

the file does not exist


Closing files after you're done using them is important for several reasons:

- Resource Management:

    When you open a file, the operating system allocates certain resources to handle the file operations, such as memory buffers and file descriptors.
    If you don't close the file properly, these resources may remain allocated even after your program has finished executing.
    Failing to close files can lead to resource leaks, where the allocated resources are not freed up for other programs or processes to use.
    Over time, if you open many files without closing them, it can lead to resource exhaustion, potentially causing your program or even the operating system to become unresponsive.

- Data Integrity:

    When you write data to a file, it is often buffered in memory before being written to the physical storage device.
    If you don't close the file properly, there's a risk that some of the buffered data may not be written to the file, leading to data loss or inconsistencies.
    Closing the file ensures that all the buffered data is flushed and written to the file, guaranteeing data integrity.

- File Locking and Concurrency:

    In some cases, when you open a file, the operating system may lock the file to prevent other processes or programs from accessing it simultaneously.
    If you don't close the file, the lock may remain in place, preventing other parts of your program or other programs from accessing the file.
    This can lead to concurrency issues, where multiple processes or threads are unable to access the file simultaneously, potentially causing delays or deadlocks.

The simpliest way to manage a file is to use `with` statement like this:

In [18]:
with open("test.txt", 'w') as f: # file object is assigned to variable f
    pass # some logic goes here 

# closing happens automatically as soon as the code block ends, no need to call close()

In the example above there is an additional parameter with value `'w'` in the `open()` call. This is so-called mode. The mode determines the operations you can perform on the opened file.

Here are the commonly used modes:

- `'r'` (Read Mode): This is the default mode. It opens the file for reading. If the file does not exist, it raises a FileNotFoundError.
- `'w'` (Write Mode): Opens the file for writing. If the file already exists, its contents are truncated (deleted). If the file does not exist, a new file is created.
- `'a'` (Append Mode): Opens the file for appending. If the file exists, the new data is written at the end of the file, preserving the existing contents. If the file does not exist, a new file is created.
- `'x'` (Exclusive Creation Mode): Opens the file for exclusive creation. If the file already exists, a FileExistsError is raised.
- `'b'` (Binary sub-Mode): Used in conjunction with other modes (like `'wb'`) to open the file in binary mode, allowing you to read or write binary data.
- `'t'` (Text sub-Mode): Used in conjunction with other modes (like `'rt'`) to open the file in text mode, allowing you to read or write string data. This is the default mode.
- `'+'` (Plus sub-Mode): Used in conjunction with other modes (like `'a+'` or `'rb+'`), add missing operations while presuming the main mode logic, e.g. `'w+'` would allow reading from the file while still truncating it on opening.

It's hard to choose a mode sometimes. It's better to not enable functionality which is not needed at the momemnt and use either `'r'` or `'w'`. Though, the `'a+'` is the most safest mode which still provides the most functionaity.  

In [50]:
with open("example_file.txt", "r") as file:
    content = file.read()  # Read the whiole content
    print(repr(content))

'line 1\nline 2\nline 3'


In [49]:
with open("example_file.txt", "r") as file:
    content = file.read(1)  # Read a single char in an ideal world
    print(repr(content))

# the read operation depend on file encoding, it may not work properly for any passed value with any ecoding
# the encoding can be specified as the open() parameter

'l'


In [48]:
with open("example_file.txt", "r") as file:
    content = file.read(10000000)  # it's possible to pass anu huge number, the file will be read till the end
    print(repr(content))

'line 1\nline 2\nline 3'


In [47]:
with open("example_file.txt", "r") as file:
    content = file.readline()  # Read a single line
    print(repr(content))

# readline() reads until newline sign/s like \n or \r\n

'line 1\n'


In [46]:
with open("example_file.txt", "r") as file:
    for line in file:
        print(repr(line)) # Read the file line by line

'line 1\n'
'line 2\n'
'line 3'


In [None]:
with open("second_example_file.txt", "w") as file: # be carefull, the file is being recreated every time with 'w'
    print(file.read()) # nope

UnsupportedOperation: not readable

In [45]:
with open("second_example_file.txt", "w") as file:
    file.write("test test test") # writing

with open("second_example_file.txt", "r") as file:
    print(repr(file.read())) # reading

'test test test'


In [44]:
with open("second_example_file.txt", "w+") as file: # + should allow reading
    file.write("test test test")
    print(repr(file.read())) # yes, but no

# the read is here, but it reads nothing (empty string)

''


In [43]:
with open("second_example_file.txt", "w+") as file: # + should allow reading
    file.write("test test test")
    file.seek(0) # we need to move the file pointer (or cursor) to the beggining
    print(file.read()) # now it reads from the beggining

# working with pointers can be complicated, but seek(0) will always get the pointer to the start of a file

test test test


It's possible to copy a text file either char by char or byte by byte. Allthoug, a bigger chunk size is being used in most cases. 

In [55]:
filename = "example_file.txt"
chunk_size = 1

with open(filename, "rb") as donor:
    with open(filename.replace(".txt", "_copy.txt"), "wb") as recepient:
        next_chunk = True
        while next_chunk:
            next_chunk = donor.read(chunk_size)
            recepient.write(next_chunk)



## Homework

Task 1 - text justification

1. Prepare a file with somewhat long text content (copy something from Wikipedia for example). Put it somewhere to be read by your program.
2. When the program starts, allow the user to enter a parameter "maximum number of characters per line", which must be greater than 20.
3. Format the text taking into account the maximum number of characters per line, but if a word does not fit entirely in a line, it should be moved to the next one, and the spacing between words should be evenly increased for the current line(similarly to the "Justify" function in text editors). There is a module called ‘textwrap’ which can do it, you may take a look at it but do not use for this task.
4. Write the resulting text to a new file and notify the user about it.

The result file content should look like this (justified for 40 chars in line): 

```
Python was created in the early 1990s by
Guido    van    Rossum    at   Stichting
Mathematisch  Centrum in the Netherlands
as a successor of a language called ABC.
Guido  remains  Python principal author,
although  it includes many contributions
from others.
In  1995,  Guido  continued  his work on
Python  at  the Corporation for National
Research Initiatives in Reston, Virginia
where  he  released  several versions of
the software.
In  May  2000, Guido and the Python core
development  team moved to BeOpen.com to
form  the  BeOpen  PythonLabs  team.  In
October of the same year, the PythonLabs
team  moved  to  Digital  Creations.  In
2001, the Python Software Foundation was
formed,    a   non-profit   organization
created      specifically     to     own
Python-related   Intellectual  Property.
Zope  Corporation is a sponsoring member
of the PSF.
All  Python  releases  are  Open Source.
Historically,  most, but not all, Python
releases  have also been GPL-compatible;
the  table  below summarizes the various
releases.
```