# Operating system and files

## Before we start... `import`
To interact with the operating system and read files, we need to *import a module*:
```python
import os
```

- What is a **module**? A file containing `python` code (variables, function definitions, classes).
- What does it mean to **import a module**? When we import a module, `python` runs all the code in the module, as simple as that.
- Most of the times, a module only contains *definitions* and is not supposed to execute any function.


In [None]:
import os
print(os.name)

## Paths
Paths identify a file on a filesystem.

Examples:
- Windows path: `C:\User\Documents\file.ext`
- Linux path: `/home/user/file`.

In Windows, most filenames have extensions and extensions is how the OS determines the file type.
In Linux, file type and extension are unrelated at a fundamental level but extensions are of help to the user and applications.

White it is technically possible to manipulate paths as strings, **don't do it**. It's messy, ugly and does not play well across different operating systems!

### Python paths, the old way


In [None]:
# Let's find out our current directory
base_dir = os.getcwd()
print(base_dir)

Now let's define a new directory...

In [None]:
new_dir = os.path.join(base_dir, 'work')
print(new_dir)

We have our path, let's create it!

In [None]:
# if we try to create a directory with an existing name, we get an error
if not os.path.exists(new_dir):
    os.makedirs(new_dir)

print(type(new_dir))

### Python paths, the cool way
We have noticed that, after all, we are still manipulating a path as a string. Can we do better?

In [None]:
from pathlib import Path

In [None]:
base_dir = Path(base_dir)
print(base_dir)
type(base_dir)

In [None]:
new_dir = base_dir / "work"

Path.mkdir(new_dir, exist_ok=True)

- We can manipulate paths as objects.
- Better functionality in the form of class and instance methods of `Path`.
- Awesome `/` operator! 

## First file

In [None]:
filename = 'datafile.dat'

filepath = new_dir / filename

data = ["hello world", "\n", "hello RUB", "\n"] # newline at the end of file is a good idea!

with open(filepath, 'w') as f:
    for string in data:
        f.write(string)
    # f.writelines(data) # shorter and smarter if `data` is an iterable/collection!

# no need to explicitly close the file!

In [None]:
with open(filepath, 'r') as f:
    data = f.read()

print(data)

If the file is really big, this is not ideal because all the file content gets loaded in a variable (on the RAM). Better to read line by line:

In [None]:
with open(filepath, 'r') as f:
    for line in f:
        print(line)

By default, the file is opened in text mode, means that:
- only characters/string can be written;
- everything is read as a character.

## Binary files
- Writing binary content is complicate and messy.
- In `python` we can use `pickle` to dump an arbitary object into a file.

In [None]:
import pickle

obj = list(range(10))

with open(filepath, 'wb') as f:
    pickle.dump(obj, f)


In [None]:
with open(filepath, 'rb') as f:
    obj = pickle.load(f)

print(obj)

Works with basically any object (even your own classes), but it also very opaque:
- `python` specific, no cross-language standard;
- need to know what's inside the file;
- writing and reading iteratively is possible but complicate.

## Why not both?
- JSON (JavaScript Object Notation) is a standard encoding format that allows to write multiple data types in the form of a text file.
- You can think of a JSON file as a big nested dictionary.
- Most `python` native data types can be written as a JSON file. 

In [None]:
import json

data = {}

data['numbers'] = list(range(10))
data['words'] = ['foo', 'bar', 'baz']

with open(filepath, 'w') as f:
    json_data = json.dumps(data) # dumps returns a string
    json.dump(data, f) # dump writes to file!

In [None]:
print(json_data)
type(json_data)

- It reminds very much the python syntax, but this is actually the JSON format itself!
- The file is human-readable!

In [None]:
with open(filepath, 'r') as f:
    obj = json.load(f)

print(obj)
type(obj)