# Operating system and files

## Before we start... `import`
To interact with the operating system and read files, we need to *import a module*:
```python
import os
```

- What is a **module**? A file containing `python` code (variables, function definitions, classes).
- What does it mean to **import a module**? When we import a module, `python` runs all the code in the module, as simple as that.
- Most of the times, a module only contains *definitions* and is not supposed to execute any function.


In [None]:
import os
print(os.name)

## Paths
Paths identify a file on a filesystem.

Examples:
- Windows path: `C:\User\Documents\file.ext`
- Linux path: `/home/user/file`.

In Windows, most filenames have extensions and extensions is how the OS determines the file type.
In Linux, file type and extension are unrelated at a fundamental level but extensions are of help to the user and applications.

White it is technically possible to manipulate paths as strings, **don't do it**. It's messy, ugly and does not play well across different operating systems!

### Python paths, the old way


In [None]:
# Let's find out our current directory
base_dir = os.getcwd()
print(base_dir)

### Python paths, the cool way
We have noticed that, after all, we are still manipulating a path as a string. Can we do better?

In [None]:
from pathlib import Path

In [None]:
base_dir = Path(base_dir)

- We can manipulate paths as objects.
- Better functionality in the form of class and instance methods of `Path`.
- Awesome `/` operator! 

## First file: a text file

In [None]:
# first, some data
names = ["NGC 5128", "TXS 0506+056", "NGC 1068", "GB6 J1040+0617", "TXS 2226-184"]
distances = [3.7, 1.75e3, 14.4, 1.51e4, 107.1]  # Mpc
luminosities = [1e40, 3e46, 4.9e38, 6.2e45, 5.5e41] # erg/s

dataset = { 'names' : names, 'distances' : distances, 'luminosities' : luminosities }

In [None]:
filename = 'datafile.dat'

By default, the file is opened in text mode, means that:
- only characters/string can be written;
- everything is read as a character.

## Binary files
- Writing binary content by hand is complicate and messy.
- In `python` we can use `pickle` to dump an arbitary object into a file.

In [None]:
import pickle

Works with basically any object (even your own classes), but it also very opaque:
- `python` specific, no cross-language standard;
- basically you need to know in advance what's inside the file;
- writing and reading iteratively is possible but complicate.

## The magic of JSON
- JSON (JavaScript Object Notation) is a standard encoding format that allows to write multiple data types in the form of a text file.
- You can think of a JSON file as a big nested dictionary.
- Most `python` native data types can be written as a JSON file. 

In [None]:
import json

- It seems like python syntax, but this is JSON.
- The file is human-readable!

## CSV
CSV is acronym for "comma separated values", it is the format of choice for tabular data. A CSV files consists of lines (entries) where different values (fields) are separated by commas.  

In [None]:
import csv