# Working with files

Very often, you will be working with your own text files, locally stored in your computer.

Before we can start processing and analysing the data with Python, we need Python to open and "read" these files.

Python can be used to work with files in different formats (plain text, comma- or tab-separated values, json, etc). 

We will focus on unstructured data in this section (i.e. unformatted plain text).

## Knowing where you are

We are used to interact with our computer via an interface, but you can also navigate your file system using the command line (also called the terminal). Melanie Walsh's [course](https://melaniewalsh.github.io/Intro-Cultural-Analytics/01-Command-Line/01-The-Command-Line.html) has a good introduction to understanding the command line.

<img src="images/terminalwindow.png" width="600" />

We only need to understand one command at the moment: `pwd`, which stands for "path of working directory" or, in other words, the folder in which you are right now.

Conveniently, Jupyter Notebook gives a way of accessing the command line without needing to use a terminal: if you add `%` or `!` at the beginning of a cell, whatever comes next will be interpreted as if you typed it in a terminal.

Therefore, we can check in which folder this notebook is located with `pwd`, like this:

In [None]:
!pwd

The output of this will be different for each of us, because we all have different folder structures!

## Paths

In order for python to access a file, we need to indicate the location to the file we want to open, i.e. in which folder the file is located.

This can be done by providing the file's absolute path or its relative path.

* **Absolute path:** This is the complete "address" of the file or folder in your machine, starting from the root. The absolute path will be different in each machine. For example, in my machine, the absolute path to the file `tale_beginning.txt` is:
  ```
  /Users/myself/Documents/dhoxss-text2tech/Sessions/data/tale_beginning.txt
  ```

* **Relative path:** The location of a file or folder relative to the location of the python script or notebook. For example, from this notebook, the relative path to the file `tale_beginning.txt` is:
  ```
  data/tale_beginning.txt
  ```

  Note: If you need to go one level up in the folder structure, you can do it with `../`.

## Opening and reading a file

Once you know the path to your file, you can open it using the following syntax:

```python
with open("path_to_file") as fr:
    text = fr.read()
```

Where:
* The first line opens the file (you should change `path_to_file` to the actual path to your file) and keeps it temporarily in variable `fr`.
* The second line (note that it's indented), fetches the actual text from variable `fr` and puts it in variable `text`.

For example, to read the content of `tale_beginning.txt`:

In [None]:
# Read the content of "tale_beginning.txt":
with open("data/tale_beginning.txt") as fr:
    text = fr.read()

In [None]:
# What's the data type of variable `text`?
type(text)

In [None]:
# And print its contents:
print(text)

✏️ **Question:** Without running the cells: what do you think will happen in each case?

In [None]:
# Read the content of "tale_beginning.txt":
with open("../Sessions/data/tale_beginning.txt") as fr:
    text = fr.read()

print(text)

In [None]:
# Read the content of "tale_beginning.txt":
with open("../data/tale_beginning.txt") as fr:
    text = fr.read()

print(text)

In [None]:
# Read the content of "tale_beginning.txt":
with open("../../dhoxss-text2tech/Sessions/data/tale_beginning.txt") as fr:
    text = fr.read()

print(text)