# Learn Python File I/O

**April 2021**<br>

Instruction, media content, examples and links to resources that will help you build a foundation for Python competency. Jupyter Notebooks are available on [Google Colab](https://colab.research.google.com/drive/1gZGzJcaZ4un_4qc68SqceBKInRbBHIEp?usp=sharing) and [Github](https://github.com/AlphaWaveData/Jupyter-Notebooks/blob/master/Learn%20Python%20File%20IO.ipynb).

<b>Web Resources</b>
<br> <a href='https://docs.python.org/3/tutorial/inputoutput.html#reading-and-writing-files'>Docs.python.org - Reading and Writing Files</a>


#### <b>What is I/O?</b>

I/O, or input/output, is communication between a computer and the outside world.

Inputs are signals received by the computer. The computer can get inputs from hardware like a keyboard and mouse or from other computers via the internet. Outputs are signals sent by the computer. Your monitor is probably the most obvious output device. An internet modem is an example of a device that both receives inputs (web-pages loading on your browser) and sends outputs (outbound emails).

     
#### <b>Why do we use I/O?</b>

You can always define data directly in your python code:

```python
example_data = [1, 2, 3, 4, 5]
answer = function(example_data)
```


This is a great way to test functionality but isn't sustainable in the long term. It would be much easier if we could read in data from a file or API as well as write data down to files. This saves us having to manually define data and scales our solutions to handle arbitrary datasets.

## Files

Common file types you likely interact with are CSV and Excel files. The pandas library has a number of convenience functions for reading data from these files and converting them directly into dataframes.

```python
import pandas as pd

csv_dataframe = pd.read_csv('csv_file.csv')
excel_dataframe = pd.read_excel('excel_file.xlsx')
```

Each method accepts the path of the data file as an argument. File paths without a `/` indicate that the data file is in the same directory as the notebook we're writing our code in. If we want to read from a file inside a folder, then we'll need to update the file path, e.g. `folder/csv_file.csv`.

Both methods accept a number of different optional parameters for defining more precisely how you want them to read the data. Visit the [I/O section](https://pandas.pydata.org/docs/reference/io.html) of the online pandas documentation to learn more about these functions.

We can use the `%pwd` [Jupyter magic command](https://ipython.readthedocs.io/en/stable/interactive/magics.html) to find the local directory. Running the following code snippet in a project cell will print the path of your local directory:

```python
x = %pwd
print(x)
```

### Resources
Python has a rich library of packages and functions for dealing with files. [This online tutorial](https://www.programiz.com/python-programming/file-operation) explains in more detail how to use python to deal with File I/O (input and output).

### File IO Examples

1. You're writing some code inside `notebook.ipynb` and want to read the data from `bonds.csv` which is inside the `data` folder (see below). What file path do you need to specify to read the data into a dataframe: `pd.read_csv(...)`?

```
notebook.ipynb
data/
└── bonds.csv
```
`pd.read_csv('data/bonds.csv')`

2. You have a CSV with 3 blank rows at the top of the file. What parameter do you need to pass to the `read_csv` function to ignore these blank rows when reading in the data?

`skiprows=3`

3. A colleague has sent you some tabular data in a [JSON file](https://www.w3schools.com/js/js_json_intro.asp) that you've moved into the same folder as your notebook. You can use pandas to read this data into a dataframe:

`pd.read_json`