# Files

This Topic will cover
- opening files in different modes
- reading and writing to files 
- using different file formats

I will cover ``pandas`` in a later topic. Pandas is very useful for reading and writing to excel files and manipulating data in a table format

##### Before we start:
You casn run a shell command in a notebook using !
so to see what is in this directory in windows ``!dir`` in mac or linux ``!ls``

There is also a "magic command ``%ls`` that will work on all OS (``%pwd`` will show the present working directory)

In [44]:
# this will run on all os
%ls

 Volume in drive C is OS
 Volume Serial Number is 2842-6AAD

 Directory of C:\Users\ABeatty\Desktop\pands2022\jupyternotebooks

27/02/2022  12:58    <DIR>          .
27/02/2022  12:58    <DIR>          ..
24/02/2022  15:56    <DIR>          .ipynb_checkpoints
11/02/2022  11:32    <DIR>          fig
27/02/2022  12:52                35 storeData.json
15/02/2022  11:28            21,944 topic03-variables.ipynb
11/02/2022  14:04            10,182 topic04-controlling the flow.ipynb
15/02/2022  11:28            24,043 topic05-data structures.ipynb
21/02/2022  20:58            31,797 Topic06-Defining-Functions.ipynb
27/02/2022  12:58            14,352 Topic07 files.ipynb
               6 File(s)        102,353 bytes
               4 Dir(s)  149,930,385,408 bytes free


## Opening files

Files should be opened with the ``with open( filename, mode) as f: `` command


For example open a text file for writing (if you run this file it will create a file in the same directory as the notebook, you can run the dir/ls command above again and see if test.txt is there, before and after you run the ``open()`` )

In [45]:
filename = "test.txt"
with open(filename, 'wt') as f:
    f.write("hello world")

This will take care of closing the file once the focus leaves the code block, even if an error occurs.

the old was of just using the ``open()`` function is not advised

```
f = open(filename, 'wt')
f.write("hello World")
f.close()
```
While the code above looks like it does the same as the ``with`` pattern, the file will not be closed if the ``f.write`` throws an error


Once you have opened a file you can use the functions to read and write to the file (depending on the mode)

| Function | Description |
| ---------------: | :----------------------------- |
| ``read()`` | Reads from the file, the amount of characters depends on the buffer size 
| ``readline()`` | Reads the next line from the file, returns blank if the end of file is reached, be aware that the string that this returns will have a \n new line character in it 
| ``readlines()`` | Returns all the lines as a list 
| ``for l in f:`` | Or you can use this pattern to loop through all the lines of a file 
| ``write(data)`` | Writes data to the file 
| ``print(data, file = f)`` | Or you can use the print function
| ``seek(offset)`` | Will move the file pointer along the file by offset number of bytes 

More information in the python documentation https://docs.python.org/3/library/functions.html#open

Which of these functions you can call is dependant on what mode the file is opened in

### The various modes in open()

| Mode | Description |
|---:|:---|
| 'a' | Writes will append to an existing file (throws an error if the file does not exist |
| 'r' | Read mode, can not write, (throws and error if the file does not exist |
| 'r+' | Read and write to existing file (throws an error if the file does not exist |
| 'w' | Write mode (no read) and create the file, this will delete the file if it already exists |
| 'w+'| write and read and create the file, deleteing the old one |
| 'x' | create the file, this will throw an error if the file already exists|

#### type of file
| letter | type of file |
|---:|:---|
| 't' | Text file |
| 'b' | File with binary data (eg jpeg)




### Some example

Open the text file we created above and output its contents. It is a good idea to speciify whether the file is text or binary 

In [24]:
with open('test.txt', 'rt') as f:
    for line in f: # read each line, one at a time
        print (line)
    

hello world


Try to open a file that does not exist in read write mode. (This throws an error

In [25]:
with open('nofileofthisname.txt', 'r+t') as f:
    print (f.read())

FileNotFoundError: [Errno 2] No such file or directory: 'nofileofthisname.txt'

Lets over write the file we created, and see its new contents (it will be empty)

In [26]:
with open('test.txt', 'w+t') as f:
    for line in f:
        print (line)
    print ("The file is now empty")

The file is now empty


## The ``os`` module

The ``os`` module is a built in module for Miscellaneous operating system interfaces.
It  can be used for manipulating the file system, there are a lot of functions in it I will be only looking at a few. See references below

### Deleting a file
Use the os module and its ``remove()`` function.


In [1]:
import os
os.remove('test.txt')  # this will remove the test.txt file

For directories use the ``rmdir()`` function (only on empty directories)
```
import os
os.rmdir('directory_name') # removes an empty directory
```

### To check if a file exists 
use ``os.path.exists()``

In [28]:
# I am assuming that os is already imported
filename = 'test.txt'
if os.path.exists(filename):
    print (filename, "already exists")
else:
    print(filename, "does not exist do you want to create it?")
    

test.txt does not exist do you want to create it?


More on this in realpython and the python documentation on ``os``

## File Formats
Files can have different format, Here are a few examples of different formats

| Format | Description | Example |
| --- | :--- | :--- |
| text | Simple text format, easy for a human to read, not easy to parse the data from | The rain in spain falls mainly in the plain|
| JSON | JavaScript Object Notation, very useful for passing and storing dictionary object | {"weather": {"area":"spainish plain", "forcast": "rain"}} |
| CSV | comma seperated variables, useful for storing tables of data | spain, plain , rain |
| TSV | Tab seperated variables same as CSV except with TABS | |
| Pickle | A binary file format used for storing Python Objects (variables and functions | hard to read by humans |
| XML | Extended markup language, used for storing data | ```<weather> <area> spainish plain </area> <forcast> rain </forcast> </weather>``` |
| XSL | excel spreadsheet, this is in a binary fomat| hard to read by humans |

There are many more formats.
Python has packages (modules) that are designed to help you manipulate files in different formast eg

`` import csv ``: For maniputating CSV and TSV files

`` import json `` : For maniputating JSON files

`` import pandas``: For excel spreadsheets and SQL database tables, this is very handy for data analysis, see week 10

There are of course many more


### JSON 
You will need the json module installed on your machine to run this (it comes with anaconda)
Here are two scripts
1. Saves a dict object to a file called storeData.json, Note: we open the file in write mode so the file will be created or overwriten each time we save
2. reads the file and prints some of the data
navigate to this directory and see what is in the storeData.json

see references for more information on the JSON module

In [2]:
import json

electricBill = {
    'name' : 'Andrew',
    'amount' : '999'
}

with open("storeData.json", "wt") as f:
    json.dump(electricBill, f) # writes the dictionary object to the file as a JSON object



In [3]:
# I am assuming that json has already been imported

# assuming theat the file storedata exists and contains json
with open("storeData.json", "rt") as f:
    readDict = json.load(f) # reads the file and converts the JSON object into a list of dictionary 
    print (readDict["name"])

Andrew


### CSV
You will need the csv module installed on your machine to run this (it comes with anaconda)

csv files are slightly more compliated to manipulate, I have made a tutorial video, showing how you could extract an email domain from a list of email addresses stored in a csv file, see learnonline

### References

- W3schools on files : https://www.w3schools.com/python/python_file_handling.asp
- realpython on file manipulation and the OS module: https://realpython.com/working-with-files-in-python/
- python documentation : https://docs.python.org/3/library/os.html#files-and-directories
