# Files

This Topic will cover
- opening files in different modes
- reading and writing to files 
- using different file formats

I will cover ``pandas`` in a later topic. Pandas is very useful for reading and writing to excel files and manipulating data in a table format

##### Before we start:
You can run a shell command in a notebook using !
so to see what is in this directory in windows ``!dir`` in mac or linux ``!ls``

There is also a "magic command ``%ls`` that will work on all OS (``%pwd`` will show the present working directory)

In [1]:
!ls # this does not run on my windows machine

'ls' is not recognized as an internal or external command,
operable program or batch file.


In [28]:
# this will run on all os including my windows machine
%ls

 Volume in drive C is OS
 Volume Serial Number is FC43-C010

 Directory of C:\Users\ABeatty\OneDrive - Atlantic TU\pands-course-material\jupyternotebooks

04/03/2024  15:55    <DIR>          .
05/02/2024  16:45    <DIR>          ..
04/03/2024  15:49    <DIR>          .ipynb_checkpoints
04/03/2024  15:53                62 data.csv
20/02/2024  13:32    <DIR>          fig
04/03/2024  15:49                37 storeData.json
04/03/2024  15:49                12 test.txt
04/03/2024  15:55            21,885 topic03-variables.ipynb
04/03/2024  15:56            11,385 topic04-controlling the flow.ipynb
04/03/2024  15:57            24,064 topic05-data structures.ipynb
04/03/2024  15:57            31,794 topic06-Functions.ipynb
04/03/2024  15:55            21,242 topic07-files.ipynb
               8 File(s)        110,481 bytes
               4 Dir(s)  111,029,784,576 bytes free


## Opening files and manipulating files

Files should be opened with the ``with open( filename, mode) as f: `` command


In [4]:
FILENAME = "test.txt"
with open(FILENAME) as f:
    str =  f.read()
    print (str)

hello world


This will open a file called test.txt for reading (it will throw an error if the file does not exist).
You can see I ran this after the opening the file in write mode below

##### Opening a file in write mode (this will create the file)
For example open a text file for writing (if you run this file it will create a file in the same directory as the notebook, you can run the dir/ls command above again and see if test.txt is there, before and after you run the ``open()`` )

In [12]:
FILENAME = "test.txt"
with open(FILENAME, 'wt') as f:
    f.write("hello world2")

This will take care of closing the file once the focus leaves the code block, even if an error occurs.

##### The old way (you may see this in some sample code on the internet)
The old was of just using the ``open()`` function is not advised

```
f = open(FILENAME, 'wt')
f.write("hello World")
f.close()
```
While the code above looks like it does the same as the ``with`` pattern, the file will not be closed if the ``f.write`` throws an error

### The various modes in open()
The default mode when you open a file is read (which will throw an error if the file does not exist)
But you can open the file in other modes if you wish (eg say you want the file to be created or not)

#### Mode
| Mode | Description |
|---:|:---|
| 'a' | Writes will append to an existing file (throws an error if the file does not exist |
| 'r' | Read mode, can not write, (throws and error if the file does not exist |
| 'r+' | Read and write to existing file (throws an error if the file does not exist |
| 'w' | Write mode (no read) and create the file, this will delete the file if it already exists |
| 'w+'| write and read and create the file, deleteing the old one |
| 'x' | create the file, this will throw an error if the file already exists|

#### type of file
| letter | type of file |
|---:|:---|
| 't' | Text file |
| 'b' | File with binary data (eg jpeg)





### Functions for file manipulation

Once you have opened a file you can use the functions to read and write to the file (depending on the mode)

| Function | Description |
| ---------------: | :----------------------------- |
| ``read()`` | Reads from the file, the amount of characters depends on the buffer size 
| ``readline()`` | Reads the next line from the file, returns blank if the end of file is reached, be aware that the string that this returns will have a \n new line character in it 
| ``readlines()`` | Returns all the lines as a list 
| ``for l in f:`` | Or you can use this pattern to loop through all the lines of a file 
| ``write(data)`` | Writes data to the file 
| ``print(data, file = f)`` | Or you can use the print function
| ``seek(offset)`` | Will move the file pointer along the file by offset number of bytes 

More information in the python documentation https://docs.python.org/3/library/functions.html#open

Which of these functions you can call is dependant on what mode the file is opened in

### Some examples

Open the text file we created above and output its contents. It is a good idea to speciify whether the file is text or binary 

In [6]:
with open('test.txt', 'rt') as f:
    for line in f: # read each line, one at a time
        print (line)
    

hello world2


Try to open a file that does not exist in read write mode. (This throws an error

In [7]:
with open('nofileofthisname.txt', 'r+t') as f:
    print (f.read())

FileNotFoundError: [Errno 2] No such file or directory: 'nofileofthisname.txt'

Lets over write the file we created, and see its new contents (it will be empty)

In [11]:
with open('test.txt', 'w+t') as f:
    for line in f:
        print (line)
    print ("The file is now empty")

The file is now empty


## The ``os`` module

The ``os`` module is a built in module for Miscellaneous operating system interfaces.
It  can be used for manipulating the file system, there are a lot of functions in it I will be only looking at a few. See references below

### Deleting a file
Use the os module and its ``remove()`` function.


In [9]:
import os
os.remove('test.txt')  # this will remove the test.txt file

For directories use the ``rmdir()`` function (only on empty directories)
```
import os
os.rmdir('directory_name') # removes an empty directory
```

### To check if a file exists 
use ``os.path.exists()``

In [10]:
# I am assuming that os is already imported
filename = 'test.txt'
if os.path.exists(filename):
    print (filename, "already exists")
else:
    print(filename, "does not exist do you want to create it?")
    

test.txt does not exist do you want to create it?


More on this in realpython and the python documentation on ``os``

## File Formats
Files can have different format, Here are a few examples of different formats

| Format | Description | Example |
| --- | :--- | :--- |
| text | Simple text format, easy for a human to read, not easy to parse the data from | The rain in spain falls mainly in the plain|
| JSON | JavaScript Object Notation, very useful for passing and storing dictionary object | {"weather": {"area":"spainish plain", "forcast": "rain"}} |
| CSV | comma seperated variables, useful for storing tables of data | spain, plain , rain |
| TSV | Tab seperated variables same as CSV except with TABS | |
| Pickle | A binary file format used for storing Python Objects (variables and functions | hard to read by humans |
| XML | Extended markup language, used for storing data | ```<weather> <area> spainish plain </area> <forcast> rain </forcast> </weather>``` |
| XSL | excel spreadsheet, this is in a binary fomat| hard to read by humans |

There are many more formats.
Python has packages (modules) that are designed to help you manipulate files in different formast eg

`` import csv ``: For maniputating CSV and TSV files

`` import json `` : For maniputating JSON files

`` import pandas``: For excel spreadsheets and SQL database tables, this is very handy for data analysis, see week 10

There are of course many more


### JSON 
**Java Script Object Notation**
The format of JSON looks very like a Dict object, except it always has ``"`` (double quotes) instead of ``'`` (single quotes)
eg
````
{
    "name" : "Andrew",
    "modules":[
        {"subject": "Math", "grade": 77},
        {"subject": "PANDS", "grade": 46}
    ]
    
}
````
|  | |  |
| :--- | :---: | :--- |
| Strings | have | ````""````|
| Integers | have | no quotes |
| Objects | have | ````{}```` |
| Lists | have | ```[]``` |


There is a molule called ``json`` that helps with manipulation (it comes with anaconda)
``dump()`` saves a dict as JSON into a file
``load()`` reads JSON from a file and returns a Dict object

Here are two scripts
1. Saves a dict object to a file called storeData.json, Note: we open the file in write mode so the file will be created or overwriten each time we save
2. reads the file and prints some of the data
navigate to this directory and see what is in the storeData.json

see references for more information on the JSON module

In [13]:
import json

electricBill = {
    'name' : 'Andrew',
    'amount' : '99999'
}

with open("storeData.json", "wt") as f:
    json.dump(electricBill, f) # writes the dictionary object to the file as a JSON object



In [24]:
FILENAME="storeData.json"
with open(FILENAME, "rt") as file:
    for line in file:
        print (line, end='')

{"name": "Andrew", "amount": "99999"}

In [19]:
# I am assuming that json has already been imported

# assuming theat the file storedata exists and contains json
with open("storeData.json", "rt") as f:
    readDict = json.load(f) # reads the file and converts the JSON object into a list of dictionary 
    print (readDict["amount"])

99999


The docs on the Json module https://docs.python.org/3/library/json.html

### CSV 
These are a little more basic, hence complicated to manipulate.

**Comma Separated Variables** (Could also be tab seperted or any other delimiter)
ie it is a file that has data like this
```
first, last, age
Andrew, Beatty, 21
Mary, Jones, 33
john, McGuire, 44
```
one way to extract this data is to user the CSV module (it comes with anaconda)


##### Reading
use csv.reader (the default delimiter is , (comma)
when you use the file line in reader pattern, 
line will contain a list of the variables on each line

In [23]:
FILENAME="data.csv"
with open(FILENAME, "rt") as file:
    for line in file:
        print (line, end='')

first,last,age
Andrew,Beatty,2
joe,Bloggs,22
Mary,mc,2222


In [21]:
import csv
FILENAME="data.csv"
with open(FILENAME, "rt") as file:
    csvReader = csv.reader(file, delimiter = ',') # delimiter can be anything, in this case a comma
    for line in csvReader: # line will be a list containing the variables in each line
        age = line[2]   # the age
        print(age)      # note this is printing the header row, I provide a solution to this in the tutorial

age
2
22
2222


##### Writing
Writing a CSV can be a little more complicated, 
the CSV module can take dictionaries, list of lists and other formats
I think I will do more of this in later weeks

NOTE: the newline='' fixes an issue in windows where an extra carrage return is put in at the end of each row
ie a blank line

In [26]:
mydict =[{'first': 'Andrew', 'last': 'Beatty', 'age':'2'},
         {'first': 'joe', 'last': 'Bloggs', 'age':'22'},
         {'first': 'Mary', 'last': 'mc', 'age':'2222'} 
        ] 
    
# field names 
fields = ['first', 'last', 'age'] 
    
# name of csv file 
FILENAME = "data.csv"
    
# writing to csv file 
with open(FILENAME, 'w', newline='') as csvfile: 
    # creating a csv dict writer object 
    writer = csv.DictWriter(csvfile, fieldnames = fields) 
        
    # writing headers (field names) 
    writer.writeheader() 
    for dictrow in mydict:
        print(dictrow)
        writer.writerow(dictrow) 

{'first': 'Andrew', 'last': 'Beatty', 'age': '2'}
{'first': 'joe', 'last': 'Bloggs', 'age': '22'}
{'first': 'Mary', 'last': 'mc', 'age': '2222'}


There is much more on this, but that for future weeks
I have made a tutorial video, showing how you could extract an email domain from a list of email addresses stored in a csv file, see VLE

### References

- W3schools on files : https://www.w3schools.com/python/python_file_handling.asp
- realpython on file manipulation and the OS module: https://realpython.com/working-with-files-in-python/
- python documentation : https://docs.python.org/3/library/os.html#files-and-directories
- JSON Module: https://docs.python.org/3/library/json.html
- JSON Tutorial in real python: https://realpython.com/python-json/
- CSV Module: https://docs.python.org/3/library/csv.html
- CSV Tutorial in Real Python: https://realpython.com/python-csv/
