# <font color=darkred>Laboratory 9: Working with Files </font>

In [None]:
# Preamble script block to identify host, user, and kernel
import sys
! hostname
! whoami
print(sys.executable)
print(sys.version)
print(sys.version_info)

## Full name: 
## R#: 
## Title of the notebook:
## Date:
___

![](https://www.winzip.com/static/wz/images/learn/features/archive-file/archive-file.png) <br>


### <font color=purple>Background</font>

A computer file is a computer resource for recording data discretely (not in the secretive context, but specifically somewhere on a piece of hardware) in a computer storage device. Just as words can be written to paper, so can information be written to a computer file. Files can be edited and transferred through the internet on that particular computer system.

There are different types of computer files, designed for different purposes. A file may be designed to store a picture, a written message, a video, a computer program, or a wide variety of other kinds of data. Some types of files can store several types of information at once.

By using computer programs, a person can open, read, change, save, and close a computer file. Computer files may be reopened, modified, and copied an arbitrary number of times.

Typically, files are organised in a file system, which keeps track of where the files are located on disk and enables user access. 

### <font color=purple>File system</font>

In computing, a file system or filesystem, controls how data is stored and retrieved. 
Without a file system, data placed in a storage medium would be one large body of data with no way to tell where one piece of data stops and the next begins. 
By separating the data into pieces and giving each piece a name, the data is isolated and identified. 
Taking its name from the way paper-based data management system is named, each group of data is called a “file”. 
The structure and logic rules used to manage the groups of data and their names is called a “file system”.

### <font color=purple>Path</font>

A path, the general form of the name of a file or directory, specifies a unique location in a file system. A path points to a file system location by following the directory tree hierarchy expressed in a string of characters in which path components, separated by a delimiting character, represent each directory. The delimiting character is most commonly the slash (”/”), the backslash character (”\”), or colon (”:”), though some operating systems may use a different delimiter.
Paths are used extensively in computer science to represent the directory/file relationships common in modern operating systems, and are essential in the construction of Uniform Resource Locators (URLs). Resources can be represented by either absolute or relative paths.
As an example consider the following two files:

1. /Users/Afam/Downloads/ENGR1330_Syllabus_Fall2024.pdf     (Mac and Linux)
2. c:\Users\Afam\Downloads\ENGR1330_Syllabus_Fall2024.pdf    (Windows)

They both have the same file name, but are located on different paths. 
Failure to provide the path when addressing the file can be a problem. 
Another way to interpret is that the two unique files actually have different names, and only part of those names is common (Guest.conf)
The two names above (including the path) are called fully qualified filenames (or absolute names), a relative path (usually relative to the file or program of interest depends on where in the directory structure the file lives. 
If we are currently in the .git directory (the first file) the path to the file is just the filename.

### <font color=purple>File Manipulation</font>

 Files can be "created","read","updated", or "deleted" (CRUD). <br>
 You need to select the location when you files will be created and you can use the OS Library. <br>
 To Check your current working directory you use **os.getcwd()** or **%pwd**<br>
 To change your current working directory you use **os.chdir()**<br>


### <font color=purple>File Types</font>
There are numerous file types used in the digital world, each designed for specific purposes. To determine the exact file type you are working with, you can check the file extension, which is typically found at the end of the file name. The extension usually consists of a few letters following a dot (e.g., .docx, .jpg, .pdf) and helps identify the file's format and associated software. Understanding file extensions is essential, as it allows you to select the appropriate software to open, edit, or share the file. In this course, we will consider some of the relevant file types to be successful in a programing environment. 

1. Text Files. Text files are regular files that contain information readable by the user. This information is stored in ASCII. You can display and print these files. The lines of a text file must not contain NULL characters, and none can exceed a prescribed (by architecture) length, including the new-line character.  The term text file does not prevent the inclusion of control or other nonprintable characters (other than NUL). Therefore, standard utilities that list text files as inputs or outputs are either able to process the special characters gracefully or they explicitly describe their limitations within their individual sections.

2. Binary Files. Binary files are regular files that contain information readable by the computer. Binary files may be executable files that instruct the system to accomplish a job. Commands and programs are stored in executable, binary files. Special compiling programs translate ASCII text into binary code.  The only difference between text and binary files is that text files have lines of less than some length, with no NULL characters, each terminated by a new-line character.

3. Directory Files. Directory files contain information the system needs to access all types of files, but they do not contain the actual file data. As a result, directories occupy less space than a regular file and give the file system structure flexibility and depth. Each directory entry represents either a file or a subdirectory. Each entry contains the name of the file and the file's index node reference number (i-node). The i-node points to the unique index node assigned to the file. The i-node describes the location of the data associated with the file. Directories are created and controlled by a separate set of commands.

4. CSV or TSV Files
There are many scientific data are stored in the comma-separated values (CSV) file format or Tab-seperated Values. CSV files use a delimited text file that uses a comma to separate values and TSV uses tabs. It is a very useful format that can store large tables of data (numbers and text) in plain text. Each line (row) in the data is one data record, and each record consists of one or more fields, separated by commas. It also can be opened using Microsoft Excel, and visualize the rows and columns. Python has its own csv module that could handle the reading and writing of the csv file, you can see the details in the documentation. But we are not going to introduce this csv module here. Instead, we will use the numpy package to deal with the csv file since many times we will read csv file directly to a numpy array or pandas Dataframes.

5. JSON Files
JSON is another format we are going to introduce. It stands for JavaScript Object Notation. A JSON file usually ends with extension “.json”. Unlike pickle, which is Python dependent, JSON is a language-independent data format, which makes it attractive to use. Besides, it is usually takes less space on the disk and the manipulation is faster than pickle (if you are interested, search online to find more materials about it). Therefore, it is a good option to store your data using JSON. In this section, we will briefly explore how to handle JSON files in Python.

___
### Examples
___
#### Directory/ Folder locations 

In [1]:
import os
#To get your current working directory
os.getcwd() 

'D:\\ENGR 1330\\Labs\\OneDrive_1_10-2-2024\\Lab009'

In [2]:
#To get your current working directory
%pwd  # list name of working directory, note it includes path, so it is an absolute path

'D:\\ENGR 1330\\Labs\\OneDrive_1_10-2-2024\\Lab009'

In [7]:
os.listdir()

['.ipynb_checkpoints',
 'OneDrive_1_10-2-2024',
 'OneDrive_1_10-2-2024.zip',
 'OneDrive_1_5-29-2024',
 'OneDrive_1_5-29-2024.zip',
 'OneDrive_1_5-29-2024_new',
 'OneDrive_2024-08-24',
 'OneDrive_2024-08-24.zip',
 'Section 1',
 'Section 2',
 'Week 1',
 'Week 4',
 'Week 5',
 'Week3']

### Exercise 9.1  (20 marks)
1. Change your current working directory to your systems default **documents** directory. 

2. Check your current working directory

3. Change your current working directory back to the first directory where this IPYNB file is saved. 

4. List all files in your directory

### <font color=purple> TXT Files</font>

So far, we used print function to display the data to the screen. But there are many ways to store data onto your disk and share it with other program or colleagues. For example, if I have some strings in this notebook, but I want to use them in another notebook, the easiest way is to store the strings into a text file, and then open it in another notebook. A text file, many times with an extension .txt, is a file containing only plain text. However, programs you write and programs that read your text file will usually expect the text file to be in a certain format; that is, organized in a specific way. <br>

To work with text files, we need to use open function which returns a file object. It is commonly used with two arguments:<br>
> ***f = open(filename, mode)*** <br>

The open() function takes two parameters; filename, and mode.<br>

There are four different methods (modes) for opening a file:<br>

"r" - Read - Default value. Opens a file for reading, error if the file does not exist<br>

"a" - Append - Opens a file for appending, creates the file if it does not exist<br>

"w" - Write - Opens a file for writing, creates the file if it does not exist<br>

"x" - Create - Creates the specified file, returns an error if the file exists<br>

In addition you can specify if the file should be handled as binary or text mode<br>

"t" - Text - Default value. Text mode<br>

"b" - Binary - Binary mode (e.g. images) <br>

"r+", open a file (do not create) for reading and writing.<br>

"w+", open or create a file for writing and reading, discard existing contents.<br>

"a+", open or create file for reading and writing, and append data to end of file.<br>

**Delete a file**

Delete can be done by a system call as we did above to clear the local directory<br>

In a JupyterLab notebook, we can either use<br>

    import os
    os.remove("myfirstfile.txt")


or <br>

    import sys
    ! del  myfirstfile.txt  # delete file if it exists, Use rm -f on Mac

they both have same effect, both equally dangerous to your filesystem.

___
### Example 1 - we create a file, then append to the file, read the contents and the delete it


In [34]:
# create file example, hence open with 'w'
externalfile = open("myfirstfile.txt",'w') # create connection to file, set to write (w), file does not need to exist
mymessage = 'message in a bottle\n' #some object to write, in this case a string
externalfile.write(mymessage)# write the contents of mymessage to the file
mymessage2 = 'More message in a bottle\t second line\n' #some object to write, in this case a string
externalfile.write(mymessage2)# write the contents of mymessage to the file
mymessage3 = 'More message in a bottle\t third line' #some object to write, in this case a string
externalfile.write(mymessage3)# write the contents of mymessage to the file
externalfile.close() # close the file connection

In [36]:
# We append to the file with 'a'. you can check where you save the file to see the changes.
externalfile = open("myfirstfile.txt",'a') # create connection to file, set to append (a), file does not need to exist
externalfile.write('\n') # adds a newline character
what_to_add = 'I love rock-and-roll, put another dime in the jukebox baby ... \n' 
externalfile.write(what_to_add) # add a string including the linefeed
what_to_add = '... the waiting is the hardest part \n' 
externalfile.write(what_to_add) # add a string including the linefeed
mylist = [1,2,3,4,5] # a list of numbers
what_to_add = ','.join(map(repr, mylist)) + "\n" # one way to write the list
externalfile.write(what_to_add)
what_to_add = ','.join(map(repr, mylist[0:len(mylist)])) + "\n" # another way to write the list
externalfile.write(what_to_add)
externalfile.close()

In [38]:
# we then read file using 'r'
externalfile = open("myfirstfile.txt",'r') # create connection to file, set to read (r), file must exist
silly_string = externalfile.read() # read the entire contents
externalfile.close() # close the file connection
print(silly_string)

message in a bottle
More message in a bottle	 second line
More message in a bottle	 third line
I love rock-and-roll, put another dime in the jukebox baby ... 
... the waiting is the hardest part 
1,2,3,4,5
1,2,3,4,5



In [40]:
# import os
file2kill = "myfirstfile.txt"
try:
    os.remove(file2kill) # file must exist or will generate an exception
except:
    pass # example of using pass to improve readability
print(file2kill, " missing or deleted !")

myfirstfile.txt  missing or deleted !


### Exercise 9.2 (20 marks)
1. Write a Python program that creates a text file named **File1.txt**. The program should generate and write 100 lines of text into the file. Each line should follow the format: **This is line Number X**, where X is the line number, starting from 0 and ending at 99. After writing all 100 lines, the program should properly close the file to ensure that all data is saved. Finally print a confirmation message that indicates the writing process has been completed successfully.


2. Read the entire Content of the file and store in a variable called **Lines_Read**

3. Using append, create a file with **your name** (i.e for Afam -> Afam.txt), write the contents ot the variable **Lines_Read** and also append this line to the end of the file **This is line Number 100**

4. Delete **File1.txt** and **File3.txt** and print confimation message if found and deleted else state not found

5. Read the text file with **your First Name** (i.e for Afam -> Afam.txt) and print the line 1 to line 6

___
### <font color=purple> CSV or TSV Files</font>
CSV (Comma-Separated Values) files are widely used for storing tabular data, which is data organized in rows and columns, similar to what you would see in a spreadsheet or database table. Each row in a CSV file typically represents a single record, and each column represents a specific attribute or field of that record. The simplicity of CSV files makes them an ideal format for storing and exchanging data between different applications, especially in cases where the data is structured but doesn't require the complexity of a database. CSV files are plain text, making them easy to create, read, and modify using various software tools like spreadsheet programs (e.g., Excel), text editors, or programming languages like Python.
when we create a csv file, We can open the csv file using Microsoft Excel.
![](https://pythonnumericalmethods.studentorg.berkeley.edu/_images/11.02.01-Write_csv.png)
We can also open the csv file using a text editor, we could see the values are separated by the commas.
![](https://pythonnumericalmethods.studentorg.berkeley.edu/_images/11.02.02-Open_csv_text.png)
### Example 2 - we create a csv file using numpy and read its contents

In [10]:
#Creating a csv file by generating 100 rows and 5 columns of data 
import numpy as np
data = np.random.random((100,5))
np.savetxt('test.csv', data, fmt = '%.2f', delimiter=',', header = 'c1, c2, c3, c4, c5')

In [12]:
#Reading the CSV file created
my_csv = np.loadtxt('test.csv', delimiter=',')
my_csv[:5, :]

array([[0.19, 0.25, 0.15, 0.18, 0.45],
       [0.48, 0.2 , 0.77, 0.78, 0.1 ],
       [0.66, 0.13, 0.91, 0.29, 0.18],
       [0.46, 0.65, 0.25, 0.06, 0.31],
       [0.05, 0.92, 0.14, 0.23, 0.84]])

In [17]:
#Example of changing directory and then saving an loading the file from there
os.chdir('D:\\ENGR 1330\\Labs') 
np.savetxt('test2.csv', data, fmt = '%.2f', delimiter=',', header = 'c1, c2, c3, c4, c5')
my_csv2 = np.loadtxt('D:\\ENGR 1330\\Labs\\test2.csv', delimiter=',')
my_csv2[:5, :]

array([[0.57, 0.22, 0.14, 0.84, 0.54],
       [0.39, 0.51, 0.09, 0.56, 0.52],
       [0.34, 0.45, 0.21, 0.05, 0.38],
       [0.91, 0.89, 0.66, 0.2 , 0.97],
       [0.41, 0.24, 0.65, 0.39, 0.17]])

### Exercise 9.3 (10 marks)

1. Read the File named **CSV_ReadingFile.csv** and print first line

2. Save the file as a Tab Seperated file with your **First Name** <br>
 *If name is afam save as Afam.tsv*

___
### <font color=purple> Binary files using Pickel Files</font>
We talked about saving data into text file or csv file. But in certain cases, we want to store dictionaries, tuples, lists, or any other data type to the disk and use them later or send them to some colleagues. This is where pickle comes in, it can serialize objects so that they can be saved into a file and loaded again later. Pickle can be used to serialize Python object structures, which refers to the process of converting an object in the memory to a byte stream that can be stored as a binary file on disk. When we load it back to a Python program, this binary file can be de-serialized back to a Python object.
### Example the create a dictionary save it as dict.pkl and read it back to screen

In [166]:
#Create a dictionary, and save it to a pickle file on disk
import pickle
dict_a = {'A':0, 'B':1, 'C':2}
pickle.dump(dict_a, open('dict.pkl', 'wb'))

In [175]:
# reading back our file
my_dict = pickle.load(open('dict.pkl', 'rb'))
my_dict

{'A': 0, 'B': 1, 'C': 2}

### Exercise 9.4 (10 marks)
1.  Create a **dictionary named after your last name**, it should have keys Rnumber, Gender and any other details you would like to add. Make sure the first values in the dictonary are yours and the store it in the file **pickle file named after your first name.**

2. Store the content of your pickle file in a variable **Pickle_File** and print the contents

___
### <font color=purple>JSON Files</font>
JSON format
The text in JSON is done through quoted string containing value in key-value pairs within {}. It is actually very similar to the dictionary we saw in Python. To use json to serialize an object, we use the json.dump function, which takes two arguments: the first one is the object, and the second argument is a file object returned by the open function. Note here the mode of the open function is ‘w’ which indicates write file. JSON supports different types, like strings and numbers, as well as nested lists, tuples and objects. <br>

For example:


In [223]:
#To save a dictionary as a JSON file in Python, you can use the `json` module.
import json

# Define the dictionary
school = {
    "school": "TTU",
    "address": {
        "city": "Lubbock",
        "state": "Texas",
        "ZipCode": "79415"
    },
    "list": [
        "student 1",
        "student 2",
        "student 3"
    ],
    "array": [1, 2, 3]
}

# Save the dictionary to a JSON file
json.dump(school, open('school.json', 'w'))

In [225]:
# Read a JSON file
my_school = json.load(open('./school.json', 'r'))
my_school

{'school': 'TTU',
 'address': {'city': 'Lubbock', 'state': 'Texas', 'ZipCode': '79415'},
 'list': ['student 1', 'student 2', 'student 3'],
 'array': [1, 2, 3]}

### Exercise 9.4 (10 marks)
1. Create a dictionary name it personal details, it should have keys **Name**,**Rnumber**,**School**,**Major**,**Address**,**Languages_Spoken** and **Siblings**. Make sure the first values in the dictonary are yours and the store it in a Json file named after your **first name.**

In [22]:
import json

# Define the dictionary
me = {
    "name":"Nengi Harry",
    "Rnumber" : 1001,
    "school": "TTU",
    "Major" : "Computer Science",
    "address": {
        "city": "Lubbock",
        "state": "Texas",
        "ZipCode": "79415"
    },
    "Languages_Spoken": ["English", "Pidgin"],
    "Siblings": ['Bisi','Bola','Bukki']
}

# Save the dictionary to a JSON file
json.dump(me, open('me.json', 'w'))

2. Read the contents of the Json file and print them.

In [23]:
# Read a JSON file
Me = json.load(open('me.json', 'r'))
Me

{'name': 'Nengi Harry',
 'Rnumber': 1001,
 'school': 'TTU',
 'Major': 'Computer Science',
 'address': {'city': 'Lubbock', 'state': 'Texas', 'ZipCode': '79415'},
 'Languages_Spoken': ['English', 'Pidgin'],
 'Siblings': ['Bisi', 'Bola', 'Bukki']}

---
## Lab Exercise 9.5 (30 marks)
Ensure you entered your full name and R Number in the top cell<br>
Convert to PDF ans save as Lab009.pdf <br>
In a single submission submit:
- Lab009.pdf
- Lab009.IPYNB
- 'first name' txt file
- 'first name' tsv file
- 'first name' pkl file
- 'first name' json file
___

___
![](https://media2.giphy.com/media/5nj4ZZWl6QwneEaBX4/source.gif) <br>

 
*Here are some great reads on this topic:* 
- __"Python File Handling: How to Create Text File, Read, Write, Open, Append Files in Python"__ by __Steve Campbell__ available at *https://www.guru99.com/reading-and-writing-files-in-python.html <br>
- __"Python Programming and Numerical Methods - A Guide for Engineers and Scientists,__ by __Qingkai Kong, Timmy Siauw, Alexandre Bayen__  available at "https://pythonnumericalmethods.studentorg.berkeley.edu/notebooks/chapter11.00-Reading-and-Writing-Data.html <br>
- __"Python File Handling Tutorial: How To Create, Open, Read, Write"__ available at *https://www.softwaretestinghelp.com/python/python-file-reading-writing/ <br>
- __"Python File Operations – Read and Write to files with Python"__ available at *https://www.journaldev.com/14408/python-read-file-open-write-delete-copy <br>
- __"Python Tutorial: File Objects - Reading and Writing to Files"__ by __Corey Schafer__ available at *https://www.youtube.com/watch?v=Uh2ebFW8OYM <br>

Learn more about CRUD with text files at https://www.guru99.com/reading-and-writing-files-in-python.html

Learn more about file delete at https://www.dummies.com/programming/python/how-to-delete-a-file-in-python/
    

___
![](https://media.csesoc.org.au/content/images/2019/10/learn11.gif) <br>


![](https://quotefancy.com/media/wallpaper/3840x2160/6361186-George-Bernard-Shaw-Quote-Life-isn-t-about-finding-yourself-Life.jpg)