# Reading and writig files in Python

Resources and outline

- [7.2. Reading and Writing Files](https://intranet.alxswe.com/rltoken/hFlrZ9E1XROVWcjwwyF52A)
- [8.7. Predefined Clean-up Actions](https://intranet.alxswe.com/rltoken/0OZ9fzPRjmKWZsID9IRJSg)
- [Dive Into Python 3: Chapter 11. Files](https://intranet.alxswe.com/rltoken/0osPfNU5d3Shh9PFWgYm9A) (*until “11.4 Binary Files” (included)*)
- [JSON encoder and decoder](https://intranet.alxswe.com/rltoken/l0B9_pFn1tgBvE7FrT14Zw)
- [Learn to Program 8 : Reading / Writing Files](https://intranet.alxswe.com/rltoken/ZvtAdnUzjnEVu1sjg3m_tQ)
- [Automate the Boring Stuff with Python](https://intranet.alxswe.com/rltoken/Ej8YjhxLXpzHW7_rNMd9XQ) (*ch. 8 p 180-183 and ch. 14 p 326-333*)
- [All about py-file I/O](https://intranet.alxswe.com/rltoken/TUatlpPV27S4zPogmQIPnQ)

Objectives

- How to open a file
- How to write text in a file
- How to read the full content of a file
- How to read a file line by line
- How to move the cursor in a file
- How to make sure a file is closed after using it
- What is and how to use the `with` statement
- What is `JSON`
- What is serialization
- What is deserialization
- How to convert a Python data structure to a JSON string
- How to convert a JSON string to a Python data structure

### Reading Files

Resources

  - https://realpython.com/read-write-files-python/#what-is-a-file
  - https://www.youtube.com/watch?v=Uh2ebFW8OYM

**What's a file?**

At its core, a file is a contiguous set of bytes used to store data. This data is organized in a specific format and can be anything as simple as a text file or as complicated as a program executable. In the end, these byte files are then translated into binary 1 and 0 for easier processing by the computer.
Files on most modern file systems are composed of three main parts:

1. **Header:** metadata about the contents of the file (file name, size, type, and so on)
2. **Data:** contents of the file as written by the creator or editor
3. **End of file (EOF):** special character that indicates the end of the file

What a file represent depends on the extension of the file, some examples are pdf, gif,  tif. There are thousands of [list](https://en.wikipedia.org/wiki/List_of_filename_extensions) out there.

**Considerations in reading and opening a file**<br>
**LINE ENDINGS**
Some other consideration when reading and writing a file is the LINE ENDING. Line ending are also know as newline characters that indicate the end of a line of a text. Different operatiing systems use different characters to represent line endings:
  - Unix and Linux use a single newline character (/n)
      ```
    Pug/r
    /n
    Jack Russell Terrier/r
    /n
    English Springer Spaniel/r
    /n
    ```
  - Windows uses two characters, a carriage return followed by a newline (/r/n)
    ```
    Pug/r/n
    Jack Russell Terrier/r/n
    English Springer Spaniel/r/n
    German Shepherd/r/n
    ```
  - Macintosh systems prior to Mac OS X use a single carriage return (/r)

**Character Encodings** Another consideration is common problem is the Character Encodings. An encoding is a translation from byte data to human readable characters, which is done by assigning a numerical value to human readable chracters. The two most common encodings are the [ASCII](https://www.ascii-code.com/)
 and [UNICODE](https://home.unicode.org/)  Formats. It’s important to note that parsing a file with the incorrect character encoding can lead to failures or misrepresentation of the characte

**Opneing and closing a file** You can open a file with  open() built-in function open() and close it in two ways; The first is file.close() and the second way to close a file is to use the with statement.

The default and most common is 'r', which represents opening the file in read-only mode as a text file. Other modes includes;

| Mode | Function |
| --- | --- |
| r | Open a file for reading. This is the default mode. |
| w | Open a file for writing. If the file already exists, overwrite its contents. Create a new file if the file does not exist. |
| a | Open a file for appending. Preserve the file’s contents, add new data to the end of the file. |
| r+ | Open a file for reading and writing. |
| w+ | Allows to write as well as read from the file. |
| a+ | Allows appending as well as reading from the file. |

Methods to reading a file

| Method | What It Does |
| --- | --- |
| [.read(size=-1)](https://docs.python.org/3.7/library/io.html#io.RawIOBase.read) | This reads from the file based on the number of size bytes. If no argument is passed or None or -1 is passed, then the entire file is read. |
| [.readline(size = 1)](https://docs.python.org/3.7/library/io.html#io.IOBase.readline) | This reads at most size number of characters from the line. This continues to the end of the line and then wraps back around. If no argument is passed or None or -1 is passed, then the entire line (or rest of the line) is read. |
| [.readlines()](https://docs.python.org/3.7/library/io.html#io.IOBase.readlines) | This reads the remaining lines from the file object and returns them as a list. |


In [29]:
# THIS READS THE ENTRIRE FILE
# with open('dog_breeds.txt', 'r') as reader:
#     # Read & print the entire file
#     print(reader.read())

# with open('dog_breeds.txt', 'r') as reader:
#     # Read the numbers of character specified
#     fconent = reader.read(20)
#     print(fconent, end='')

with open('dog_breeds.txt', 'r') as reader:
    # Read the numbers of character specified
    print(reader.read(100), end='')


# # THIS READS THE FILE CHAR BY CHAR
# with open('dog_breeds.txt', 'r') as reader:
#     # Read & print the first 5 characters of the line 5 times
#     print(reader.readline(5))
#     # Read & print the next first 7 characters each time including the space
#     print(reader.readline(7))

# # THIS ITERATES OVER THE WHOLE LINE
# with open('dog_breeds.txt', 'r') as reader:
#     for line in reader:
#         print(line, end='')

Pug
Jack Russell Terrier
English Springer Spaniel
German Shepherd
Staffordshire Bull Terrier
Cavalie

In [107]:
# Read multiple files  
import os

directory = os.getcwd()
for filename in os.listdir(directory):
    if filename.endswith('.txt'):
        with open(os.path.join(directory, filename), 'r') as file:
            contents = file.read()
            print(contents)
            # Do something with the contents of the file


Pug
Jack Russell Terrier
English Springer Spaniel
German Shepherd
Staffordshire Bull Terrier
Cavalier King Charles Spaniel
Golden Retriever
West Highland White Terrier
Boxer
Border Terrier
Pug
Jack Russell Terrier
English Springer Spaniel
German Shepherd
Staffordshire Bull Terrier
Cavalier King Charles Spaniel
Golden Retriever
West Highland White Terrier
Boxer
Border Terrier

This is the content 
 of the new file I am writing

This is the content 
 of the new file I am writing

We offer a truly innovative approach to education:
focus on building reliable applications and scalable systems, take on real-world challenges, collaborate with your peers. We offer a truly innovative approach to education:
 This is next

This School is so cool!

The file
1) This is a test file
2) With multiple lines of data...
3) Third line
4) Fourth line
5) Fifth line
6) Sixth line
7) Seventh line
8) Eighth line
9) Ninth line
10) Tenth line


### Writing Files

| Method | What It Does |
| --- | --- |
| .write(string) | This writes the string to the file. |
| .writelines(seq) | This writes the sequence to the file. No line endings are appended to each sequence item. It’s up to you to add the appropriate line ending(s). |

In [47]:
with open('dog_breeds_write.txt', 'w') as f: #The name of the new file is 'dog_breeds_write'
    f.write('This is the content /n of the new file I am writing/n')

print(f.mode)


w


# Working With Files in Python

https://realpython.com/working-with-files-in-python/

Making Directories

In [60]:
import os

path = os.getcwd()
print (os.listdir(path))

# Rename the file
# os.rename('dog_breeds_write.txt','dog_breeds_write2.txt') 

# Make a new directory
# os.mkdir("newdir")

# Change the directory
# # os.chdir("newdir") 

# Check the directory
# os.getcwd() 

# path = os.getcwd()
os.chdir(r"../26_input-output") # Move backward directory
# os.getcwd() # Check the directory

['27_JSON', 'AlxNote_Inandout.ipynb', 'ALX_input_output_Ex.ipynb', 'dog_breeds.txt', 'dog_breeds_reversed.txt', 'dog_breeds_reverseds.txt', 'dog_breeds_write.txt', 'dog_breeds_write2.txt', 'filename', 'input_output.ipynb', 'my_file_0.txt', 'my_first_file.txt', 'newdir', 'New_file.txt', 'Test.py', 'test.txt']


Filename Pattern Matching

https://realpython.com/working-with-files-in-python/#filename-pattern-matching

**Using fnmatch.fnmatch()**
Let’s suppose you want to find .txt files that meet certain criteria. For example, you could be only interested in finding .txt files that contain the word data, a number between a set of underscores, and the word backup in their filename

In [94]:
# Simple Filename Pattern Matching Using fnmatch
import os
path = os.getcwd()
# os.listdir() # List all files in the current directory

# Print all the files in the directory
# for file in os.listdir(path):
#     if file.endswith('.txt'):
#       print(file)


# Find all teh files that starts with dogs 
import fnmatch
for file in os.listdir(path):
   if fnmatch.fnmatch(file, 'dog*'):
      print(file)

# Find all teh files that starts with dogs using glob. glob also supports shell-style wildcards to match patterns
import glob
for file in glob.glob('dog*'):
   print(file)

dog_breeds.txt
dog_breeds_reversed.txt
dog_breeds_reverseds.txt
dog_breeds_write.txt
dog_breeds_write2.txt
dog_breeds.txt
dog_breeds_reversed.txt
dog_breeds_reverseds.txt
dog_breeds_write.txt
dog_breeds_write2.txt


Traversing Directories and Processing Files
https://realpython.com/working-with-files-in-python/#traversing-directories-and-processing-files

In [95]:
for dirpath, dirnames, files in os.walk('.'):
    print(f'Found directory: {dirpath}')
    for file_name in files:
        print(file_name)

Found directory: .
AlxNote_Inandout.ipynb
ALX_input_output_Ex.ipynb
dog_breeds.txt
dog_breeds_reversed.txt
dog_breeds_reverseds.txt
dog_breeds_write.txt
dog_breeds_write2.txt
filename_dog
input_output.ipynb
my_file_0.txt
my_first_file.txt
New_file.txt
Test.py
test.txt
Found directory: .\27_JSON
data_file.json
Deserialization.py
Esri.json
Json_Practise.ipynb
Loading JSON.ipynb
moviefile.txt
new_states.json
new_states_TO.json
Seirealization.py
states.json
World Terrain with Labels_v1.json
Yahoo_API_Data_Ex.ipynb
Found directory: .\newdir
dog_breeds_write.txt


Read multiple files

https://realpython.com/working-with-files-in-python/#reading-multiple-files

In [103]:
# File: fileinput-example.py
import fileinput
import sys

files = fileinput.input()
for line in files:
    if fileinput.isfirstline():
        print(f'\n--- Reading {fileinput.filename()} ---')
    print(' -> ' + line, end='')
print()

FileNotFoundError: [Errno 2] No such file or directory: '--ip=127.0.0.1'

# JSON encoder and decoder

Dedicated Note on JSON can be found here 
[JSON Practise](./27_JSON/Json_Practise.ipynb)

https://realpython.com/python-json/

JSON is a format for storing and transporting data. JSON is often used when data is sent from a server to a web page.JSON means Java script object notation

The process of encoding JSON is usually called **serialization**
. This term refers to the transformation of data into a series of bytes
 (hence serial
) to be stored or transmitted across a network. You may also hear the term **marshaling**
, but that’s a whole other discussion
. Naturally, **deserialization**
 is the reciprocal process of decoding data that has been stored or delivered in the JSON standard.

**JSON Objects:** JSON objects are written inside curly braces. Example ```{"firstName":"John", "lastName":"Doe"}```

**JSON Arrays:** JSON arrays are written inside square brackets.

Resource

- [Working With JSON Data in Python](https://www.geeksforgeeks.org/working-with-json-data-in-python/?ref=lbp)
- [Python – Difference between json.dump() and json.dumps()](https://www.geeksforgeeks.org/python-difference-between-json-dump-and-json-dumps/?ref=rp)-




In [9]:
a = {"employees":[
  {"firstName":"John", "lastName":"Doe"},
  {"firstName":"Anna", "lastName":"Smith"},
  {"firstName":"Peter", "lastName":"Jones"}
]}

# Data is a dictionary
print(type(a))

## Create a Json file from the dict file
import json
b= json.dumps(a)

print(type(b))

<class 'dict'>
<class 'str'>


Serializing JSON:

The process of encoding JSON is usually called serialization. This term refers to the transformation of data into a series of bytes (hence serial) to be stored or transmitted across a network. To handle the data flow in a file, the JSON library in Python uses dump() function to convert the Python objects into their respective JSON object, so it makes it easy to write data to files. See the following table given below. 


| Python | JSON |
| --- | --- |
| dict | object |
| list, tuple | array |
| str | string |
| int, long, float | number |
| True | true |
| False | false |
| None | null |

In [18]:
# Serializing Example 
import json


data = {
    "name": "Alice",
    "age": 30,
    "city": "New York"
}

# Using Dumps
json_string = json.dumps(data, indent=4)
print(json_string)

# Using dump
with open("data.json", "w") as f:
    json.dump(data, f, indent=4)

{
    "name": "Alice",
    "age": 30,
    "city": "New York"
}


dump and dumps

Both json.dump() and json.dumps() are methods in the Python json module that allow you to encode a Python object into a JSON formatted string. The difference between them is in how they handle the output:

json.dumps() returns a string, which can be used to transmit data over a network, or to store data in a file, for example.

json.dump() writes the encoded object directly to a file-like object (e.g. a file object), without returning any value.

In [22]:
# Dump and dumps example
var = {
      "Subjects": {
                  "Maths":85,
                  "Physics":90
                   }
      }

print(type(var))

# json.dump converts the dictionary file to Json 
with open("data_file2.json", "w") as p:
    json.dump(var, p, indent= 4)
    



<class 'dict'>


Deserializing JSON:

Deserialization is the opposite of Serialization, i.e. conversion of JSON objects into their respective Python objects. The load() method is used for it. If you have used JSON data from another program or obtained it as a string format of JSON, then it can easily be deserialized with load(), which is usually used to load from a string, otherwise, the root object is in a list or dict.

Just like serialization, there is a simple conversion table for deserialization

| JSON | Python |
| --- | --- |
| object | dict |
| array | list |
| string | str |
| number (int) | int |
| number (real) | float |
| true | True |
| false | False |
| null | None |

In [21]:
json_string = """
{
    "researcher": {
        "name": "Ford Prefect",
        "species": "Betelgeusian",
        "relatives": [
            {
                "name": "Zaphod Beeblebrox",
                "species": "Betelgeusian"
            }
        ]
    }
}
"""

data = json.loads(json_string)

print(type(data))

<class 'dict'>


json module, json.load() and json.loads() are two methods that can be used to parse JSON data. The difference between them is in how they handle the input:

json.load() reads JSON data from a file-like object (e.g. a file object), and returns a Python object corresponding to the JSON data.
json.loads() reads a string containing JSON data, and returns a Python object corresponding to the JSON data.

# TASKS

Write a function that writes a string to a text file (`UTF8`) and returns the number of characters written:

- Prototype: `def write_file(filename="", text=""):`
- You must use the `with` statement
- You don’t need to manage file permission exceptions.
- Your function should create the file if doesn’t exist.
- Your function should overwrite the content of the file if it already exists.
- You are not allowed to import any module

In [24]:
def read_file(filename=""):
    """""reads a text file(UTF8) and prints it to stdout"""
    with open(filename, "r", encoding="utf-8") as f:
        print(f.read(), end="")

read_file('dog_breeds_write.txt')

This is the content 
 of the new file I am writing


Write a function that appends a string at the end of a text file (`UTF8`) and returns the number of characters added:

- Prototype: `def append_write(filename="", text=""):`
- If the file doesn’t exist, it should be created
- You must use the `with` statement
- You don’t need to manage `file permission` or `file doesn't exist` exceptions.
- You are not allowed to import any module

In [None]:
def write_file(filename="", text=""):
    """returns the number of chars written to "filename" from "text" """
    with open(filename, 'w', encoding='utf=8') as f:
        return f.write(text)

Write a function that returns the JSON representation of an object (string):

- Prototype: `def to_json_string(my_obj):`
- You don’t need to manage exceptions if the object can’t be serialized.



Write a function that returns an object (Python data structure) represented by a JSON string:

- Prototype: `def from_json_string(my_str):`
- You don’t need to manage exceptions if the JSON string doesn’t represent an object.



Write a function that writes an Object to a text file, using a JSON representation:

- Prototype: `def save_to_json_file(my_obj, filename):`
- You must use the `with` statement
- You don’t need to manage exceptions if the object can’t be serialized.
- You don’t need to manage file permission exceptions.



Write a function that creates an Object from a “JSON file”:

- Prototype: `def load_from_json_file(filename):`
- You must use the `with` statement
- You don’t need to manage exceptions if the JSON string doesn’t represent an object.
- You don’t need to manage file permissions / exceptions.

Write a script that adds all arguments to a Python list, and then save them to a file:

- You must use your function `save_to_json_file` from `5-save_to_json_file.py`
- You must use your function `load_from_json_file` from `6-load_from_json_file.py`
- The list must be saved as a JSON representation in a file named `add_item.json`
- If the file doesn’t exist, it should be created
- You don’t need to manage file permissions / exceptions.

Write a function that returns the dictionary description with simple data structure (list, dictionary, string, integer and boolean) for JSON serialization of an object:

- Prototype: `def class_to_json(obj):`
- `obj` is an instance of a Class
- All attributes of the `obj` Class are serializable: list, dictionary, string, integer and boolean
- You are not allowed to import any module

Write a class `Student` that defines a student by:

- Public instance attributes:
    - `first_name`
    - `last_name`
    - `age`
- Instantiation with `first_name`, `last_name` and `age`: `def __init__(self, first_name, last_name, age):`
- Public method `def to_json(self):` that retrieves a dictionary representation of a `Student` instance (same as `8-class_to_json.py`)
- You are not allowed to import any module

Write a class `Student` that defines a student by: (based on `9-student.py`)

- Public instance attributes:
    - `first_name`
    - `last_name`
    - `age`
- Instantiation with `first_name`, `last_name` and `age`: `def __init__(self, first_name, last_name, age):`
- Public method `def to_json(self, attrs=None):` that retrieves a dictionary representation of a `Student` instance (same as `8-class_to_json.py`):
    - If `attrs` is a list of strings, only attribute names contained in this list must be retrieved.
    - Otherwise, all attributes must be retrieved
- You are not allowed to import any module

Write a class `Student` that defines a student by: (based on `10-student.py`)

- Public instance attributes:
    - `first_name`
    - `last_name`
    - `age`
- Instantiation with `first_name`, `last_name` and `age`: `def __init__(self, first_name, last_name, age):`
- Public method `def to_json(self, attrs=None):` that retrieves a dictionary representation of a `Student` instance (same as `8-class_to_json.py`):
    - If `attrs` is a list of strings, only attributes name contain in this list must be retrieved.
    - Otherwise, all attributes must be retrieved
- Public method `def reload_from_json(self, json):` that replaces all attributes of the `Student` instance:
    - You can assume `json` will always be a dictionary
    - A dictionary key will be the public attribute name
    - A dictionary value will be the value of the public attribute
- You are not allowed to import any module

Now, you have a simple implementation of a serialization and deserialization mechanism (concept of representation of an object to another format, without losing any information and allow us to rebuild an object based on this representation)

**Technical interview preparation**:

- You are not allowed to google anything
- Whiteboard first

Create a function `def pascal_triangle(n):` that returns a list of lists of integers representing the Pascal’s triangle of `n`:

- Returns an empty list if `n <= 0`
- You can assume `n` will be always an integer
- You are not allowed to import any module

Write a function that inserts a line of text to a file, after each line containing a specific string (see example):

- Prototype: `def append_after(filename="", search_string="", new_string=""):`
- You must use the `with` statement
- You don’t need to manage `file permission` or `file doesn't exist` exceptions.
- You are not allowed to import any module

Write a script that reads `stdin` line by line and computes metrics:

- Input format: `<IP Address> - [<date>] "GET /projects/260 HTTP/1.1" <status code> <file size>`
- Each 10 lines and after a keyboard interruption (`CTRL + C`), prints those statistics since the beginning:
    - Total file size: `File size: <total size>`
    - where is the sum of all previous (see input format above)
    - Number of lines by status code:
        - possible status code: `200`, `301`, `400`, `401`, `403`, `404`, `405` and `500`
        - if a status code doesn’t appear, don’t print anything for this status code
        - format: `<status code>: <number>`
        - status codes should be printed in ascending order