#### File Operation- Read And Write Files

File handling is a crucial part of any programming language. Python provides built-in functions and methods to read from and write to files, both text and binary. This lesson will cover the basics of file handling, including reading and writing text files and binary files.

In [1]:
### Read a Whole File

with open('example.txt','r') as file:
    content=file.read()
    print(content)

Hello world
This is a new line 



In [2]:
## Read a file line by line
with open('example.txt','r') as file:
    for line in file:
        print(line.strip()) ## sstrip() removes the newline character

'''The strip() function in Python removes leading and trailing whitespace characters 
(including spaces, tabs, and newline characters) from a string.'''

Hello world
This is a new line


'The strip() function in Python removes leading and trailing whitespace characters \n(including spaces, tabs, and newline characters) from a string.'

In [4]:
## Writing a file(Overwriting)

with open('example.txt','w') as file:
    file.write('Hello World!\n')
    file.write('this is a new line.')
    print(file.tell())

32


In [5]:
## Write a file(wwithout Overwriting)
with open('example.txt','a') as file:
    print(file.tell())
    file.write("Append operation taking place!\n")

32


In [6]:
### Writing a list of lines to a file


'''
In Python, the writelines() function expects an iterable (such as a list, tuple,set or generator) of strings to write to a file. The elements of the iterable will be written to the file without any modifications
 (so each string should include the newline character \n if you want to add new lines).'''
lines=['First line \n','Second line \n','Third line\n']
with open('example.txt','a') as file:
    file.writelines(lines)

Here’s a brief comparison between **binary files** and **text files**:

### 1. **Text Files**
   - **Definition**: Text files store data as human-readable characters, typically encoded in formats like ASCII or UTF-8.
   - **Content**: Contains only readable characters such as letters, digits, punctuation, and special symbols. Data in these files is stored as plain text.
   - **Line Breaks**: Text files use special characters like newline (`\n` or `\r\n`) to indicate line breaks.
   - **Examples**: `.txt`, `.csv`, `.html`, `.py` (source code files), `.json` (with UTF-8 encoding).
   - **Usage**: Ideal for documents, configuration files, logs, and other text-based data.
   - **Editing**: Can be opened and edited by any text editor (e.g., Notepad, Vim, VS Code).
   - **Human-readable**: Can be read directly by a human with basic knowledge of text encodings.

   **Example (Text File Content)**:
   ```
   Hello, World!
   This is a text file.
   ```

### 2. **Binary Files**
   - **Definition**: Binary files store data in the form of binary code (0s and 1s), and are not directly human-readable.
   - **Content**: Contain data in raw format such as images, audio, video, or compiled programs. The data is not meant to be interpreted as text.
   - **No Line Breaks**: Binary files do not use text-based line breaks; instead, they store data in fixed-size chunks or streams.
   - **Examples**: `.exe`, `.jpg`, `.mp3`, `.png`, `.dat`, `.zip` (compressed files).
   - **Usage**: Used for storing images, multimedia, executable programs, archives, and more.
   - **Editing**: Cannot be opened and edited with standard text editors. Requires specialized software or hex editors.
   - **Human-readable**: Not human-readable; they require specific programs or libraries to interpret the data.

   **Example (Binary File Content)**:
   - A binary file would look like a sequence of bytes, and attempting to open it in a text editor will show unreadable characters.

### Key Differences:
| Aspect               | Text Files                         | Binary Files                         |
|----------------------|-------------------------------------|--------------------------------------|
| **Content**          | Human-readable text                | Raw binary data (not human-readable) |
| **Editing**          | Can be edited with text editors     | Requires specialized programs        |
| **Line Breaks**      | Uses newline characters (`\n`)      | No text-based line breaks            |
| **Examples**         | `.txt`, `.csv`, `.json`            | `.exe`, `.jpg`, `.mp3`, `.zip`       |
| **Size Efficiency**  | Less efficient for large data      | More efficient for storing complex data like images or audio |

### Summary:
- **Text files** are simple and used for storing plain text, while **binary files** are used to store data that cannot be directly interpreted as text (e.g., images, audio, software).


In [7]:
''' More depth'''

' More depth'

You're absolutely right! **All data, including text files**, is ultimately stored as **bits** (0s and 1s) on a computer. However, when we refer to **text files** as being "human-readable," we mean that the data is stored in a way that can be interpreted as meaningful characters by humans without needing special software or encoding knowledge. 

Here's what happens in the distinction between text and binary files:

### **Text Files (Human-Readable)**

- **Character Encoding**: Text files use specific **character encodings** (like ASCII, UTF-8, or UTF-16) to map human-readable characters (letters, numbers, punctuation, etc.) to **binary representations**.
  - For example, in ASCII encoding:
    - The letter 'A' is stored as the binary number `01000001`.
    - The letter 'B' is stored as `01000010`.
  
- **Human-Readable Format**: When you open a text file with a program like Notepad or a text editor, the program reads these binary values (according to the chosen encoding) and converts them back into characters that humans can read.
  
- **Structure**: Text files are typically structured with line breaks and plain characters, which are easy to understand. For instance, a text file might contain:
  ```
  Hello, World!
  This is a text file.
  ```
  Internally, the text file would store this as a sequence of bits corresponding to the ASCII/UTF-8 values for each character.

### **Binary Files (Not Human-Readable)**

- **No Direct Character Representation**: In binary files, data is stored in raw form, and the binary content does not directly map to human-readable characters.
  
  For example:
  - A **JPEG image file** might start with a binary header like `11111111 10101000 11010001`, which represents image-specific metadata and pixel data that, when interpreted by an image viewer, displays an image.
  - An **executable file** might start with a header like `01101000 00010111 10000000`, which the operating system reads and interprets as instructions for the CPU.
  
- **Not Intentionally Human-Readable**: Since the data isn't in a human-readable form, trying to open a binary file with a text editor results in a bunch of unreadable characters (random symbols or gibberish) because the text editor is trying to interpret the raw binary data as text.

### The Key Difference

- **Text files** store **textual characters** (which have specific binary equivalents defined by encodings) in a manner that is meaningful when interpreted as **text** (using an encoding like ASCII, UTF-8, etc.).
- **Binary files** store **data** (images, videos, executables) in a form that requires specialized programs to interpret the data as its intended format.

### Example:

If you save the text `"Hello"` in a text file:
- Internally, it might be stored as the following binary values (in ASCII encoding):
  ```
  H -> 01001000
  e -> 01100101
  l -> 01101100
  l -> 01101100
  o -> 01101111
  ```

If you save an image in a binary file (like a `.jpg`):
- The file will contain a sequence of bits that represent compressed pixel data and metadata, which when interpreted by an image viewer, would display the image, but if opened in a text editor, it will appear as random binary characters.

---

### Summary:
- **Human-readable** means that text files are stored in a format that can be directly interpreted as characters, using standard character encodings like ASCII or UTF-8. When opened in a text editor, these characters are **meaningful** to a person.
- **Binary files**, on the other hand, store raw data that doesn’t represent readable characters unless interpreted by the appropriate software. The data might represent images, audio, or programs in machine-readable formats that are not intended to be directly readable by humans.

In [12]:
### Binary Files

# Writing to a binary file
data = b'\x00\x01\x02\x03\x04'
with open('example.bin', 'wb') as file:
    file.write(data)


The expression `data = b'\x00\x01\x02\x03\x04'` is a way of representing **binary data** in Python using **byte literals**. Let me break it down:

### Explanation of `b'\x00\x01\x02\x03\x04'`:

1. **`b''`**: The `b` prefix indicates that the data is a **byte string** in Python. Byte strings are used to represent **binary data**, such as raw bytes, that are not encoded in any particular text encoding (e.g., UTF-8 or ASCII). This is distinct from regular strings that are meant to represent human-readable text.
   
   - `b"Hello"` would be a regular string (text).
   - `b'\x00\x01\x02\x03\x04'` is a byte string that represents raw binary data.

2. **`\x00`, `\x01`, `\x02`, `\x03`, `\x04`**: The `\x` notation is used to represent **hexadecimal** (base 16) values. Each value after `\x` represents a **byte** in hexadecimal format.

   - `\x00` represents the **hexadecimal** value `00`, which is equivalent to the **binary** value `00000000`.
   - `\x01` represents the **hexadecimal** value `01`, which is equivalent to the **binary** value `00000001`.
   - `\x02` represents the **hexadecimal** value `02`, which is equivalent to the **binary** value `00000010`.
   - `\x03` represents the **hexadecimal** value `03`, which is equivalent to the **binary** value `00000011`.
   - `\x04` represents the **hexadecimal** value `04`, which is equivalent to the **binary** value `00000100`.

   Hexadecimal is often used to represent binary data because it's more compact and easier to read. One hexadecimal digit can represent **four bits** (a nibble), so two hexadecimal digits represent one byte (8 bits). 

### Why Use Hexadecimal (`\x`) Instead of Binary?

- **Compactness**: It's more practical to write data in **hexadecimal** format than binary. A byte in binary would require 8 digits to represent (e.g., `00000000`), while hexadecimal only requires 2 digits (e.g., `00`). This makes it much easier to read and write large binary data.
  
- **Readability**: Hexadecimal representation is **human-readable** compared to binary, and it's more common in low-level programming when dealing with binary data.

### Conversion Between Binary, Hexadecimal, and Decimal

- **Binary (Base 2)**: The raw form that machines understand. A byte is 8 bits (e.g., `00000000`).
  
- **Hexadecimal (Base 16)**: A shorthand way to represent binary. Two hexadecimal digits represent 1 byte (e.g., `0x00` = `00000000` in binary).

- **Decimal (Base 10)**: The base that humans commonly use (e.g., `0` to `255` for a byte).

### Example Conversion:
- `\x00` (hexadecimal) is `00000000` in binary and `0` in decimal.
- `\x01` (hexadecimal) is `00000001` in binary and `1` in decimal.
- `\x02` (hexadecimal) is `00000010` in binary and `2` in decimal.
- `\x03` (hexadecimal) is `00000011` in binary and `3` in decimal.
- `\x04` (hexadecimal) is `00000100` in binary and `4` in decimal.

### Why Not Just Use Bits?

While it’s possible to work directly with bits (e.g., `0b00000000`), it’s far more cumbersome. Hexadecimal notation provides a compact and readable way to represent bytes (8 bits) at a time, which is much easier to handle, especially when working with larger binary data, such as images, files, or memory addresses.

---

### Summary:
- `b'\x00\x01\x02\x03\x04'` is a byte string, where each `\x` followed by two hexadecimal digits represents a single byte (8 bits).
- Hexadecimal is used for its **compactness** and **readability** compared to binary, and each pair of hexadecimal digits represents 1 byte.


In [13]:
# Reading a binary file
with open('example.bin', 'rb') as file:
    content = file.read()
    print(content)

b'\x00\x01\x02\x03\x04'


In [14]:
### Read the content froma  source text fiile and write to a destination text file
# Copying a text file
with open('example.txt', 'r') as source_file:
    content = source_file.read()

with open('destination.txt', 'w') as destination_file:
    destination_file.write(content)


The code you provided works correctly because of the way Python's `with` statement handles file operations and the scope of variables. Let me explain in more detail:

### Key Points:
1. **`with` Statement**: The `with` statement in Python ensures that resources (like files) are properly managed, meaning they are automatically opened and closed when the block is exited. It's a context manager that helps to avoid issues like forgetting to close a file.

2. **Scope of Variables**: The variable `content` is defined in the **outer scope** of the `with` block. So, even though the file `source_file` is closed at the end of the first `with` block, the variable `content` is still in scope for use in the second block. 

   The key here is that the `with` block itself manages file handling (i.e., opening and closing files), but the variables you define inside that block, like `content`, remain available in the outer scope after the block ends.

### Step-by-Step Execution:
1. **First `with` Block (Reading from the Source File)**:
   ```python
   with open('example.txt', 'r') as source_file:
       content = source_file.read()
   ```
   - The file `example.txt` is opened in **read mode** (`'r'`), and its contents are read into the variable `content`.
   - After the `with` block, the file `source_file` is automatically closed, but the `content` variable still holds the data from the file.

2. **Second `with` Block (Writing to the Destination File)**:
   ```python
   with open('destination.txt', 'w') as destination_file:
       destination_file.write(content)
   ```
   - The file `destination.txt` is opened in **write mode** (`'w'`).
   - The `content` variable, which holds the data read from the `source_file`, is written into `destination.txt`.
   - After the second `with` block, the `destination_file` is closed, and the file handling is completed.

### Why It Works:
- The **`content` variable** exists in the scope of the whole function or script, so once the first `with` block has finished, `content` is still available for use in the second `with` block.
- **File handling (open/close)** is automatically managed by the `with` statement. Once the first block is done, the `source_file` is closed, but the data in `content` can still be used outside of that block.

### Example with Clear Scope:
```python
# Copying a text file
content = ""  # content is declared outside of 'with' block to ensure it's available after the first block
with open('example.txt', 'r') as source_file:
    content = source_file.read()  # content is filled with the file's content

with open('destination.txt', 'w') as destination_file:
    destination_file.write(content)  # content is written to the destination file
```
- **`content`** is defined outside the `with` block to show that the variable is **not limited to the block's scope**.
- The `with` statement manages file opening and closing but doesn't affect the scope of variables defined outside the block.

---

### Summary:
- The `content` variable persists in the **outer scope** of the `with` block, so you can use it in the second block, even though the first file has already been closed. The `with` block only controls the file handling (opening/closing), not the scope of variables.

In [12]:
#Read a text file and count the number of lines, words, and characters.
# Counting lines, words, and characters in a text file
def count_text_file(file_path):
    with open(file_path, 'r') as file:
        lines = file.readlines()
        print(lines)
        line_count = len(lines)
        '''
        word_count = (len(line.split()) for line in lines)
        print(word_count)#<generator object count_text_file.<locals>.<genexpr> at 0x1059d6a40>
        '''

        '''
        word_count = [len(line.split()) for line in lines]
        print(word_count) # [2, 8, 2, 2, 2]
        '''
        word_count = sum(len(line.split()) for line in lines)
        char_count = sum(len(line) for line in lines)
    return line_count, word_count, char_count

file_path = 'example.txt'
lines, words, characters = count_text_file(file_path)
print(f'Lines: {lines}, Words: {words}, Characters: {characters}')


['Hello World!\n', 'this is a new line.Append operation taking place!\n', 'First line \n', 'Second line \n', 'Third line\n']
Lines: 5, Words: 16, Characters: 99


The w+ mode in Python is used to open a file for both reading and writing. If the file does not exist, it will be created. If the file exists, its content is truncated (i.e., the file is overwritten).

In [16]:
### Writing and then reading a file

with open('example.txt','w+') as file:
    file.write("Hello world\n")
    file.write("This is a new line \n")

    ## Move the file cursor to the beginning
    print(file.tell())

    file.seek(0)

    lines=file.readlines()
    char_count=sum(len(line)for line in lines )
    print(char_count)

    file.seek(0)

    ## Read the content of the file
    content=file.read()
    print(content)

32
32
Hello world
This is a new line 

