### Some Theory

##### Types of data used for I/O:
- Text - '12345' as a sequence of unicode chars
- Binary - 12345 as a sequence of bytes of its binary equivalent

##### Hence there are 2 file types to deal with
- Text files - All program files are text files
- Binary Files - Images,music,video,exe 

### How File I/O is done in most programming languages

- Open a file
- Read/Write data
- Close the file

In [38]:
# case 1 - if the file is not present
f = open('sample.txt','w')
f.write('Hello world')
f.close()
# since file is closed hence this will not work
f.write('hello')

ValueError: I/O operation on closed file.

In [1]:
# write multiline strings
f = open('sample1.txt','w')
f.write('hello world')
f.write('\nhow are you?')
f.close()

In [37]:
# write lines
L = ['hello\n','hi\n','how are you\n','I am fine']

f = open('sample.txt','w')
f.writelines(L)
f.close()

In [5]:
# reading upto n chars
f = open('sample.txt','r')
s = f.read(10)
print(s)
f.close()

hello
hi
h


In [None]:
# reading entire using readline
f = open('sample.txt','r')
while True:
    data = f.readline()
    if data == "":
        break
    else:
        print(data,end="")
    

f.close()

hello
hi
how are you
I am fine

| Mode    | Description                     | File Created If Not Exists? | Overwrites? | Reads? | Writes? | Appends? | Type   |
| ------- | ------------------------------- | --------------------------- | ----------- | ------ | ------- | -------- | ------ |
| `'r'`   | Read only (default)             | ❌ No                        | ❌ No        | ✅ Yes  | ❌ No    | ❌ No     | Text   |
| `'w'`   | Write only (truncate first)     | ✅ Yes                       | ✅ Yes       | ❌ No   | ✅ Yes   | ❌ No     | Text   |
| `'a'`   | Append only                     | ✅ Yes                       | ❌ No        | ❌ No   | ✅ Yes   | ✅ Yes    | Text   |
| `'r+'`  | Read + Write (no truncate)      | ❌ No                        | ❌ No        | ✅ Yes  | ✅ Yes   | ❌ No     | Text   |
| `'w+'`  | Read + Write (truncate)         | ✅ Yes                       | ✅ Yes       | ✅ Yes  | ✅ Yes   | ❌ No     | Text   |
| `'a+'`  | Read + Append                   | ✅ Yes                       | ❌ No        | ✅ Yes  | ✅ Yes   | ✅ Yes    | Text   |
| `'rb'`  | Read only (binary)              | ❌ No                        | ❌ No        | ✅ Yes  | ❌ No    | ❌ No     | Binary |
| `'wb'`  | Write only (truncate, binary)   | ✅ Yes                       | ✅ Yes       | ❌ No   | ✅ Yes   | ❌ No     | Binary |
| `'ab'`  | Append only (binary)            | ✅ Yes                       | ❌ No        | ❌ No   | ✅ Yes   | ✅ Yes    | Binary |
| `'rb+'` | Read + Write (binary)           | ❌ No                        | ❌ No        | ✅ Yes  | ✅ Yes   | ❌ No     | Binary |
| `'wb+'` | Read + Write (truncate, binary) | ✅ Yes                       | ✅ Yes       | ✅ Yes  | ✅ Yes   | ❌ No     | Binary |
| `'ab+'` | Read + Append (binary)          | ✅ Yes                       | ❌ No        | ✅ Yes  | ✅ Yes   | ✅ Yes    | Binary |


In [None]:
# Type data while writing the file
f=open("append.txt","a+")
f.write(str(456))
f.write("3455")

In [71]:
# more complex data
d = {
    'name':'nitish',
     'age':33,
     'gender':'male'
}

with open('dict.txt','w') as f:
  f.write(str(d))

In [None]:
with open('sample.txt','r') as f:
  print(dict(f.read()))   # data in str format

ValueError: dictionary update sequence element #0 has length 1; 2 is required

### Using Context Manager (With)

- It's a good idea to close a file after usage as it will free up the resources
- If we dont close it, garbage collector would close it
- with keyword closes the file as soon as the usage is over

### Why its important to close the file
- If Many files remained unclosed, your program could hit the system file handle limit(To many open files error)
- When writting a file, the data may be buffered (held in memory)
- On some systems (like Windows), an unclosed file may stay locked and inaccessible to other programs until your script ends.
- It may close the file later than expected, causing intermittent bugs that are hard to debug.

In [21]:
f= open("yamuna.txt","w")

f.write("Yamuna")
print("Before closing manually: Is file closed?",f.closed)  # check the file is closed or not
print(f.close())
print("After closing manually: Is file closed?",f.closed)

Before closing manually: Is file closed? False
None
After closing manually: Is file closed? True


| File Type    | Example        | Mode   | Module Needed?         |
| ------------ | -------------- | ------ | ---------------------- |
| Plain Text   | `.txt`         | `"r"`  | No                     |
| CSV          | `.csv`         | `"r"`  | `csv` (optional)       |
| JSON         | `.json`        | `"r"`  | `json`                 |
| Binary Image | `.jpg`, `.png` | `"rb"` | No                     |
| PDF          | `.pdf`         | `"rb"` | `PyPDF2` (for parsing) |
| Pickle       | `.pkl`         | `"rb"` | `pickle`               |
| Zip          | `.zip`         | `"r"`  | `zipfile`              |
| Gzip         | `.gz`          | `"rt"` | `gzip`                 |


In [30]:
# try f.read() now
with open('sample.txt','r') as f:
  print(f.readlines())

['hello\n', 'hi\n', 'how are you\n', 'I am fine']


In [34]:
# moving within a file -> 10 char then 10 char
with open('sample.txt','r') as f:
  print(f.read(10))
  print(".....")
  print(f.read(10))

hello
hi
h
.....
ow are you


In [35]:
# benefit? -> to load a big file in memory
big_L = ['hello world ' for i in range(1000)]

with open('big.txt','w') as f:
  f.writelines(big_L)

## seek(offset, whence)
- Moves the file pointer to a specific position
Parameters:
    - offset: number of bytes to move.
    - whence: optional (default is 0)
        - 0 → from start of file
        - 1 → from current position
        - 2 → from end of file
## tell()
- Returns the current position of the file pointer (in bytes).

In [None]:
# seek and tell function
f= open("yamuna.txt","r")
f.seek(0)
print(f.tell())
f.read()
print(f.fileno())


0
3


In [48]:
with open('sample2.txt','w') as f:
  f.write('Hello')
  f.seek(0)
  f.write('Xa')

In [50]:
# Binary file reading
with open("space.png","rb") as f:
    print(f.read())

b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x01\xa4\x00\x00\x01^\x08\x03\x00\x00\x00\xfd\x9b\x9bx\x00\x00\x02\xf7PLTE\x0c\x11,\x04\x1b:\x08\x176\n\x141\x01!D\x02\x1fA\x03\x1d=\xd4\x92m\x01\x1a6\x02\x15,\x02\x181\x02\x12&\x02\x10$\xde6L\xc4_]\x01\x01\x01\xff\xaba\x01\xff\xfc\xff\xf6X\xff\xf5S\n\x05\x06\xff\xff\xffW>6b\xe8G\xeb\xb0E\xff\xecK\xff\xfdqtDA\xff\xf6_\x98\x82y\xaaG\x95\x16\xd8\xff\x84[K\x12\x07\x08\x02\x06\x0c\x03\r\x1c\x1b\n\x0eS\xb0\xb9\xd0p0q#K\x9c\xf8\xff\x03\n\x14\xb7\xfe\xfe\x0e\x0b\x11\xff\xf7h\xd94J\xd2\x8dk\xcdze8E\x97{\x1e&\xff\xc5)\xce\x84g\xfd\x02\xb0\x87#)\xc8i`#\x15\x11\xffy\xe1O\x13\x1c\xff\xc9\xf3\xcaqb\x18\x14\x0b4\x0c\x15\xc6b^D$"7 \x1e\'\n\x11\xfd\xa0^l\x1a&\xd02H\x91(/_\x17"\xbd-A\xd1\x89i\xbbZV\xda\xbbK\xc71D-\x15\x1b\xe2@N\xe4JOV\x02f\x1c\x12\x1cA\x10\x18\x16\x97\xb7\xe5\xff\x11*!\x12\xefmV\x10\x17\x1f\xacRO\xaa):\x9c%5\x1d\x1e+\n\x10\x08\x14*>\xf8\x94\\S*)\xf5\x87Z\xe7UQ\xb4,@pqp\xeb`S\xb4WRi65\xf8\xedd\x86CA\xf2{XB\x13.y:8\x9fNLH4*C6\x16a\xc5\xc7\xb3w

In [None]:
from PIL import Image   # if we want to read and display the binary files use the python modules like PIL, Tkinter,Opencv

# Open and display image
with open("space.png", "rb") as f:
    img = Image.open(f)
    img.show()


In [None]:
# write binary file
with open("space.png","rb") as f:
   with open("space-copy.png","wb") as wf:
      wf.write(f.read())
      

In [None]:
# Read the log file and write lines with "ERROR" to a new file
# Problem: You have a log.txt file. Extract and save all lines that contain the word "ERROR".
with open("log.txt", "r") as f, open("error_logs.txt", "w") as out:
    for line in f:
        if "ERROR" in line:
            out.write(line)


In [2]:
# Problem: Clean a file sensor_data.txt by removing empty lines and lines that start with # (comments).
with open("sensor_data.txt", "r") as f, open("clean_data.txt", "w") as out:
    for line in f:
        line = line.strip()
        if line and not line.startswith("#"):
            out.write(line + "\n")

In [3]:
# Problem: write a function that Counts how many times a specific word (e.g., “python”) appears in a .txt document.
def count_word(filename, keyword):
    with open(filename, "r") as f:
        content = f.read().lower()
        return content.count(keyword.lower())

print(count_word("w.txt", "python"))

3


In [4]:
#  Use Case: Content indexing, SEO analysis, or keyword tracking.
with open("employees.txt", "r") as f, open("top_performers.txt", "w") as out:
    for line in f:
        name, emp_id, score = line.strip().split(",")
        if int(score) > 80:
            out.write(line)

In [17]:
#  [File Monitor] Detect Duplicate Lines in a Text File
# Problem: Detect and list duplicate lines in a file.
from collections import Counter

# Read all lines and strip whitespace
# lines=[]
with open("data_duplicate.txt", "r") as f:
    lines=f.readlines()
    # for line in f:
    #     lines.append(line.strip())
    # print(lines)

# Count occurrences of each line
line_counts = Counter(lines)
print(line_counts)
# Write only unique lines (count == 1) to a new file
with open("unique_lines.txt", "w") as out:
    for line in lines:
        if line_counts[line] == 1:
            out.write(line)



Counter({'The system started successfully.\n': 2, 'User John logged in.\n': 2, 'Connection to database established.\n': 2, 'An error occurred while processing the request.\n': 1, 'Backup completed at midnight.\n': 1, 'The report was generated successfully.\n': 1, 'The system started successfully.': 1})


In [1]:
#  collecions module example
from collections import Counter
a=[1,2,3,1,3,4,5,2,1,1]
c= Counter(a)
print(c)

Counter({1: 4, 2: 2, 3: 2, 4: 1, 5: 1})
