# File Handling
```
Type of data used I/O
* Every string in Python is a sequence of unicode characters.
* Text: A sequence of unicode characters
* Binary: 12345 as a sequence of bytes of it's binary equivalent.
```
### Text File
```
* All program files are text file
* Whenever we worked on text file it will always return only string whether we pass list, dictionaries, sets, tuples, interger etc.
```
### Binary File

```
e.g. Image, Audio, Video, exe etc
```

### open()
```
* Load file from ROM(Hard Drive) to RAM buffer memory
```
### close()

```
* Release file from memory but remain in ROM.
```

### Modes
```
w - Write Mode: When working on existing file it would replace old content of the file.
a - Append Mode: Add new content into the existing file without replacing the old content of the file.
r - Read Mode: To read the existing file
rb - Read binary files
wb - Write binary files
```

In [None]:
f = open("sample.txt", "w")
f.write("This content replaced the old content of the file")
f.close()

In [None]:
f = open("sample.txt", "a")
f.write("\nThis is the new content without replacing old content")
f.close()

In [None]:
f = open("sample.txt", "r")
print(f.read())
f.close()

This content replaced the old content of the file
This is the new content without replacing old content


### Write

```
write()
* To write single line in the file
writelines()
* To write multiple lines in the file
* Is the smart way to write multiple lines in the file without using loop
```

In [None]:
f = open("sample1.txt", "w")
f.write("Hello World!")
f.close()

In [None]:
l = ["This\n", "is\n","multiple\n", "lines\n", "sample\n", "text\n"]
f = open("sample2.txt", "w")
f.writelines(l)
f.close()

### read()
```
read()
```
```
readline()
When to use readline()?
* When we have large amount of data in file so that to avoid load on memory we use readline
```
```
readlines()
```

In [None]:
f = open("sample.txt", "r")
print(f.read())
f.close()


This content replaced the old content of the file
This is the new content without replacing old content


In [None]:
f = open("sample.txt", "r")
print(f.read(10))
f.close()

This conte


In [None]:
f = open("sample.txt", "r")
print(f.readline()) # print and readline both always change line, "\n" is the defult escape character for both
print(f.readline())
f.close()

This content replaced the old content of the file

This is the new content without replacing old content


In [None]:
f = open("sample.txt", "r")
print(f.readline(), end="")
print(f.readline(), end="")
f.close()

This content replaced the old content of the file
This is the new content without replacing old content

In [None]:
f = open("sample.txt", "r")
print(f.readlines(), end="")
print(f.readline(), end="")
f.close()

['This content replaced the old content of the file\n', 'This is the new content without replacing old content']

In [None]:

f = open("sample.txt", "r")
for i in f.readlines():
  print(i, end="")
f.close()

This content replaced the old content of the file
This is the new content without replacing old contentThis is another addition using with

In [None]:
f = open("sample.txt", "r")
chunk_size = 10
print(f.read(chunk_size))
while len(f.read(chunk_size)) != 0:
  print(f.read(chunk_size))
  f.read(chunk_size)
f.close()

This conte
d the old 
This is th
t replacin
 another a



In [None]:
f = open("sample.txt", "r")
print(f.readable(), end="") # readable always return bolean
f.close()

True

# Using context manager (with)
```
* with is the replacement of f.close().
* with keyword close the file as soon as the usage is over.
```

In [None]:
with open("sample.txt", "r") as f:
  print(f.read())

This content replaced the old content of the file
This is the new content without replacing old content


In [None]:
f.write("Trying to add content after with but this will not work")

ValueError: ignored

In [None]:
with open("sample.txt", "a") as f:
  print(f.write("This is another addition using with"))

35


```
Using chunk size we can be able to process large files
```

### tell()
```
* Inform the current cusor position or give information about how much text we processed
```
### seek()
```
* Move the cursor to our desired position
```

In [None]:
with open("sample.txt", "r") as f:
  print(f.readline())
  print(f.tell())

This content replaced the old content of the file

50


In [None]:
with open("sample.txt", "r") as f:
  print(f.read(50))
  print(f.seek(0))
  print(f.read(50))

This content replaced the old content of the file

0
This content replaced the old content of the file



# Problem with working text mode
```
* Can't working with binary files like image.
* Not good for other data types like int,float, list, dict, etc.
```

### Working with binary

In [None]:
with open("/content/my_logo2.jpg", "r") as f:
  print(f.read())

UnicodeDecodeError: ignored

In [None]:
with open("/content/my_logo2.jpg", "rb") as open_file:
  with open("/content/my_logo2_copy.jpg", "wb") as write_file:
    write_file.write(open_file.read())

### Working with other data type

In [None]:
with open("other_data_type.txt", "w") as f:
  f.write(5)

TypeError: ignored

In [None]:
with open("other_data_type.txt", "w") as f:
  f.write("5")

In [None]:
with open("other_data_type.txt", "r") as f:
  print(f.read() + 5)

TypeError: ignored

In [None]:
with open("other_data_type.txt", "r") as f:
  print(int(f.read()) + 5)

10


In [None]:
d = {
    "name": "Atif",
    "age" : 45,
    "gender" : "male"
}

with open("complex_data.txt", "w") as f:
  f.write(d)

TypeError: ignored

In [None]:
d = {
    "name": "Atif",
    "age" : 45,
    "gender" : "male"
}

with open("complex_data.txt", "w") as f:
  f.write(str(d))

In [None]:
with open("complex_data.txt", "r") as f:
  print(f.read())
  print(type(f.read()))

{'name': 'Atif', 'age': 45, 'gender': 'male'}
<class 'str'>


In [None]:
with open("complex_data.txt", "r") as f:
  print(dict(f.read()))
# This is the big big problem when we convert complex data type into string we lose the information.
# String can't be converted back into complex data type.

ValueError: ignored

# Serialization And De-Serialization

## Serialization:
```
Is the process of converting Python data types to JSON format.
```

In [None]:
import json
l = [10, 100, 50, 5.5, 102.3]
with open("list_file.json", "w") as f:
  json.dump(l, f)

In [None]:
import json
student = {
    "name" : "Atif",
    "marks" : [98, 99.5, 99.75, 100]
}

with open("dict_file.json", "w") as f:
  json.dump(student, f, indent=4)

# JSON (Java Script On Notation)
```
* It is the universal data format understand by all programming languages.
* Thant's whu it is very very special.
* It look likes Python dictionary(key value pair)
```


## De-Serialization:
```
Is the process of converting JSON to Python data types.
```

In [None]:
import json
with open("dict_file.json", "r") as f:
  dict1 = json.load(f)
  print(dict1)
  print(type(dict1))

{'name': 'Atif', 'marks': [98, 99.5, 99.75, 100]}
<class 'dict'>


In [None]:
import json
with open("list_file.json", "r") as f:
  new_list = json.load(f)
  print(new_list)
  print(type(new_list))


[10, 100, 50, 5.5, 102.3]
<class 'list'>


# Serialize Custom Objects
```
* We can serialize Python built-in data types into JSON.
* But we can't serilaze our custom object into JSON.
* But there is some special technique from which we can resolve this problem.
```

In [None]:
class Person:

  def __init__(self, fname, lname, age, gender):
    self.fname = fname
    self.lname = lname
    self.age = age
    self.gender = gender

p1 = Person("Atif", "Salam", 45, "Male")

In [None]:
import json
with open("custom_object.json", "w") as f:
  json.dump(p1, f)

TypeError: ignored

In [None]:
import json

def show_object(person):
  if isinstance(person, Person):
    return "{} {} age -> {} gender -> {}".format(person.fname, person.lname, person.age, person.gender)

with open("custom_object.json", "w") as f:
  json.dump(p1, f, default=show_object)

In [None]:
import json

def show_object(person):
  '''Shows how our custom object will look'''
  if isinstance(person, Person):
    return {
      "name": person.fname + " " + person.lname,
      "age" : person.age,
      "gender" : person.gender
    }

with open("custom_object.json", "w") as f:
  json.dump(p1, f, default=show_object, indent=4)

In [None]:
with open("custom_object.json", "r") as f:
  dict2 = json.load(f)
  print(dict2)
  print(type(dict2))

{'name': 'Atif Salam', 'age': 45, 'gender': 'Male'}
<class 'dict'>


# Pickling
```
* When we want to convert our custom object into binary format.
* That our object can perform same operations or retain our object's functionality.
```
# Un-Pickling
```
* Is the inverse of picklling whereby a byte-stream is converted back to an object.
```

### Pickling vs JSON
```
* Pickle lets the user to store data into binary format.
* JSON lets the user to store data in a human readable text format.

```

In [None]:
class Person:

  def __init__(self, fname, lname, age, gender):
    self.fname = fname
    self.lname = lname
    self.age = age
    self.gender = gender

  def display_info(self):
    return "My name is {} {}, my age is {} and I am {}".format(self.fname, self.lname, self.age, self.gender)

In [None]:
  p2 = Person("Atif", "Salam", 45, "Male")

In [None]:
import pickle
with open("person.pkl", "wb") as f:
  pickle.dump(p2, f)

In [None]:
with open("person.pkl", "rb") as f:
  p = pickle.load(f)

In [None]:
p.display_info()

'My name is Atif Salam, my age is 45 and I am Male'

# Tasks


In [None]:
txt = '''
Hello World!
This is the second line of the file
This is the third line of the file
This is the fourth line of the file
This is the fifth line of the file
'''

with open("sample_tsk.txt", "a") as f:
  f.write(txt)

### `Q-1:` Write a function `get_final_line(filename)`, which takes filename as input and return final line of the file.

Note: You can choose any file of your choice.

In [None]:
# Write code here

def get_final_line(filename):
  with open(filename, "r") as f:
    lst = f.readlines()
    return lst[-1]

print(get_final_line("sample_tsk.txt"))


This is the fifth line of the file



###`Q-2:` Read through a text file, line by line. Use a dict to keep track of how many times each vowel (a, e, i, o, and u) appears in the file. Print the resulting tabulation -- dictionary.

In [None]:
# Write code here
def vowel_detector(file):
  vowel = {"a":0, "e":0, "i":0, "o":0, "u":0}
  with open(file, "r") as f:
    for text in f.read().lower():
      if text in vowel:
        vowel[text] = vowel[text] + 1
  return vowel

d = vowel_detector("sample_tsk.txt")
for key, value in d.items():
  print("{} = {}".format(key, value))

a = 2
e = 20
i = 19
o = 9
u = 1


In [None]:
def vowel_detector(file):
  vowel = {"a":0, "e":0, "i":0, "o":0, "u":0}
  with open(file, "r") as f:
    for text in f.read().lower():
      if text in vowel:
        vowel[text] = vowel[text] + 1
  for key, value in vowel.items():
    print("{} = {}".format(key, value))

vowel_detector("sample_tsk.txt")

a = 2
e = 20
i = 19
o = 9
u = 1


###`Q-3:` Create a text file (using an editor, not necessarily Python) containing two tab separated columns, with each column containing a number. Then use Python to read through the file you’ve created. For each line, multiply each first number by the second and include it in the file in third column. In last add a line Total, by summing the value of third column



Input File example: That you need to create
```
1   2
3   4
5   6
7   8
9   10

```

Output File Example:
```
1   2   2
3   4   12
5   6   30
7   8   56
9   10  90
Total   190
```


In [None]:
# write code here
extended_lst = []
total_val = 0
with open("number.txt", "r") as f:
  for num in f.readlines():
    lst = num.split()
    new_val = int(lst[0]) * int(lst[1])
    total_val = total_val + new_val
    lst.append(new_val)
    txt = f"{lst[0]}  {lst[1]}  {new_val}\n"
    with open("calculated.txt", "a") as cf:
      cf.writelines(txt)
with open("calculated.txt", "a") as cf:
      cf.writelines(f"Total  {total_val}")


###`Q-4:` Create line wise reverse of a file
Write a function which takes two arguments: the names of the input file (to be read from) and the output file (which will be created).

For example, if a file looks like
 ```
abc def
ghi jkl
```
then the output file will be
```
fed cba
lkj ihg
```
**Notice**: The newline remains at the end of the string, while the rest of the characters are all reversed.

In [None]:
# write code here
def reverse_text(input_file, output_file):
  with open(input_file, "r") as read_file:
    for txt in read_file.readlines():
      rev_text = txt[::-1]
      with open(output_file, "a") as write_file:
        write_file.write(rev_text + "\n")

reverse_text("string_data.txt", "rev_string.txt")

###`Q-5:` Create a Serialized dict of frequency of words in the file. And from given list of words, using serialized dict show word count.

* List of word will be given



Given String

```
strings = """Alice was beginning to get very tired of sitting by her sister
            on the bank, and of having nothing to do:  once or twice she had
            peeped into the book her sister was reading, but it had no
            pictures or conversations in it, `and what is the use of a book,'
            thought Alice `without pictures or conversation?'

            So she was considering in her own mind (as well as she could,
            for the hot day made her feel very sleepy and stupid), whether
            the pleasure of making a daisy-chain would be worth the trouble
            of getting up and picking the daisies, when suddenly a White
            Rabbit with pink eyes ran close by her.

            There was nothing so VERY remarkable in that; nor did Alice
            think it so VERY much out of the way to hear the Rabbit say to
            itself, `Oh dear!  Oh dear!  I shall be late!'  (when she thought
            it over afterwards, it occurred to her that she ought to have
            wondered at this, but at the time it all seemed quite natural);
            but when the Rabbit actually TOOK A WATCH OUT OF ITS WAISTCOAT-
            POCKET, and looked at it, and then hurried on, Alice started to
            her feet, for it flashed across her mind that she had never
            before seen a rabbit with either a waistcoat-pocket, or a watch to
            take out of it, and burning with curiosity, she ran across the
            field after it, and fortunately was just in time to see it pop
            down a large rabbit-hole under the hedge."""

word_list = ['alice', 'wonder', 'natural']
```

In [13]:
# write code here
strings = """
Alice was beginning to get very tired of sitting by her sister
on the bank, and of having nothing to do:  once or twice she had
peeped into the book her sister was reading, but it had no
pictures or conversations in it, `and what is the use of a book,'
thought Alice `without pictures or conversation?'
So she was considering in her own mind (as well as she could,
for the hot day made her feel very sleepy and stupid), whether
the pleasure of making a daisy-chain would be worth the trouble
of getting up and picking the daisies, when suddenly a White
Rabbit with pink eyes ran close by her.

There was nothing so VERY remarkable in that; nor did Alice
think it so VERY much out of the way to hear the Rabbit say to
itself, `Oh dear!  Oh dear!  I shall be late!'  (when she thought
it over afterwards, it occurred to her that she ought to have
wondered at this, but at the time it all seemed quite natural);
but when the Rabbit actually TOOK A WATCH OUT OF ITS WAISTCOAT-
POCKET, and looked at it, and then hurried on, Alice started to
her feet, for it flashed across her mind that she had never
before seen a rabbit with either a waistcoat-pocket, or a watch to
take out of it, and burning with curiosity, she ran across the
field after it, and fortunately was just in time to see it pop
down a large rabbit-hole under the hedge.
"""
word_list = ['alice', 'wonder', 'natural']
import json
with open("raw_string.txt", "w") as stringFile:
  stringFile.write(strings)


In [5]:
import json
import string
ser_dict = {}
word_list = ['alice', 'wondered', 'natural']
for word in word_list:
    ser_dict[word] = 0

with open("serialized_dict.json", "w") as sd:
    json.dump(ser_dict, sd, indent=4)

with open("serialized_dict.json", "r") as rf:
  dict1 = json.load(rf)

with open("raw_string.txt", "r") as raw_str:
  for text in raw_str.read().split():
    text = text.translate(str.maketrans("", "", string.punctuation)) # To remove punctuation
    if text.lower() in dict1:
      dict1[text.lower()] += 1
with open("new_dict.json", "w") as nd:
  json.dump(dict1, nd)

print(dict1)

{'alice': 4, 'wondered': 1, 'natural': 1}


### **`Q-6:`** Given a string calculate length of the string using recursion.

**Example 1:**

Input:
```bash
"abcd"
```

Output:

```bash
4
```

**Example 2:**

Input:
```bash
DataScience
```

Output:

```bash
11
```


In [17]:
# Iterative Approach
def len_string(text):
    counter = 0
    for i in text:
        counter += 1
    return counter

print(len_string(strings))

1333


In [18]:
# Recursive Approach
def len_string(text, counter = 0):    
    if len(text) == 1:
        counter += 1
        return counter
    else:        
        return len_string(text[1::], counter = counter + 1)        
               

print(len_string(strings))

1333


### **`Q-7:`** Write a function that accepts two numbers and returns their greatest common divisior. Without using any loop

def gcd(int, int) => int

```
gcd(16,24) will give 8
```

In [49]:
# Write code here
def gcd_loop(x, y):
       
    s1 = {i for i in range(1,x+1) if x % i == 0}
    s2 = {j for j in range(1,y+1) if y % j == 0}    
    return max(s1.intersection(s2))
    
gcd_loop(2,8)

2

In [60]:
def gcd_euclids(x, y):
    if y == 0:
        return x
    else:
        return gcd_euclids(y, x%y)
    
gcd_euclids(1025, 35)

5

 ### `Q-8:` String Edit Distance

 Use your recursive function to write a program that reads two strings from the
user and displays the edit distance between them.

*The edit distance between two strings is a measure of their similarity—the smaller the edit distance, the more similar the strings are with regard to the minimum number of insert, delete and substitute operations needed to transform one string into the other.*

Consider the strings `kitten` and `sitting`. The first string can be transformed
into the second string with the following operations:
* Substitute the `k` with an `s`,
* substitute the `e` with an `i`,
* and insert a `g` at the end of the string.

This is the smallest number of operations that can be performed to transform kitten into sitting. As a result, the edit distance is `3`.


Write a recursive function that computes the edit distance between two strings.

Use the following algorithm:

```
Let s and t be the strings
    If the length of s is 0 then
        Return the length of t
    Else if the length of t is 0 then
        Return the length of s
    Else
        Set cost to 0
        If the last character in s does not equal the last character in t then
            Set cost to 1
        Set d1 equal to the edit distance between all characters except the last one in s, and all characters in t, plus 1
        Set d2 equal to the edit distance between all characters in s, and all characters except the last one in t, plus 1

        Set d3 equal to the edit distance between all characters except the last one in s, and all characters except the last one in t, plus cost
        Return the minimum of d1, d2 and d3
```





In [None]:
# write code here
def edit_distance(string1, string2, cost = 0):
    if len(string1) == 0:
        return string2
  
    elif len(string2) == 0:
        return string1
    else:
        cost = 0
        string1[-1] != string2[-1]
        cost = 1
        d1 = edit_distance(string1[::-1], string2[::]) + 1
        d2 = edit_distance(string1[::], string2[::-1]) + 1
        d3 = edit_distance(string1[::-1], string2[::-1]) + cost
        return min(d1, d2, d3)
    
edit_distance("sunday", "saturday")

###`Q-9:` Run-Length Encoding

Run-length encoding is a simple data compression technique that can be effective when repeated values occur at adjacent positions within a list. Compression is achieved by replacing groups of repeated values with one copy of the value, followed by the number of times that the value should be repeated. For example, the list
```
["A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "A", "A", "A", "A", "A", "A", "B"]
```
would be compressed as `["A", 12, "B", 4, "A", 6, "B", 1]`.

Write a recursive function that implements the run-length compression technique
described above. Your function will take a list or a string as its only parameter. It should return the run-length compressed list as its only result. Include a main program that reads a string from the user, compresses it, and displays the run-length encoded result.

In [None]:
# Write code here
lst1 = ["A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "A", "A", "A", "A", "A", "A", "B"]
for i in lst1:
    


###`Q-10:` Write a recursive function to convert a decimal to binary

In [5]:
# Write code here
def convert_decimal(num, convert_to, l=[]):    
    if num == 1:
        l.append(num%convert_to)
        return l.reverse()
    else:
        l.append(num%convert_to)
        convert_decimal(num//convert_to, convert_to)
        return l

convert_decimal(125, 2)

[1, 1, 1, 1, 1, 0, 1]

In [5]:
print(125//8)
print(125%8)

15
5


In [17]:
def decimal_to_hexa(num):
    if num == 0:
        return "0"
    hex_chars = "0123456789ABCDEF"
    hex_number = ""    
    
    while num > 0:
        remainder = num % 16
        hex_digit = hex_chars[remainder] # string indexing
        hex_number = hex_digit + hex_number
        num = num // 16
    return hex_number
decimal_to_hexa(125)

'7D'