#  File Differences project for Week 4 of Python Data Representations course.
## The aim of the project is to build a program that finds differences in text files content.

More specifically, the aim of the project is to find the exact location of the first character that differs between two input files (the line number, the word and the character). To do that, I wrote the following 5 functions, that their name, parameters and general functionality were part of the guidelines for this project.
The five fuctions are:
* singleline_diff(line1, line2) - The function takes two single line strings and returns the index of the first character that differs between the two lines, or an indication in case that the lines are identical.
* singleline_diff_format(line1, line2, idx) - The function creates a format for presenting the location of the first difference.
* multiline_diff(lines1, lines2) - The function takes two lists of single line strings, and returns a tuple that indicates the line and index within that line where the first difference between the two lists occurs.
* get_file_lines(filename) - The function takes a filename as input and returns a list of single lines as strings.
* file_diff_format(filename1, filename2) - The function takes two filenames as input and returns a formatted string with the location of the first difference between the files.

Originally this program was writtes on Thony IDE, but I converted it to this notebook, so that the steps are more clear, and each funciton lies in separate cell. The last cell activates the program.

In [2]:
def singleline_diff(line1, line2):
    """
    Inputs:
      line1 - first single line string
      line2 - second single line string
    Output:
      Returns the index where the first difference between
      line1 and line2 occurs.

      Returns IDENTICAL if the two lines are the same.
    """
    # Check lines' length:
    len_1 = len(line1)
    len_2 = len(line2)
    # Calculates the shorter lines (minimum of lines):
    len_for_check = min(len_1, len_2)
    
    # Calibrates the error index to -1 - negative so that it won't come as an output.
    error_index = -1
    
    # Now, going character by character, and if identifies a mismatch, recieves the index no.
    for char in range(len_for_check):
        if line1[char] != line2[char]:
            error_index = char
            break
        # If they are identical, but not the same length, returns the index of the first char betond the short line
        elif len_1 - len_2 != 0:
            error_index = len_for_check
        else:
            error_index = "IDENTICAL"
    
    return error_index
    

In [3]:
def singleline_diff_format(line1, line2, idx):
    """
    Inputs:
      line1 - first single line string
      line2 - second single line string
      idx   - index of first difference between the lines
    Output:
      Returns a three line formatted string showing the location
      of the first difference between line1 and line2.

      If either input line contains a newline or carriage return,
      then returns an empty string.

      If idx is not a valid index, then returns an empty string.
    """
    
    # Takes the index no. as the output of singleline_diff function 
    idx = singleline_diff(line1, line2)
    
    # Checkes how many lines there are (to later return blank if more than one
    lenline_1 = len(line1.splitlines())
    lenline_2 = len(line2.splitlines())
    
    # If more that one line, return blank line
    if lenline_1 > 1 or lenline_2 > 1:
        return ""
    else:
        # Now makes a string of all sub-strings. After each line adds '\n'
        result = line1 + "\n" + "=" * idx + "^" + "\n" + line2 + "\n"
        return result        

In [4]:
def multiline_diff(lines1, lines2):
    """
    Inputs:
      lines1 - list of single line strings
      lines2 - list of single line strings
    Output:
      Returns a tuple containing the line number (starting from 0) and
      the index in that line where the first difference between lines1
      and lines2 occurs.

      Returns (IDENTICAL, IDENTICAL) if the two lists are the same.
    """
    
    # First calculates the length of aech list
    len1 = len(lines1)
    len2 = len(lines2)
    # Then calculates how many lines I need to check
    len_to_check = min(len1, len2)
    
    if lines1 == lines2: # If the lists are identical
        return ("IDENTICAL", "IDENTICAL")
    else:
        # This counter caounts all character in the list till it gets to the mismacth
        char_counter = 0 
        # This parameter returns the no of line with the mismatch
        line_no = -1
        for line_idx in range(len_to_check):
            # it says that if the lines are identical, it need ot "collect" their characters for
            # the overall index, so that the returned no will be the first char in the remaining long list
            if singleline_diff(lines1[line_idx], lines2[line_idx]) == "IDENTICAL":
                char_counter += len(lines1[line_idx])
                line_no = "IDENTICAL"
            else:
                char_counter += singleline_diff(lines1[line_idx], lines2[line_idx])
                line_no = line_idx
                break
        return (line_no, char_counter)

In [5]:
def get_file_lines(filename):
    """
    Inputs:
      filename - name of file to read
    Output:
      Returns a list of lines from the file named filename.  Each
      line will be a single line string with no newline ('\n') or
      return ('\r') characters.

      If the file does not exist or is not readable, then the
      behavior of this function is undefined.
    """
    # It opens and reads in the same line
    flist = open(filename).readlines()
    # It takes off any '\n' from the lines, and creates a list []
    list_strip = [s.rstrip('\n') for s in flist]
    return list_strip
       

In [6]:
def file_diff_format(filename1, filename2):
    """
    Inputs:
      filename1 - name of first file
      filename2 - name of second file
    Output:
      Returns a four line string showing the location of the first
      difference between the two files named by the inputs.

      If the files are identical, the function instead returns the
      string "No differences\n".

      If either file does not exist or is not readable, then the
      behavior of this function is undefined.
    """
    # First opens the files and strip off the '\n's
    list1 = get_file_lines(filename1)
    list2 = get_file_lines(filename2)
    
    if list1 == list2:
        return "No differences\n"
    else:
        # line_no takes only the first item from the tuple in multiline_diff function
        line_no = multiline_diff(list1,list2)[0]
        # This is the actual text from list 1
        list1_line = list1 [line_no]
        # This is the actual text from list 2
        list2_line = list2 [line_no]
        char_no = singleline_diff(list1_line, list2_line)
        # result is the sting that combines all the required elements in the format 
        result = "Line" + str(line_no) + ":\n" + list1_line + "\n" + "=" * char_no + "^" + "\n"\
                 + list2_line + "\n"
        return result

In [7]:
print (file_diff_format("file2.txt", "file3.txt"))

Line0:
engineering classes
=====^
enginneering classes

