---
title: Tracking Changes
date: 2023-11-30 
authors:
  - name: Sébastien Boisgérault
    email: Sebastien.Boisgerault@minesparis.psl.eu
    url: https://github.com/boisgera
    affiliations:
      - institution: Mines Paris - PSL University
        department: Institut des Transformation Numériques (ITN)
github: boisgera
license: CC-BY-4.0
open_access: true
---

In order to understand how `.tldr` files are structured, we can add a new graphical objects, change some if their properties, etc. and each time we modify the document, analyze the corresponding evolution of the file.

In this notebook, we develop some tooling to help us track such changes.

## Text comparison

We define two similar versions of the "zen of Python":

In [1]:
zen_1 = """The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Errors should never pass silently.
In the face of ambiguity, refuse the temptation to guess.
There should be one obvious way to do it.
Although that way may not be obvious at first.
Now is better than never.
Although never is often better than right now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it is a good idea.
"""

In [2]:
zen_2 = """\
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
"""

```{exercise}
 1. Transform `zen_1` and `zen_2` into list of lines.
 2. Use the [`difflib`](https://docs.python.org/3/library/difflib.html) module of the Python standard library to [`compare`](https://docs.python.org/3/library/difflib.html#difflib.Differ.compare) the two sequences.
 3. Make a text out of the output of compare and print it.
 4. Interpret the result and list the differences between both versions of the zen of Python.
```

In [3]:
#Question 1
zen_1 = "The Zen of Python, by Tim Peter.\nBeautiful is better than ugly.\nExplicit is better than implicit.\nSimple is better than complex.\nComplex is better than complicated.\nFlat is better than nested.\nSparse is better than dense.\nReadability counts.\nSpecial cases aren't special enough to break the rules.\nErrors should never pass silently.\nIn the face of ambiguity, refuse the temptation to guess.\nThere should be one obvious way to do it.\nAlthough that way may not be obvious at first.\nNow is better than never.\nAlthough never is often better than right now.\nIf the implementation is hard to explain, it's a bad idea.\nIf the implementation is easy to explain, it is a good idea."

output_file_path = "output.txt"

# Ouvrir le fichier en mode écriture ('w')
with open(output_file_path, 'w') as output_file:
    # Écrire le contenu dans le fichier
    output_file.write(zen_1)
print(f"Le texte a été exporté avec succès dans {output_file_path}")

Le texte a été exporté avec succès dans output.txt


In [4]:
with open('output.txt', 'r') as file:
    list_of_lines = file.readlines()

# Afficher la liste des lignes
print(list_of_lines)

['The Zen of Python, by Tim Peter.\n', 'Beautiful is better than ugly.\n', 'Explicit is better than implicit.\n', 'Simple is better than complex.\n', 'Complex is better than complicated.\n', 'Flat is better than nested.\n', 'Sparse is better than dense.\n', 'Readability counts.\n', "Special cases aren't special enough to break the rules.\n", 'Errors should never pass silently.\n', 'In the face of ambiguity, refuse the temptation to guess.\n', 'There should be one obvious way to do it.\n', 'Although that way may not be obvious at first.\n', 'Now is better than never.\n', 'Although never is often better than right now.\n', "If the implementation is hard to explain, it's a bad idea.\n", 'If the implementation is easy to explain, it is a good idea.']


In [5]:
zen_2 = "The Zen of Python, by Tim Peters\nBeautiful is better than ugly.\nExplicit is better than implicit.\nSimple is better than complex.\nComplex is better than complicated.\nFlat is better than nested.\nSparse is better than dense.\nReadability counts.\nSpecial cases aren't special enough to break the rules.\nAlthough practicality beats purity.\nErrors should never pass silently.\nUnless explicitly silenced.\nIn the face of ambiguity, refuse the temptation to guess.\nThere should be one-- and preferably only one --obvious way to do it.\nAlthough that way may not be obvious at first unless you're Dutch.\nNow is better than never.\nAlthough never is often better than *right* now.\nIf the implementation is hard to explain, it's a bad idea.\nIf the implementation is easy to explain, it may be a good idea.\nNamespaces are one honking great idea -- let's do more of those!"

output_file_path2 = "output2.txt"

# Ouvrir le fichier en mode écriture ('w')
with open(output_file_path2, 'w') as output_file:
    # Écrire le contenu dans le fichier
    output_file.write(zen_2)
print(f"Le texte a été exporté avec succès dans {output_file_path}")

Le texte a été exporté avec succès dans output.txt


In [6]:
with open('output2.txt', 'r') as file:
    list_of_lines2 = file.readlines()

# Afficher la liste des lignes
print(list_of_lines2)

['The Zen of Python, by Tim Peters\n', 'Beautiful is better than ugly.\n', 'Explicit is better than implicit.\n', 'Simple is better than complex.\n', 'Complex is better than complicated.\n', 'Flat is better than nested.\n', 'Sparse is better than dense.\n', 'Readability counts.\n', "Special cases aren't special enough to break the rules.\n", 'Although practicality beats purity.\n', 'Errors should never pass silently.\n', 'Unless explicitly silenced.\n', 'In the face of ambiguity, refuse the temptation to guess.\n', 'There should be one-- and preferably only one --obvious way to do it.\n', "Although that way may not be obvious at first unless you're Dutch.\n", 'Now is better than never.\n', 'Although never is often better than *right* now.\n', "If the implementation is hard to explain, it's a bad idea.\n", 'If the implementation is easy to explain, it may be a good idea.\n', "Namespaces are one honking great idea -- let's do more of those!"]


In [7]:
#question 2
import difflib
differ = difflib.Differ()
diff_result = list(differ.compare(zen_1, zen_2))

# Afficher les différences
for line in diff_result:
    print(line)

  T
  h
  e
   
  Z
  e
  n
   
  o
  f
   
  P
  y
  t
  h
  o
  n
  ,
   
  b
  y
   
  T
  i
  m
   
  P
  e
  t
  e
  r
- .
+ s
  

  B
  e
  a
  u
  t
  i
  f
  u
  l
   
  i
  s
   
  b
  e
  t
  t
  e
  r
   
  t
  h
  a
  n
   
  u
  g
  l
  y
  .
  

  E
  x
  p
  l
  i
  c
  i
  t
   
  i
  s
   
  b
  e
  t
  t
  e
  r
   
  t
  h
  a
  n
   
  i
  m
  p
  l
  i
  c
  i
  t
  .
  

  S
  i
  m
  p
  l
  e
   
  i
  s
   
  b
  e
  t
  t
  e
  r
   
  t
  h
  a
  n
   
  c
  o
  m
  p
  l
  e
  x
  .
  

  C
  o
  m
  p
  l
  e
  x
   
  i
  s
   
  b
  e
  t
  t
  e
  r
   
  t
  h
  a
  n
   
  c
  o
  m
  p
  l
  i
  c
  a
  t
  e
  d
  .
  

  F
  l
  a
  t
   
  i
  s
   
  b
  e
  t
  t
  e
  r
   
  t
  h
  a
  n
   
  n
  e
  s
  t
  e
  d
  .
  

  S
  p
  a
  r
  s
  e
   
  i
  s
   
  b
  e
  t
  t
  e
  r
   
  t
  h
  a
  n
   
  d
  e
  n
  s
  e
  .
  

  R
  e
  a
  d
  a
  b
  i
  l
  i
  t
  y
   
  c
  o
  u
  n
  t
  s
  .
  

  S
  p
  e
  c
  i
  a
  l


We can make our job easier if we use HTML instead of plain text to visualise the differences between the two texts.


```{exercise}
  1. Use the [HtmlDiff](https://docs.python.org/3/library/difflib.html#difflib.HtmlDiff) class of difflib to produce a `diff.html` file that represents this difference in a HTML document.
  2. Use the [webbrowser](https://docs.python.org/3/library/webbrowser.html) module of the standard library to open it!
  3. Define a `display_diff_text` function that takes two arguments `text_1` and `text_2` and automates steps 1. and 2.
```

In [8]:
#question 1
from difflib import HtmlDiff
# Créer un objet HtmlDiff
html_diff = HtmlDiff()

# Générer le document HTML avec les différences
html_result = html_diff.make_file(zen_1.splitlines(), zen_2.splitlines())

# Enregistrer le document HTML dans un fichier
with open("diff_output.html", "w", encoding="utf-8") as html_file:
    
    

SyntaxError: incomplete input (11234687.py, line 12)

In [9]:
#question 2 
import webbrowser 
webbrowser.open('diff_output.html')

True

In [10]:
#question 3
def display_diff_text(text_1, text_2):
    html_diff = HtmlDiff()
    html_result = html_diff.make_file(text_1.splitlines(), text_2.splitlines())
    with open("diff_output.html", "w", encoding="utf-8") as html_file:
        html_file.write(html_result)
    webbrowser.open('diff_output.html')  

## Comparison of JSON documents

````{exercise} Comparison of dictionnaries

 1. Create a `display_diff` function that takes two Python objects, converts them to strings then leverages `display_diff_text` to display the difference in a browser.

 2. Consider the 3 dictionaries defined by
    ```python
    d1 = {k:k+1 for k in range(100)}
    d2 = d1.copy(); d2[50] = 50
    d3 = {k:k+1 for k in range(99, -1, -1)}
    ```
    `d1` and `d2` have a slight difference and `d1` and `d2` are equal.
    Does your `display_diff` function make easy to spot where the difference is in the first case when it compares `d1` and `d2`?
    Does it make easy to see that `d1` and `d3` are equal?

  3. Investigate the [`pprint`](https://docs.python.org/3/library/pprint.html) module standard library ; use it to improve the behavior of `display_text_diff` in the two cases considered in the previous question.

````
 

In [11]:
#Question 1 
def display_diff(objet1,objet2) : 
    str_1 = str(objet1)
    str_2 = str(objet2)
    return display_diff_text(str_1, str_2)

In [12]:
#Question 2 
d1 = {k:k+1 for k in range(100)}
d2 = d1.copy(); d2[50] = 50
d3 = {k:k+1 for k in range(99, -1, -1)}
print(display_diff(d1,d2))

NameError: name 'HtmlDiff' is not defined

```{exercise} tldraw documents comparator
Implement a function `tldraw_diff` that takes as argument two filenames that refer to tldraw documents and display their differences in the browser.
```