🐍 Support app to get a diff results from two document πŸ“—
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.



Generate the diff data between two files

License Travis Build


Implementation was started mainly focusing as a support app for Data Science work. The current implementation helps to analyse recommendations β€˜CSV’ files. (i.e If you need to analyse two algorithm results this lib will be very handy)

Recommendation Format

The CSV file contains list of key-value in each line. The key is product (productCode/porductID) and the value is list of recommended products (productCode/porductID). The product code and the recommendation list is separated with β€˜TAB’.

Sample Recommendation 
1098808	1597549,1974410,1850731
1161889	1095554
1706909	2078866
1815368	2215327
1847624	2179582,2085753


$ pip install doc-diff


  • Generate the following comparison reports
    • common_in_doc1-and-doc2-%Y-%m-%d.csv
    • common_key_with_diff_values-%Y-%m-%d.csv
    • exclusive_in_doc1-%Y-%m-%d.csv
    • exclusive_in_doc2-%Y-%m-%d.csv
  • Compare two files and return following 'dicts(prodCode, recommendation)'
    • common_in_doc1_and_doc2_list = dicts()
    • common_key_with_diff_values_list = dicts()
    • exclusive_in_doc1_list = dicts()
    • exclusive_in_doc2_list = dicts()


  • Allow to generate the evaluation result files
  • Able to extract the comparsion results as key-value list
    • Using the diifferent dictionary objects you can present the results as you like (i.e Graphs, Venn diagram)

Comparison Report Format

  • In CSV file each line contains the product code and the corresponding recommendation. The product code and the recommendation list is separated with β€˜TAB’.
  • In β€˜common_key_with_diff_values-%Y-%m-%d.csv’ file the result format is slightly different. To show the un-matching recommendation in each line after product code TAB separation you will find the result of β€˜A’ algorithm and the β€˜B' algorithm result separated with two pipes β€˜||’.
Sample common_key_with_diff_values-%Y-%m-%d.csv 
c36623	2256360,2398464,2503472,c27214||2256360,2398464,2503472,c27214,c79033
c973955	1965886,c340951,c752950,c973951||1965886,c24224,c340951,c752950,c906950,c973951
c25749	c25982||c205950,c25982,c65977

Package Directory Layout

β”œβ”€β”€ LICENSE                         # Contains License Agreement file
β”œβ”€β”€ README.md                       # Contains the details of doc-diff lib
β”œβ”€β”€ doc_diff                        # Root package 
β”‚Β Β  β”œβ”€β”€ Diff.py                     # Diff class
β”‚Β Β  β”œβ”€β”€ __init__.py                 # Package declaration 
β”œβ”€β”€ setup.py                        # Setup file for packaging 
└── test                            # Test module (Includes the useage)
    β”œβ”€β”€ __init__.py                 # Package declaration 
    β”œβ”€β”€ data                        # Sample data
    β”‚Β Β  β”œβ”€β”€ a-priori.csv            # A-Priori algo results
    β”‚Β Β  └── pfp.csv                 # FP-Growth algo results
    └── doc_diff_app.py             # Main method file 

Current Published Artifacts


For any problem/question or if you think a feature that could make doc-diff lib more useful, do not hesitate to open an issue.


Thanks Flat Icon for the free logo.


MIT Β© Renien