🐍 Support app to get a diff results from two document πŸ“—
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
doc/blob
doc_diff
test
.gitignore
.travis.yml
LICENSE
README.md
setup.cfg
setup.py

README.md

doc-diff
doc-diff

Generate the diff data between two files

License Travis Build

Summary

Implementation was started mainly focusing as a support app for Data Science work. The current implementation helps to analyse recommendations β€˜CSV’ files. (i.e If you need to analyse two algorithm results this lib will be very handy)

Recommendation Format

The CSV file contains list of key-value in each line. The key is product (productCode/porductID) and the value is list of recommended products (productCode/porductID). The product code and the recommendation list is separated with β€˜TAB’.

Sample Recommendation 
---------------------
1098808	1597549,1974410,1850731
1161889	1095554
1706909	2078866
1815368	2215327
1847624	2179582,2085753

Installation

$ pip install doc-diff

Features

  • Generate the following comparison reports
    • common_in_doc1-and-doc2-%Y-%m-%d.csv
    • common_key_with_diff_values-%Y-%m-%d.csv
    • exclusive_in_doc1-%Y-%m-%d.csv
    • exclusive_in_doc2-%Y-%m-%d.csv
  • Compare two files and return following 'dicts(prodCode, recommendation)'
    • common_in_doc1_and_doc2_list = dicts()
    • common_key_with_diff_values_list = dicts()
    • exclusive_in_doc1_list = dicts()
    • exclusive_in_doc2_list = dicts()

Usage

  • Allow to generate the evaluation result files
  • Able to extract the comparsion results as key-value list
    • Using the diifferent dictionary objects you can present the results as you like (i.e Graphs, Venn diagram)

Comparison Report Format

  • In CSV file each line contains the product code and the corresponding recommendation. The product code and the recommendation list is separated with β€˜TAB’.
  • In β€˜common_key_with_diff_values-%Y-%m-%d.csv’ file the result format is slightly different. To show the un-matching recommendation in each line after product code TAB separation you will find the result of β€˜A’ algorithm and the β€˜B' algorithm result separated with two pipes β€˜||’.
Sample common_key_with_diff_values-%Y-%m-%d.csv 
------------------------------------------------
c36623	2256360,2398464,2503472,c27214||2256360,2398464,2503472,c27214,c79033
c973955	1965886,c340951,c752950,c973951||1965886,c24224,c340951,c752950,c906950,c973951
c25749	c25982||c205950,c25982,c65977

Package Directory Layout

doc-diff
β”œβ”€β”€ LICENSE                         # Contains License Agreement file
β”œβ”€β”€ README.md                       # Contains the details of doc-diff lib
β”œβ”€β”€ doc_diff                        # Root package 
β”‚Β Β  β”œβ”€β”€ Diff.py                     # Diff class
β”‚Β Β  β”œβ”€β”€ __init__.py                 # Package declaration 
β”œβ”€β”€ setup.py                        # Setup file for packaging 
└── test                            # Test module (Includes the useage)
    β”œβ”€β”€ __init__.py                 # Package declaration 
    β”œβ”€β”€ data                        # Sample data
    β”‚Β Β  β”œβ”€β”€ a-priori.csv            # A-Priori algo results
    β”‚Β Β  └── pfp.csv                 # FP-Growth algo results
    └── doc_diff_app.py             # Main method file 

Current Published Artifacts

Contribute

For any problem/question or if you think a feature that could make doc-diff lib more useful, do not hesitate to open an issue.

Thanks

Thanks Flat Icon for the free logo.

License

MIT Β© Renien