- Louis Tochon
- Kévin Huguenin (Superviser)
Omitted files are not required to understand the project.
📦k-ano_Tool
┣ 📂.github\workflows //Configuration folder
┃ ┗ 📜python-app.yml //test environnement properties
┣ 📂Flask project //First Flask project to gain knowledge
┣ 📂K-anoTool //Flask project with the anonimization tool
┃ ┣ 📂Env //Virtual Python Env
┃ ┣ 📂Static //Static files (JS, CSS, IMG, Saved files)
┃ ┣ 📂Templates //HTML Templates
┃ ┣ 📂Upload //Sub-directory for Upload part of the project
┃ ┣ 📂Result //Sub-directory for Result part of the project
┃ ┣ 📂About //Sub-directory for About part of the project
┃ ┗ 📜app.py //launch project file
┣ 📂Presentation //Folder with powerpoint for meetings
┣ 📂Prototyping //Folder with prototypes of generalizytion interface
┣ 📂Test_algo //Algorithm's development folder
┃ ┣ 📂data
┃ ┣ 📂ideas //Ideas for improving algorithms
┃ ┣ 📂yed //Folder containing algorithm's graph
┃ ┣ 📜datafly_v1_weighted.py //Proposal of datafly algorithm with weighted attributes
┃ ┣ 📜datafly_v1.py //Basic datafly algorithm
┃ ┣ 📜incognito_v1.py //First try with incognito algorithm
┃ ┣ 📜mondarian.py //Open Source Code for mondarian algorithm
┃ ┣ 📜samarati_tree.py //File to generate the distance matrix lattice for samarati algorithm
┃ ┣ 📜sw.py //Original proposal algorithm to include weights
┃ ┗ 📜ssw.py //Suppr_weighted_algo combined with samarati
┣ 📜test.py //test file
┣ 📜LICENSE
┗ 📜README.md
The main goal of this project is to provide an anonymization tool to k-anonymize a database with algorithms based on weights for each attribute and to implement a tool to easily create the wanted tree of generalization.
To realize this project, the algorithm created needs to incorporate the weights at each level of generalization to describe the quantity of information lost. The algorithm proposed is called SW and basically tests every possible generalization level to find the smallest information lost. However, testing all combinations is not a viable solution due to the exponential time needed. To solve this problem, a second algorithm is proposed which is called SSW (for Samarati-SW). This one includes a binary search which enables avoiding combinations which are leading to a larger information loss than a previous solution found.
This anonymization algorithm will be deployed on a web application written in Python through the framework Flask.
In a second time, an interface on the same web application will be created to provide a tool to the user to easily create his own tree of generalization.
The crucial point of this project is to do a user-friendly tool with easy interactions to make the experience as simple as possible.
This project will focus on the medical domain to handle a single use case and go as deep as possible to provide a complete tool.
maximum generalization level = n
number of qid = k
number of combination of generalization's level = nk
one-level generalization = O(n)
max-level generalization = O(k * n2)
anonimity level check = O(n)
supression check = O(n)
Final complexity level : O(nk * (k * n2 + n + n + n)) = O(nk * (k * n2 + 3n)) = O(nk * n2)
maximum generalization level = n
number of qid = k
number of combination of generalization's level = nk
one-level generalization = O(n)
max-level generalization = O(k * n2)
anonimity level check = O(n)
supression check = O(n)
Final complexity level : O(log2(nk) * (k * n2 + n + n + n)) = O(log2(nk) * (k * n2 + 3n)) = O(klog2(n) * n2)*
(:heavy_check_mark: : Done | :hourglass_flowing_sand: : In Progress | :x: : Canceled)
- Research about the Datafly algorithm ✔️
- Gaining knowledge about Flask (first web application with Python) ✔️
- Implementing Datafly algorithm ✔️
- Algorithm's proposal with weighted attributes ✔️
- Implementation of Flask application with the algorithm ✔️
- Make Flask application be shiny ✔️
- Prototyping the interface of custom generalization ✔️
- Prototype testing ✔️
- Implementation of the generalization's interface ✔️
- Final application's usability tests ⏳
After cloning, just write this line in your terminal at the root of the repository :
python K-AnoTool\app.py
This will launch the web tool on the port 5000 of your localhost.
weight algo k-anon [28.02] : https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6663954
weight k-ano [28.01] : https://www.sciencedirect.com/science/article/pii/S0167404820300377
Mondrian [01.03] : https://personal.utdallas.edu/~muratk/courses/privacy08f_files/MultiDim.pdf
Incognito explained [01.03] : https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6406297
Datafly explained [02.03] : http://www.mathcs.emory.edu/~lxiong/cs573_s12/share/slides/03anonymity_generalization.pdf
Datafly wikipédia [02.03] : https://en.wikipedia.org/wiki/Datafly_algorithm
Improved Greedy vs Datafly [03.03] : http://www.tdp.cat/issues11/tdp.a100a12.pdf
Analysis of Datafly [03.03] : https://kanonymity.github.io/ola.
Datafly explained [10.03] : https://dataprivacylab.org/datafly/paper2.pdf
Schema of core's principles of Datafly [10.03] : https://www.researchgate.net/publication/287537219_A_Systematic_Comparison_and_Evaluation_of_k-Anonymization_Algorithms_for_Practitioners
Distortion calcul with weighted attributes [15.03] : https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4606080
Anonymization by Local Recoding in Data with Attribute Hierarchical Taxonomies (Article)[22.03]
Datafly [4.4]: https://link.springer.com/content/pdf/10.1007/s00778-006-0008-z.pdf)
Incognito [4.4] : https://www.profsandhu.com/cs6393_s13/p49-lefevre_2005.pdf
Samarati [4.4] : http://www.sti.uniurb.it/events/fosad08/slides/samarati-2.pdf
Top-Down Specialization [4.4] : https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1410123