# Challenge - GDPR Compliant

![](http://eleanorglanvillecentre.lincoln.ac.uk/assets/images/content/_large/adalovelacehero.jpg)

In the `ada_lovelace.txt` file, located in the same folder, contains some informations about Ada Lovelace. This problem is that this file is full of identifying informations about people, and as such, is really not GDPR-compliant 😱 (info : the [General Data Protection Regulation](https://en.wikipedia.org/wiki/General_Data_Protection_Regulation) is a regulation in EU law on data protection and privacy)

## Guidelines
The objective of this exercice is to write a function that will clean up a file, by remplacing all mentions of people's names by "\[REDACTED\]", in order to comply with European law.

In [5]:
# TODO : Imports
import spacy
from spacy import displacy
from pathlib import Path

AttributeError: partially initialized module 'charset_normalizer' has no attribute 'md__mypyc' (most likely due to a circular import)

In [3]:
# TODO : load file and have a look at it
file_path = 'ada_lovelace.txt'

with open(file_path, 'r', encoding='utf-8') as file:
    doc = file.read()

print(doc[:500])

Augusta Ada King, Countess of Lovelace (née Byron; 10 December 1815 – 27 November 1852) was an English mathematician and writer, chiefly known for her work on Charles Babbage's proposed mechanical general-purpose computer, the Analytical Engine. She was the first to recognise that the machine had applications beyond pure calculation, and published the first algorithm intended to be carried out by such a machine. As a result, she is sometimes regarded as the first to recognise the full potential 


**Q1.** Using the SpaCy NER tools, identify the **entities** in this document, and their relating tags.

In [4]:
# TODO : Named Entities Recognition
# Load the spaCy English model
nlp = spacy.load('en_core_web_sm')

# Process the document using spaCy
doc_spacy = nlp(doc)

# Extract all named entities (persons) found in the document
persons = [entity.text for entity in doc_spacy.ents if entity.label_ == 'PERSON']
print(persons[:10]) 

NameError: name 'spacy' is not defined

**Q2.** Display the identified entities in a more visual manner.

In [4]:
# TODO : NER visualization

# Visualize the entities using spaCy's displacy
colors = {'PERSON': 'linear-gradient(90deg, #aa9cfc, #fc9ce7)'}
options = {'ents': ['PERSON'], 'colors': colors}

displacy.render(persons, style='ent', manual=True, options=options)

**Q3.** Write a function `replace_name_by_redacted`that will modify the document in order to replace all occurences of names by "\[REDACTED\]", and apply it to the file.

In [5]:
# TODO : `replace_name_by_redacted`
def replace_name_by_redacted(token):
    return "[REDACTED]"

Q4. Write a function make_doc_GDPR_compliant that will modify the document in order to replace all occurencies of names by "[REDACTED]", and apply it to the file.

In [9]:
def make_doc_GDPR_compliant(doc):
    doc_spacy = nlp(doc)
    redacted_text = []
    
    for token in doc_spacy:
        if token.text in persons:
            redacted_text.append(replace_name_by_redacted(token))
        else:
            redacted_text.append(token.text)
    
    return ' '.join(redacted_text)

make_doc_GDPR_compliant(doc)

'[REDACTED] [REDACTED] [REDACTED] , [REDACTED] of [REDACTED] ( née [REDACTED] ; 10 December 1815 – 27 November 1852 ) was an English mathematician and writer , chiefly known for her work on [REDACTED] [REDACTED] [REDACTED] proposed mechanical general - purpose computer , the Analytical Engine . She was the first to recognise that the machine had applications beyond pure calculation , and published the first algorithm intended to be carried out by such a machine . As a result , she is sometimes regarded as the first to recognise the full potential of a " computing machine " and one of the first computer programmers . \n\n [REDACTED] became close friends with her tutor [REDACTED] [REDACTED] , who introduced her to [REDACTED] [REDACTED] in 1833 . She had a strong respect and affection for Somerville , and they corresponded for many years . Other acquaintances included the scientists [REDACTED] [REDACTED] , Sir [REDACTED] [REDACTED] , [REDACTED] [REDACTED] , [REDACTED] [REDACTED] and the a