<a href="https://colab.research.google.com/github/Amulyanrao7777/NLP/blob/main/lab4_Assignment_8.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Task 1: Sentiment Feature Extractor

### Subtask:
Implement a Python script using spaCy to extract all (Adjective, Noun) pairs from the provided customer review text.


**Reasoning**:
The first step is to import the `spacy` library and load the English language model, which is necessary for processing the text and extracting part-of-speech tags.



In [6]:
import spacy

# Load the English language model
nlp = spacy.load('en_core_web_sm')

print("spaCy library imported and 'en_core_web_sm' model loaded.")

spaCy library imported and 'en_core_web_sm' model loaded.


**Reasoning**:
Now that the spaCy model is loaded, the next step is to process the `text` variable with the `nlp` model to create a `doc` object. This `doc` object allows for iteration over tokens and access to their part-of-speech tags and lemmas.



In [7]:
doc = nlp(text)

print("Text processed by spaCy model.")

Text processed by spaCy model.


**Reasoning**:
With the text processed into a `doc` object, the next step is to iterate through its tokens, identify adjectives followed by nouns, and store these pairs as lemmas in a list as per the instructions.



In [8]:
adj_noun_pairs = []

for i, token in enumerate(doc):
    # Check if the current token is an adjective
    if token.pos_ == 'ADJ':
        # Check if there is a next token and if it's a noun
        if i + 1 < len(doc) and doc[i+1].pos_ == 'NOUN':
            adj_noun_pairs.append((token.lemma_, doc[i+1].lemma_))

print("Extracted (Adjective, Noun) pairs:")
print(adj_noun_pairs)

Extracted (Adjective, Noun) pairs:
[('new', 'phone'), ('fantastic', 'camera')]


## Task 2: GDPR Redactor

### Subtask:
Create a Python function `redact_sensitive_info(text)` using spaCy that detects and replaces PERSON and GPE entities with '[REDACTED]' from the provided text.


In [9]:
def redact_sensitive_info(text):
    doc = nlp(text)
    redacted_text = text
    # Iterate over entities in reverse order to avoid issues with index changes
    for ent in sorted(doc.ents, key=lambda x: x.start_char, reverse=True):
        if ent.label_ == 'PERSON' or ent.label_ == 'GPE':
            redacted_text = redacted_text[:ent.start_char] + '[REDACTED]' + redacted_text[ent.end_char:]
    return redacted_text

# Test the function with the provided input text
input_text_task2 = "Startups in Bangalore are growing fast. Mr. Rajesh Kumar met with Sarah Jones in London to discuss the new venture."
redacted_output = redact_sensitive_info(input_text_task2)

print("Original Text:")
print(input_text_task2)
print("\nRedacted Text:")
print(redacted_output)

Original Text:
Startups in Bangalore are growing fast. Mr. Rajesh Kumar met with Sarah Jones in London to discuss the new venture.

Redacted Text:
Startups in [REDACTED] are growing fast. Mr. [REDACTED] met with [REDACTED] in [REDACTED] to discuss the new venture.
