#### Anonymizing Organization Names

This notebook demonstrates how to anonymize organization names in a text dataset using spaCy's NLP model.

- Load the dataset.
- Anonymize organization names using spaCy to detect organization names
- Display anonymized data.

Let's begin!


#### Load Dataset

Read the dataset containing text data.


In [None]:

import spacy
import pandas as pd

# Load the dataset
data = pd.read_csv('/drive/datasets/climatebert-climate-detection.csv')
data.head()  # Display the first few rows of the dataset


#### Load NLP Model

Load a pre-trained spaCy model for entity recognition.


In [None]:

# Load a pre-trained NLP model
nlp = spacy.load('en_core_web_sm')  # Efficient small English model


#### Anonymize Organization Names

Replace organization names with '[ANONYMIZED]'.


In [None]:

# Anonymize organization names directly within the DataFrame
for idx, text in enumerate(data['text']):
    doc = nlp(text)
    anonymized_text = text
    for ent in doc.ents:
        if ent.label_ == 'ORG':  # Identify organization entities
            anonymized_text = anonymized_text.replace(ent.text, '[ANONYMIZED]')
    data.at[idx, 'text'] = anonymized_text  # Update the text with anonymized version

# Display a subset of data to see the anonymized dataset
data.head()


#### Conclusion

We learned how to:
- Use spaCy to detect and anonymize organization names in text data.

These skills are essential for maintaining data privacy in text datasets.

Great job!
