# Remove Duplicates using ensureIndex() in MongoDB

This notebook demonstrates how to remove duplicate agent records from a MongoDB collection by creating a unique compound index using the legacy `ensureIndex()` method with the `dropDups` option. This method enforces uniqueness on a combination of fields and automatically drops duplicate documents. 

**Important:** `dropDups` is deprecated in newer versions of MongoDB.

**This way needs to have fields that are 100% complete and varying to be used** .

In [None]:
import pymongo

# MongoDB connection string provided for the challenge
conn_str = "mongodb://hiring:12341234@srv07.properbird.com:27017"
client = pymongo.MongoClient(conn_str)

# Replace 'your_database' and 'your_collection' with the actual database and collection names
db = client['realEstateFr']
collection = db['agentSampleHiring']

# Check record count before deduplication
count_before = collection.count_documents({})
print('Record count before deduplication:', count_before)

In [None]:
# Create a unique compound index on key fields to enforce uniqueness
# Using ensure_index() with dropDups=True to drop duplicate documents
# Note: The dropDups option is deprecated in newer MongoDB versions.

index_name = collection.ensure_index([
    ("online_store_id", pymongo.ASCENDING), 
    ("online_store_name", pymongo.ASCENDING), 
    ("phone", pymongo.ASCENDING)
], unique=True, dropDups=True)

print('Created unique index:', index_name)

In [None]:
# Check record count after deduplication (duplicates dropped during index creation)
count_after = collection.count_documents({})
print('Record count after deduplication:', count_after)