Dropping duplicate names
To list each dog name only once, we use drop_duplicates(subset='name'). However, this removes dogs with the same name but different breeds (e.g., Max the Labrador). To fix this, we must consider more details and drop duplicates based on name, breed, and weight_kg instead.

In [6]:
import pandas as pd

# Sample data with duplicated name, breed, and weight_kg
data = {
    'date': pd.date_range(start='2025-01-01', periods=20, freq='D'),
    'name': ['Buddy', 'Max', 'Bella', 'Charlie', 'Lucy'] * 4,
    'breed': ['Labrador', 'Poodle', 'Bulldog', 'Beagle', 'Poodle'] * 4,
    'weight_kg': [30, 10, 25, 20, 10] * 4  # notice 'Poodle' with 10kg appears multiple times
}

df = pd.DataFrame(data)

unique = df.drop_duplicates(subset=["name","breed"])
# Show DataFrame
print(df.drop_duplicates(subset="name"))
print(unique)


        date     name     breed  weight_kg
0 2025-01-01    Buddy  Labrador         30
1 2025-01-02      Max    Poodle         10
2 2025-01-03    Bella   Bulldog         25
3 2025-01-04  Charlie    Beagle         20
4 2025-01-05     Lucy    Poodle         10
        date     name     breed  weight_kg
0 2025-01-01    Buddy  Labrador         30
1 2025-01-02      Max    Poodle         10
2 2025-01-03    Bella   Bulldog         25
3 2025-01-04  Charlie    Beagle         20
4 2025-01-05     Lucy    Poodle         10


. Easy as 1, 2, 3
To count the dogs of each breed, we'll subset the breed column and use the value_counts method. We can also use the sort argument to get the breeds with the biggest counts on top.

In [7]:
import pandas as pd

# Sample data with duplicated name, breed, and weight_kg
data = {
    'date': pd.date_range(start='2025-01-01', periods=20, freq='D'),
    'name': ['Buddy', 'Max', 'Bella', 'Charlie', 'Lucy'] * 4,
    'breed': ['Labrador', 'Poodle', 'Bulldog', 'Beagle', 'Poodle'] * 4,
    'weight_kg': [30, 10, 25, 20, 10] * 4  # notice 'Poodle' with 10kg appears multiple times
}

df = pd.DataFrame(data)
print(df["name"].value_counts())

name
Buddy      4
Max        4
Bella      4
Charlie    4
Lucy       4
Name: count, dtype: int64


In [8]:
import pandas as pd

# Sample data with duplicated name, breed, and weight_kg
data = {
    'date': pd.date_range(start='2025-01-01', periods=20, freq='D'),
    'name': ['Buddy', 'Max', 'Bella', 'Charlie', 'Lucy'] * 4,
    'breed': ['Labrador', 'Poodle', 'Bulldog', 'Beagle', 'Poodle'] * 4,
    'weight_kg': [30, 10, 25, 20, 10] * 4  # notice 'Poodle' with 10kg appears multiple times
}

df = pd.DataFrame(data)
print(df["name"].value_counts(normalize=True))

name
Buddy      0.2
Max        0.2
Bella      0.2
Charlie    0.2
Lucy       0.2
Name: proportion, dtype: float64
