# Species names
Some species names, particularly for subspecies, pairs, and hybrids are long.
This notebook looks at ways to shorten them.

## History

2025-05-04 Initial version

In [1]:
from django.db.models import Count
from django.db.models.functions import Length
from django.template.loader import render_to_string

from IPython.display import display, HTML

from data.models import Species

In [2]:
# Top fifty longest names
longest = Species.objects.annotate(length=Length('common_name'), observed=Count('observations')).order_by("-length")[:50]

table = render_to_string("notebooks/species-names.html", {"species": longest})
display(HTML(table))

Name,No. of characters,No. of observations
African Collared-Dove (Domestic type or Ringed Turtle-Dove),59,17
Western Yellow Wagtail (iberiae/cinereocapilla/pygmaea),55,11
Willow Warbler/Common Chiffchaff/Iberian Chiffchaff,51,8
Lesser Black-backed Gull (intermedius/graellsii),48,9
European/African/Eastern Red-rumped Swallow,43,3
Caspian/European Herring/Yellow-legged Gull,43,1
Accipitrine hawk sp. (former Accipiter sp.),43,13
Great Black-backed x Glaucous Gull (hybrid),43,1
Great Spotted Woodpecker (Great Spotted),40,16
Lesser Black-backed Gull (intermedius),38,3


# Results

There are only 8 names longer than 40 characters, 40 names longer than 30 characters.
In many cases the longer names are for subspecies so there is not the same scope for
reducing the length as there was for observer or location names.

Some systematic changes could be made however:

1. Hyrid species are identified by the canonical 'x', e.g. 'Great Black-backed x
   Glaucous Gull (hybrid)' so '(hybrid)' could be removed since it is redundant.

2. '(Domestic type)' could be worth removing since the species is clearly domesticated
   or part of a collection, e.g. Indian Peafowl. However the situation is not clear with
   Mallards where it is worthwhile distinguishing between wild and domestic birds, though
   in urban situations that distinction is blurred, by tame, wild birds.

3. The prefixes 'Eurasian', 'European', 'Common' and 'Western' could in most cases
   be removed since there is normally only one species in the region, e.g. 'Eurasian Skylark',
   or 'European Turtle-Dove'. Even when there is a similar species, 'Common Chiffchaff'
   and 'Iberian Chiffchaff', or 'Eurasian Kestrel' and 'Lesser Kestrel' the prefix can
   be removed on the former without adding too much ambiguity. Clearly there are some
   exceptions, e.g. 'Common Tern' or 'European Storm-petrel'.

Some long names are from incorrect identifications:

* Long-tailed Tit (europaeus Group) is from northern Europe and unlikely to occur here.
* Similarly, Long-tailed Tit (alpinus Group) is from the south edge of the Caspian Sea.

Some long names are from incorrectly imprecise identifications

* Common Tern (hirundo/tibetans). The tibetana subspecies of Common Tern is from
  the Himalayas / Mongolia.

* Western Yellow Wagtail (iberiae/cinereocapilla/pygmaea). The cinereocapilla subspecies
  is from Italy and the Adriatic, while pygmaea is fronm the Nile river and is resident.

* Caspian/European Herring/Yellow-legged Gull - there are only a handful of Caspian Gull
  reports each year. Sure it's a possibility but then juveniles of every gull species is
  also a possibility.

These could be manually edited to the change the species, though this would need the list
of species recorded to be reviewed regularly. This does introduce an editorial step which
would be better if it was avoided - misidentifications are really a problem for eBird's
moderators.

# Conclusion

There is some scope for editing the species common name when adding a record to the database,
but a species only added once, and given the number of exceptions that would need to be taken
into account, it is probably simpler, easier and more reliable to edit the names in the Django
Admin.