# Victoria - UK name comparison
The question was put to me - what is a classic Australian name? My answer was that they're mostly the same as British names. But are they really? Are there any names that are ubiquitous in one country but unheard of in the other? This experiment will compare the top 100 boys and girls names from 2017 from Victoria, Australia with England and Wales to find any similarities and differences. Unfortunately the data for Scotland and Northern Ireland is published separately so is not incorporated in this, but there is scope for this to be added in the future.

In [74]:
import pandas as pd

[Victorian data](https://www.bdm.vic.gov.au/popular-baby-names-in-victoria-2017-data) is from the Registry of Births, Deaths and Marriages Victoria

In [75]:
df = pd.read_excel('Top 100 Baby Names 2017.xlsx', header=2)

Split the DataFrame into separate boys and girls DataFrames

In [76]:
victoria_boys = df.copy()[['Position', 'Name', 'Count']]
victoria_girls = df.copy()[['Position.1', 'Name.1', 'Count.1']]

Rename the indexes on the girls DataFrame

In [77]:
victoria_girls = victoria_girls.rename(columns={"Position.1": "Position", "Name.1": "Name", "Count.1": "Count"})

Need to transform the Victorian data to uppercase to compare with UK data

In [78]:
victoria_boys['Name'] = victoria_boys['Name'].str.upper()
victoria_girls['Name'] = victoria_girls['Name'].str.upper()

Data for England and Wales is provided by the Office for National Statistics. There are separate datasets for boys and girls. These statistics are more in depth than those provided in Victoria, but we'll stick with just the rankings.

[Boys](https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/livebirths/datasets/babynamesenglandandwalesbabynamesstatisticsboys)

[Girls](https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/livebirths/datasets/babynamesenglandandwalesbabynamesstatisticsgirls)

Import files, normalise columns and trim to top 100 to match Victorian data

In [79]:
uk_girls = pd.read_excel('2017girlsnames.xls', sheetname='Table 6', header=5)
uk_girls = uk_girls[['Rank', 'Name', 'Count3']]
uk_girls = uk_girls.rename(columns={"Count3": "Count", "Rank": "Position"})
uk_girls['Name'] = uk_girls['Name'].str.strip()
uk_girls = uk_girls[:100]

uk_boys = pd.read_excel('2017boysnames.xls', sheetname='Table 6', header=5)
uk_boys = uk_boys[['Rank', 'Name', 'Count3']]
uk_boys = uk_boys.rename(columns={"Rank": "Position", "Count3": "Count"})
uk_boys['Name'] = uk_boys['Name'].str.strip()
uk_boys = uk_boys.head(100)

### Boys analysis

Firstly let's merge the two datasets on name. This will give us all the names that are common to both datasets.

In [80]:
merged_boys = pd.merge(uk_boys, victoria_boys, on=['Name'])
merged_boys['Name']

0        OLIVER
1         HARRY
2        GEORGE
3          NOAH
4          JACK
5         JACOB
6           LEO
7         OSCAR
8       CHARLIE
9      MUHAMMAD
10      WILLIAM
11        HENRY
12       THOMAS
13       JOSHUA
14        JAMES
15       ARCHIE
16       ARTHUR
17        LOGAN
18    ALEXANDER
19       EDWARD
20        ISAAC
21        LUCAS
22        ETHAN
23          MAX
24       JOSEPH
25       SAMUEL
26       DANIEL
27     BENJAMIN
28     HARRISON
29    SEBASTIAN
        ...    
37      ZACHARY
38         TOBY
39         HUGO
40         JUDE
41        JAXON
42         LUCA
43         JAKE
44      GABRIEL
45       HARVEY
46      MATTHEW
47      MICHAEL
48       JAYDEN
49      CHARLES
50      JACKSON
51         LUKE
52        CALEB
53       HUNTER
54        LOUIS
55         RYAN
56        BLAKE
57        LEWIS
58       NATHAN
59        JESSE
60         LIAM
61          KAI
62        TYLER
63         FINN
64       AUSTIN
65         LEON
66        FELIX
Name: Name, Length: 67, 

There are 67 boys names that were common in the top 100 of both datasets. Let's calculate their average position to show their combined popularity, then show the top 10 names by combined popularity

In [81]:
merged_boys['Popularity'] = (merged_boys['Position_x'] + merged_boys['Position_y']) / 2
merged_boys[['Name', 'Popularity']].sort_values(by='Popularity')[:10]

Unnamed: 0,Name,Popularity
0,OLIVER,1.0
3,NOAH,4.0
4,JACK,4.0
10,WILLIAM,6.5
12,THOMAS,9.0
8,CHARLIE,9.0
1,HARRY,9.5
6,LEO,10.0
11,HENRY,10.5
14,JAMES,11.5


How about names that are popular in Australia but not in England and Wales?

In [82]:
unique_victoria_boys = victoria_boys[~victoria_boys['Name'].isin(uk_boys['Name'])]
unique_victoria_boys

Unnamed: 0,Position,Name,Count
11,12,XAVIER,263
21,22,LEVI,216
28,29,LACHLAN,188
29,30,PATRICK,186
30,31,HUDSON,184
33,34,ARCHER,183
36,37,COOPER,180
40,41,JORDAN,150
45,46,AIDEN,143
47,48,DARCY,141


And vice versa

In [83]:
unique_uk_boys = uk_boys[~uk_boys['Name'].isin(victoria_boys['Name'])]
unique_uk_boys

Unnamed: 0,Position,Name,Count
11,12,ALFIE,3287.0
15,16,FREDDIE,3127.0
20,21,THEO,2616.0
29,30,MOHAMMED,1982.0
30,31,FINLEY,1944.0
38,39,TEDDY,1626.0
43,44,DAVID,1416.0
45,46,LOUIE,1365.0
48,49,REUBEN,1309.0
50,51,REGGIE,1280.0


Obviously as the search is confined to the top 100 names, these results will misrepresent the popularity of some names across the datasets. For example, Nicholas is ranked 76 in the Victorian dataset and is not in the UK dataset at all. However, if you look at the raw data, Nicholas ranks at 147 in England and Wales, so is still fairly common there.

Most of the unique names also appear fairly low down in the rankings. There's only a handful of names in the top 50 that are unique:

In [84]:
unique_victoria_boys[unique_victoria_boys['Position'] < 50]

Unnamed: 0,Position,Name,Count
11,12,XAVIER,263
21,22,LEVI,216
28,29,LACHLAN,188
29,30,PATRICK,186
30,31,HUDSON,184
33,34,ARCHER,183
36,37,COOPER,180
40,41,JORDAN,150
45,46,AIDEN,143
47,48,DARCY,141


In [85]:
unique_uk_boys[unique_uk_boys['Position'] < 50]

Unnamed: 0,Position,Name,Count
11,12,ALFIE,3287.0
15,16,FREDDIE,3127.0
20,21,THEO,2616.0
29,30,MOHAMMED,1982.0
30,31,FINLEY,1944.0
38,39,TEDDY,1626.0
43,44,DAVID,1416.0
45,46,LOUIE,1365.0
48,49,REUBEN,1309.0


### Girls analysis

In [86]:
merged_girls = pd.merge(uk_girls, victoria_girls, on=['Name'])
merged_girls['Name']

0        OLIVIA
1        AMELIA
2          ISLA
3           AVA
4         EMILY
5      ISABELLA
6           MIA
7         POPPY
8          ELLA
9          LILY
10       SOPHIA
11    CHARLOTTE
12        GRACE
13         EVIE
14      JESSICA
15       SOPHIE
16        ALICE
17        DAISY
18     FLORENCE
19        FREYA
20       PHOEBE
21       EVELYN
22       SIENNA
23     ISABELLE
24          IVY
25       WILLOW
26      MATILDA
27        ELSIE
28         RUBY
29     SCARLETT
        ...    
37         MAYA
38        ELIZA
39         LOLA
40    ELIZABETH
41         ARIA
42         LUNA
43         LUCY
44        ELLIE
45      HARRIET
46         EMMA
47      ELEANOR
48     PENELOPE
49        HOLLY
50       HANNAH
51        BELLA
52         ROSE
53       VIOLET
54      GEORGIA
55        LILLY
56      JASMINE
57    ANNABELLE
58         ZARA
59      ABIGAIL
60         MILA
61         ANNA
62       AURORA
63        HEIDI
64       SUMMER
65     VICTORIA
66       BONNIE
Name: Name, Length: 67, 

Coincidentally, the number of common names is the same as the boys dataset - 67. Let's look at combined popularity again

In [87]:
merged_girls['Popularity'] = (merged_girls['Position_x'] + merged_girls['Position_y']) / 2
merged_girls.sort_values(by='Popularity')[:10][['Name', 'Popularity']]

Unnamed: 0,Name,Popularity
0,OLIVIA,1.5
1,AMELIA,2.5
3,AVA,4.0
2,ISLA,6.0
6,MIA,6.0
11,CHARLOTTE,6.5
8,ELLA,9.5
5,ISABELLA,9.5
4,EMILY,10.0
12,GRACE,10.5


Common Victorian names that don't feature in the UK dataset

In [88]:
victoria_girls[~victoria_girls['Name'].isin(uk_girls['Name'])]

Unnamed: 0,Position,Name,Count
5,6,ZOE,299
27,28,AUDREY,181
34,35,HAZEL,149
36,37,BILLIE,135
43,44,FRANKIE,105
44,45,STELLA,100
46,47,PIPER,97
51,52,MACKENZIE,88
53,54,SARAH,87
55,56,MADDISON,85


Common UK names not in the Victorian dataset

In [89]:
uk_girls[~uk_girls['Name'].isin(victoria_girls['Name'])]

Unnamed: 0,Position,Name,Count
34,35,ROSIE,1388.0
36,37,MILLIE,1372.0
41,42,ESME,1235.0
44,45,ERIN,1161.0
45,46,MAISIE,1142.0
52,53,THEA,1041.0
57,58,MOLLY,978.0
60,61,AMBER,915.0
65,66,DARCIE,868.0
66,67,NANCY,861.0


Probably the most significant finding there is Zoe - which is the highest unique name out of both boys and girls. It actually does feature at 105 in the UK dataset, but it's still clearly vastly more popular in Victorian than England and Wales.