In this example, I will try to analyze the popularity of names from the wonderful series of Game Of Thrones.

1) Get data from the Social Security site https://www.ssa.gov/oact/babynames/limits.html

2) Analyze names popularity across the show's popularity (before, during the show, after)

3) Compute most popular names for the last year (if this data exist!)

Let's get started!

In [5]:
# Importing necessary libs

import os
import zipfile
import numpy as np
import matplotlib.pyplot as pp
import pandas as pd
import seaborn

In [6]:
# Setting config for matplotlib

%matplotlib inline 

In [10]:
# Extracting data from the zip archive

zipfile.ZipFile('names.zip').extractall('names')

In [13]:
# Checking extracted data in names folder
# An example, we get the first 10 files from names folder

os.listdir('names')[:10]

['yob2000.txt',
 'yob2014.txt',
 'yob1938.txt',
 'yob1910.txt',
 'yob1904.txt',
 'yob1905.txt',
 'yob1911.txt',
 'yob1939.txt',
 'yob2015.txt',
 'yob2001.txt']

In [12]:
# Let's check data in files
# For example, we read data from yob2018.txt file, first 10 lines

open('names/yob2018.txt', 'r').readlines()[:10]

['Emma,F,18688\n',
 'Olivia,F,17921\n',
 'Ava,F,14924\n',
 'Isabella,F,14464\n',
 'Sophia,F,13928\n',
 'Charlotte,F,12940\n',
 'Mia,F,12642\n',
 'Amelia,F,12301\n',
 'Harper,F,10582\n',
 'Evelyn,F,10376\n']

In [16]:
# Let's try to work with data

names2018 = pd.read_csv('names/yob2018.txt', names=['name', 'sex', 'number'])

In [17]:
# Checking data
names2018.head()

Unnamed: 0,name,sex,number
0,Emma,F,18688
1,Olivia,F,17921
2,Ava,F,14924
3,Isabella,F,14464
4,Sophia,F,13928


In [18]:
# Let's get data for analysis from files
# As we know the show started in April 2011 until May 2019
# and our range are between 2011 and 2019 (unfortunately, we have data till 2018!)

names_list = []

for year in range(2011, 2018+1):
    names_list.append(pd.read_csv(f'names/yob{year}.txt', names=['name', 'sex', 'number']))
    names_list[-1]['year'] = year
    
years = pd.concat(names_list)

In [19]:
# Checking top result

years.head()

Unnamed: 0,name,sex,number,year
0,Sophia,F,21842,2011
1,Isabella,F,19910,2011
2,Emma,F,18803,2011
3,Olivia,F,17322,2011
4,Ava,F,15503,2011


In [20]:
# Checking bottom result

years.tail()

Unnamed: 0,name,sex,number,year
32028,Zylas,M,5,2018
32029,Zyran,M,5,2018
32030,Zyrie,M,5,2018
32031,Zyron,M,5,2018
32032,Zzyzx,M,5,2018


Now, we finished our first part of the analysis.

Time to analyze data which we got in previous steps.