<h2>US Baby Names 1880-2016</h2>

<p>The United States Social Security Administration (SSA) has made available data on the
frequency of baby names since 1880 for 2016. These data files can be obtained here:<a>https://www.ssa.gov/oact/babynames/limits.html</a></p></br>
<h4>what do we do?</h4></br>
<p>There many things, we might want to do with the data set:</p>
<ol>
  <li>Verify the total birth by gender in each year.</li>
  <li>Visualize the proportion of babies given a particular name (your own, or another name) over time.</li>
  <li>Determine the relative rank of a name.</li>
  <li>Determine the most popular names in each year or the names with largest increases or decreases.</li>
  <li>Analyze external sources of trends: biblical names, celebrities, demographic changes.</li>
  <li>Many others things.</li>
</ol>

<h4>Let's go to data</h4>

In [None]:
# I used the UNIX head command to look at the first 10 lines of one of the files.
!head -n 10 '../data/names/yob1880.txt'

In [None]:
# Using padas for read data.
# make one list for be user data's header
ubabys = ['name', 'sex', 'births']

# read data for pandas.
names1880 = pd.read_csv('../data/names/yob1880.txt', names= ubabys)

In [None]:
# showing the first ten to verify if be right
names1880.head(10)

In [None]:
# Now, we grouping total births by sex 
names1880.groupby('sex').births.sum()

In [None]:
# Now, we'll go togeth all of the data into a single DataFrame and further to add a year field, using pandas.

# make one range for new column of the data. 2016 is the last available year right now.
years = range(1880, 2016)

In [None]:
files = [] # list for save archive data
columns = ['name', 'sex', 'births'] # list for header.

In [None]:
# for each year (1880 for 2016) read one file.
for year in years:
    path = '../data/names/yob%d.txt' % year 
    frame = pd.read_csv(path, names=columns)
    frame['year'] = year # add new column.
    files.append(frame) # add file in files list.

In [None]:
# showing the first five to verify if be right
files[:2]

In [None]:
# Concatenate everything into a single DataFrame
names = pd.concat(files, ignore_index=True)

In [None]:
# showing the first ten to verify if be right
names.head(10)

## Manipuling data and solving the asks

### 1. Verify the total birth by gender in each year.
<p>Now, we can already start aggregating the data at the year and sex level. We can do this by groupby or pivot table.</p>

In [None]:
# Here we use grpou table for grouping data.
total_births = names.groupby(['year', 'sex']).births.sum()
total_births.tail()

In [None]:
# plotting data for better view.
total_births.plot(title='Total births by sex and year')

In [None]:
# Here we use pivot table for grouping data.
total_births = names.pivot_table('births', index='year',columns='sex', aggfunc=sum)
total_births.tail()

In [None]:
# plotting data for better view.
total_births.plot(title='Total births by sex and year')

### 2. Visualize the proportion of babies given a particular name (your own, or another name) over time
<p>Now, we group the data by year and sex, then add the new column to each group. For this we do make one function because we would used before.</p>

In [None]:
# function for add do fraction of babies names relative to the total number of births.
def add_prop(group):
    # Integer division floors
    births = group.births.astype(float) # get births e change field for float type.
    group['prop'] = births / births.sum() # add field prop in group.
    return group

In [None]:
# Now, we are going to add field 
names = names.groupby(['year', 'sex']).apply(add_prop)

In [None]:
# showing the first five. The function groupby had already ordened.
names.head()

In [None]:
# Checking sum of the prop is equals 1, for verify if funcfion 'add_prop' was made right = return True or mistake = return False.
np.allclose(names.groupby(['year', 'sex']).prop.sum(), 1)

In [None]:
# Now that this is done, I’m going to extract a subset of the data to facilitate further analysis.
def get_top1000(group):
    return group.sort_values(by='births', ascending=False)[:1000]

In [None]:
grouped = names.groupby(['year', 'sex'])
top1000 = grouped.apply(get_top1000)

In [None]:
top1000[-1:]

### Analyzing Naming Trends