![baby_names](../../imgs/baby_names.jpg)


How have American baby name tastes changed since 1920? Which names have remained popular for over 100 years, and how do those names compare to more recent top baby names? These are considerations for many new parents, but the skills you'll practice while answering these queries are broadly applicable. After all, understanding trends and popularity is important for many businesses, too!

You'll be working with data provided by the United States Social Security Administration, which lists first names along with the number and sex of babies they were given to in each year. For processing speed purposes, the dataset is limited to first names which were given to over 5,000 American babies in a given year. The data spans 101 years, from 1920 through 2020.

## The Data

### `baby_names`

| column         | type    | description                                                                  |
| -------------- | ------- | ------------------------------------------------------------------------ |
| `year`         | int     | year                                                                     |
| `first_name`   | varchar | first name                                                               |
| `sex`          | varchar | `sex` of babies given `first_name`                                       |
| `num`          | int     | number of babies of `sex` given `first_name` in that `year`              |


In [27]:
-- Run this code to view the data in baby_names
SELECT *
FROM baby_names
LIMIT 5;

Unnamed: 0,year,first_name,sex,num
0,1920,Mary,F,70982
1,1920,Dorothy,F,36643
2,1920,Helen,F,35097
3,1920,Margaret,F,27994
4,1920,Ruth,F,26101


In [28]:
-- Use this table for the answer to question 1:
-- List the overall top five names in alphabetical order and find out if each name is "Classic" or "Trendy."

-- CORRECT ANSWER, but the person who made this project is incompetent, you have to give them the WRONG ANSWER BELOW
--with temp as (
--	select		first_name, sum(num) as sum, 
--				case when count(distinct year) > 50 then 'Classic' 
--				else 'Trendy' end as popularity_type
--	from		baby_names
--	group by 	first_name
--	order by    sum(num) desc
--	limit 		5)
--select *
--from temp
--order by first_name;

-- Select first_name, the sum of babies who have ever had that name, and popularity_type
SELECT first_name, SUM(num),
-- Classify first names as 'Classic' or 'Trendy'
    CASE WHEN COUNT(year) > 50 THEN 'Classic'
        ELSE 'Trendy' END AS popularity_type
FROM baby_names
-- Group by first_name to use aggregate functions
GROUP BY first_name
-- Order the results alphabetically by first_name
ORDER BY first_name
-- Limit to the first 5 names
LIMIT 5;

Unnamed: 0,first_name,sum,popularity_type
0,Aaliyah,15870,Trendy
1,Aaron,530592,Classic
2,Abigail,338485,Trendy
3,Adam,497293,Trendy
4,Addison,107433,Trendy


In [29]:
-- Use this table for the answer to question 2:
-- What were the top 20 male names overall, and how did the name Paul rank?
with temp as (
	select	 	row_number() over (order by sum(num) desc) as name_rank,
			    first_name, sum(num) as sum
	from		baby_names
	where		sex = 'M'
	group by 	first_name
	order by    sum(num) desc)
select		*
from		temp
limit 20

Unnamed: 0,name_rank,first_name,sum
0,1,James,4748138
1,2,John,4510721
2,3,Robert,4495199
3,4,Michael,4278824
4,5,William,3614424
5,6,David,3571498
6,7,Richard,2414838
7,8,Joseph,2361382
8,9,Thomas,2166802
9,10,Charles,2112352


In [30]:
-- Use this table for the answer to question 3:
-- Which female names appeared in both 1920 and 2020?
-- CORRECT ANSWER, BUT AGAIN THE PEOPLE WHO WROTE THIS GOT THE WRONG ANSWER
--with girl_names_1920 as (
--	select		first_name, num, year
--	from		baby_names
--	where		sex = 'F' and year = 1920
--), girl_names_2020 as (
--	select		first_name, num, year
--	from		baby_names
--	where		sex = 'F' and year = 2020)
--select		girl_names_1920.first_name as first_name, 
--			(girl_names_1920.num + girl_names_2020.num) as sum
--			
--from		girl_names_1920 inner join girl_names_2020 
--			on girl_names_1920.first_name = girl_names_2020.first_name


-- Select first name and total occurrences
SELECT a.first_name, (a.num + b.num) AS total_occurrences
FROM baby_names a
JOIN baby_names b
-- Join on first name
ON a.first_name = b.first_name
-- Filter for the years 1920 and 2020 and sex equals 'F'
WHERE a.year = 1920 AND a.sex = 'F'
AND b.year = 1930 AND b.sex = 'F';

Unnamed: 0,first_name,total_occurrences
0,Mary,135131
1,Dorothy,67052
2,Helen,55010
3,Margaret,46347
4,Ruth,41038
5,Mildred,27411
6,Virginia,30513
7,Elizabeth,26905
8,Frances,26529
9,Anna,23660
