# Project 1: What's in a name?

## Deliverables
1. Use the provided template to submit your case study. The template has three sections:
2. A short summary that describes the results of the project and the tools you used. (Think “elevator pitch”.)
3. Answers to the grand questions. Each answer should include a written description of your results, and may also include charts or tables.
4. An appendix that provides your commented code. Your code comments should justify any decisions you had to make while programming.

In [None]:
# Libraries
import pandas as pd
import altair as alt
import numpy as np
# alt.data_transformers.enable('json')

In [None]:
# Read in data
dat = pd.read_csv("https://raw.githubusercontent.com/byuidatascience/data4names/master/data-raw/names_year/names_year.csv")

# Data info: https://github.com/byuidatascience/data4names/blob/master/data.md

In [None]:
# Clean Data

## Grand Question 1: 
How does your name at your birth year compare to its use historically?

In [None]:
q1 = dat.query('name == "Brigham"')
q1_1 = dat[['name', 'year', 'Total']].query('name == "Brigham" & year == 1998')
# q1_1
# q1_1[0:-3]


In [None]:
# Altair Charts

brigham_data = pd.DataFrame({
        "year":[1998],
        "name":["Brigham"],
        "label":["Birth Year"],
        "y":[80]
})

yearchart = (alt.Chart(q1)
                .mark_line()
                .encode(
                        x='year:O',
                        y='Total',)
                .properties(width = 600, title = "Usage of Brigham Over Time")
)
# yearchart

line = (alt.Chart(brigham_data)
                .mark_rule()
                .encode(x = alt.X("year:O"))
)

text = (alt.Chart(brigham_data)
        .mark_text(dx = -70, 
                   dy = -10, 
                   color = "black")
        .encode(
                x='year:O', 
                text = alt.condition(
                        'datum.year == 1998',
                        alt.value('This is when I was born :D \nThere were 30 at the time, \nI was the 31st!'),
                        alt.value('')
        ))
        .properties(width = 600, title = "Usage of Brigham Over Time")

)
# m1

bigchart = yearchart + line + text
bigchart

bigchart.save('gq1chart.png')


## Grand Question 2:
If you talked to someone named Brittany on the phone, what is your guess of his or her age? What ages would you not guess?

In [None]:
brittany = (
    dat
    .query('name == "Brittany" and year > 1980')
    .assign(age = lambda x: abs(x.year - 2021) )
    .filter(["name", "year", "Total", "age"])
)
# brittany

brittany_chart = (
    alt.Chart(brittany)
    .mark_bar()
    .encode(
        x = alt.X('year:O'),
        y = alt.Y('Total:Q')
    )
    .properties(title = "Brittany Age")
)
# brittany_chart

brittany_chart.save('gq2chart.png')

## Grand Question 3: 
Mary, Martha, Peter, and Paul are all Christian names. From 1920 to 2000, compare the name usage of each of the four names.

In [None]:
list3 = ["Mary", "Martha", "Peter", "Paul"]
dat3 = dat.query('name == @list3')

# q1 = dat.query('name == "Brigham"')
# q1_1 = dat.query('name == "Brigham" and year == "1998"')

chart3 = (
alt.Chart(dat3)
    .mark_line().encode(
        x = alt.X('year:O', axis = alt.Axis(title = "Year", format = 'd')),
        y = alt.Y('Total:Q', title = "Name Count"), 
        color = "name"
        )
        .properties(width = 500, title = "Martha, Mary, Paul, and Peter Names Over the Years")
)
chart3
# chart3.save('gq3chart.png')

## Grand Question 4: 
Think of a unique name from a famous movie. Plot the usage of that name and see how changes line up with the movie release.

In [None]:
chart_4 = pd.DataFrame({
        "year1":[1981],
        "year2":[1984],
        "year3":[1989], 
        "year4":[2008]
})

dat3= (
    dat
    .query('name == "Harrison"')
    # .groupby('year')
    # .agg(total_year = ('name', sum))
)
# dat3

chart4 = (
alt.Chart(dat3)
    .mark_line()
        .encode(
        x = alt.X('year:O', title = "Year"),
        y = alt.Y('Total:Q', title = "Count of Name")
    )
    .properties(title = "Use of 'Harrison' Over Time (Movies Shown With Black Line)", width = 500)
)
# chart4

line1 = (alt.Chart(chart_4)
                .mark_rule()
                .encode(x = alt.X('year1:O'))
)

line2 = (alt.Chart(chart_4)
                .mark_rule()
                .encode(x = alt.X('year2:O'))
)

line3 = (alt.Chart(chart_4)
                .mark_rule()
                .encode(x = alt.X('year3:O'))
)

line4 = (alt.Chart(chart_4)
                .mark_rule()
                .encode(x = alt.X('year4:O'))
)

final_chart = chart4 + line1 + line2 + line3 + line4
# final_chart

# final_chart.save('gq4chart.png')

In [195]:
chart_4_2 = pd.DataFrame({
        "year1":[1977],
        "year2":[1980],
        "year3":[1983]
})

dat3= (
    dat
    .query('name == "Harrison"')
)
# dat3

chart4_2 = (
alt.Chart(dat3)
    .mark_line()
        .encode(
        x = alt.X('year:O', title = "Year"),
        y = alt.Y('Total:Q', title = "Count of Name")
    )
    .properties(title = "Use of 'Harrison' Over Time (Movies Shown With Black Line)", width = 500)
)
# chart4

line1 = (alt.Chart(chart_4_2)
                .mark_rule()
                .encode(x = alt.X('year1:O'))
)

line2 = (alt.Chart(chart_4_2)
                .mark_rule()
                .encode(x = alt.X('year2:O'))
)

line3 = (alt.Chart(chart_4_2)
                .mark_rule()
                .encode(x = alt.X('year3:O'))
)

final_chart_2 = chart4 + line1 + line2 + line3 + final_chart
# final_chart_2

final_chart_2.save('gq4_2chart.png')