Skip to content

Commit

Permalink
looked at name growth
Browse files Browse the repository at this point in the history
  • Loading branch information
aviezerl committed Aug 15, 2023
1 parent c5a96c5 commit cfa3be0
Show file tree
Hide file tree
Showing 3 changed files with 40 additions and 1 deletion.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,4 @@ docs/
data-raw/*.xlsx
docs
inst/doc
work/
4 changes: 4 additions & 0 deletions streamlit/streamlit_app.py
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,10 @@ def get_line_chart(data):
f"There where {total_male} male babies and {total_female} female babies named {name} from 1948 to 2021."
)

st.write(
f"Years that include less than 5 babies are shown as 0. Data was downloaded from the [Israeli Central Bureau of Statistics](https://www.cbs.gov.il/he/publications/LochutTlushim/2020/%D7%A9%D7%9E%D7%95%D7%AA-%D7%A4%D7%A8%D7%98%D7%99%D7%99%D7%9D.xlsx)."
)

st.write(
f"Additional analysis can be found [here](https://aviezerl.github.io/babynamesIL/articles/babynamesIL.html)"
)
36 changes: 35 additions & 1 deletion vignettes/articles/babynamesIL.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -216,7 +216,7 @@ babynamesIL %>%
Or we can use the matrices we created before to find patterns in the ratio between male and female over time:

```{r}
cluster_unisex_names <- function(sector, colors = colorRampPalette(c("blue", "white", "red"))(1000), epsilon = 1e-6) {
cluster_unisex_names <- function(sector, colors = colorRampPalette(c("blue", "white", "red"))(1000), epsilon = 1e-3) {
mat_M <- babynamesIL %>%
filter(sector == !!sector, sex == "M") %>%
tidyr::complete(sector, year, sex, name, fill = list(n = 0, prop = 0)) %>%
Expand Down Expand Up @@ -276,3 +276,37 @@ cluster_unisex_names("Muslim")
cluster_unisex_names("Christian")
cluster_unisex_names("Druze")
```

## Names that are growing in a short period of time

We can look at names that are growing in popularity in a short period of time, e.g. a single year.

```{r}
growth_names <- babynamesIL %>%
arrange(sector, sex, name, year) %>%
filter(lead(n) >= 100) %>% # take only names with at least 100 babies
group_by(sector, name, sex) %>%
mutate(next_n = lead(n), growth = next_n / n) %>%
ungroup() %>%
filter(growth >= 2) %>%
arrange(desc(growth))
head(growth_names)
nrow(growth_names)
```

Plot:

```{r, fig.width = 15, fig.height = 10}
growth_names %>%
filter(sector == "Jewish") %>%
rename(`Number of babies` = next_n) %>%
ggplot(aes(x=year + 1, y=growth, size = `Number of babies`, label = name, color = sex)) +
geom_point() +
theme_classic() +
tgutil::scale_y_log2() +
ggsci::scale_color_aaas() +
ggrepel::geom_text_repel(size = 6) +
scale_x_continuous(breaks = seq(1948, 2021, 5)) +
xlab("Year") +
ylab("Growth")
```

0 comments on commit cfa3be0

Please sign in to comment.