# Lab 2: Population Analysis

Goals:

* Learn how to pick out subpopulations
* Analyze the binaries in your subpopulation, understand their evolution history and formation channels
* Learn how to vary the parameters of your population and produce different outcomes

___

In this lab we are going to look at a much larger population than what we have done so far. The population that we will look at consists of 1,000,000 binaries and although you now know how to run a population like this yourself, it can take something like an hour to run on an HPC facility, depending on available resources. So, we'll simply load in a pre-run population and analyze it below -- we'll keep things simple for now and stick to one metallicity, but later on in this lab you will see how to work with a multi-metallicity population. First up, let's import the `Population` class so that we can begin our analysis.

# 1. Managing Large Binary Populations

As we have alluded to, you will probably be running much larger populations than what we have been running when you use POSYDON in your own research. This is essential in order to get realistic statistics on the populations that you might be interested in. So, we will get some practice now how to handle very large populations. In the last lab, you may have gotten to the point of running a population on an HPC system. This essentially picks up from that point, but we will pretend that you ran a population of 1 million stars (even this is on the smaller side in comparison to real stellar populations in nature). Running populations this large will usually require HPC usage (unless you have a lot of time or a very fast PC!). On quest, a population this size completed within a few hours, but this timeframe can be more or less depending on your available system resources.

Thankfully, much of what you have seen already applies here as well -- just everything is bigger. Also, since we're using more than 10 or 100 stars, we can start to see some accurate physical results, based on the underlying physical assumptions made in POSYDON.

## 1.1 Loading Your Population

After you've run a population, you'll be met with its save file in your working directory. We have one prepared for you to load up and begin inspecting. 

In [None]:
from posydon.popsyn.synthetic_population import Population
from posydon.config import PATH_TO_POSYDON_DATA
import os

# This is just building the path to where our populations are saved for this lab.
# In your own research, you'd likely want to point to your own data set, where ever it is
data_path = os.path.join(os.path.dirname(PATH_TO_POSYDON_DATA), "2025_school_data/populations/1M_pops")

<div class='alert alert-success'>
    
#### Exercise 1: Load in the population

  (a) Load the population into memory using the path provided below.  
  (b) Print out the population's initial parameters to see the settings used to create it.  
  (c) Have a look at the history data frame (and a few binaries for fun, if you like).
    
</div>

#### Ex. 1: Parts (a) & (b)  

In [None]:
# Load the population 

path_to_pop = os.path.join(data_path, "1e+00_Zsun_population.h5")
pop = ### fill in ###

In [None]:
# Print the initial parameters

 ### fill in ###

<div class="alert alert-warning" style="margin-top: 20px">
<details>

<b><summary>Solution (click to reveal):</summary></b>

```python
pop = Population(path_to_pop)
pop.ini_params
```
</details>
</div>

In [None]:
pop.mass_per_metallicity

#### Ex. 1: Part (c)

In [None]:
cols = ['time', 'step_names', 'state', 'event', 'S1_state', 'S2_state', 'S1_mass', 'S2_mass', 'orbital_period']
pop.history[12039][cols]  # look at different binaries if you like

## 1.2 Masking Your Population

Often when you are using POSYDON you will be interested in looking at a particular type of stellar system that you are researching, or you will simply want to partition all of the population data up into more digestible pieces so that you can work on understanding things piece by piece. We will demonstrate how this can be done, and get familiar with population analysis by setting ourselves the task of picking out all of the binary black hole merger systems in our population and studying some of their basic properties.

You can use all the regular tricks that you might know to maniupulate your population's `DataFrame`s and get out the data that you want. In addition, the `history` of your `Population` has a nice function called `select` that lets you easily cut your `DataFrame` based on some logic that you define, or just specific data columns that you specify. This can be useful for a number of reasons; one of which is that you may notice our history `DataFrame` for this population is huge! If you try to look at it by running the code below, it should take several seconds (up to a minute) to run before it outputs the data to your notebook.

In [None]:
pop.history[cols]

Running around with all this data is cumbersome if you don't need it. We can cut it down to size using `select()` to pick out just the things we will need for now. For example, if our first task is to find all of the binary black holes in this population, we really only need the star states and the binary `event`. We can cut our `DataFrame` to include just those like this, with the `columns` argument of `select()`:

In [None]:
# temporarily store the cut history DataFrame for easy access/clarity
tmp_data = pop.history.select(columns=['S1_state', 'S2_state', 'event'])

If you try displaying this in your notebook, you should notice it takes much less time, as we are carrying around considerably less data.

In [None]:
tmp_data

With that out of the way, we can work on getting our merging binary black holes.

### 1.2.1 Designing a Simple BBH Mask

Above, we just saw how we can cut our data down to a more manageable size using the `select` function. This is one way to maks your data and pick out just what you want. However, since we are working with `DataFrame`s, we can utilize all of the same functionalities that you normally would for manipulating those data structures. In this section, you will design your own mask to select merging binary black hole systems using conventional `DataFrame` manipulation techniques and POSYDON's `select()` function.

<div class='alert alert-success'>
    
#### Exercise 2: Masking a population for merging black holes
 (a) Using the `event`, `S1_state`, and `S2_state`, define a mask to find the binary indices of all merging BBH systems.    
 (b) Try doing the same thing using the `pop.history.select()` function. More instructions on this below.

</div>
    
#### Ex. 2: Part (a)

In [None]:
# Define a mask to slice out just systems that have merging black holes:
S1_is_BH =   ### Fill this in ###
S2_is_BH =   ### Fill this in ###
is_merging =   ### Fill this in ###

mask = S1_is_BH & S2_is_BH & is_merging

# get the indices of all systems
indices = tmp_data.index

# Use your mask to get just the binary indices for BBH systems
BBH_indices = indices[mask].to_list()

<div class="alert alert-warning" style="margin-top: 20px">
<details>

<b><summary>Hint (click to reveal): </summary></b>

The event you want is called `CO_contact`.
</details>
</div>

<div class="alert alert-warning" style="margin-top: 20px">
<details>

<b><summary>Solution (click to reveal): </summary></b>

```python
# Define a mask to slice out just systems that have merging black holes:
S1_is_BH = tmp_data['S1_state'] == 'BH' 
S2_is_BH = tmp_data['S2_state'] == 'BH' 
is_merging = tmp_data['event'] == 'CO_contact' 

mask = S1_is_BH & S2_is_BH & is_merging

# get the indices of all systems
indices = tmp_data.index

# Use your mask to get just the binary indices for BBH systems
BBH_indices = indices[mask].to_list()
```
</details>
</div>

Now we should have all the indices of the systems that we are after. We can check by running this (it will take a bit of time to run again):

#### Below, check that you have successfully masked your population to contain only merging binary black hole systems:

In [None]:
pop.history[BBH_indices][cols]

It should look  right, but you won't be able to see everything since the `DataFrame` gets truncated. Just looking at one system:

#### Below, examine just a single BBH system's history DataFrame and check that it looks right:

In [None]:
pop.history[BBH_indices[0]][cols]


#### Ex. 2: Part (b)

Let's do all that another way, using the `select()` function. Besides specifying columns that you want to select from your `DataFrame`, you can also give `select()` a logic statement to pick out systems.

In [None]:
# We can make up just about anything we want for the logic -- its a string. 
# The string has to follow valid Python logic syntax, and we're restricted to utilize the data available in our history DataFrame though.
# Here's something simple to get merging BH-BH pairs:

logic_statement =  ### Fill this in ###

df = pop.history.select(where=logic_statement)

BBH_indices = df.index.to_list()

<div class="alert alert-warning" style="margin-top: 20px">
<details>

<b><summary>Solution (click to reveal): </summary></b>

```python
logic_statement = "(event == CO_contact) & (S1_state == BH) & (S2_state == BH)"

df = pop.history.select(where=logic_statement)

BBH_indices = df.index.to_list()
```
</details>
</div>

The two resulting list of indices should be the same. What we've done is to use the `select()` function to pick out just the rows where we have two black holes and the event is such that the two compact objects are merging. 

#### Take a look at your `select`ed `DataFrame` below to see this for yourself:

In [None]:
df

and if you use you run the same code as before to look at one of your merging BBH systems, you should see that the index list that you've found yields the exact same results as before:

In [None]:
pop.history[BBH_indices[0]][cols]

#### Does the history DataFrame above match the binary history that you found before in part (a)?

## 1.3 Exporting your population selection to a file

Now that we've picked out our BBH systems, we can save our selection to a file, making it easier to come back and work on things if we have to give it a rest at some point. To do this, we can use the `export_selection()` function of the `Population` class to save our BBH systems to a `.h5` file. It is as simple as:

In [None]:
pop.export_selection(BBH_indices, '1e+00_Zsun_BBH_contact_subpop.h5', append=True)

If you look inside of your working directory, you should find your new subpopulation file there. It's the same thing as a regular population file, just containing only the systems that you've selected: the merging binary black holes.

# 2. Analyzing a population of merging black holes

At this point, you probably know how to get started. Let's load our populations up, just like if we'd started a new day at the office and we're excited to find out what happened in our simulation of BBH systems.

In [None]:
BBH_pop = Population('1e+00_Zsun_BBH_contact_subpop.h5')

If we look at the mass per metallicity meta data of our population, we can see a few interesting things already.

In [None]:
BBH_pop.mass_per_metallicity

Out of 1 million binaries, we only got several hundred merging BBH systems (in this case, about 0.06% of the total population). So, these do indeed seem like pretty rare systems to find, just like what we see in nature.

There are many things that we could do at this point, and we will leave some for later in the week. For now, let's try and look at the mass distribution of our BBH systems. To start with, import `matplotlib.pyplot` for some basic plotting functionality.

In [None]:
import matplotlib.pyplot as plt
import numpy as np

%matplotlib inline
%config InlineBackend.figure_format = 'retina'

## 2.1 Plotting Distributions

You will often want to take a look at various distributions in your population. Such as their mass or orbital period distribution to compare to real data. Below, we will work on an example of how you might do this with your merging binary black hole population. POSYDON does have some built-in plotting functionalities that you will see later in the week. However, you will often be using Python's usual packages for making plots tailored to suit your specific research needs. You will have to run, load, and manipulate your data to analyze whatever it is you are interested in. You've already done the first couple of those steps, so in this next exercise you can practice with the last step: plotting and analyzing.

<div class='alert alert-success'>

#### Exercise 3: Plotting the BBH mass distribution

(a) Make a histogram of the primary black hole mass distribution at solar metallicity.  
(b) We have another population file available from 1 million binaries run at a lower metallicity. Load it in and export the BBH population.   
(c) Plot the mass distribution of the lower metallicity population, and compare to solar metallicity.  
(d) Note any differences and reasons for them.
    
</div>

#### Ex. 3: Part (a)

In [None]:
logic_statement = "(event == CO_contact) & (S1_state == BH) & (S2_state == BH)"

BBH_at_contact = BBH_pop.history.select(where=logic_statement)

In [None]:
# Might need to do this to avoid tex errors
import matplotlib as mpl
mpl.rcParams.update(mpl.rcParamsDefault)

In [None]:
BBH_m1 =  ### Extract primary BH mass ###
BBH_m2 =  ### Extract secondary BH mass ###

<div class="alert alert-warning" style="margin-top: 20px">
<details>

<b><summary>Solution (click to reveal): </summary></b>

```python
BBH_m1 = BBH_at_contact['S1_mass']
BBH_m2 = BBH_at_contact['S2_mass']
```
</details>
</div>

In [None]:
# run this to plot your mass distribution
f, ax = plt.subplots(1,1, figsize=(8,8))

ax.hist(BBH_m1, bins=30, alpha = 0.3, ec='k')
ax.set_ylabel("N")
ax.set_xlabel("Primary Merging BH Mass")

plt.show()

#### Ex. 3: Part (b) 

Load the low Z population and export the merging BBH subpopulation. You just have to figure out a logical statement you can use to get the merging BBH subpopulation.

In [None]:
path_to_pop = os.path.join(data_path, "1e-04_Zsun_population.h5")
lowZ_pop = Population(path_to_pop)

In [None]:
lowZ_pop.mass_per_metallicity

In [None]:
logic_statement = ### Fill this in ###

df = lowZ_pop.history.select(where=logic_statement)

BBH_indices = df.index.to_list()

<div class="alert alert-warning" style="margin-top: 20px">
<details>

<b><summary>Solution (click to reveal): </summary></b>

```python
logic_statement = "(event == CO_contact) & (S1_state == BH) & (S2_state == BH)" 

df = lowZ_pop.history.select(where=logic_statement)

BBH_indices = df.index.to_list()
```
</details>
</div>

In [None]:
lowZ_pop.history[BBH_indices[0]]

In [None]:
lowZ_pop.export_selection(BBH_indices, '1e-04_Zsun_BBH_contact_subpop.h5', append=True)

#### Ex. 3: Part (c)

Plot the low Z BBH merging primary mass distribution. You have to extract the BH masses using methods that you have learned.

In [None]:
lowZ_BBH_pop = Population('1e-04_Zsun_BBH_contact_subpop.h5')

In [None]:
lowZ_BBH_pop.mass_per_metallicity

In [None]:
logic_statement = "(event == CO_contact) & (S1_state == BH) & (S2_state == BH)"

lowZ_BBH_at_contact = lowZ_BBH_pop.history.select(where=logic_statement)

In [None]:
lowZ_BBH_m1 =  ### Extract primary BH mass ###
lowZ_BBH_m2 =  ### Extract secondary BH mass ###

<div class="alert alert-warning" style="margin-top: 20px">
<details>

<b><summary>Solution (click to reveal): </summary></b>

```python
lowZ_BBH_m1 = lowZ_BBH_at_contact['S1_mass']
lowZ_BBH_m2 = lowZ_BBH_at_contact['S2_mass']
```
</details>
</div>

In [None]:
# Run this to plot your mass distribution
f, ax = plt.subplots(1,1, figsize=(8,8))

bins = np.linspace(min(BBH_m1.min(), lowZ_BBH_m1.min()), max(BBH_m1.max(), lowZ_BBH_m1.max()), 50)

ax.hist(BBH_m1, bins=bins, alpha = 0.3, ec='k',density=True)
ax.hist(lowZ_BBH_m1, bins=bins, alpha = 0.3, ec='k',density=True)
ax.set_ylabel("N")
ax.set_xlabel("Primary Merging BH Mass")

plt.show()

#### Ex. 3: Part (d)

Note the differences between the populations. How do the populations that you see compare to results from the literature?

<div class="alert alert-info">
    
**Note:**
    
There's not much to do here in code, but take some time to think through what could be causing the differences in the two populations above and talk to your neighbors or the TAs (if they aren't too busy).
    
</div>

An analysis can get much more complicated, with all of the data available in POSYDON, and ideally we would run a much larger population in a real analysis. The other classes during the week will build on what you have learned here and go into more depth on the different evolution steps and analysis capabilities of POSYDON.


## 2.2 Inspecting Formation Channels

Now, let's take a look at how we can robustly determine the formation channels of our BBH subpopulation.

One thing that we have not seen is how you can calculate the formation channels of binaries in your population. While you can always look at the history of your binaries and get an idea of the formation channel yourself, POSYDON does have an automated way to pick out and sumamrize this information for you. Here is how:

In [None]:
# Calculate the formation channels for our BBH pop
BBH_pop.calculate_formation_channels()

After calculating the formation channels, they are automatically saved in your population's `.h5` file to access later. Anytime you load up your population now, you can access the formation channels like this:

In [None]:
BBH_pop.formation_channels

The `DataFrame` above shows you two columns `channel_debug` and `channel`; we want to pay attention to the one on the right. This column shows us the formation channel for each binary in our population, summarzing how it became a merger. For example, you should see in the first row that binary `0` formed like this:

1. ZAMS (they all start here)
2. The primary star initiates Roche lobe overflow (stable mass transfer)
3. The primary star reach core collapse
4. The secondary eventually initiates Roche lobe overflow (unstable mass transfer)
5. A common envelope (CE) occurs, initated by the secondary
6. The binary survives the CE, the secondary core collapses
7. The two black holes inspiral and merge

You can always refer back to the history for more details. Once again, for this particular binary:

In [None]:
BBH_pop.history[0][cols]

The binary is already so close together after the CE event that the secondary's supernova can not unbind the system; it forms a BH and the two merge. Below, we will practice how you can use these formation channels to analyze your population further.

<div class='alert alert-success'>

#### Exercise 4: BBH Formation Channels 

 (a) Look at the full, solar-metallicity population and find all the formation channels of BBH mergers. Which are the most common?  
 (b) Overplot the mass distributions of the different formation channels. Explain any physical relations or patterns you find.

Repeat the same steps for the lower metallicity population:

 (c) Find the formation channels of BBH mergers. Which ones dominate in this case?  
 (d) Overplot the mass distributions of the different formation channels.  
 (e) Compare your findings between the solar and sub-solar metallicity populations. How do the dominant formation channels change with metallicity, if at all?
    
</div>

#### Ex. 4: Part (a)

In [None]:
# Count all the binaries for each formation channel

from collections import Counter
Counter(BBH_pop.formation_channels['channel'])

Which are the most common BBH formation channels at solar metallicity?

<div class="alert alert-warning" style="margin-top: 20px">
<details>

<b><summary>Solution (click to reveal): </summary></b>
    
Most merging BBHs at solar metallicity require a double RLO and a CE event to form. The second most common formation channel is the double RLO (stable mass trasnfer, no CE).
    
</details>
</div>

#### Ex. 4: Part (b)

In [None]:
# Print and store the top two most common formation channels

counts = Counter(BBH_pop.formation_channels['channel'])
print(counts.most_common(2))

most_common_channel = counts.most_common(2)[0][0]
sec_most_common_channel = counts.most_common(2)[1][0]

In [None]:
# Get indices of the two formation channels

most_common_i = BBH_pop.formation_channels.loc[BBH_pop.formation_channels['channel'] == most_common_channel].index
sec_most_common_i = BBH_pop.formation_channels.loc[BBH_pop.formation_channels['channel'] == sec_most_common_channel].index

In [None]:
# Plot mass distributions corresponding to the top two formation channels

f, ax = plt.subplots(1,1, figsize=(8,8))

ax.hist(BBH_m1, bins=30, alpha = 0.3, ec='k')
ax.hist(BBH_m1[most_common_i], bins=10, alpha = 0.3, ec='k', label=f'{most_common_channel}')
ax.hist(BBH_m1[sec_most_common_i], bins=30, alpha = 0.3, ec='k', label=f'{sec_most_common_channel}')
ax.set_ylabel("N")
ax.set_xlabel("Primary Merging BH Mass")
ax.legend(loc=1)

plt.show()

Discuss the plot with a neighbor! How do the mass distributions differ? Can you guess why?

<div class="alert alert-warning" style="margin-top: 20px">
<details>

<b><summary>Solution (click to reveal): </summary></b>
    
Most merging BBHs form through the common envelope (CE) channel, which produces primary BHs tightly clustered around ~8–12 solar masses because the CE strips off most of the star’s envelope before collapse. The double stable mass transfer channel is less common, but it allows stars to keep more of their mass and make slightly heavier BHs. Overall, CE evolution is the dominant path, while stable transfer gives a rarer but broader range of BH masses.
    
</details>
</div>

#### Now, repeat all steps above for the lower metallicity population.

(Hint: We saved it above as `lowZ_BBH_pop`)

#### Ex. 4: Part (c)

Which are the most common formation channels for the lower metallicity population?

In [None]:
# First, before we call the formation_channels, we need to calculate it for the new population

### Fill this in ###

<div class="alert alert-warning" style="margin-top: 20px">
<details>

<b><summary>Solution (click to reveal): </summary></b>
    
```python
    
lowZ_BBH_pop.calculate_formation_channels()
    
```
</details>
</div>

In [None]:
# Print and store the top two most common formation channels

counts = Counter(lowZ_BBH_pop.formation_channels['channel'])
print(counts.most_common(2))

most_common_channel = counts.most_common(2)[0][0]
sec_most_common_channel = counts.most_common(2)[1][0]

#### Ex. 4: Part (d)

In [None]:
# Get indices of the top two formation channels

most_common_i = ### Fill this in ###
sec_most_common_i = ### Fill this in ###

<div class="alert alert-warning" style="margin-top: 20px">
<details>

<b><summary>Hint (click to reveal): </summary></b>
    
Look at Ex.4 part (b) on how to grab the indices.
    
</details>
</div>

<div class="alert alert-warning" style="margin-top: 20px">
<details>

<b><summary>Solution (click to reveal): </summary></b>
    
```python
    
most_common_i = lowZ_BBH_pop.formation_channels.loc[lowZ_BBH_pop.formation_channels['channel'] == most_common_channel].index
sec_most_common_i = lowZ_BBH_pop.formation_channels.loc[lowZ_BBH_pop.formation_channels['channel'] == sec_most_common_channel].index
    
```
</details>
</div>

In [None]:
# Plot mass distributions corresponding to the top two formation channels

f, ax = plt.subplots(1,1, figsize=(8,8))

ax.hist(lowZ_BBH_m1, bins=30, alpha = 0.3, ec='k')
ax.hist(lowZ_BBH_m1[most_common_i], bins=30, alpha = 0.3, ec='k', label=f'{most_common_channel}')
ax.hist(lowZ_BBH_m1[sec_most_common_i], bins=30, alpha = 0.3, ec='k', label=f'{sec_most_common_channel}')
ax.set_ylabel("N")
ax.set_xlabel("Primary Merging BH Mass")
ax.legend(loc=1)

plt.show()

#### Ex. 4: Part (e)

Comment on your results. How do the primary BH mass distributions at low metallicity compare to those at solar metallicity, and what physical reasons might explain the differences?

<div class="alert alert-warning" style="margin-top: 20px">
<details>

<b><summary>Solution (click to reveal): </summary></b>
        
At low metallicity, the primary BH masses extend to much higher values than in the solar-metallicity population. This is because weaker stellar winds in low-metallicity environments allow massive stars to form larger He cores before the CE phase, thus forming heavier black holes. Both the CE channel and the double stable mass-transfer channel are still present, but compared to solar metallicity their distributions shift toward higher masses and broaden.
    
</details>
</div>