<a href="https://colab.research.google.com/github/Raghuram-Veeramallu/Astro_Stat_Project2/blob/development/Group3_Project2_DataSummary.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## AST/STAT 5731 Project 2
### Research Synopsis
**Group 3**   
Daniel Warshofsky  
Hari Veeramallu  
Jacynda Alatoma  
Nicholas Kruegler

#### Research Question

*Synopsis here*


### Preamble

### Data   

Raw data is available at [Google Drive](https://drive.google.com/file/d/1v6LSAKvkuEjahtOWDNq3riBMLkD7rZD0/view)

##### 1. Loading the data

In [13]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [5]:
# download the dataset to colab / local environment

## for google colab
!gdown 1v6LSAKvkuEjahtOWDNq3riBMLkD7rZD0

## for local environment
# !gdown https://drive.google.com/uc?id=1v6LSAKvkuEjahtOWDNq3riBMLkD7rZD0

Downloading...
From: https://drive.google.com/uc?id=1v6LSAKvkuEjahtOWDNq3riBMLkD7rZD0
To: /content/snaeT1.tsv
  0% 0.00/47.4k [00:00<?, ?B/s]100% 47.4k/47.4k [00:00<00:00, 64.4MB/s]


In [6]:
## Load the data

# NOTE: change the data path to the download path mentioned in the cell above
# This is typically your downloads folder
data = pd.read_csv('/content/snaeT1.tsv', sep='\t')

##### 2. Looking at the data

In [7]:
data.head(5)

Unnamed: 0,zcmb,zhel,mb,e_mb,x1,e_x1,c,e_c,logMst
0,0.503,0.5043,23.002,0.088,1.273,0.15,-0.012,0.03,9.517
1,0.581,0.582,23.574,0.09,0.974,0.274,-0.025,0.037,9.169
2,0.495,0.496,22.96,0.088,-0.729,0.102,-0.1,0.03,11.58
3,0.346,0.347,22.398,0.087,-1.155,0.113,-0.041,0.027,10.821
4,0.678,0.679,24.078,0.098,0.619,0.404,-0.039,0.067,8.647


**Column Description:**.

| Column  | Description                 |
| ------- | --------------------------- |
| zcmb    | CMB Frame Redshift          |
| zhel    | Heliocentric Redshift       |
| mb      | B band peak magnitude (in mB)|
| e_mb    | Error in mb                 |
| x1      | SALT2 shape (stretch) parameter |
| e_x1    | Error in x1                 |
| c       | SALT2 color parameter       |
| e_c     | Error in c                  |
| logMst  | $Log_{10}$ Host Stellar Mass|

In [9]:
data.describe()

Unnamed: 0,zcmb,zhel,mb,e_mb,x1,e_x1,c,e_c,logMst
count,740.0,740.0,740.0,740.0,740.0,740.0,740.0,740.0,740.0
mean,0.32387,0.324449,20.904961,0.115899,0.036589,0.311096,-0.023878,0.038897,9.795342
std,0.276893,0.276748,2.655139,0.018767,0.988554,0.211399,0.084118,0.015694,1.395068
min,0.01,0.0094,14.148,0.085,-2.863,0.018,-0.25,0.012,5.0
25%,0.124,0.124425,19.7355,0.102,-0.65025,0.1485,-0.08425,0.026,9.31675
50%,0.229,0.2305,21.099,0.114,0.1655,0.268,-0.0305,0.035,10.1935
75%,0.498,0.499,23.043,0.124,0.77575,0.42,0.031,0.05,10.71025
max,1.299,1.3,26.047,0.175,2.337,1.641,0.26,0.107,11.817


##### 3. Converting the shape and color paramters to distance

Since we want to estimate the relationship between the Distance and the Redshift (CMB), we need to compute the distance from the parameters that we have available to us.

Distance can be computed through Distance Modulii $\mu$, defined as
$$ \mu = m_{B}^{*} - (M_{B} - \alpha X_{1} + \beta C)$$


where,  
$m_{B}^{*}$ is the observed peak magnitude in rest frame B band.  
$M_{B}$ is the absolute magnitude of the supernova (adjusted based on host galaxy's properties). It depends on the host galalxy's stellar mass ($M_{stellar}$).   
$$M_{B} = \begin{cases} M_{B}^{1} & \text{if $M_{stellar} < 10^{10} M_{\odot}$} \\
M_{B}^{1} + \Delta_{m} & \text{otherwise}
\end{cases}$$   
here, $M_{\odot}$ is the mass of the Sun.   
$\alpha, \beta$ are the nuisance parameters that account for the shape of the light curve ($X_{1}$) and the color of the supernova ($C$) respectively.  
$\beta, M_{B}$ are found to be dependent on the host galaxy properties.  


Modulus can be converted into distance using $ d_{L} = 10^{\mu/5 + 1} $.  
($\mu = 5 log_{10}{(d_{L} /10 pc)}$ according to the literature). **TODO: add link here**

Since there are errors associated with each measurement, we need to estimate the distance errors as well.   

Error parameter $\sigma_{\mu}$ = $\sqrt{\sigma_{m_{b}}^{2} + (\alpha \sigma_{X_{1}})^2 + (\beta \sigma_{C})^2}$.  
where $\sigma_{m_{b}}, \sigma_{X_{1}}, \sigma_{C}$ refer to the error terms of $m_{b}, X_{1}$ and $C$ (`e_mb`, `e_x1` and `e_c` from the dataset).  

Error in distance measurements
$$\sigma_{d} = \frac{\sigma_{\mu} . log(10) . |d|}{5}$$

From the literature the values used are, $\alpha = 0.141, \beta = 3.101, M_{B}^{1} = -19.05$ and $\Delta_{M} = -0.07$ (according to the C11 analysis)

The absolute magnitude of type Ia in B-band is consistent, typically around $-19.3 \pm 0.3 $ in magnitude.

In [11]:
# defining the nuiscance parameters
alpha = 0.141
beta = 3.101
MB1 = -19.05
DeltaM = -0.07

In [12]:
# MB
data['MB'] = MB1 + (DeltaM * (data['logMst'] > 10))

# mu
data['dist_moduli'] = data['mb'] - (data['MB'] - alpha * data['x1'] + beta * data['c'])

# distance
data['d'] = 10 ** (data['dist_moduli']/5 + 1)

In [14]:
# distance modulii (mu) error
data['e_mu'] = np.sqrt(data['e_mb'] ** 2 + (alpha * data['e_x1']) ** 2 + (beta * data['e_c']))

# distance error
data['e_d']= (data['e_mu'] * np.log(10) * np.abs(data['d']))/5

In [15]:
data.head(5)

Unnamed: 0,zcmb,zhel,mb,e_mb,x1,e_x1,c,e_c,logMst,MB,dist_moduli,d,e_mu,e_d
0,0.503,0.5043,23.002,0.088,1.273,0.15,-0.012,0.03,9.517,-19.05,42.268705,2842765000.0,0.318153,416507400.0
1,0.581,0.582,23.574,0.09,0.974,0.274,-0.025,0.037,9.169,-19.05,42.838859,3696339000.0,0.352604,600212100.0
2,0.495,0.496,22.96,0.088,-0.729,0.102,-0.1,0.03,11.58,-19.12,42.287311,2867228000.0,0.317775,419592200.0
3,0.346,0.347,22.398,0.087,-1.155,0.113,-0.041,0.027,10.821,-19.12,41.482286,1979052000.0,0.302572,275760300.0
4,0.678,0.679,24.078,0.098,0.619,0.404,-0.039,0.067,8.647,-19.05,43.336218,4647759000.0,0.469698,1005328000.0


In [19]:
# selecting only the required parameters from here
df = data[['zcmb', 'd', 'e_d']]
df.describe()

Unnamed: 0,zcmb,d,e_d
count,740.0,740.0,740.0
mean,0.32387,1881246000.0,351420300.0
std,0.276893,1872763000.0,397231500.0
min,0.01,41203570.0,6331576.0
25%,0.124,592337300.0,84478470.0
50%,0.229,1150148000.0,195752400.0
75%,0.498,2828445000.0,456350700.0
max,1.299,9490109000.0,2500113000.0
