Merge c3377c0 into 20a3b16

berkeley-stat159 · Nov 11, 2015 · fb688fa · fb688fa
2 parents 20a3b16 + c3377c0
commit fb688fa
Show file tree

Hide file tree

Showing 5 changed files with 196 additions and 0 deletions.
diff --git a/paper/sections/abstract.tex b/paper/sections/abstract.tex
diff --git a/paper/sections/discussion.tex b/paper/sections/discussion.tex
@@ -0,0 +1,3 @@
+% Discussion Section of Paper Report
+
+\par \indent Our initial analysis...
diff --git a/paper/sections/introduction.tex b/paper/sections/introduction.tex
diff --git a/paper/sections/plan.tex b/paper/sections/plan.tex
@@ -0,0 +1,154 @@
+\documentclass[11pt]{article}
+
+\usepackage[margin=0.75in]{geometry}
+
+\title{The title of your project proposal}
+\author{
+  Qiu, Brian\\
+  \texttt{brianqiu}
+  \and
+  Hsieh, Benjamin\\
+  \texttt{BenjaminHsieh}
+  \and
+   Chang, Siyao \\
+  \texttt{changsiyao}
+  \and
+  Gong, Boying\\
+  \texttt{boyinggong}
+  \and
+  Zhu, Jiang\\
+  \texttt{pigriver123}
+}
+
+\bibliographystyle{siam}
+
+\begin{document}
+\maketitle
+
+\abstract{You should have a short abstract.}
+
+\section{Introduction}
+
+Identify a published fMRI paper and the accompanying data
+\cite{lindquist2008statistical}.  You should explain the basic idea of the
+paper in a paragraph.  You should also perform basic sanity check on the data
+(e.g., can you downloaded, can you load the files, confirm that you have the
+correct number of subjects).
+
+Briefly explain what reproducibility means and in what sense you will
+try to reproduce this study.
+
+\section{Data}
+
+\section{Plan}
+
+\subsection{Models and analysis}
+
+\subsubsection{Behavioral analysis}
+
+We fit a Logistic regression model on the behavioral data to examine how the response of individuals relates to the size of potential gain and loss of a gamble. Following is the model:
+
+\begin{equation}
+logit(Y_{resp}) = \beta_0 + \beta_{loss} *X_{loss} + \beta_{gain} * X_{gain}  + \epsilon
+\end{equation}
+
+where $X_{loss}$ and $X_{gain}$ are the potential loss and gain value seperately, $Y_{resp}$ is a categorical independent variable representing the subjects' decision on whether to accept or reject the gambles:
+
+\begin{displaymath}
+Y_{resp} = \left \{ \begin{array}{ll}
+1 & \textrm{If the subject accepted the gamble.} \\
+0 & \textrm{If the subject rejected the gamble.}
+\end{array} \right .
+\end{displaymath}
+
+Then we calculate the the behavioral loss aversion ($ \lambda $) for each subject as follows, note that for simplicity, we collapse 3 runs into one model for each participant.
+
+\begin{equation}
+\lambda = -\beta_{loss} / \beta_{gain}
+\end{equation}
+
+We use $\lambda$ as the metric for the degree of loss aversion for each participant. We have used R to fit the Logistic model, just as what the authors did in the paper, and we achieved almost the same results as the paper presented.
+
+\subsubsection{Linear Regression on BOLD data}
+
+For each voxel $i$, we fit a multiple linear model:
+
+\begin{equation}
+Y_{i} = \beta_{i, 0} + \beta_{i, loss} *X_{loss} + \beta_{i, gain} * X_{gain}  + \epsilon_i
+\end{equation}
+
+where $Y_{i}$ is the BOLD data of voxel $i$. For each voxel, we calculate the neural loss aversion $\eta_i$:
+
+\begin{equation}
+\eta_i = (-\beta_{loss}) - \beta_{gain}
+\end{equation}
+
+Using the voxelwise neural loss aversion, we do a region-specific analysis on BOLD data for each participant. That is, we plot a heat map of $\eta_i$ and  $\beta_{i, loss}$, $ \beta_{i, gain}$ for each participant to find out the regions with significant activation and regions which show a significant positive or negative correlation with increasing loss or gain levels.
+
+\subsubsection{Whole brain analysis of correlation between neural activity and behavioral response across participants}
+
+We then apply the above model on the standard brain to analysis the neural activity and behavioral response across participants. For each participant, we pick up several regions with highest activation level, calculate the mean neural loss aversion $\bar{\eta}$ within these specific region. Thus we could examine the relationship between neural activity and behavioral using the following regression model:
+
+\begin{equation}
+\lambda = \alpha_0 + \alpha_1 * \eta + \epsilon
+\end{equation}
+
+where the sample size is the number of participants(16).
+
+
+\subsection{Explanation on model simplification}
+
+\subsubsection{Use of Data}
+\indent \indent First of all, for simplicity reasons, we are not using all the regressors the paper used. The model in the paper performed regression on the BOLD data with gain, loss and euclidean distance to indifference. In our model, we are leaving out the regressor euclidean distance to indifference. The paper and its supplement material didn't document the exact way the authors calculated this parameter; we are having a hard time reproducing this parameter. Therefore, we decide to leave out this parameter when doing our own regression.
+
+\subsubsection{Simplification of regression on BOLD data}
+\indent \indent We plan on simplifying the model on neural data. In the original data analysis, the authors performed a mixed effect model when regressing the potential gain and loss values against the BOLD data across runs, since there are three different runs for each subject and the authors were trying to incorporate all three runs into one model. The mixed effect model adds a random effects term, which is associated with individual experimental units drawn at random from a population. In this case, it measures the difference between the average brain activation in run i and the average brain activation in all three runs.
+
+We are simplifying the model because it is much easier to perform a simple linear regression in python. In addition, we do not have a great deal of understanding of fMRI data, so simple linear model would suffice when we are only performing exploratory data analysis and looking for obvious pattern in the data.
+
+After looking at the initial result from our linear regression model, we can decide whether we want to further explore the relationship between the dependent variable (BOLD data) and the independent variables (gain and loss) and whether we want to continue to fit a mixed effect model.
+
+
+\subsection{Issues with analyses and potential solutions}
+\subsubsection{Selecting specific regions to further explore correlation between neural and behavioral activity}
+\indent \indent Since we have no knowledge on the sections of brain that might experience large difference in activation, it is hard for us to pick the regions to deeper explore the correspondence between neural and behavioral loss aversion.
+
+There are two potential ways to deal with this issue. The first one is to read more paper and related articles to learn which parts of the brain are likely to react in our given scenario -- faced with potential gain and loss combinations. Another way to deal with the issue to to fit a regression for every part of the brain and look for the areas with higher correspondence (higher slope). Then, we select and graph a few areas with the most significant positive or negative correlation between the parametric response to potential losses and behavioral loss aversion (ln(λ)) across participants.
+
+
+\subsubsection{Producing heat map}
+\indent \indent Another issue that we are facing during our project is finding the same region to plot for each participant. We see that each region of the brain has its own standard coordinates. However, without much knowledge of fMRI, we are not sure how to use these standard coordinates to locate the regions of the brain.
+
+From our understanding, each subject's brain is mapped onto a standard brain and we then use the coordinates for the standard brain to extract data from the areas we are interested in. However, currently, we don't have the skill to perform this step.
+
+\subsubsection{Further Research}
+
+We fit a linear regression model combining behavioral and BOLD data to examine the relationship of correlation between neural activity and behavioral response, we use another method which is different from what is mentioned in the paper. We add the behavioral response to the regression model on BOLD data as a predictor. We use the original 4-level response as stated below. \\ 
+
+\begin{tabular}{lllll}
+\hline
+behavioral response & strongly accept & weakly accept & weakly reject & strongly reject\\ 
+\hline
+$X_{behav}$ & 1 & 2 & 3 & 4 \\
+\hline
+\end{tabular}
+
+And the models are following:
+
+\begin{equation}
+Y_{i} = \beta_{i, 0} + \beta_{i, behav} * X_{behav} + \epsilon_i
+\end{equation}
+
+However, since the response and level of loss and gain are potentially correlated, we might need to use stepwise regression to choose the best predictor from the regression model presented above.
+
+\subsubsection{Inferences on Data}
+\indent \indent After fitting regression models on our BOLD and behavior data, we would try assessing and validating our models. In order to do this, we would calculate for the residual sum of squares for our model. We could also calculate the t-statistic or p-value for our beta coefficients to check whether our beta parameters are statistically significant at a significance level of 5\%.
+
+
+\section{Results}
+\section{Discussion}
+
+
+\bibliography{project}
+
+\end{document}
diff --git a/slides/progress.md b/slides/progress.md
@@ -34,3 +34,42 @@
 ## Statistical Analysis
 
 - linear model
+
+# Our Process
+
+## Hardest part of process?
+- Working with the FMRI data and trying to understand our paper
+- Keeping up with documentation 
+
+## Success in overcoming these obstacles?
+- Using git workflows to raise issue and problems for the group.
+- Limited success in the FMRI part, still figuring things out. 
+
+# Our Process Part 2
+## Issues facing the team?
+- Debugging each other's code when travis CI fails in a pull request
+- Addressing this by meeting up for teamwork or ask for help
+
+## Most helpful?
+- Python/numpy, lab sections with git workflows
+
+## Most confusing?
+- FMRI lectures
+
+# Our Process Part 3
+
+## What do you need to do to successfully complete the project?
+- Have a clear idea of what we can get done
+- Make work as reproducible as possible
+
+## Diffuculty in reproducibility?
+- Very frustrating if Travis fails on a pull request
+- Remembering to write documentation in the scripts
+- Test functions for plotting functions are hard to write/assert
+- With a lot of work we may be able to get most of it reproducible
+
+## Remaining weeks:
+- Mostly unstructured time would be helpful
+- Could cover:
+  - Software tools like statmodels, make
+  - More regression: linear and logistic