Merge pull request #321 from berkeley-stat159/master

merge from master to final2
berkeley-stat159 · Dec 16, 2015 · ebb4751 · ebb4751
2 parents b436d84 + e2417f5
commit ebb4751
Show file tree

Hide file tree

Showing 26 changed files with 404 additions and 152 deletions.
diff --git a/Makefile b/Makefile
@@ -1,4 +1,4 @@
-.PHONY: all clean coverage test verbose
+.PHONY: all clean coverage test verbose data
 
 all: clean
 
@@ -15,35 +15,57 @@ verbose:
 	nosetests data/tests code/utils/tests -v
 
 data:
-	cd data && make download_data
+	cd data && make download_all
 
+validate_data:
+	cd data && make validate_data
+
+eda:
+	cd code/utils/scripts && python eda.py
+	cd code/utils/scripts && python hist-outliers_script.py
 
 linear:
-	python code/stat159epsilon/code/utils/scripts/linear_regression_script.py
+	cd code/utils/scripts && python linear_regression_script.py
 
-lostic:
-	python code/stat159epsilon/code/utils/scripts/log_regression_script.py
+logistic:
+	cd code/utils/scripts && python log_regression_script.py
 
 t-test:
-	python code/stat159epsilon/code/utils/scripts/t_test_plot_script.py
+	cd code/utils/scripts && python t_test_plot_script.py
 
 convolution-high:
-	python code/stat159epsilon/code/utils/scripts/convolution_high_res_script.py
+	cd code/utils/scripts && python convolution_high_res_script.py
 
 convolution-normal:
-	python code/stat159epsilon/code/utils/scripts/convolution_normal_script.py
+	cd code/utils/scripts && python convolution_normal_script.py
 
 correlation:
-	python code/stat159epsilon/code/utils/scripts/correlation_script
+	cd code/utils/scripts && python correlation_script.py
+
+glm:
+	cd code/utils/scripts && python glm_script.py
+
+noise-pca:
+	cd code/utils/scripts && python noise-pca_script.py
+	cd code/utils/scripts && python noise-pca_filtered_script.py
 
 multi-comparison:
-	python code/stat159epsilon/code/utils/scripts/multi_comparison_script.py
+	cd code/utils/scripts && python multi_beta_script.py
+	cd code/utils/scripts && python multi_comparison_script.py
 
 all-analysis:
+	make eda 
 	make linear
 	make logistic
-	make t-test
 	make convolution-high
 	make convolution-normal
+	make t-test
+	make glm
 	make correlation
+	make noise-pca 
 	make multi-comparison
+
+report:
+	cd paper && make all
+
+
diff --git a/README.md b/README.md
@@ -9,28 +9,56 @@
 _**Topic:**_ [The Neural Basis of Loss Aversion in Decision-Making Under Risk] 
 
 ## Overview
-This repository attempts to reproduce the original analysis on "The Neural Basis of Loss Aversion in Decision-Making Under Risk" done by Sabrina M. Tom, Craig R. Fox, Christopher Trepel, Russell A. Poldrack. The imaging data were collected using the fMRI method. They were processed and analyzed in order to identify the regions of the brain activated by the decision making process. This study also investigated the relationship between the brain activity and the behavior of the subjects towards the gambling situations using a whole-brain robust regression analysis. 
+This repository attempts to reproduce the original analysis on 
+"The Neural Basis of Loss Aversion in Decision-Making Under Risk" 
+done by Sabrina M. Tom, Craig R. Fox, Christopher Trepel, Russell A. Poldrack. 
+The imaging data were collected using the fMRI method. They were processed 
+and analyzed in order to identify the regions of the brain activated by the 
+decision making process. This study also investigated the relationship between 
+the brain activity and the behavior of the subjects towards the gambling situations 
+using a whole-brain robust regression analysis. 
 Please follow the insturctions to explore on the repository.
 
-
 ## Directions
 1. Clone the repo: `git clone https://github.com/berkeley-stat159/project-epsilon.git'
 2. Install python dependencies with pip: `pip install -r requirements.txt` 
 
 ### Navigation
- - Data `make data` : Downloads the ds005 dataset including brain scan images of totl 16 subjects
+ - Data `make data` : Downloads the ds005 dataset including brain scan images of total 
+ 16 subjects. When rin from this repository, this commend will download the raw data and
+ the filerted data provided. The total size of the file is ~17GB.
 
- - Validate `make validate` : Validates the downloaded data to check if it's ok to run on 
+ - Validate `make validate_data` : Validates the downloaded data 
 
- - All Analysis `make all-analysis` : Executes all analysis and creates relevant img files under fig/ folder
+ - Clean `make clean` : remove compiled python files
 
  - Test `make test` : Tests the functions in code/utils folder
 
  - Coverage `make coverage` : Creates a coverage report for the functions in code/utils/ folder
 
  - Verbose `make verbose` : Tests the functions in code/utils folder via nosetests option
 
- - Report `make verbose` : Creates a report in pdf file under paper/ folder
+ - Report `make report` : Creates final_report.pdf under paper/
+
+ - All Analysis `make all-analysis` : Executes all analysis and creates relevant 
+ img files under fig/ folder
+ 	 - NOTICE : `make multi-comparison` will run about for 1 hour because it has to generate all the betav values for each single voxel for each subject over time-course
+
+	 - If you want to perform each analysis, please be aware of the following dependencies:
+	   - noise-pca (prerequisites: convolution)
+	   - noise-pca_filtered (prerequisites: convolution, download_all)
+
+ - Each of analysis :
+   - `make eda` 
+   - `make linear`
+   - `make logistic`
+   - `make convolution-high`
+   - `make convolution-normal`
+   - `make t-test`
+   - `make glm`
+   - `make correlation`
+   - `make noise-pca`
+   - `make multi-comparison`
 
 ## Contributors
 Min Gu Jo ([`mingujo`](https://github.com/mingujo))\\

diff --git a/code/README.md b/code/README.md
@@ -0,0 +1,10 @@
+### Navigation
+This directory comports the `utils` directory including three subdirectories:
+ - `scripts`: find all the scripts written for the data analysis in this directory
+ in addition to ploting function that were not included in the `functions` because
+ not testable.
+ - `functions`: The functions have been writtent to be used in the scripts for the 
+ data analysis. Each function in this directory has a test associated with it
+ - `test`: this directory includes the tests written for the functions from the
+ `functions` directory. 
+
diff --git a/code/utils/functions/linear_regression.py b/code/utils/functions/linear_regression.py
@@ -17,19 +17,19 @@
 from logistic_reg import *
 
 
-"""
-Parameters:
+def load_data(subject, data_dir = "/Users/macbookpro/Desktop/stat159_Project/"):
+
+	"""
+	Parameters:
 
-	subject: 1 - 16
-	data_dir: The working directory that you store your data
+		subject: 1 - 16
+		data_dir: The working directory that you store your data
 
-Return:
-	
-	run_total: the data for 3 runs
+	Return:
+		
+		run_total: the data for 3 runs
 
-"""
-def load_data(subject, data_dir = "/Users/macbookpro/Desktop/stat159_Project/"):
-
+	"""
 	# Get the directory where data is stored
 	data_location = data_dir + 'ds005/sub' + subject
 
@@ -45,22 +45,22 @@ def load_data(subject, data_dir = "/Users/macbookpro/Desktop/stat159_Project/"):
 	return(run_total)
 
 
-"""
-To perform linear regression
 
-Parameters:
+def linear_regression(data, y, *arg):
+	"""
+	To perform linear regression
 
-	data: The dataset that contains variables
-	y: Dependent variable
-	args: Explanatory variable(s)
+	Parameters:
 
-Return:
-	beta: The coefficients for explantatory variables
-	pvalues: The pvalues for each explantatory variables
+		data: The dataset that contains variables
+		y: Dependent variable
+		args: Explanatory variable(s)
 
-"""
-def linear_regression(data, y, *arg):
+	Return:
+		beta: The coefficients for explantatory variables
+		pvalues: The pvalues for each explantatory variables
 
+	"""
 	# Get the length of data
 	n = len(data)
 

diff --git a/code/utils/scripts/eda.py b/code/utils/scripts/eda.py
@@ -0,0 +1,101 @@
+"""
+This script plots some exploratory analysis plots for the raw and filtered data:
+    - Moisaic of the mean voxels values for each brain slices
+
+Run with:
+    python eda.py 
+    from this directory
+"""
+from __future__ import print_function, division
+import sys, os, pdb
+import numpy as np
+import matplotlib.pyplot as plt
+import nibabel as nib
+sys.path.append(os.path.join(os.path.dirname(__file__), "./"))
+from plot_mosaic import *
+from mask_filtered_data import *
+
+# Locate the paths
+project_path = '../../../'
+data_path = project_path+'data/ds005/' 
+path_dict = {'data_filtered':{ 
+			      'type' : 'filtered',
+			      'feat' : '.feat',
+			      'bold_img_name' : 'filtered_func_data_mni.nii.gz',
+			      'run_path' : 'model/model001/'
+			     },
+             'data_original':{
+		       	      'type' : '',
+			      'feat': '',
+                              'bold_img_name' : 'bold.nii.gz',
+                              'run_path' : 'BOLD/'
+			     }}
+
+#subject_list = [str(i) for i in range(1,17)]
+#run_list = [str(i) for i in range(1,4)]
+
+# Run only for subject 1 and 5 - run 1
+run_list = [str(i) for i in range(1,2)]
+subject_list = ['1', '5']
+
+# set gray colormap and nearest neighbor interpolation by default
+plt.rcParams['image.cmap'] = 'gray'
+plt.rcParams['image.interpolation'] = 'nearest'
+
+# Create the needed directories if they do not exist
+dirs = [project_path+'fig/',\
+        project_path+'fig/BOLD']
+for d in dirs:
+    if not os.path.exists(d):
+        os.makedirs(d)
+
+# Template to plot the unmasked filetered data
+template_path = project_path+'data/mni_icbm152_t1_tal_nlin_asym_09c_2mm.nii'
+
+# Loop through the data type - raw or filtered
+for dat in path_dict:
+    d_path = path_dict[dat]
+    # Set the data name and paths
+    images_paths = [('ds005' + '_sub' + s.zfill(3) + '_t1r' + r, \
+                     data_path + 'sub%s/'%(s.zfill(3)) + d_path['run_path'] \
+                     + 'task001_run%s%s/%s' %(r.zfill(3),d_path['feat'],\
+    		     d_path['bold_img_name'])) \
+                     for r in run_list \
+                     for s in subject_list]
+    for image_path in images_paths:
+        name = image_path[0]
+        # Plot
+        if d_path['type']=='filtered':
+	    data = nib.load(image_path[1]).get_data()
+	    data = data.astype(float)
+            mean_data = np.mean(data, axis=-1)
+            Transpose=False
+	    template_data = nib.load(template_path).get_data()
+	    plt.imshow(\
+	        plot_mosaic(template_data, transpose=Transpose), \
+	        cmap='gray', alpha=1)
+	else:
+            img = nib.load(image_path[1])
+            data = img.get_data()
+	    data = data.astype(float)
+            mean_data = np.mean(data, axis=-1)
+	    in_brain_mask = mean_data > 375
+            mean_data = np.mean(data, axis=-1)
+            Transpose=True
+            plt.contour(\
+	        plot_mosaic(in_brain_mask, transpose=Transpose), \
+		            cmap='gray' , alpha=1)
+        plt.imshow(\
+	plot_mosaic(mean_data, transpose=Transpose), cmap='gray', alpha=1)
+        plt.colorbar()
+        plt.title('Voxels mean values' + '\n' +  (d_path['type'] + str(name)))
+        plt.savefig(project_path+'fig/BOLD/%s_mean_voxels.png'\
+                    %(d_path['type'] + str(name)))
+        #plt.show()
+        plt.clf()
+
+
+print("======================================")
+print("\nEDA analysis done")
+print("Mosaic plots in project_epsilon/fig/BOLD/ \n\n")
+
diff --git a/code/utils/scripts/hist-outliers_script.py b/code/utils/scripts/hist-outliers_script.py
@@ -88,9 +88,10 @@
     plt.plot([0, N], [thresholds[0], thresholds[0]], ':', label='IQR lo')
     plt.plot([0, N], [thresholds[1], thresholds[1]], ':', label='IQR hi')
     plt.title('voxels std ' + str(name))
+    plt.xlabel('time')
     plt.legend(fontsize=11, \
     ncol=2, loc=9, borderaxespad=0.2)
-    plt.savefig(project_path+'fig/outliers/vol_std_' + str(name) + '.png')
+    plt.savefig(project_path+'fig/outliers/%s_vol_std.png' %str(name))
     plt.close()
 
     #RMS difference values
@@ -103,28 +104,32 @@
     plt.plot(x[rms_outliers], rms_dvals[rms_outliers], 'o', label='outliers')
     plt.plot([0, N], [rms_thresholds[0], rms_thresholds[0]], ':', label='IQR lo')
     plt.plot([0, N], [rms_thresholds[1], rms_thresholds[1]], ':', label='IQR hi')
-    plt.title('voxels rms difference' + str(name))
-    plt.legend()
-    plt.savefig(project_path+'fig/outliers/vol_rms_outliers_' + str(name) + '.png')
+    plt.title('voxels rms difference ' + str(name))
+    plt.xlabel('time')
+    plt.legend(fontsize=11, \
+    ncol=2, loc=9, borderaxespad=0.2)
+    plt.savefig(project_path+'fig/outliers/%s_vol_rms_outliers.png'%str(name))
     plt.close()
     #Label the outliers
     T = data.shape[-1]
     ext_outliers = diagnostics.extend_diff_outliers(rms_outliers)
-    np.savetxt(project_path+'txt_output/outliers/extended_vol_rms_outliers_' + \
-      str(name) + '.txt', ext_outliers)
+    np.savetxt(project_path+'txt_output/outliers/%s_extended_vol_rms_outliers.png' \
+      %str(name), ext_outliers)
     x = np.arange(T)
     rms_dvals_ext = np.concatenate((rms_dvals, (0,)), axis=0)
     plt.plot(rms_dvals_ext, label='vol RMS differences ' + str(name))
     plt.plot(x[ext_outliers], rms_dvals_ext[ext_outliers], 'o', label='outliers')
     plt.plot([0, N], [rms_thresholds[0], rms_thresholds[0]], ':', label='IQR lo')
     plt.plot([0, N], [rms_thresholds[1], rms_thresholds[1]], ':', label='IQR hi')
-    plt.legend()
+    plt.xlabel('time')
+    plt.legend(fontsize=11, \
+    ncol=2, loc=9, borderaxespad=0.2)
     plt.savefig(project_path+\
-    'txt_output/outliers/%s_extended_vol_rms_outliers.png'%str(name))
+    'fig/outliers/%s_extended_vol_rms_outliers.png'%str(name))
     plt.close()
 
 print("=============================")
 print("\nHistograms and outliers plots generated")
-print("\nSee project-epsilon/fig/histograms")
-print("\nSee project-epsilon/fig/outliers")
+print("See project-epsilon/fig/histograms")
+print("See project-epsilon/fig/outliers\n\n")