Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fcma partial sim matrix #168

Merged
merged 8 commits into from Jan 26, 2017
Merged

Conversation

yidawang
Copy link
Member

add partial similarity matrix algorithm in correlation-based classification
add the corresponding test code
rename some methods
improve the example code

@@ -245,6 +303,12 @@ def fit(self, X, y):
and prepared for correlation computation.
assuming all elements of X has the same num_voxels value
y: labels, len(X) equals len(Y)
num_training_samples: int, default None
the number of samples that used in the training,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: "that used".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -79,27 +91,43 @@ class Classifier(BaseEstimator):

num_samples_: int
The number of samples of the training set

num_digits_: int
The number of digit of the first value of the kernel matrix,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: "number of digit".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Used for SVM with precomputed kernel,
every time only compute correlation between num_process_voxels and
the whole brain to aggregate the kernel matrices.
This is to better use the memory
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean to save memory? Also, please write sentences in docstrings, including full stops and all other punctuation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, computing similarity matrices portion by portion can save the memory so as to handle correlations at a larger scale.

@@ -57,7 +63,10 @@ class Classifier(BaseEstimator):
default None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Attributes are not set during initialization. They are not None.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleted this line

@@ -57,7 +63,10 @@ class Classifier(BaseEstimator):
default None
training_data\_ is None except clf is SVM.SVC with precomputed kernel,
in which case training data is needed to compute
the similarity vector for each sample to be classified
the similarity vector for each sample to be classified.
However, if the test samples are also provided during the fit,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a paragraph at the beginning of the docstring explaining how testing data can be passed to fit as an optimization.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

subsequent operations, e.g. getting decision values of the prediction
subsequent operations, e.g. getting decision values of the prediction.
test_data\_ may also be set in the fit method
if SVM.SVC with precomputed kernel and the test samples are known.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the full package name in docstrings for modules outside BrainIAK, i.e., sklearn.svm.SVC.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Parameters
----------
X: a list of numpy array in shape [num_TRs, num_voxels]
len(X) is the number of samples
assuming all elements of X has the same num_voxels value
start_voxel: int, default 0
the starting voxel id for correlation computation
num_voxels_1: int, default None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use a descriptive name.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

which is set when the similarity matrix is constructed
portion by portion so the similarity vectors of the
test data have to be computed here.
If it is set, only those samples will be used to fit the model
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Document also that this only applies to SVMs with precomputed kernels, where it must be present.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -27,25 +28,32 @@
logging.basicConfig(level=logging.INFO, format=format, stream=sys.stdout)
logger = logging.getLogger(__name__)

def example_of_aggregating_sim_matrix(raw_data, labels):
# aggregate the similarity matrix to save memory
use_clf = svm.SVC(kernel='precomputed', shrinking=False, C=1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use_clf seems like a Boolean name. How about svm_clf?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@mihaic mihaic merged commit ae41eba into brainiak:master Jan 26, 2017
@yidawang yidawang deleted the fcma_partial_sim_matrix branch January 26, 2017 20:03
danielsuo pushed a commit that referenced this pull request Nov 16, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants