unexpected keyword argument 'size' #3

iwwwish · 2021-07-21T19:59:41Z

Hi Charles,

First, I'd like to thank you for contributing this interesting work and sharing the code with the community. I am trying to run the GuiltyTargets pipeline using example data shared in the previous #2 issue (of course, after fixing the issues with the data files). To do this, I created a condo env and installed all the packages as suggested, but ran into the below issue. I would really appreciate your help in fixing this.

I originally wanted to reproduce the results from your paper but since the open targets REST API has been deprecated and replaced by a GraphQL API, I had to directly work with the code from this repo. Also, you mention that some data is provided in supplementary information but I couldn't find any supplementary data in the original publication as well as on bioRxiv. It would be nice to see those tables too.

Thank you,
Vishal

# imports
from guiltytargets.pipeline import run

# define constants
input_directory = 'exampleData/'
targets_path = 'exampleData/known_targetID.csv'
ppi_graph_path = 'exampleData/ppi_graph.csv'
dge_path = 'exampleData/dge3.tsv'
auc_output_path = 'exampleData/'
probs_output_path = 'exampleData/'
max_padj = 0.05
lfc_cutoff = 1
entrez_id_name = 'Entrez id'
log_fold_change_name = 'Log fold change'
adjusted_p_value_name = 'Adjusted p value'
base_mean_name = 'Base mean'
split_char = ';'
confidence_cutoff = 0.1

# run GuiltyTargets
guiltytargets.run(
    input_directory,
    targets_path,
    ppi_graph_path,
    dge_path,
    auc_output_path,
    probs_output_path,
    max_adj_p=max_padj,
    max_log2_fold_change=lfc_cutoff * -1,
    min_log2_fold_change=lfc_cutoff,
    entrez_id_header=entrez_id_name,
    log2_fold_change_header=log_fold_change_name,
    adj_p_header=adjusted_p_value_name,
    base_mean_header=base_mean_name,
    entrez_delimiter=split_char,
    ppi_edge_min_confidence=confidence_cutoff,
 )

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-18-713dbffdce87> in <module>
     14     base_mean_header=base_mean_name,
     15     entrez_delimiter=split_char,
---> 16     ppi_edge_min_confidence=confidence_cutoff
     17  )

~/work/mentoring/summer_2021/code/guiltytargets/src/guiltytargets/pipeline.py in run(input_directory, targets_path, ppi_graph_path, dge_path, auc_output_path, probs_output_path, max_adj_p, max_log2_fold_change, min_log2_fold_change, entrez_id_header, log2_fold_change_header, adj_p_header, base_mean_header, entrez_delimiter, ppi_edge_min_confidence)
     58         directory=input_directory,
     59         targets=targets,
---> 60         network=network,
     61     )
     62 

~/work/mentoring/summer_2021/code/guiltytargets/src/guiltytargets/pipeline.py in rank_targets(network, targets, directory)
    110         gat2vec_config.dimension,
    111         gat2vec_config.window_size,
--> 112         output=True,
    113     )
    114     classifier = Classification(directory, directory, tr=gat2vec_config.training_ratio)

~/anaconda3/envs/guiltytargets/lib/python3.6/site-packages/GAT2VEC/gat2vec.py in train_gat2vec(self, nwalks, wlength, dsize, wsize, output)
     66             fname = paths.get_embedding_path(self.dataset_dir, self.output_dir)
     67             gat2vec_model = self._train_gat2vec(dsize, fname, nwalks, output, walks_structure,
---> 68                                                 wlength, wsize)
     69         return gat2vec_model
     70 

~/anaconda3/envs/guiltytargets/lib/python3.6/site-packages/GAT2VEC/gat2vec.py in _train_gat2vec(self, dsize, fname, nwalks, output, walks_structure, wlength, wsize, add_structure)
     93         if add_structure:
     94             walks = walks_structure + walks
---> 95         gat2vec_model = self._train_word2Vec(walks, dsize, wsize, 4, output, fname)
     96         return gat2vec_model

~/anaconda3/envs/guiltytargets/lib/python3.6/site-packages/GAT2VEC/gat2vec.py in _train_word2Vec(self, walks, dimension_size, window_size, cores, output, fname)
     48         model = Word2Vec([list(map(str, walk)) for walk in walks],
     49                          size=dimension_size, window=window_size, min_count=0, sg=1,
---> 50                          workers=cores)
     51         if output is True:
     52             model.wv.save_word2vec_format(fname)

TypeError: __init__() got an unexpected keyword argument 'size'

The text was updated successfully, but these errors were encountered:

cthoyt · 2021-07-21T23:13:28Z

This is a known issue because gensim changed their arguments in the word2vec model. See also:

At this point I'm not super enthusiastic about updating the code in this repo for a few reasons:

GAT2VEC didn't really provide any meaningful improvements over standard random walks with DeepWalk. Our supervisor was adamant that we needed this, so we included it, but I would suggest skipping this part completely
Node2vec has some much newer/better implementations that perform better than DeepWalk
I've re-implemented this pipeline a few times in different places - ultimately the training we did here is pretty routine, but we just didn't have the bandwidth to clean this code up at the end of my master's student's time

If you're gearing up for a publication and need a co-author, I could probably find some time to give some real support. In the mean time, I'd suggest checking out some of the follow-up work to GuiltyTargets that use the same ideas, but have a bit more clean and reusable code:

iwwwish · 2021-07-22T00:49:58Z

Charles, thank you very much for the prompt response. I figured out the issue with genesim and a couple of other packages that gat2vec relies on and had to be downgraded in order to use the code 'as is'.

My summer intern is trying to use the approach for target prioritization. So I am not sure if we would be ready for a publication any sooner, but that is an encouraging thought. Thanks for offering to help. I'll reach out if we decided to pursue this idea further.

Vishal

ozlemmuslu · 2021-07-28T18:15:32Z

Dear Vishal,

Thank you for your interest in our work. If you let me know which versions Gat2Vec requires, I can update the documentation.

I'd like to add that using Gat2Vec as opposed to DeepWalk increased the performance by 1-2% which could be important depending on how many candidates you are working with.

You can address your questions to me in the future as this is primarily my work. We no longer work on this project, but the purpose of making it open source was so that the community could also contribute.

Best,
Özlem

SalvatoreRa · 2021-11-04T16:07:43Z

Dear Özlem,

I had the same issue, how can I solve this issue?

thank you for your help

Best,
Salvatore

ozlemmuslu · 2021-11-11T16:38:23Z

Dear Salvatore,

The error is caused by a version mismatch in the gensim library. Originally, one of the parameters to initialize a Word2Vec object was named size, but it changed to vector_size.

I now updated the gat2vec library which uses gensim to be compatible with the more recent version of gensim. Please do a clean install (including, and especially for gat2vec) and let me know if it works now.

Best,
Özlem

SalvatoreRa · 2021-11-12T16:29:53Z

Dear Özlem,

Thank you for updating the code. I was working in these days on my machine, I uninstalled and reistalled all the libraries and now it looks like it works.

However, it may useful for you to know (or if someone else wants to use it) after uninstalling Gat2vec, Guiltytarget and deepwalk and reinstall them, was still giving me errors.

The first error is during import guiltytargets, it requires to uninstall and reinstall gensim

the second was generated by a deprecated function in gat2vec, in the parser file. the function as_matrix is deprecated and removed in pandas. I present here the correct code that permitted me to solve the error:

def get_embeddingDF(fname):
    """returns the embeddings read from file fname."""
    df = pd.read_csv(fname, header=None, skiprows=1, delimiter=' ')
    df.sort_values(by=[0], inplace=True)
    df = df.set_index(0)
    return df.to_numpy()

There is another error, in the evaluation file of the gat2vec, the iid= False argument is deprecated and removed from scikit-learn, here the updated code:

def evaluate_cv(self, clf, embedding, n_splits):
        """Do a repeated stratified cross validation.

        :param clf: Classifier object.
        :param embedding: The feature matrix.
        :param n_splits: Number of folds.
        :return: Dictionary containing numerical results of the classification.
        """
        embedding = embedding[self.label_ind, :]
        results = defaultdict(list)
        grid = {
            'C': np.logspace(-4, 4, 20),
            'tol': [0.0001, 0.001, 0.01]
        }
        log_reg = linear_model.LogisticRegression(solver='liblinear')

        # tol, C
        for i in range(10):
            inner_cv = StratifiedKFold(n_splits=n_splits, shuffle=True)
            outer_cv = StratifiedKFold(n_splits=n_splits, shuffle=True)

            for train_idx, test_idx in outer_cv.split(embedding, self.labels):
                clf = GridSearchCV(estimator=log_reg, param_grid=grid, cv=inner_cv 
                )
                clf.fit(embedding, self.labels)

                print('Parameter fitting done. clf: {}'.format(clf))

                X_train, X_test, Y_train, Y_test = self._get_split(embedding, test_idx, train_idx)
                pred, probs = self.get_predictions(clf, X_train, X_test, Y_train, Y_test)
                results["TR"].append(i)
                results["accuracy"].append(accuracy_score(Y_test, pred))
                results["f1micro"].append(f1_score(Y_test, pred, average='micro'))
                results["f1macro"].append(f1_score(Y_test, pred, average='macro'))
                if self.label_count == 2:
                    results["auc"].append(roc_auc_score(Y_test, probs[:, 1]))
                else:
                    results["auc"].append(0)
        return results

I did not received other errors and it work smoothly then

since I had to slightly modified the code I have a couple to additional questions.

a) after running GuiltyTargets with my files it returns the following outputs:

_gat2vec.emb which I suppose is the embedding
_na.adjlist (I suppose is an adjacency list, but not clear of what)
_graph.adjlist which I suppose is the adjacency list of the graph
labels_maped.txt I suppose the list of the labels (if I understood correctly the list of provided labels are just mapped to the graph, before the classifying steps)
probe_df.csv that i suppose it is the file after the classification step where the other entrez gene are classified as possible target or not
auc_df the results of the cross-validation

is it correct?

b) probes_df is returning a dataframe with 3 columns 0, 1 entrez which I suppose are the probabilities for each entrez gene to be a target (class 1) or not a target (class 0). is it correct? did you use on the probabilities argmax function to consider it a target or not?

c) did you use the probability to rank the target?

best,

Salvatore

ozlemmuslu · 2021-11-18T15:59:53Z

Dear Salvatore,

Thank you for your input. Would you like to do a pull request so your contribution to the repository is more visible?

Regarding your questions:
a) .emb is the embedding. .adjlist files are needed by Gat2Vec, one being the adjacency list of the structural graph, the other is the attribute graph.

For the rest of your questions, I will need to double check the output files, then I will get back to you.

I am closing this issue, since it's no longer about the original question. Here are the new issues:

#4
#5

Best,
Özlem

Update deprecated function based on GuiltyTargets/guiltytargets#3 (comment)

Update based on GuiltyTargets/guiltytargets#3 (comment)

This was referenced Nov 18, 2021

Add output file explanations #4

Closed

Update deprecated functions #5

Closed

ozlemmuslu closed this as completed Nov 18, 2021

ozlemmuslu added a commit to GuiltyTargets/GAT2VEC that referenced this issue Dec 22, 2021

Update parsers.py

18e2f9f

Update deprecated function based on GuiltyTargets/guiltytargets#3 (comment)

ozlemmuslu added a commit to GuiltyTargets/GAT2VEC that referenced this issue Dec 22, 2021

Update classification.py

b500f0f

Update based on GuiltyTargets/guiltytargets#3 (comment)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unexpected keyword argument 'size' #3

unexpected keyword argument 'size' #3

iwwwish commented Jul 21, 2021 •

edited

Loading

cthoyt commented Jul 21, 2021

iwwwish commented Jul 22, 2021

ozlemmuslu commented Jul 28, 2021

SalvatoreRa commented Nov 4, 2021

ozlemmuslu commented Nov 11, 2021

SalvatoreRa commented Nov 12, 2021

ozlemmuslu commented Nov 18, 2021

unexpected keyword argument 'size' #3

unexpected keyword argument 'size' #3

Comments

iwwwish commented Jul 21, 2021 • edited Loading

cthoyt commented Jul 21, 2021

iwwwish commented Jul 22, 2021

ozlemmuslu commented Jul 28, 2021

SalvatoreRa commented Nov 4, 2021

ozlemmuslu commented Nov 11, 2021

SalvatoreRa commented Nov 12, 2021

ozlemmuslu commented Nov 18, 2021

iwwwish commented Jul 21, 2021 •

edited

Loading