-
Notifications
You must be signed in to change notification settings - Fork 289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MemoryError: Unable to allocate 51.0 GiB #131
Comments
Hi, |
Hello, I get the following error: ~\Anaconda3\lib\site-packages\pke\unsupervised\graph_based\multipartiterank.py in candidate_weighting(self, threshold, method, alpha) ~\Anaconda3\lib\site-packages\pke\unsupervised\graph_based\multipartiterank.py in topic_clustering(self, threshold, method) ~\Anaconda3\lib\site-packages\scipy\spatial\distance.py in pdist(X, metric, *args, **kwargs) MemoryError: Unable to allocate 51.0 GiB for an array with shape (6848888203,) and data type float64 |
I think your input document might too big, for PKE to process. extractor = pke.unsupervised.MultipartiteRank()
extractor.load_document(input='../rawdata/dm_abstracts.txt', language='en', normalization='stemming', max_length=12000000)
pos = {'NOUN', 'PROPN', 'ADJ'}
stoplist = list(string.punctuation)
stoplist += ['-lrb-', '-rrb-', '-lcb-', '-rcb-', '-lsb-', '-rsb-']
stoplist += stopwords.words('english')
extractor.candidate_selection(pos=pos, stoplist=stoplist)
extractor.candidate_weighting(alpha=1.1,threshold=0.74, method='average') should be executed for each document (so for each abstract) in the file. |
Okay, got it. But this will produce disconnected sets, a set for each abstract. Does this mean that the unsupervised methods supported in pke just support single document not a corpus of many documents? Thanks, |
Well, PKE is a library that aims at providing many implementation of keyphrase extraction method. There are many method in the literature you can find the corresponding articles in the README. |
extractor.candidate_weighting(alpha=1.1, threshold=0.74,method='average')
For: MultipartiteRank
Used text file size = 11 MB
Platform: Windows 10 with 32 GB RAM
Error:
MemoryError Traceback (most recent call last)
in
4 extractor.candidate_weighting(alpha=1.1,
5 threshold=0.74,
----> 6 method='average')
~\Anaconda3\lib\site-packages\pke\unsupervised\graph_based\multipartiterank.py in candidate_weighting(self, threshold, method, alpha)
213
214 # cluster the candidates
--> 215 self.topic_clustering(threshold=threshold, method=method)
216
217 # build the topic graph
~\Anaconda3\lib\site-packages\pke\unsupervised\graph_based\multipartiterank.py in topic_clustering(self, threshold, method)
98
99 # compute the distance matrix
--> 100 Y = pdist(X, 'jaccard')
101 Y = np.nan_to_num(Y)
102
~\Anaconda3\lib\site-packages\scipy\spatial\distance.py in pdist(X, metric, *args, **kwargs)
2002 out = kwargs.pop("out", None)
2003 if out is None:
-> 2004 dm = np.empty((m * (m - 1)) // 2, dtype=np.double)
2005 else:
2006 if out.shape != (m * (m - 1) // 2,):
MemoryError: Unable to allocate 51.0 GiB for an array with shape (6848888203,) and data type float64
The text was updated successfully, but these errors were encountered: