MemoryError: Unable to allocate 51.0 GiB #131

ssmoha7 · 2020-05-14T15:19:57Z

extractor.candidate_weighting(alpha=1.1, threshold=0.74,method='average')
For: MultipartiteRank
Used text file size = 11 MB
Platform: Windows 10 with 32 GB RAM
Error:

MemoryError Traceback (most recent call last)
in
4 extractor.candidate_weighting(alpha=1.1,
5 threshold=0.74,
----> 6 method='average')

~\Anaconda3\lib\site-packages\pke\unsupervised\graph_based\multipartiterank.py in candidate_weighting(self, threshold, method, alpha)
213
214 # cluster the candidates
--> 215 self.topic_clustering(threshold=threshold, method=method)
216
217 # build the topic graph

~\Anaconda3\lib\site-packages\pke\unsupervised\graph_based\multipartiterank.py in topic_clustering(self, threshold, method)
98
99 # compute the distance matrix
--> 100 Y = pdist(X, 'jaccard')
101 Y = np.nan_to_num(Y)
102

~\Anaconda3\lib\site-packages\scipy\spatial\distance.py in pdist(X, metric, *args, **kwargs)
2002 out = kwargs.pop("out", None)
2003 if out is None:
-> 2004 dm = np.empty((m * (m - 1)) // 2, dtype=np.double)
2005 else:
2006 if out.shape != (m * (m - 1) // 2,):

MemoryError: Unable to allocate 51.0 GiB for an array with shape (6848888203,) and data type float64

ygorg · 2020-05-14T15:23:12Z

Hi,
Please explain your issue, an error message does not provide enough information.
Good day.

ssmoha7 · 2020-05-14T15:29:08Z

Hello,
I tried to run the Multipartite Rank on a text file for data mining abstracts (size of file is 11 MB).
//This is my code
extractor = pke.unsupervised.MultipartiteRank()
extractor.load_document(input='../rawdata/dm_abstracts.txt', language='en', normalization='stemming', max_length=12000000)
pos = {'NOUN', 'PROPN', 'ADJ'}
stoplist = list(string.punctuation)
stoplist += ['-lrb-', '-rrb-', '-lcb-', '-rcb-', '-lsb-', '-rsb-']
stoplist += stopwords.words('english')
extractor.candidate_selection(pos=pos, stoplist=stoplist)
extractor.candidate_weighting(alpha=1.1,threshold=0.74, method='average') //ERROR HERE

I get the following error:
MemoryError Traceback (most recent call last)
in
4 extractor.candidate_weighting(alpha=1.1,
5 threshold=0.74,
----> 6 method='average')

~\Anaconda3\lib\site-packages\pke\unsupervised\graph_based\multipartiterank.py in candidate_weighting(self, threshold, method, alpha)
213
214 # cluster the candidates
--> 215 self.topic_clustering(threshold=threshold, method=method)
216
217 # build the topic graph

~\Anaconda3\lib\site-packages\pke\unsupervised\graph_based\multipartiterank.py in topic_clustering(self, threshold, method)
98
99 # compute the distance matrix
--> 100 Y = pdist(X, 'jaccard')
101 Y = np.nan_to_num(Y)
102

~\Anaconda3\lib\site-packages\scipy\spatial\distance.py in pdist(X, metric, *args, **kwargs)
2002 out = kwargs.pop("out", None)
2003 if out is None:
-> 2004 dm = np.empty((m * (m - 1)) // 2, dtype=np.double)
2005 else:
2006 if out.shape != (m * (m - 1) // 2,):

MemoryError: Unable to allocate 51.0 GiB for an array with shape (6848888203,) and data type float64

ygorg · 2020-05-14T15:34:25Z

I think your input document might too big, for PKE to process.
If you are processing abstracts, then I guess your file contains more than one abstract.
The code you executed:

extractor = pke.unsupervised.MultipartiteRank()
extractor.load_document(input='../rawdata/dm_abstracts.txt', language='en', normalization='stemming', max_length=12000000)
pos = {'NOUN', 'PROPN', 'ADJ'}
stoplist = list(string.punctuation)
stoplist += ['-lrb-', '-rrb-', '-lcb-', '-rcb-', '-lsb-', '-rsb-']
stoplist += stopwords.words('english')
extractor.candidate_selection(pos=pos, stoplist=stoplist)
extractor.candidate_weighting(alpha=1.1,threshold=0.74, method='average')

should be executed for each document (so for each abstract) in the file.

ssmoha7 · 2020-05-14T15:44:51Z

Okay, got it. But this will produce disconnected sets, a set for each abstract. Does this mean that the unsupervised methods supported in pke just support single document not a corpus of many documents?

Thanks,

ygorg · 2020-05-14T15:50:59Z

Well, PKE is a library that aims at providing many implementation of keyphrase extraction method. There are many method in the literature you can find the corresponding articles in the README.
Methods using topic modeling or TfIdf account for thr corpus, but the task of keyphrase extraction is to get keyphrases for documents. Maybe keyphrase extraction does not suit your needs.
I close this issue as this is not a problem with PKE.

ygorg closed this as completed May 14, 2020

ygorg mentioned this issue Sep 30, 2022

max_length parameter error with the latest version #202

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MemoryError: Unable to allocate 51.0 GiB #131

MemoryError: Unable to allocate 51.0 GiB #131

ssmoha7 commented May 14, 2020

ygorg commented May 14, 2020

ssmoha7 commented May 14, 2020

ygorg commented May 14, 2020

ssmoha7 commented May 14, 2020

ygorg commented May 14, 2020

MemoryError: Unable to allocate 51.0 GiB #131

MemoryError: Unable to allocate 51.0 GiB #131

Comments

ssmoha7 commented May 14, 2020

extractor.candidate_weighting(alpha=1.1, threshold=0.74,method='average') For: MultipartiteRank Used text file size = 11 MB Platform: Windows 10 with 32 GB RAM Error:

ygorg commented May 14, 2020

ssmoha7 commented May 14, 2020

ygorg commented May 14, 2020

ssmoha7 commented May 14, 2020

ygorg commented May 14, 2020

extractor.candidate_weighting(alpha=1.1, threshold=0.74,method='average')
For: MultipartiteRank
Used text file size = 11 MB
Platform: Windows 10 with 32 GB RAM
Error: