You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was expecting it to write to "metadata.json" and "spacy_doc.bin".
Current Behavior
Instead it seems to be writing to files that have an extra "s" on the end: "metadatas.json" and "spacy_docs.bin".
Possible Solution
I can't figure out why; looking at the code certainly suggests that the string should be "metadata.json".
Steps to Reproduce (for bugs)
def getGutenbergMetadata(textno):
meta = {
'title': list(get_metadata('title', textno))[0],
'author': list(get_metadata('author', textno)),
'rights': list(get_metadata('rights', textno)),
'subject': list(get_metadata('subject', textno)),
'language': list(get_metadata('language', textno))[0],
'guten_no': textno}
return meta
def getGutenberg(filenumber):
return open("./data/corpora/gutenberg/strip/{}.txt".format(filenumber), mode="r", encoding="utf_8").read()
for i in acttext:
count += 1
#if count % 10 is 0:
print(".", end="")
am = getGutenbergMetadata(i)
ad = textacy.Doc(getGutenberg(i), None, "en")
actdocs.append(ad)
actmeta.append(am)
print("m", end="")
current_corpus = textacy.corpus.Corpus('en', docs=actdocs, metadatas=actmeta)
current_corpus.save("./data")
current_corpus = textacy.Doc.load("./data")
(Renaming the files lets it find it again, of course, but then I get this (possibly separate) error:)
Traceback (most recent call last):
File "<ipython-input-5-5605bd27e87b>", line 1, in <module>
loadCorpus()
File "./excalibur/action_catalog.py", line 125, in loadCorpus
current_corpus = textacy.Doc.load("./data")
File "C:\tools\Anaconda3\envs\genmoenv\lib\site-packages\textacy\doc.py", line 219, in load
metadata = list(fileio.read_json(meta_fname))[0]
File "C:\tools\Anaconda3\envs\genmoenv\lib\site-packages\textacy\fileio\read.py", line 69, in read_json
yield json.load(f)
File "C:\tools\Anaconda3\envs\genmoenv\lib\json\__init__.py", line 268, in load
parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
File "C:\tools\Anaconda3\envs\genmoenv\lib\json\__init__.py", line 319, in loads
return _default_decoder.decode(s)
File "C:\tools\Anaconda3\envs\genmoenv\lib\json\decoder.py", line 342, in decode
raise JSONDecodeError("Extra data", s, end)
JSONDecodeError: Extra data
Context
Your Environment
textacy and spacy versions: textacy: 0.3.2, spacy: 1.2.0
Python version: 3.5
Operating system and version: Windows 7 x64
The text was updated successfully, but these errors were encountered:
Hi @ikarth , in your example code, you save a textacy.Corpus to disk via current_corpus.save("./data") then try to load it as a textacy.Doc via textacy.Doc.load("./data"). As you've noticed, that doesn't work! The immediate cause for failure is that Corpus and Doc instances are saved to disk with different filenames (one pluralized and the other not). But still, you shouldn't save a Corpus and expect to be able to load it back as a Doc.
Expected Behavior
I was expecting it to write to "metadata.json" and "spacy_doc.bin".
Current Behavior
Instead it seems to be writing to files that have an extra "s" on the end: "metadatas.json" and "spacy_docs.bin".
Possible Solution
I can't figure out why; looking at the code certainly suggests that the string should be "metadata.json".
Steps to Reproduce (for bugs)
(Renaming the files lets it find it again, of course, but then I get this (possibly separate) error:)
Context
Your Environment
textacy
andspacy
versions: textacy: 0.3.2, spacy: 1.2.0The text was updated successfully, but these errors were encountered: