You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Aug 24, 2020. It is now read-only.
Bonjour,
J'ai exécuté la routine de téléchargement des archives JOFR sans souci :
/home/dev/opt/bin/python3 -m dila2sql.runner --run-only downloader --base JORF --dumps-dir ./tarballs_jorf/
481 fichiers ont été téléchargés.
Par contre j'ai un message d'erreur au bout d'un certain temps lors de l'exécution de la routine d'importation dans une base SQLite : KeyError: 'etat'"
Voici le traces :
/home/dev/opt/bin/python3 -m dila2sql.runner --run-only importer --base JORF --db-url sqlite:///JORF.sqlite --dumps-dir ./tarballs_jorf/ --raw
read format "warc" is not supported
read filter "lz4" is not supported
write format "warc" is not supported
write filter "lz4" is not supported
last_update is None
Skipped 1 old archives
Processing Freemium_jorf_global_20181129-070000.tar.gz...
counting entries in archive ...
counted 4560475 entries in archive.
big archive will be processed in 10 chunks...
chunk 0: generating XML jobs args for 500000 entries starting from idx 0
99%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉ | 493555/500000 [00:07<00:00, 64945.25it/s]
chunk 0: start processing XML jobs...
starting process_xml tasks in a Process Pool...
459000it [25:58, 663.60it/s]
chunk 1: generating XML jobs args for 500000 entries starting from idx 500000
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋| 499428/500000 [00:07<00:00, 68653.81it/s]
chunk 1: start processing XML jobs...
starting process_xml tasks in a Process Pool...
495000it [24:19, 339.12it/s]
chunk 2: generating XML jobs args for 500000 entries starting from idx 1000000
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎| 498613/500000 [00:08<00:00, 48860.39it/s]
chunk 2: start processing XML jobs...
starting process_xml tasks in a Process Pool...
43%|██████████████████████████████████████████████████████████████████████████████████████████████████████▋ | 202000/468320 [11:53<15:30, 286.36it/s]concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/dev/opt/lib/python3.7/concurrent/futures/process.py", line 232, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File "/home/dev/app/dila2sql/dila2sql/importer/process_xml_batch.py", line 64, in process_xml_jobs_batch
batch_counts, batch_skipped = process_xml_jobs_sync(jobs_args_batch, db=db, commit=False)
File "/home/dev/app/dila2sql/dila2sql/importer/process_xml_batch.py", line 34, in process_xml_jobs_sync
xml_counts, xml_skipped = process_xml(*arg_list)
File "/home/dev/app/dila2sql/dila2sql/importer/process_xml.py", line 401, in process_xml
'etat': attrs['etat'],
KeyError: 'etat'
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/dev/opt/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/dev/opt/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/dev/app/dila2sql/dila2sql/runner.py", line 71, in
dumps_directory=args.dumps_dir
File "/home/dev/app/dila2sql/dila2sql/importer/importer.py", line 102, in run_importer
archive_processor.run()
File "/home/dev/app/dila2sql/dila2sql/importer/archive_processor.py", line 43, in run
chunk_counts, chunk_skipped = self.process_chunk(chunk_idx, chunk)
File "/home/dev/app/dila2sql/dila2sql/importer/archive_processor.py", line 83, in process_chunk
return process_xml_batch(process_xml_jobs_args, self.db_url)
File "/home/dev/app/dila2sql/dila2sql/importer/process_xml_batch.py", line 23, in process_xml_batch
return process_xml_jobs_in_parallel(process_xml_jobs_args, db_url)
File "/home/dev/app/dila2sql/dila2sql/importer/process_xml_batch.py", line 53, in process_xml_jobs_in_parallel
batch_counts, batch_skipped = future.result()
File "/home/dev/opt/lib/python3.7/concurrent/futures/_base.py", line 425, in result
return self.__get_result()
File "/home/dev/opt/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
KeyError: 'etat'
Si quelqu'un à une idée pour résoudre ce problème, je suis preneur !
Merci
The text was updated successfully, but these errors were encountered:
J'ai désactivé le parallélisme (export DILA2SQL_MAX_PROCESSES=1) mais je j'ai toujours la même erreur. En ligne 401 il semblerait que le dictionnaire attrs ne contienne pas la clef 'etat'. Je ne trouve pas dans le reste de ce fichier process_xml.py où cette clef est créée : serait-ce la cause du problème ?
399 if model == TexteVersion: 400 Sommaire.update({ 401 'etat': attrs['etat'], 402 'debut': attrs['date_debut'], 403 'fin': attrs['date_fin'] 404 }).where(Sommaire.element == text_id).execute()
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Bonjour,
J'ai exécuté la routine de téléchargement des archives JOFR sans souci :
/home/dev/opt/bin/python3 -m dila2sql.runner --run-only downloader --base JORF --dumps-dir ./tarballs_jorf/
481 fichiers ont été téléchargés.
Par contre j'ai un message d'erreur au bout d'un certain temps lors de l'exécution de la routine d'importation dans une base SQLite : KeyError: 'etat'"
Voici le traces :
/home/dev/opt/bin/python3 -m dila2sql.runner --run-only importer --base JORF --db-url sqlite:///JORF.sqlite --dumps-dir ./tarballs_jorf/ --raw
read format "warc" is not supported
read filter "lz4" is not supported
write format "warc" is not supported
write filter "lz4" is not supported
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/dev/opt/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/dev/opt/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/dev/app/dila2sql/dila2sql/runner.py", line 71, in
dumps_directory=args.dumps_dir
File "/home/dev/app/dila2sql/dila2sql/importer/importer.py", line 102, in run_importer
archive_processor.run()
File "/home/dev/app/dila2sql/dila2sql/importer/archive_processor.py", line 43, in run
chunk_counts, chunk_skipped = self.process_chunk(chunk_idx, chunk)
File "/home/dev/app/dila2sql/dila2sql/importer/archive_processor.py", line 83, in process_chunk
return process_xml_batch(process_xml_jobs_args, self.db_url)
File "/home/dev/app/dila2sql/dila2sql/importer/process_xml_batch.py", line 23, in process_xml_batch
return process_xml_jobs_in_parallel(process_xml_jobs_args, db_url)
File "/home/dev/app/dila2sql/dila2sql/importer/process_xml_batch.py", line 53, in process_xml_jobs_in_parallel
batch_counts, batch_skipped = future.result()
File "/home/dev/opt/lib/python3.7/concurrent/futures/_base.py", line 425, in result
return self.__get_result()
File "/home/dev/opt/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
KeyError: 'etat'
Si quelqu'un à une idée pour résoudre ce problème, je suis preneur !
Merci
The text was updated successfully, but these errors were encountered: