can not export data,there is nothing in the export compressed file #1749

amzfc · 2022-03-23T09:53:53Z

How to reproduce the behaviour

Your Environment

Operating System: ubuntu16.4
Python Version Used:
When you install doccano: 2022/3/23
How did you install doccano (Heroku button etc): docker

amzfc · 2022-03-23T10:01:29Z

After several attempts, a zip. XML file was exported Txt file, the content is {"status":"Not ready"}

amzfc · 2022-03-23T10:08:34Z

The project created is a sequence annotation

liyp0095 · 2022-03-23T21:32:01Z

Same issue happens to me. I deploy the project on OpenShift Container Platform. The task is Text Classification. And I got the exported txt file with content {"status":"Not ready"} too.

Hironsan · 2022-03-24T02:32:56Z

Hmm, I can't reproduce the problem.

What is output in the docker logs?

> docker logs YOUR_CONTAINER_ID
[2022-03-24 02:25:47 +0000] [18] [INFO] Starting gunicorn 20.1.0
[2022-03-24 02:25:47 +0000] [18] [INFO] Listening at: http://0.0.0.0:8000 (18)
[2022-03-24 02:25:47 +0000] [18] [INFO] Using worker: sync
[2022-03-24 02:25:47 +0000] [23] [INFO] Booting worker with pid: 23
[2022-03-24 02:25:47 +0000] [24] [INFO] Booting worker with pid: 24
 
 -------------- celery@8d85ef7a6f5f v5.2.3 (dawn-chorus)
--- ***** ----- 
-- ******* ---- Linux-5.10.47-linuxkit-x86_64-with-glibc2.2.5 2022-03-24 02:25:50
- *** --- * --- 
- ** ---------- [config]
- ** ---------- .> app:         config:0x7f59c7580190
- ** ---------- .> transport:   sqla+sqlite:////data/doccano.db
- ** ---------- .> results:     
- *** --- * --- .> concurrency: 2 (prefork)
-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)
--- ***** ----- 
 -------------- [queues]
                .> celery           exchange=celery(direct) key=celery
                

[tasks]
  . data_export.celery_tasks.export_dataset
  . data_import.celery_tasks.import_dataset
  . health_check.contrib.celery.tasks.add

[2022-03-24 02:25:50,824: INFO/MainProcess] Connected to sqla+sqlite:////data/doccano.db
[2022-03-24 02:25:50,956: INFO/MainProcess] celery@8d85ef7a6f5f ready.
[2022-03-24 02:26:38,314: INFO/MainProcess] Task data_import.celery_tasks.import_dataset[3490eade-4d8f-42cd-a3a7-a312f79c361a] received
[2022-03-24 02:26:38,380: INFO/ForkPoolWorker-1] Task data_import.celery_tasks.import_dataset[3490eade-4d8f-42cd-a3a7-a312f79c361a] succeeded in 0.05503669998142868s: {'error': []}
[2022-03-24 02:26:49,363: INFO/MainProcess] Task data_export.celery_tasks.export_dataset[d1c9a5ff-781f-4cec-a8d7-acc44503cfb4] received
[2022-03-24 02:26:49,400: INFO/ForkPoolWorker-2] Task data_export.celery_tasks.export_dataset[d1c9a5ff-781f-4cec-a8d7-acc44503cfb4] succeeded in 0.03536610002629459s: '/doccano/backend/media/66c0c1a6-549b-4d7d-a434-a029ec4926a7.zip'

Hironsan · 2022-03-24T02:42:57Z

By the way, I show you how to copy the data from the container to the host as a quick fix:

check the database file in the container

> docker exec -it doccano bash
> doccano@8d85ef7a6f5f:/doccano/backend$ ls /data
doccano.db
> doccano@8d85ef7a6f5f:/doccano/backend$ exit
exit

copy the file to the host.

# Replace 8d85ef7a6f5f with your container id.
> docker cp 8d85ef7a6f5f:/data/doccano.db .

execute the SQL query.

> sqlite3 doccano.db
sqlite> select text from examples_example;
exampleA
exampleB
exampleA
exampleB
exampleC

Hironsan · 2022-03-24T03:02:39Z

I understand the problem.

The frontend tries to get the task status repeatedly. If the task is ready, it tries to download the file:

doccano/frontend/pages/projects/_id/dataset/export.vue

Lines 114 to 125 in 27eff5c

    
               pollData() { 
        
                 // @ts-ignore 
        
           		  this.polling = setInterval(async() => { 
        
                   if (this.taskId) { 
        
                     const res = await this.$services.taskStatus.get(this.taskId) 
        
                     if (res.ready) { 
        
                       this.$services.download.download(this.projectId, this.taskId) 
        
                       this.reset() 
        
                     } 
        
                   } 
        
             		}, 1000) 
        
           	  },

The download API is the following. This should be called after the task is ready. But it seems to me that the task is not ready for some reason and returns {"status": "Not ready"}. As a result, it is included in your content.

doccano/backend/data_export/views.py

Lines 25 to 35 in 27eff5c

    
           class DatasetExportAPI(APIView): 
        
               permission_classes = [IsAuthenticated & IsProjectAdmin] 
        
               def get(self, request, *args, **kwargs): 
        
                   task_id = request.GET["taskId"] 
        
                   task = AsyncResult(task_id) 
        
                   ready = task.ready() 
        
                   if ready: 
        
                       filename = task.result 
        
                       return FileResponse(open(filename, mode="rb"), as_attachment=True) 
        
                   return Response({"status": "Not ready"})

liyp0095 · 2022-03-24T03:12:44Z

Log streaming from OpenShift give me something like this. I do not know if it helps.

10.128.5.154 - - [24/Mar/2022:02:58:45 +0000] "POST /v1/projects/11/download HTTP/1.0" 200 50 "https://qli-lab-doccano.apps.nimbus.las.iastate.edu/projects/11/dataset?limit=10&offset=0" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.82 Safari/537.36"
10.128.5.154 - - [24/Mar/2022:02:58:46 +0000] "GET /v1/tasks/status/a6d6238c-0138-486b-892d-c1db4142cdaf HTTP/1.0" 200 42 "https://qli-lab-doccano.apps.nimbus.las.iastate.edu/projects/11/dataset?limit=10&offset=0" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.82 Safari/537.36"
10.128.5.154 - - [24/Mar/2022:02:58:47 +0000] "GET /v1/tasks/status/a6d6238c-0138-486b-892d-c1db4142cdaf HTTP/1.0" 200 70 "https://qli-lab-doccano.apps.nimbus.las.iastate.edu/projects/11/dataset?limit=10&offset=0" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.82 Safari/537.36"
10.128.5.154 - - [24/Mar/2022:02:58:47 +0000] "GET /v1/projects/11/download?taskId=a6d6238c-0138-486b-892d-c1db4142cdaf HTTP/1.0" 500 145 "https://qli-lab-doccano.apps.nimbus.las.iastate.edu/projects/11/dataset?limit=10&offset=0" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.82 Safari/537.36"
Internal Server Error: /v1/projects/11/download
Traceback (most recent call last):
  File "/opt/app-root/lib64/python3.8/site-packages/django/core/handlers/exception.py", line 47, in inner
    response = get_response(request)
  File "/opt/app-root/lib64/python3.8/site-packages/django/core/handlers/base.py", line 181, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/opt/app-root/lib64/python3.8/site-packages/django/views/decorators/csrf.py", line 54, in wrapped_view
    return view_func(*args, **kwargs)
  File "/opt/app-root/lib64/python3.8/site-packages/django/views/generic/base.py", line 69, in view
    return self.dispatch(request, *args, **kwargs)
  File "/opt/app-root/lib64/python3.8/site-packages/rest_framework/views.py", line 509, in dispatch
    response = self.handle_exception(exc)
  File "/opt/app-root/lib64/python3.8/site-packages/rest_framework/views.py", line 469, in handle_exception
    self.raise_uncaught_exception(exc)
  File "/opt/app-root/lib64/python3.8/site-packages/rest_framework/views.py", line 480, in raise_uncaught_exception
    raise exc
  File "/opt/app-root/lib64/python3.8/site-packages/rest_framework/views.py", line 506, in dispatch
    response = handler(request, *args, **kwargs)
  File "/opt/app-root/src/backend/api/views/export_dataset.py", line 34, in get
    return FileResponse(open(filename, mode='rb'), as_attachment=True)
TypeError: expected str, bytes or os.PathLike object, not OperationalError

amzfc · 2022-03-24T03:19:31Z

--- ***** -----
-- ******* ---- Linux-4.4.0-210-generic-x86_64-with-glibc2.2.5 2022-03-24 03:12:14
- *** --- * ---
- ** ---------- [config]
- ** ---------- .> app:         config:0x7f9fcffae190
- ** ---------- .> transport:   sqla+sqlite:////data/doccano.db
- ** ---------- .> results:
- *** --- * --- .> concurrency: 2 (prefork)
-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)
--- ***** -----
 -------------- [queues]
                .> celery           exchange=celery(direct) key=celery


[tasks]
  . data_export.celery_tasks.export_dataset
  . data_import.celery_tasks.import_dataset
  . health_check.contrib.celery.tasks.add

[2022-03-24 03:12:14,358: INFO/MainProcess] Connected to sqla+sqlite:////data/doccano.db
[2022-03-24 03:12:14,402: INFO/MainProcess] celery@372447ef8f6f ready.
[2022-03-24 03:14:18,363: INFO/MainProcess] Task data_export.celery_tasks.export_dataset[5010e35b-7c98-45       26-9d4c-0d68a7427239] received
[2022-03-24 03:14:18,403: INFO/ForkPoolWorker-1] Task data_export.celery_tasks.export_dataset[5010e35b-7c       98-4526-9d4c-0d68a7427239] succeeded in 0.03686487604863942s: '/doccano/backend/media/bf8fd420-b4a2-43f2-       b208-917a82341006.zip'

There is nothing in the exported compressed file, and it can't even be decompressed

amzfc · 2022-03-24T03:32:28Z

By the way, I show you how to copy the data from the container to the host as a quick fix:

check the database file in the container
> docker exec -it doccano bash
> doccano@8d85ef7a6f5f:/doccano/backend$ ls /data
doccano.db
> doccano@8d85ef7a6f5f:/doccano/backend$ exit
exit
copy the file to the host.
# Replace 8d85ef7a6f5f with your container id.
> docker cp 8d85ef7a6f5f:/data/doccano.db .
execute the SQL query.
> sqlite3 doccano.db
sqlite> select text from examples_example;
exampleA
exampleB
exampleA
exampleB
exampleC

the way is ok, but i want the annotationed data.

amzfc · 2022-03-24T03:42:09Z

Hironsan · 2022-03-24T04:38:08Z

the way is ok, but i want the annotationed data.

For example, If you want to write the data id, start offset, end offset, and label name to some a csv file, the following queries are useful:

sqlite> .headers on
sqlite> .mode csv
sqlite> .output annotation.csv
sqlite> SELECT example_id, start_offset, end_offset, text FROM labels_span, label_types_spantype WHERE labels_span.label_id=label_types_spantype.id;
sqlite> .quit

Contents:

> head annotation.csv
example_id,start_offset,end_offset,text
20757,0,8,LOC
20758,4,23,ORG
20758,59,65,MISC
...

Hironsan · 2022-03-24T04:40:22Z

#1749 (comment)

Thanks.

The task is ready but the result is something wrong. I need some investigation.

qiuminghai · 2022-03-28T06:54:01Z

I solve this bug. in the D:\conda\envs\doccano\Lib\site-packages\backend\data_import\pipeline\writer.py file,
Modify the code below：
line 9, add : encoding="utf-8"

class LineWriter(BaseWriter):
    extension = 'txt'
    def write(self, records: Iterator[Record]) -> str:
        files = {}
        for record in records:
            filename = os.path.join(self.tmpdir, f'{record.user}.{self.extension}')
            if filename not in files:
                f = open(filename, mode='a',encoding="utf-8") #here
                files[filename] = f
            f = files[filename]
            line = self.create_line(record)
            f.write(f'{line}\n')
        for f in files.values():
            f.close()
        save_file = self.write_zip(files)
        for file in files:
            os.remove(file)
        return save_file

when export the file , dont "export only approved file"

Hironsan · 2022-03-29T06:00:18Z

Fixed #1754

teohsinyee · 2022-04-19T09:13:49Z

I can't find writer.py file. But I have writers.py.
When I open the file, I can't find the class that you mentioned: class LineWriter(BaseWriter)

Hironsan added the bug Something isn't working label Mar 24, 2022

Hironsan added this to To do in v1.7.0 via automation Mar 24, 2022

Hironsan moved this from To do to In progress in v1.7.0 Apr 13, 2022

Hironsan mentioned this issue Apr 25, 2022

Enhancement/dataset export #1799

Merged

Hironsan closed this as completed in #1799 Apr 25, 2022

v1.7.0 automation moved this from In progress to Done Apr 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

can not export data,there is nothing in the export compressed file #1749

can not export data,there is nothing in the export compressed file #1749

amzfc commented Mar 23, 2022

amzfc commented Mar 23, 2022

amzfc commented Mar 23, 2022

liyp0095 commented Mar 23, 2022

Hironsan commented Mar 24, 2022

Hironsan commented Mar 24, 2022

Hironsan commented Mar 24, 2022

liyp0095 commented Mar 24, 2022

amzfc commented Mar 24, 2022 •

edited by Hironsan

amzfc commented Mar 24, 2022

amzfc commented Mar 24, 2022

Hironsan commented Mar 24, 2022

Hironsan commented Mar 24, 2022

qiuminghai commented Mar 28, 2022 •

edited by Hironsan

Hironsan commented Mar 29, 2022

teohsinyee commented Apr 19, 2022 •

edited

can not export data,there is nothing in the export compressed file #1749

can not export data,there is nothing in the export compressed file #1749

Comments

amzfc commented Mar 23, 2022

How to reproduce the behaviour

Your Environment

amzfc commented Mar 23, 2022

amzfc commented Mar 23, 2022

liyp0095 commented Mar 23, 2022

Hironsan commented Mar 24, 2022

Hironsan commented Mar 24, 2022

Hironsan commented Mar 24, 2022

liyp0095 commented Mar 24, 2022

amzfc commented Mar 24, 2022 • edited by Hironsan

amzfc commented Mar 24, 2022

amzfc commented Mar 24, 2022

Hironsan commented Mar 24, 2022

Hironsan commented Mar 24, 2022

qiuminghai commented Mar 28, 2022 • edited by Hironsan

Hironsan commented Mar 29, 2022

teohsinyee commented Apr 19, 2022 • edited

amzfc commented Mar 24, 2022 •

edited by Hironsan

qiuminghai commented Mar 28, 2022 •

edited by Hironsan

teohsinyee commented Apr 19, 2022 •

edited