Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can not export data,there is nothing in the export compressed file #1749

Closed
amzfc opened this issue Mar 23, 2022 · 15 comments · Fixed by #1799
Closed

can not export data,there is nothing in the export compressed file #1749

amzfc opened this issue Mar 23, 2022 · 15 comments · Fixed by #1799
Labels
bug Something isn't working
Projects

Comments

@amzfc
Copy link

amzfc commented Mar 23, 2022

How to reproduce the behaviour

Your Environment

  • Operating System: ubuntu16.4
  • Python Version Used:
  • When you install doccano: 2022/3/23
  • How did you install doccano (Heroku button etc): docker
@amzfc
Copy link
Author

amzfc commented Mar 23, 2022

After several attempts, a zip. XML file was exported Txt file, the content is {"status":"Not ready"}

@amzfc
Copy link
Author

amzfc commented Mar 23, 2022

The project created is a sequence annotation

@liyp0095
Copy link

Same issue happens to me. I deploy the project on OpenShift Container Platform. The task is Text Classification. And I got the exported txt file with content {"status":"Not ready"} too.

@Hironsan
Copy link
Member

Hmm, I can't reproduce the problem.

What is output in the docker logs?

> docker logs YOUR_CONTAINER_ID
[2022-03-24 02:25:47 +0000] [18] [INFO] Starting gunicorn 20.1.0
[2022-03-24 02:25:47 +0000] [18] [INFO] Listening at: http://0.0.0.0:8000 (18)
[2022-03-24 02:25:47 +0000] [18] [INFO] Using worker: sync
[2022-03-24 02:25:47 +0000] [23] [INFO] Booting worker with pid: 23
[2022-03-24 02:25:47 +0000] [24] [INFO] Booting worker with pid: 24
 
 -------------- celery@8d85ef7a6f5f v5.2.3 (dawn-chorus)
--- ***** ----- 
-- ******* ---- Linux-5.10.47-linuxkit-x86_64-with-glibc2.2.5 2022-03-24 02:25:50
- *** --- * --- 
- ** ---------- [config]
- ** ---------- .> app:         config:0x7f59c7580190
- ** ---------- .> transport:   sqla+sqlite:////data/doccano.db
- ** ---------- .> results:     
- *** --- * --- .> concurrency: 2 (prefork)
-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)
--- ***** ----- 
 -------------- [queues]
                .> celery           exchange=celery(direct) key=celery
                

[tasks]
  . data_export.celery_tasks.export_dataset
  . data_import.celery_tasks.import_dataset
  . health_check.contrib.celery.tasks.add

[2022-03-24 02:25:50,824: INFO/MainProcess] Connected to sqla+sqlite:////data/doccano.db
[2022-03-24 02:25:50,956: INFO/MainProcess] celery@8d85ef7a6f5f ready.
[2022-03-24 02:26:38,314: INFO/MainProcess] Task data_import.celery_tasks.import_dataset[3490eade-4d8f-42cd-a3a7-a312f79c361a] received
[2022-03-24 02:26:38,380: INFO/ForkPoolWorker-1] Task data_import.celery_tasks.import_dataset[3490eade-4d8f-42cd-a3a7-a312f79c361a] succeeded in 0.05503669998142868s: {'error': []}
[2022-03-24 02:26:49,363: INFO/MainProcess] Task data_export.celery_tasks.export_dataset[d1c9a5ff-781f-4cec-a8d7-acc44503cfb4] received
[2022-03-24 02:26:49,400: INFO/ForkPoolWorker-2] Task data_export.celery_tasks.export_dataset[d1c9a5ff-781f-4cec-a8d7-acc44503cfb4] succeeded in 0.03536610002629459s: '/doccano/backend/media/66c0c1a6-549b-4d7d-a434-a029ec4926a7.zip'

@Hironsan
Copy link
Member

By the way, I show you how to copy the data from the container to the host as a quick fix:

  1. check the database file in the container
> docker exec -it doccano bash
> doccano@8d85ef7a6f5f:/doccano/backend$ ls /data
doccano.db
> doccano@8d85ef7a6f5f:/doccano/backend$ exit
exit
  1. copy the file to the host.
# Replace 8d85ef7a6f5f with your container id.
> docker cp 8d85ef7a6f5f:/data/doccano.db .
  1. execute the SQL query.
> sqlite3 doccano.db
sqlite> select text from examples_example;
exampleA
exampleB
exampleA
exampleB
exampleC

@Hironsan Hironsan added the bug Something isn't working label Mar 24, 2022
@Hironsan Hironsan added this to To do in v1.7.0 via automation Mar 24, 2022
@Hironsan
Copy link
Member

I understand the problem.

The frontend tries to get the task status repeatedly. If the task is ready, it tries to download the file:

pollData() {
// @ts-ignore
this.polling = setInterval(async() => {
if (this.taskId) {
const res = await this.$services.taskStatus.get(this.taskId)
if (res.ready) {
this.$services.download.download(this.projectId, this.taskId)
this.reset()
}
}
}, 1000)
},

The download API is the following. This should be called after the task is ready. But it seems to me that the task is not ready for some reason and returns {"status": "Not ready"}. As a result, it is included in your content.

class DatasetExportAPI(APIView):
permission_classes = [IsAuthenticated & IsProjectAdmin]
def get(self, request, *args, **kwargs):
task_id = request.GET["taskId"]
task = AsyncResult(task_id)
ready = task.ready()
if ready:
filename = task.result
return FileResponse(open(filename, mode="rb"), as_attachment=True)
return Response({"status": "Not ready"})

@liyp0095
Copy link

Log streaming from OpenShift give me something like this. I do not know if it helps.

10.128.5.154 - - [24/Mar/2022:02:58:45 +0000] "POST /v1/projects/11/download HTTP/1.0" 200 50 "https://qli-lab-doccano.apps.nimbus.las.iastate.edu/projects/11/dataset?limit=10&offset=0" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.82 Safari/537.36"
10.128.5.154 - - [24/Mar/2022:02:58:46 +0000] "GET /v1/tasks/status/a6d6238c-0138-486b-892d-c1db4142cdaf HTTP/1.0" 200 42 "https://qli-lab-doccano.apps.nimbus.las.iastate.edu/projects/11/dataset?limit=10&offset=0" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.82 Safari/537.36"
10.128.5.154 - - [24/Mar/2022:02:58:47 +0000] "GET /v1/tasks/status/a6d6238c-0138-486b-892d-c1db4142cdaf HTTP/1.0" 200 70 "https://qli-lab-doccano.apps.nimbus.las.iastate.edu/projects/11/dataset?limit=10&offset=0" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.82 Safari/537.36"
10.128.5.154 - - [24/Mar/2022:02:58:47 +0000] "GET /v1/projects/11/download?taskId=a6d6238c-0138-486b-892d-c1db4142cdaf HTTP/1.0" 500 145 "https://qli-lab-doccano.apps.nimbus.las.iastate.edu/projects/11/dataset?limit=10&offset=0" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.82 Safari/537.36"
Internal Server Error: /v1/projects/11/download
Traceback (most recent call last):
  File "/opt/app-root/lib64/python3.8/site-packages/django/core/handlers/exception.py", line 47, in inner
    response = get_response(request)
  File "/opt/app-root/lib64/python3.8/site-packages/django/core/handlers/base.py", line 181, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/opt/app-root/lib64/python3.8/site-packages/django/views/decorators/csrf.py", line 54, in wrapped_view
    return view_func(*args, **kwargs)
  File "/opt/app-root/lib64/python3.8/site-packages/django/views/generic/base.py", line 69, in view
    return self.dispatch(request, *args, **kwargs)
  File "/opt/app-root/lib64/python3.8/site-packages/rest_framework/views.py", line 509, in dispatch
    response = self.handle_exception(exc)
  File "/opt/app-root/lib64/python3.8/site-packages/rest_framework/views.py", line 469, in handle_exception
    self.raise_uncaught_exception(exc)
  File "/opt/app-root/lib64/python3.8/site-packages/rest_framework/views.py", line 480, in raise_uncaught_exception
    raise exc
  File "/opt/app-root/lib64/python3.8/site-packages/rest_framework/views.py", line 506, in dispatch
    response = handler(request, *args, **kwargs)
  File "/opt/app-root/src/backend/api/views/export_dataset.py", line 34, in get
    return FileResponse(open(filename, mode='rb'), as_attachment=True)
TypeError: expected str, bytes or os.PathLike object, not OperationalError

@amzfc
Copy link
Author

amzfc commented Mar 24, 2022

--- ***** -----
-- ******* ---- Linux-4.4.0-210-generic-x86_64-with-glibc2.2.5 2022-03-24 03:12:14
- *** --- * ---
- ** ---------- [config]
- ** ---------- .> app:         config:0x7f9fcffae190
- ** ---------- .> transport:   sqla+sqlite:////data/doccano.db
- ** ---------- .> results:
- *** --- * --- .> concurrency: 2 (prefork)
-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)
--- ***** -----
 -------------- [queues]
                .> celery           exchange=celery(direct) key=celery


[tasks]
  . data_export.celery_tasks.export_dataset
  . data_import.celery_tasks.import_dataset
  . health_check.contrib.celery.tasks.add

[2022-03-24 03:12:14,358: INFO/MainProcess] Connected to sqla+sqlite:////data/doccano.db
[2022-03-24 03:12:14,402: INFO/MainProcess] celery@372447ef8f6f ready.
[2022-03-24 03:14:18,363: INFO/MainProcess] Task data_export.celery_tasks.export_dataset[5010e35b-7c98-45       26-9d4c-0d68a7427239] received
[2022-03-24 03:14:18,403: INFO/ForkPoolWorker-1] Task data_export.celery_tasks.export_dataset[5010e35b-7c       98-4526-9d4c-0d68a7427239] succeeded in 0.03686487604863942s: '/doccano/backend/media/bf8fd420-b4a2-43f2-       b208-917a82341006.zip'

There is nothing in the exported compressed file, and it can't even be decompressed

@amzfc
Copy link
Author

amzfc commented Mar 24, 2022

By the way, I show you how to copy the data from the container to the host as a quick fix:

  1. check the database file in the container
> docker exec -it doccano bash
> doccano@8d85ef7a6f5f:/doccano/backend$ ls /data
doccano.db
> doccano@8d85ef7a6f5f:/doccano/backend$ exit
exit
  1. copy the file to the host.
# Replace 8d85ef7a6f5f with your container id.
> docker cp 8d85ef7a6f5f:/data/doccano.db .
  1. execute the SQL query.
> sqlite3 doccano.db
sqlite> select text from examples_example;
exampleA
exampleB
exampleA
exampleB
exampleC

the way is ok, but i want the annotationed data.

@amzfc
Copy link
Author

amzfc commented Mar 24, 2022

image

@Hironsan
Copy link
Member

the way is ok, but i want the annotationed data.

For example, If you want to write the data id, start offset, end offset, and label name to some a csv file, the following queries are useful:

sqlite> .headers on
sqlite> .mode csv
sqlite> .output annotation.csv
sqlite> SELECT example_id, start_offset, end_offset, text FROM labels_span, label_types_spantype WHERE labels_span.label_id=label_types_spantype.id;
sqlite> .quit

Contents:

> head annotation.csv
example_id,start_offset,end_offset,text
20757,0,8,LOC
20758,4,23,ORG
20758,59,65,MISC
...

@Hironsan
Copy link
Member

#1749 (comment)

Thanks.

The task is ready but the result is something wrong. I need some investigation.

@qiuminghai
Copy link

qiuminghai commented Mar 28, 2022

I solve this bug. in the D:\conda\envs\doccano\Lib\site-packages\backend\data_import\pipeline\writer.py file,
Modify the code below:
line 9, add : encoding="utf-8"

class LineWriter(BaseWriter):
    extension = 'txt'
    def write(self, records: Iterator[Record]) -> str:
        files = {}
        for record in records:
            filename = os.path.join(self.tmpdir, f'{record.user}.{self.extension}')
            if filename not in files:
                f = open(filename, mode='a',encoding="utf-8") #here
                files[filename] = f
            f = files[filename]
            line = self.create_line(record)
            f.write(f'{line}\n')
        for f in files.values():
            f.close()
        save_file = self.write_zip(files)
        for file in files:
            os.remove(file)
        return save_file

when export the file , dont "export only approved file"

@Hironsan
Copy link
Member

Fixed #1754

@Hironsan Hironsan moved this from To do to In progress in v1.7.0 Apr 13, 2022
@teohsinyee
Copy link

teohsinyee commented Apr 19, 2022

I can't find writer.py file. But I have writers.py.
When I open the file, I can't find the class that you mentioned: class LineWriter(BaseWriter)

image

v1.7.0 automation moved this from In progress to Done Apr 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
No open projects
v1.7.0
  
Done
Development

Successfully merging a pull request may close this issue.

5 participants