Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPII Human Pose Dataset #106

Closed
kristinagrig06 opened this issue Oct 19, 2020 · 21 comments · Fixed by #168
Closed

MPII Human Pose Dataset #106

kristinagrig06 opened this issue Oct 19, 2020 · 21 comments · Fixed by #168
Assignees

Comments

@kristinagrig06
Copy link
Contributor

Describe the dataset

Add MPII Human Pose Dataset dataset to Hub. So this would work.

import hub
ds = hub.load("username/mpii-human-pose-dataset")

Steps

  1. Please take a look at the docs on uploading datasets.

  2. Uploading script should be added to examples folder

Example

You can find an example of large dataset loading and upload here:

@sanchitvj
Copy link
Contributor

@kristinagrig06 I would like to work on this issue, please assign me.

@mikayelh
Copy link
Collaborator

Hi @sanchitvj ! Assigned you to this issue. Thanks for your willingness to contribute! Let me know if you have any questions! :)

@mikayelh
Copy link
Collaborator

Hi, @sanchitvj ! Hope this finds you well. Dropping a note to check in on you an ask if you need a hand with uploading the dataset. Feel free to ask us in the GitHub Discussions (we have beta access!) or our dedicated Slack channel. Thanks a mil!

@sanchitvj
Copy link
Contributor

sanchitvj commented Oct 21, 2020 via email

@AbhinavTuli
Copy link
Contributor

That's great @sanchitvj. Here's a tutorial for uploading datasets using Hub that might be helpful for you!

@sanchitvj
Copy link
Contributor

I've one query, do I need to know the codebase of hub.

@AbhinavTuli
Copy link
Contributor

@sanchitvj No, it's not required but feel free to take a look if you ever want to understand how something is working under the hood!

@sanchitvj
Copy link
Contributor

sanchitvj commented Oct 22, 2020

@AbhinavTuli is there any example available on how to use the hub for loading dataset, visualize data(like what is present in the data), and training(using TensorFlow). The dataset I'm working on is challenging to use, process, and train.

@AbhinavTuli
Copy link
Contributor

@sanchitvj did you take a look at the tutorial mentioned above? It has links to a couple of examples that would be helpful.
Here's an example that includes training as well, https://github.com/activeloopai/Hub/tree/master/examples/fashion-mnist.
Let me know if you have any particular doubts. I'd be happy to help.

@sanchitvj
Copy link
Contributor

sanchitvj commented Oct 23, 2020

@AbhinavTuli Can I know what CocoGenerator class is doing? I'm facing difficulties understanding that. How the output of that class looks like. And in the COCO upload example, it's not clear because I can't see what are the outputs. I've done most of the part just want to deal with this issue of the generator. COCO upload example isn't much useful because mpii annotations is not the same as COCO. So can you guide me on how to write a generator function for this purpose and what all code files from the hub collections should I understand to get the basic idea to come over this issue?

@AbhinavTuli
Copy link
Contributor

@sanchitvj sorry for getting back to you so late, somehow missed this.
The purpose of the generator class is to take a single item from a list and return a dictionary of numpy arrays. The dictionary will contain separate keys corresponding to each feature of the dataset(i.e. for images and for all the different annotations in MPII). You don't really need to go too much into how hub collections work for this.
Did you get a chance to go through the tutorial :- https://github.com/activeloopai/Hub/discussions/125?
Also, take a look at this example :-https://github.com/activeloopai/omdena-aerial/blob/master/store_omdena.py, it's a little easier to understand than the COCO example.
If it's still not clear, do join our dedicated Slack channel and we can set up a call to discuss in detail.

@sanchitvj
Copy link
Contributor

@AbhinavTuli I'm almost done. But how can I see that output is as expected? Here is my code. When I'm trying to print, this: '<hub.collections.dataset.core.Dataset object at 0x7f55ae0aac50>' is the output. So how can I check it's working correctly?

@AbhinavTuli
Copy link
Contributor

Hey @sanchitvj, you can test out the code by using ds.store("./mpii"), this will store the dataset locally instead of uploading it to hub and should be much faster.
You can then load this saved dataset and try iterating over it

import hub
ds = hub.load("./mpii")
for item in ds:
    print(item["data"].compute())
    print(item["labels"].compute())

Just replace the keys ("data" and "labels") with your actual ones. Let me know how it goes!

@sanchitvj
Copy link
Contributor

@AbhinavTuli This is the error coming after I'm doing ds.store(). Can you help me with it?

`0
Traceback (most recent call last):
File "", line 45, in call
ds["image"][i] = np.array(Image.open(img_path + all[i]['img_paths']))
KeyError: 0
Stack (most recent call last):
File "/usr/lib/python3.6/threading.py", line 884, in _bootstrap
self._bootstrap_inner()
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.6/dist-packages/distributed/threadpoolexecutor.py", line 55, in _worker
task.run()
File "/usr/local/lib/python3.6/dist-packages/distributed/_concurrent_futures_thread.py", line 65, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/local/lib/python3.6/dist-packages/distributed/worker.py", line 3411, in apply_function
result = function(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/distributed/worker.py", line 3304, in execute_task
return func(*map(execute_task, args))
File "/usr/local/lib/python3.6/dist-packages/hub/collections/dataset/init.py", line 13, in _generate
output = generator(input)
File "", line 65, in call
logger.error(e, exc_info=e, stack_info=True)
distributed.worker - WARNING - Compute Failed
Function: execute_task
args: ((<function generate at 0x7fdbcd107f28>, <main.MPIIGenerator object at 0x7fdb0e5c02e8>, (<class 'dict'>, [['dataset', 'MPI'], ['isValidation', 0.0], ['img_paths', '003353243.jpg'], ['img_width', 1280.0], ['img_height', 720.0], ['objpos', [984.0, 97.0]], ['joint_self', [[991.0, 109.0, 0.0], [972.0, 101.0, 0.0], [1040.0, 47.0, 1.0], [1071.0, 116.0, 1.0], [999.0, 222.0, 1.0], [1033.0, 248.0, 0.0], [1056.0, 82.0, 1.0], [942.0, 96.0, 1.0], [937.583, 95.954, 1.0], [851.417, 95.046, 1.0], [962.0, 39.0, 0.0], [0.0, 0.0, 0.0], [926.0, 52.0, 1.0], [957.0, 139.0, 1.0], [980.0, 211.0, 1.0], [926.0, 257.0, 1.0]]], ['scale_provided', 2.585], ['joint_others', [[672.0, 231.0, 1.0], [677.0, 151.0, 1.0], [672.0, 12.0, 1.0], [745.0, 89.0, 0.0], [757.0, 127.0, 1.0], [651.0, 65.0, 0.0], [709.0, 51.0, 0.0], [800.0, 67.0, 0.0], [780.16, 67.863, 1.0], [865.84, 64.137, 1.0], [707.0, 94.0, 1.0], [673.0, 22.0, 1.0], [763.0, 71.0, 1.0], [837.0, 62.0, 0.0], [814.0, 140.0, 1.0], [790.0, 220.0, 1.0]]], ['scale

kwargs: {}
Exception: AttributeError("'NoneType' object has no attribute 'keys'",)`

@AbhinavTuli
Copy link
Contributor

I would probably need to look at the code to help you out but seems like an issue in implementing the call function

@sanchitvj
Copy link
Contributor

sanchitvj commented Oct 27, 2020

@AbhinavTuli Here is the code. And how much time do you think it will take to store this 13 GB data.

@sanchitvj
Copy link
Contributor

@AbhinavTuli @kristinagrig06 @davidbuniat dataset is uploaded, It's visible on the app and I've loaded it and used it. Working fine, so can I send the PR now with an example code.

sanchitvj added a commit to sanchitvj/Hub that referenced this issue Nov 1, 2020
@sanchitvj
Copy link
Contributor

@AbhinavTuli I've sent PR but one of the checks is failing, can you help me understand it.

sanchitvj added a commit to sanchitvj/Hub that referenced this issue Nov 1, 2020
@davidbuniat davidbuniat linked a pull request Nov 1, 2020 that will close this issue
2 tasks
@davidbuniat
Copy link
Member

@sanchitvj there is a linting error with Black, if you can fix it then happy to merge! thanks for making the dataset!

sanchitvj added a commit to sanchitvj/Hub that referenced this issue Nov 1, 2020
sanchitvj added a commit to sanchitvj/Hub that referenced this issue Nov 1, 2020
sanchitvj added a commit to sanchitvj/Hub that referenced this issue Nov 1, 2020
sanchitvj added a commit to sanchitvj/Hub that referenced this issue Nov 1, 2020
sanchitvj added a commit to sanchitvj/Hub that referenced this issue Nov 1, 2020
sanchitvj added a commit to sanchitvj/Hub that referenced this issue Nov 1, 2020
sanchitvj added a commit to sanchitvj/Hub that referenced this issue Nov 1, 2020
@sanchitvj
Copy link
Contributor

@davidbuniat All build checks passed.

@davidbuniat
Copy link
Member

@sanchitvj awesome! once we check the dataset is working will merge the PR! Thanks for the awesome job!

sanchitvj added a commit to sanchitvj/Hub that referenced this issue Nov 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

5 participants