Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add lookup table operator #1251

Merged
merged 4 commits into from
Sep 17, 2019
Merged

Conversation

jantonguirao
Copy link
Contributor

@jantonguirao jantonguirao commented Sep 12, 2019

Signed-off-by: Joaquin Anton janton@nvidia.com

Why we need this PR?

Need a lookup operator, e.g. lookup weights for label values

What happened in this PR?

  • Introduced LookupTable CPU and GPU implementations
  • Added python tests to cover the new operator

JIRA TASK: [DALI-1050]

@jantonguirao jantonguirao force-pushed the lookup_table_op branch 3 times, most recently from efe97d3 to 9003501 Compare September 13, 2019 11:48
@jantonguirao jantonguirao changed the title [WIP] Add lookup table operator Add lookup table operator Sep 13, 2019
@jantonguirao jantonguirao force-pushed the lookup_table_op branch 2 times, most recently from d9ecef5 to 052f67c Compare September 13, 2019 11:58
@jantonguirao
Copy link
Contributor Author

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [899100]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [899100]: BUILD PASSED

dali/pipeline/data/tensor.h Outdated Show resolved Hide resolved
dali/pipeline/operators/util/lookup_table.h Outdated Show resolved Hide resolved
dali/pipeline/operators/util/lookup_table.h Outdated Show resolved Hide resolved
dali/pipeline/operators/util/lookup_table.h Outdated Show resolved Hide resolved
dali/pipeline/operators/util/lookup_table.h Outdated Show resolved Hide resolved
dali/pipeline/operators/util/lookup_table.cc Show resolved Hide resolved
dali/pipeline/operators/util/lookup_table.cc Outdated Show resolved Hide resolved
@klecki
Copy link
Contributor

klecki commented Sep 16, 2019

Btw, did you forget to add python tests?

Signed-off-by: Joaquin Anton <janton@nvidia.com>
dali/pipeline/data/tensor.h Outdated Show resolved Hide resolved
dali/pipeline/operators/util/lookup_table.cc Outdated Show resolved Hide resolved
dali/pipeline/operators/util/lookup_table.cc Show resolved Hide resolved
dali/pipeline/operators/util/lookup_table.cc Outdated Show resolved Hide resolved
dali/pipeline/operators/util/lookup_table.cc Outdated Show resolved Hide resolved
dali/pipeline/operators/util/lookup_table.h Outdated Show resolved Hide resolved
@jantonguirao
Copy link
Contributor Author

Btw, did you forget to add python tests?

Yes, and the GPU implementation. I pushed it now

Signed-off-by: Joaquin Anton <janton@nvidia.com>
@jantonguirao
Copy link
Contributor Author

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [902153]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [902153]: BUILD PASSED

Signed-off-by: Joaquin Anton <janton@nvidia.com>
Signed-off-by: Joaquin Anton <janton@nvidia.com>
@jantonguirao
Copy link
Contributor Author

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [903729]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [903729]: BUILD PASSED

@jantonguirao jantonguirao merged commit 9feeeb7 into NVIDIA:master Sep 17, 2019
00liujj pushed a commit to 00liujj/DALI that referenced this pull request Oct 10, 2019
Signed-off-by: Joaquin Anton <janton@nvidia.com>
Signed-off-by: Jianjun Liu <00liujj@163.com>
@LiuHao-THU
Copy link

ops, only int supported for input.

@JanuszL
Copy link
Contributor

JanuszL commented Feb 10, 2020

@LiuHao-THU - what is your use case and how you want it to work?

@LiuHao-THU
Copy link

LiuHao-THU commented Feb 10, 2020

@LiuHao-THU - what is your use case and how you want it to work?
Thank for the reply, here is my situation:

I'm Reimplement bayesian personalized ranking(a recommendation algorithm), the network is very small, however, I have over 200000 users in my datasets. Now, I'm using PyTorch data_loader,
the code is here. self.cand is a dictionary(lookup-table in my datasets, user as key, rating as value), however, after I set num works in data loader to a very large value, the CPU usage is very high, the network still runs in low GPU usage. so I want to load the datasets to GPU for preprocessing, however, the lookuptable range from [0, 65535].

"""

"""
class BPRData(data.Dataset):

def __init__(self, users, items, candidates, num_items, is_training = True):
	super(BPRData, self).__init__()
	self.users = users
	self.items = items
	self.is_training = is_training
	self.cand = candidates
	self.all = set([i for i in range(num_items)])
	self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

def __len__(self):
	return len(self.users)

def __getitem__(self, idx):
	if self.is_training == True:
		neg_items = list(self.all - set(self.cand[int(self.users[idx])]))
		indices = random.randint(0, len(neg_items) - 1)

	user = self.users[idx]
	item_i = self.items[idx]
	item_j = neg_items[indices] if \
			self.is_training else self.items[idx]

	return user, item_i, item_j

"""

@JanuszL
Copy link
Contributor

JanuszL commented Feb 10, 2020

@LiuHao-THU - the lookup was rather designed for signal processing, than for tabular data. Have you tried to check RAPIDS?
If you want to can also extend LookupTable operator to support a wider range of indexes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants