Skip to content

Commit

Permalink
disable dbm trick
Browse files Browse the repository at this point in the history
  • Loading branch information
junchenfeng committed Aug 2, 2015
1 parent e3f5351 commit 2bdff23
Show file tree
Hide file tree
Showing 3 changed files with 13 additions and 15 deletions.
17 changes: 5 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,18 +39,6 @@ when try to calibrate item parameters for online testing bank, such assumption
breaks down and the algorithm runs into sparse data problem, as well as severe
missing data problem.

## "Big" Data
EM algorithm requires two essential dictionaries for analysis routine. One maps
item to user and the other maps user to item. Python dictionary is not memory
efficient so pyirt uses hard disk dbm instead. The limit of data size is
about 1/4 of the hard drive size. I doubt any dataset will be that large.

The performance will suffer greatly by using the dbm. For a 10 million record
dataset, the loading time increases by about 5 times and the computation time
increase by about 3 times. Putting the temp folder in memory does not reduces
the time by 10%.

User be aware when invoke the 'dbm' mode.

## Missing Data

Expand Down Expand Up @@ -143,6 +131,11 @@ VII.ToDos
(3) The solver cannot handle group constraints.


## BIG DATA
bdm is a work around when the data are too much for memory. However,berkeley db
is quite hard to install on operating system. Therefore, although in utl module
there are code snips for dbm trick. It is not standard shipping.



VIII.Acknowledgement
Expand Down
1 change: 0 additions & 1 deletion pyirt/solver/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,6 @@

from ..utl import clib, tools, loader
from ..solver import optimizer
# import cython


class IRT_MMLE_2PL(object):
Expand Down
10 changes: 8 additions & 2 deletions pyirt/utl/loader.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,15 @@
import os
import subprocess

'''
# bsddb3 is hard to install
#
# example to install bsddb3 on OSX
# brew install Berkeley-db
# YES_I_HAVE_THE_RIGHT_TO_USE_THIS_BERKELEY_DB_VERSION=TRUE BERKELEYDB_DIR=/usr/local/Cellar/berkeley-db/6.1.19 pip install bsddb3
# YES_I_HAVE_THE_RIGHT_TO_USE_THIS_BERKELEY_DB_VERSION=TRUE
# BERKELEYDB_DIR=/usr/local/Cellar/berkeley-db/6.1.19
# pip install bsddb3
try:
import bsddb as diskdb
except:
Expand All @@ -28,6 +32,7 @@
# Compact with bsddb3
if hasattr(diskdb, "hashopen"):
diskdb.open = diskdb.hashopen
'''

import collections as cos

Expand Down Expand Up @@ -224,7 +229,7 @@ def _process_data_dbm(self, uids, eids, atags):

self.item2user['%d' % eid] += '%d,%d;' % (uid, atag)
self.user2item['%d' % uid] += '%d,%d;' % (eid, atag)

'''
def _init_right_wrong_map_bdm(self):
os.system("rm -f %" % self.tmp_dir + '/right_map.db')
os.system("rm -f %" % self.tmp_dir + '/wrong_map.db')
Expand All @@ -248,6 +253,7 @@ def _init_right_wrong_map_bdm(self):
self.wrong_map[eidstr] = '%d' % uid_idx
else:
self.wrong_map[eidstr] += ',%d' % uid_idx
'''

def _init_right_wrong_map_memory(self):
self.right_map = cos.defaultdict(list)
Expand Down

0 comments on commit 2bdff23

Please sign in to comment.