zohmg / zohmg

Zohmg is a data store for aggregation of multi-dimensional time series data, built on top of Hadoop, Dumbo and HBase.

This URL has Read+Write access

avtobiff (author)
Wed Aug 19 06:43:29 -0700 2009
commit  e8b962370e879d975969e15c8f100ada1af9ab8b
tree    69d62d2202ed5ec981c1ff85a94e5ba2dc16d350
parent  559df131b6c866f3e1bea72bab95520999bf20bf
zohmg / FAQ
100644 116 lines (68 sloc) 4.131 kb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
 
+ Can I use LZO compression?
 
Sure! Just add the --lzo flag to your import like so:
$> zohmg import mapper.py data/ --lzo
 
 
+ Can I pass any argument to Dumbo and/or Hadoop Streaming?
 
Sure! Just append the arguments of your choice like so:
$> zohmg import mapper.py data/ -hadoopconf someoption=yeah
 
The Dumbo wiki has more information on what options other than
hadoopconf are supported: http://wiki.github.com/klbostee/dumbo/running-programs
 
 
+ My mapper raises an ImportError. What's wrong?
 
The mapper fails to import a module. Make sure you put the eggs for
all non-standard python modules that your mapper use in the lib
directory of your application.
 
 
+ What does this exception mean and how do I fix it?
 
java.lang.NullPointerException
  at org.apache.hadoop.io.BytesWritable.(BytesWritable.java:50)
  at org.apache.hadoop.typedbytes.TypedBytesWritable.(TypedBytesWritable.java:41)
  at org.apache.hadoop.streaming.io.TypedBytesOutputReader.getLastOutput(TypedBytesOutputReader.java:73)
<snip>
 
Check the stdout and stderr in the Hadoop webui for more information. It could be:
* That your mapper outputs all the dimensions you have specified in your dataset.yaml file.
* That there's a python library missing, check the DEPENDENCIES file for more information.
 
 
+ What does this error mean?
TypeError: unsupported operand type(s) for +: 'int' and 'str'
 
The units you output have to be of type int. In this case it's a string instead,
you can typecast it like so: int(variable)
 
 
+ What does this error mean?
TypeError: sequence item 0: expected string, int found
 
The dimension attributes must be strings. In this case it's an int, so
you would have to typecast it like so: str(variable).
 
 
+ How can I speed up my mapper?
 
If you have the native libs built, enabling compression of map output may speed things up.
Set the flag like so: zohmg import <mapper> <input> -jobconf mapred.compress.map.output=true
 
 
+ How do I bundle python modules that my code needs?
 
Make an egg out of the module with easy_install and put that egg in the lib directory:
 
 $> easy_install -zmaxd . yrfavoritemodule
 $> cp yrfavoritemodule-0.1.egg ~/zohmg/lib/egg
 
 
+ Why am I getting 'OSError: Permission denied' when running an import?
 
It could be that you do not have write permissions to
'lib/usermapper.py'. Make sure the user you're running the job as has
sufficient permissions.
 
 
+ I get this error when installing:
build/temp.linux-i686-2.5/check_libyaml.c:2:18: error: yaml.h: No such file or directory
etc. Should I worry?
 
There's likely no reason to worry. PyYaml emits this error when it
tries (and fails) to build a C version of the yaml code. If you can
'import yaml' at the python interpreter you're golden.
 
 
+ I get this error when installing:
zipimport.ZipImportError: bad local file header in/usr/lib/python2.5/site-packages/zohmg-0.0.1-py2.5.egg
What's wrong?
 
That happens for no good reason when re-installing zohmg. Running the
installation again always seems to fix it (in fact, install.py already
tried a second time and is likely to have suceeded).
 
 
+ I get this error when importing: ImportError: cannot import name map
 
It seems like your mapper python file does not define a map() function.
 
 
+ My imports runs just fine locally but seem to do the wrong thing when I run
them on the cluster. What's up?
 
Make sure you don't have any old eggs laying around on the cluster node. Zohmg
ships everything it needs, so there should be no need to have the zohmg egg, for
example, installed on the nodes.
 
+ I see this on Hadoop 0.20 and HBase 0.20/trunk:
2009-06-16 14:36:40,464 WARN org.apache.hadoop.mapred.TaskTracker: Error running child
java.lang.NoClassDefFoundError: org/apache/zookeeper/Watcher
 
That's funny, it looks as if the Zookeeper jar is not on the classpath. Make sure it is.
 
 
+ I see this on Hadoop 0.20 and HBase 0.20/trunk:
2009-06-16 14:48:26,495 FATAL org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Fail to read properties from zoo.cfg
java.io.IOException: zoo.cfg not found
 
You need to have zoo.cfg in Hadoop's conf/ directory.
$> cp -v $HBASE_HOME/conf/zoo.cfg $HADOOP_HOME/conf