Skip to content
Newer
Older
100644 253 lines (191 sloc) 11.5 KB
015b879 Added the source files for all articles so far.
YorickPeterse authored
1 Almost any application will eventually need to store a collection of passwords or another
2 type of data that has to be stored using a hashing algorithm. Blogs, forums, issue
3 trackers, they all need to store user data and these passwords. This article covers the
4 common mistakes made when dealing with passwords and what you should use instead. In order
5 to fully understand this article some basic knowledge of programming and computers is
6 required, you should also know a bit about the common hashing algorithms such as MD5 and
7 SHA1.
8
9 ## The Problem
10
11 When developing applications developers make the common mistake of thinking they have a
12 solid understanding of how hashing works. They think that by doing X they're done and
13 perfectly safe. Guess what, that's not the case (not even close). The following mistakes
14 are the most common:
15
16 * Using a broken algorithm (MD5, SHA1)
17 * Hashing a password N times in the form of hash( hash(password) ) * N
18 * Limiting the length of passwords to N characters
19
20 We'll start with the first problem. Up until a few years ago MD5 was the most common
21 hashing algorithm used for passwords (and other data as well). MD5 was considered to be
22 pretty safe until a group of people managed to prove how weak it really was: they were
23 able to generate a set of collisions in a relatively short amount of time (a few hours or
24 so). This set off a chain reaction and many more flaws were found.
25
26 Luckily MD5 isn't the only hashing algorithm out there, there's SHA1 and the SHA2 family
27 as well as a few other ones. SHA1-SHA2 are much strong than MD5 and at the time of writing
28 (April 2011) only SHA1 has been compromised. Technically it would take serious amount of
29 time to crack SHA1 but the idea of using an algorithm that *can* be cracked before
30 humanity is wiped out should be enough for people to not use it for privacy related data.
31
32 So why are collisions bad? Can't we just use a very very long password or use method X
33 (insert your favorite counter measure)? Yes, you can. The problem however isn't fixed,
34 you're merely making the process slower rather than fixing the actual root of the problem.
35 Time for an example. Assuming we have a hashing function called "hash" and two strings,
36 A and B (where A and B are unique), our hashing process of these strings would look like
37 the following:
38
39 pwd1 = hash(A)
40 pwd2 = hash(B)
41
42 In this case both pwd1 and pwd2 are unique. At this point a lot of people think they're
43 good to go as they assume nobody is willing to wait for a certain period of time before
44 they're able to crack the password, this is a *very* stupid mistake. While trying to
45 crack a password (by bruteforcing it for example) may take a long time on a single computer
46 most hackers can easily boot up a few servers or even worse, use a botnet. All known
47 hashing algorithms (except BCrypt, more on that later) are affected by a single common
48 problem: [Moore's law][moore's law]. Moore's law states that every two years the amount
49 of transistors that can be put in a computer doubles. This means that the faster computers
50 get the quicker they're able to crack a password. A hacker merely has to use N computers
51 and the time required to retrieve the original password will be greatly reduced.
52
53 Because of this problem developers try to come up with solutions. These solutions don't
54 actually solve the problem, they just make it harder and require more time. A common "fix"
55 is to hash a password N times and then save it in the database. Developers do this for
56 a few reasons:
57
58 * It's supposed to be slower
59 * In order to retrieve the original password a hacker has to crack multiple
60 hashes instead of only one.
61
62 The fun thing is that this entire process doesn't actually make the password more secure,
63 in a lot of cases it will even make it *less* secure. The first reason is pretty easy to
64 bust: simply add more hardware (or better hardware) and you're good to go. The second
65 reason is a bit harder to bust as it depends on the algorithm that is used. If we look
66 back at our hash() function the process of hashing a hash multiple times would look like
67 the following:
68
69 hash = hash( hash(hash(A)) )
70
71 In this example there are 3 calls to the hashing function. If A was "yorick" this would
72 look a bit like the following:
73
74 hash(yorick) -> j238103
75 hash(j238103) -> a9shda9
76 hash(a9shda9) -> 11s08j1
77
78 In this case "11s08j1" is the final hash that will be stored in our database. At this
79 point developers usually lay down their work and take a coffee or a tea thinking they've
80 done a good job and are hacker proof. Guess what, they're not. What just happened is that
81 the process of hashing A multiple times actually increased the possibility of a hash
82 collision. While we do have to crack the hashing process N times for each call
83 to hash() we don't actually have to start at the very end (with "11s08j1"). The reason for
84 this is that "11s08j1" isn't directly based on "yorick" but on "a9shda9". This means that
85 we merely have to find the hash that results in "11s08j1" when using our hash function.
86 If we find a collision we can simply crack it again and we'd end up with our
87 original password.
88
89 In order to explain this properly I simplified the process of hashing A N times:
90
91 password --> hash 1 --> hash 2 --> final hash
92
93 In order to retrieve the original password ("password") we'd have to find a collision for
94 "hash 2". We can't use hash 1 as it's source ("password") can be considered totally random
95 and would take more time. However, the source of hash 2 is much easier due one big issue:
96 the entropy (the amount of possible combinations) of the password has been decreased. If
97 we look back at the previous example we know the final hash is "11s08j1" and that the
98 original password is "yorick". Using various techniques (rainbow tables, bruteforcing, etc)
99 we can quickly identify the source of "final hash". The value of "hash 2" is "a9shda9",
100 while in our example this looks more random (it is) than the original password common
101 hashing algorithms only use regular characters (letters and numbers) for their output. A
102 good example of this is the following Ruby example:
103
104 require 'digest'
105
106 password = 'as9(A*&SD&(@))'
107 hash = Digest::SHA1.new.hexdigest(password)
108
109 p hash # => "d4c36f9b1f003bee2e5dcafdf6b006110709dfb5"
110
111 The hash of the password (which is just something I randomly typed on my keyboard) may be
112 longer but it only uses letters and numbers opposed to all the gibberish in the original
113 password. The same happens with our hash() function and this allows us to quickly retrieve
114 the original password. If we have the original hash of "final hash" we can then simply
115 continue reversing the process until we end up at "yorick".
116
117 The reason why you can't initially find the source of "hash 2" is because you can't find
118 out what "hash 1" is because it's not stored somewhere while "final hash" is.
119
120 To cut a long story short, hashing a hash N times doesn't make your passwords more secure
121 and can actually make it less secure as a hacker can quite easily reverse the process by
122 generating hash collisions.
123
124 ## The Solution
125
126 It has already been mentioned before but the solution is to use an algorithm called
127 "BCrypt". BCrypt is a hashing algorithm based on [Blowfish][blowfish] with a small twist:
128 it keeps up with Moore's law. The idea of BCrypt is quite simple, don't just use regular
129 characters (and thus increasing the entropy) and make sure password X always takes the
130 same amount of time regardless of how powerful the hardware is that's used to generate X.
131 I'm not going to cover all the technical details but basically BCrypt requires you to
132 specify a cost/workfactor in order to generate a password. This workfactor not only makes
133 the entire process slower but is also used to generate the end hash. This means that if
134 somebody were to change the workfactor the hash would also be different. In other words,
135 hackers, you're fucked. In order for a hacker to gain the original password he must use
136 the same workfactor and thus has to wait N times longer than when not using a workfactor.
137
138 Time for an example in Ruby:
139
140 require 'benchmark'
141 require 'bcrypt'
142
143 password = 'yorick'
144 amount = 100
145
146 Benchmark.bmbm(20) do |run|
147
148 run.report("Cost of 5") do
149 amount.times do
150 hash = BCrypt::Password.create(password, :cost => 5)
151 end
152 end
153
154 run.report("Cost of 10") do
155 amount.times do
156 hash = BCrypt::Password.create(password, :cost => 10)
157 end
158 end
159
160 run.report("Cost of 15") do
161 amount.times do
162 hash = BCrypt::Password.create(password, :cost => 15)
163 end
164 end
165
166 end
167
168 For the non Ruby people, this is a simple benchmark script that shows the time it takes
169 to hash "yorick" with BCrypt with a cost/workfactor of 5, 10 and 15 a total of 100 times.
170 The results of this benchmark would look like the following:
171
172 Rehearsal -------------------------------------------------------
173 Cost of 5 0.250000 0.000000 0.250000 ( 0.249723)
174 Cost of 10 7.740000 0.010000 7.750000 ( 7.879849)
175 Cost of 15 247.510000 0.460000 247.970000 (255.346897)
176 -------------------------------------------- total: 255.970000sec
177
178 user system total real
179 Cost of 5 0.250000 0.000000 0.250000 ( 0.272549)
180 Cost of 10 7.750000 0.030000 7.780000 ( 8.442511)
181 Cost of 15 247.530000 0.480000 248.010000 (254.815985)
182
183 The column we're really interested in is the "real" column. As you can see a cost of 5
184 only takes about 250 miliseconds while a cost of 15 takes a whopping 250 seconds (around
185 4 minutes).
186
187 To cut another long story short: BCrypt adopts to Moore's law and makes it impossible for
188 a hacker to crack a password using rainbow tables or other techniques.
189
190 ## Implementations
191
192 The BCrypt hashing algorithm is implemented in quite a few languages. I've collected a
193 list of resources for various languages so you can start using BCrypt right away.
194
195 ### PHP
196
197 PHP allows you to use BCrypt passwords using the [crypt()][php crypt] function. This works
198 as following:
199
200 <?php
201
202 $hash = crypt('rasmuslerdorf', '$2a$07$usesomesillystringforsalt$');
203
204 ### Ruby
205
206 For Ruby there's a gem called "bcrypt-ruby" which can be installed using Rubygems:
207
208 $ gem install bcrypt-ruby
209
210 Once installed you can use it as following:
211
212 require 'bcrypt'
213
214 hash = BCrypt::Password.create('yorick', :cost => 10)
215
216 ### Perl
217
218 For Perl there's [Crypt::Eksblowfish][perl bcrypt] which works as following:
219
220 use Crypt::Eksblowfish::Bcrypt qw(bcrypt_hash);
221
222 $salt = '1p23j1-9381-23';
223 $password = 'yorick';
224 $hash = bcrypt_hash({
225 key_nul => 1,
226 cost => 10,
227 salt => $salt,
228 }, $password);
229
230 ### Others
231
232 * Python has [The Python Cryptography Toolkit][pycrypto]
233 * Lua seems to have [this][lua bcrypt] implementation
234 * There's an [Erlang implementation][erlang bcrypt] as well
235
236 ## Special Thanks
237
238 I'd like to thank the following IRC folks for helping me out (all of them can be found
239 on Freenode):
240
241 * squeeks from \#forrst-chat
242 * amr from \#forrst-chat
243 * dominikh from \#ramaze
244
245 [sha wikipedia]: http://en.wikipedia.org/wiki/SHA-1
246 [moore's law]: https://secure.wikimedia.org/wikipedia/en/wiki/Moore's_law
247 [blowfish]: http://en.wikipedia.org/wiki/Blowfish_(cipher)
248 [php crypt]: http://nl3.php.net/manual/en/function.crypt.php
249 [perl bcrypt]: http://search.cpan.org/dist/Crypt-Eksblowfish/
250 [pycrypto]: https://github.com/dlitz/pycrypto
251 [lua bcrypt]: https://github.com/silentbicycle/lua-bcrypt
252 [erlang bcrypt]: https://github.com/skarab/erlang-bcrypt
Something went wrong with that request. Please try again.