-
Notifications
You must be signed in to change notification settings - Fork 0
/
Discussion.txt
145 lines (124 loc) · 2.73 KB
/
Discussion.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
I tried thinking of a way to rank how different two strings are.
The simplest would be to iterate both and count how many characters are different:
abcze
abcde
|
1
but that'll lead to erroneously large ranks if a character is accidently inserted:
azbcdef
abcdef
||||||
123456
So I thought, I could try matching as many characters in order as possible
azbcdef
|/////
abcdef
mismatch:1 (z)
I was working out a rather complicated algorithm for this, and it could have worked, but I was having difficultly with deciding how to rank the difference. Number of mismatching characters? what if both strings contain mismatched characters? Should I prioritize one? That doesn't seem right.
But then I discovered the minimum edit distance algorithm. It works perfectly. It counts how many insertions, removals, or character swaps are needed to convert one string to another. It even works both ways because the same number of edits are needed to convert from string a to string b as is needed to convert from string b to string a, so I don't need to mark one of the stings as priority or master or anything like that.
It was also a lot simpler to program than I thought. Much simpler than the solution I was going to do.
And of course, I sent this file through the program too.
java SpellCheck words.txt Discussion.txt
itereate: did you mean:
1. iterate
2. overeate
3. mitergate
4. literate
5. terebate
6. biternate
7. iterates
8. iterated
9. terete
10. increate
11. itereate (no change)
0. something else
1
abcze: did you mean:
1. abaze
2. abc
3. ace
4. amaze
5. amuze
6. ancle
7. ancre
8. arcae
9. arche
10. abave
11. abcze (no change)
0. something else
11
abcde: did you mean:
1. abede
2. abide
3. abode
4. abrade
5. abd
6. abc
7. ace
8. ade
9. bcd
10. bde
11. abcde (no change)
0. something else
11
aroniously: did you mean:
1. ironiously
2. harmoniously
3. wrongously
4. atrociously
5. aroniously (no change)
0. something else
0
somehing else:erroneously
azbcdef: did you mean:
1. azbcdef (no change)
0. something else
1
abcdef: did you mean:
1. abided
2. abider
3. abides
4. abede
5. abide
6. abode
7. aboded
8. abodes
9. abcdef (no change)
0. something else
9
azbcdef: did you mean:
1. azbcdef (no change)
0. something else
1
abcdef: did you mean:
1. abided
2. abider
3. abides
4. abede
5. abide
6. abode
7. aboded
8. abodes
9. abcdef (no change)
0. something else
9
dificulty: did you mean:
1. difficulty
2. difficultly
3. difficult
4. dificulty (no change)
0. something else
2
missmatching: did you mean:
1. mismatching
2. mispatching
3. missmatching (no change)
0. something else
1
missmatched: did you mean:
1. mismatched
2. mismatches
3. mispatched
4. missmatched (no change)
0. something else
1