Permalink
Browse files

first commit

  • Loading branch information...
Patrick Wilson-Welsh
Patrick Wilson-Welsh committed Mar 2, 2011
0 parents commit cead14b6446bdfdf56cbf38a0ff8457cc9885801
Showing 401 changed files with 16,229 additions and 0 deletions.
@@ -0,0 +1,10 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<classpath>
+ <classpathentry kind="src" path="src"/>
+ <classpathentry kind="src" path="test"/>
+ <classpathentry kind="con" path="org.eclipse.jdt.launching.JRE_CONTAINER"/>
+ <classpathentry kind="con" path="org.eclipse.jdt.junit.JUNIT_CONTAINER/4"/>
+ <classpathentry kind="lib" path="lib/cyvis.jar"/>
+ <classpathentry kind="lib" path="lib/asm-all-2.1.jar"/>
+ <classpathentry kind="output" path="bin"/>
+</classpath>
@@ -0,0 +1,17 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<projectDescription>
+ <name>Refactor_BankOCR_Whack_A_Method</name>
+ <comment></comment>
+ <projects>
+ </projects>
+ <buildSpec>
+ <buildCommand>
+ <name>org.eclipse.jdt.core.javabuilder</name>
+ <arguments>
+ </arguments>
+ </buildCommand>
+ </buildSpec>
+ <natures>
+ <nature>org.eclipse.jdt.core.javanature</nature>
+ </natures>
+</projectDescription>
@@ -0,0 +1,217 @@
+You can find the original description of this kata here: http://codingdojo.org/cgi-bin/wiki.pl?KataBankOCR
+
+The text as of December, 2010 is repeated here:
+
+-----------------------------------
+
+This Kata was presented at XP2006 by EmmanuelGaillot and ChristopheThibaut?.
+Problem Description
+
+User Story 1
+
+You work for a bank, which has recently purchased an ingenious machine to assist in reading letters and faxes sent in by branch offices. The machine scans the paper documents, and produces a file with a number of entries which each look like this:
+
+ _ _ _ _ _ _ _
+ | _| _||_||_ |_ ||_||_|
+ ||_ _| | _||_| ||_| _|
+
+Each entry is 4 lines long, and each line has 27 characters. The first 3 lines of each entry contain an account number written using pipes and underscores, and the fourth line is blank. Each account number should have 9 digits, all of which should be in the range 0-9. A normal file contains around 500 entries.
+
+Your first task is to write a program that can take this file and parse it into actual account numbers.
+
+User Story 2
+
+Having done that, you quickly realize that the ingenious machine is not in fact infallible. Sometimes it goes wrong in its scanning. The next step therefore is to validate that the numbers you read are in fact valid account numbers. A valid account number has a valid checksum. This can be calculated as follows:
+
+account number: 3 4 5 8 8 2 8 6 5
+position names: d9 d8 d7 d6 d5 d4 d3 d2 d1
+
+checksum calculation:
+(d1+ 2*d2 + 3*d3 + .. + 9*d9) mod 11 = 0
+So now you should also write some code that calculates the checksum for a given number, and identifies if it is a valid account number.
+
+User Story 3
+
+Your boss is keen to see your results. He asks you to write out a file of your findings, one for each input file, in this format:
+
+457508000
+664371495 ERR
+86110??36 ILL
+ie the file has one account number per row. If some characters are illegible, they are replaced by a ?. In the case of a wrong checksum, or illegible number, this is noted in a second column indicating status.
+
+User Story 4
+
+It turns out that often when a number comes back as ERR or ILL it is because the scanner has failed to pick up on one pipe or underscore for one of the figures. For example
+
+ _ _ _ _ _ _ _
+|_||_|| || ||_ | | ||_
+ | _||_||_||_| | | | _|
+The 9 could be an 8 if the scanner had missed one |. Or the 0 could be an 8. Or the 1 could be a 7. The 5 could be a 9 or 6. So your next task is to look at numbers that have come back as ERR or ILL, and try to guess what they should be, by adding or removing just one pipe or underscore. If there is only one possible number with a valid checksum, then use that. If there are several options, the status should be AMB. If you still can't work out what it should be, the status should be reported ILL.
+
+Clues
+
+I recommend finding a way to write out 3x3 cells on 3 lines in your code, so they form identifiable digits. Even if your code actually doesn't represent them like that internally. I'd much rather read
+
+" " +
+"|_|" +
+" |"
+than
+" |_| |"
+anyday.
+When Christophe and Emmanuel presented this Kata at XP2005 they worked on a solution that made extensive use of recursion rather than iteration. Many people are more comfortable with iteration than recursion. Try this kata both ways.
+
+Some gotchas to avoid:
+
+ - be very careful to read the definition of checksum correctly. It is not a simple dot product, the digits are reversed from what you expect.
+ - The spec does not list all the possible alternatives for valid digits when one pipe or underscore has been removed or added
+ - don't forget to try to work out what a ? should have been by adding or removing one pipe or underscore.
+Suggested Test Cases
+
+If you want to just copy and paste these test cases into your editor, I suggest first clicking "edit this page" so you can see the source. Then you can be sure to copy across all the whitespace necessary. Just don't save any changes by mistake.
+
+use case 1
+ _ _ _ _ _ _ _ _ _
+| || || || || || || || || |
+|_||_||_||_||_||_||_||_||_|
+
+=> 000000000
+
+ | | | | | | | | |
+ | | | | | | | | |
+
+=> 111111111
+ _ _ _ _ _ _ _ _ _
+ _| _| _| _| _| _| _| _| _|
+|_ |_ |_ |_ |_ |_ |_ |_ |_
+
+=> 222222222
+ _ _ _ _ _ _ _ _ _
+ _| _| _| _| _| _| _| _| _|
+ _| _| _| _| _| _| _| _| _|
+
+=> 333333333
+
+|_||_||_||_||_||_||_||_||_|
+ | | | | | | | | |
+
+=> 444444444
+ _ _ _ _ _ _ _ _ _
+|_ |_ |_ |_ |_ |_ |_ |_ |_
+ _| _| _| _| _| _| _| _| _|
+
+=> 555555555
+ _ _ _ _ _ _ _ _ _
+|_ |_ |_ |_ |_ |_ |_ |_ |_
+|_||_||_||_||_||_||_||_||_|
+
+=> 666666666
+ _ _ _ _ _ _ _ _ _
+ | | | | | | | | |
+ | | | | | | | | |
+
+=> 777777777
+ _ _ _ _ _ _ _ _ _
+|_||_||_||_||_||_||_||_||_|
+|_||_||_||_||_||_||_||_||_|
+
+=> 888888888
+ _ _ _ _ _ _ _ _ _
+|_||_||_||_||_||_||_||_||_|
+ _| _| _| _| _| _| _| _| _|
+
+=> 999999999
+ _ _ _ _ _ _ _
+ | _| _||_||_ |_ ||_||_|
+ ||_ _| | _||_| ||_| _|
+
+=> 123456789
+
+use case 2
+
+The checksum validation should work for the following test account numbers:
+
+123456789
+000000000
+345882865
+000000051
+
+
+use case 3
+ _ _ _ _ _ _ _ _
+| || || || || || || ||_ |
+|_||_||_||_||_||_||_| _| |
+
+=> 000000051
+ _ _ _ _ _ _ _
+|_||_|| || ||_ | | | _
+ | _||_||_||_| | | | _|
+
+=> 49006771? ILL
+ _ _ _ _ _ _ _
+ | _| _||_| _ |_ ||_||_|
+ ||_ _| | _||_| ||_| _
+
+=> 1234?678? ILL
+
+use case 4
+
+ | | | | | | | | |
+ | | | | | | | | |
+
+=> 711111111
+ _ _ _ _ _ _ _ _ _
+ | | | | | | | | |
+ | | | | | | | | |
+
+=> 777777177
+ _ _ _ _ _ _ _ _ _
+ _|| || || || || || || || |
+|_ |_||_||_||_||_||_||_||_|
+
+=> 200800000
+ _ _ _ _ _ _ _ _ _
+ _| _| _| _| _| _| _| _| _|
+ _| _| _| _| _| _| _| _| _|
+
+=> 333393333
+ _ _ _ _ _ _ _ _ _
+|_||_||_||_||_||_||_||_||_|
+|_||_||_||_||_||_||_||_||_|
+
+=> 888888888 AMB ['888886888', '888888880', '888888988']
+ _ _ _ _ _ _ _ _ _
+|_ |_ |_ |_ |_ |_ |_ |_ |_
+ _| _| _| _| _| _| _| _| _|
+
+=> 555555555 AMB ['555655555', '559555555']
+ _ _ _ _ _ _ _ _ _
+|_ |_ |_ |_ |_ |_ |_ |_ |_
+|_||_||_||_||_||_||_||_||_|
+
+=> 666666666 AMB ['666566666', '686666666']
+ _ _ _ _ _ _ _ _ _
+|_||_||_||_||_||_||_||_||_|
+ _| _| _| _| _| _| _| _| _|
+
+=> 999999999 AMB ['899999999', '993999999', '999959999']
+ _ _ _ _ _ _ _
+|_||_|| || ||_ | | ||_
+ | _||_||_||_| | | | _|
+
+=> 490067715 AMB ['490067115', '490067719', '490867715']
+ _ _ _ _ _ _ _
+ _| _| _||_||_ |_ ||_||_|
+ ||_ _| | _||_| ||_| _|
+
+=> 123456789
+ _ _ _ _ _ _ _
+| || || || || || || ||_ |
+|_||_||_||_||_||_||_| _| |
+
+=> 000000051
+ _ _ _ _ _ _ _
+|_||_|| ||_||_ | | | _
+ | _||_||_||_| | | | _|
+
+=> 490867715
+
Binary file not shown.
Binary file not shown.
Binary file not shown.
@@ -0,0 +1,81 @@
+package convert;
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.Map;
+
+public class AccountNumber {
+ public String textVersion = "";
+ public boolean isValid;
+ final static int WIDTH_OF_OCR_NUMERAL = 3;
+ final static int NUMBER_OF_DIGITS = 9;
+
+ Map<String, String> stickDigitsMappedToNumerals = new HashMap<String, String>();
+
+ public final static String ZERO = " _ | ||_|";
+ public final static String ONE = " | |";
+ public final static String TWO = " _ _||_ ";
+ public final static String THREE = " _ _| _|";
+ public final static String FOUR = " |_| |";
+ public final static String FIVE = " _ |_ _|";
+ public final static String SIX = " _ |_ |_|";
+ public final static String SEVEN = " _ | |";
+ public final static String EIGHT = " _ |_||_|";
+ public final static String NINE = " _ |_| _|";
+
+ public AccountNumber(String ocrAccountNumber) {
+ stickDigitsMappedToNumerals.put(ZERO, "0");
+ stickDigitsMappedToNumerals.put(ONE, "1");
+ stickDigitsMappedToNumerals.put(TWO, "2");
+ stickDigitsMappedToNumerals.put(THREE, "3");
+ stickDigitsMappedToNumerals.put(FOUR, "4");
+ stickDigitsMappedToNumerals.put(FIVE, "5");
+ stickDigitsMappedToNumerals.put(SIX, "6");
+ stickDigitsMappedToNumerals.put(SEVEN, "7");
+ stickDigitsMappedToNumerals.put(EIGHT, "8");
+ stickDigitsMappedToNumerals.put(NINE, "9");
+ this.textVersion = createAccountNumberFromOcr(ocrAccountNumber);
+ }
+
+ private String createAccountNumberFromOcr(String ocrToInterpret) {
+ ArrayList<String> accountNumberAsOcrDigits1 = new ArrayList<String>();
+ for (int digit1 = 0; digit1 < NUMBER_OF_DIGITS; digit1++) {
+
+ int startOfFirstLine = (digit1 * WIDTH_OF_OCR_NUMERAL);
+ int startOfSecondLine = startOfFirstLine + (WIDTH_OF_OCR_NUMERAL * NUMBER_OF_DIGITS);
+ int startOfThirdLine = startOfSecondLine + (WIDTH_OF_OCR_NUMERAL * NUMBER_OF_DIGITS);
+
+ String firstLineOfOcrDigit = ocrToInterpret.substring(startOfFirstLine, (startOfFirstLine + WIDTH_OF_OCR_NUMERAL));
+ String secondLineOfOcrDigit = ocrToInterpret.substring(startOfSecondLine, (startOfSecondLine + WIDTH_OF_OCR_NUMERAL));
+ String thirdLineOfOcrDigit = ocrToInterpret.substring(startOfThirdLine, (startOfThirdLine + WIDTH_OF_OCR_NUMERAL));
+
+ String nextDigit = firstLineOfOcrDigit + secondLineOfOcrDigit
+ + thirdLineOfOcrDigit;
+ accountNumberAsOcrDigits1.add(nextDigit);
+ }
+
+ ArrayList<String> accountNumberAsOcrDigits = accountNumberAsOcrDigits1;
+
+ String accountNumber = "";
+
+ final int ACCOUNT_NUMBER_LENGTH = accountNumberAsOcrDigits.size();
+
+ for (int digit = 0; digit < ACCOUNT_NUMBER_LENGTH; digit++) {
+ accountNumber = accountNumber
+ + (stickDigitsMappedToNumerals.get(accountNumberAsOcrDigits.get(digit)));
+ }
+
+ int checkSumCalculation = 0;
+ int currentDigit;
+
+ for (int digit = 0; digit < NUMBER_OF_DIGITS; digit++) {
+
+ String thisCharacter = accountNumber.substring(digit, digit + 1);
+ currentDigit = Integer.parseInt(thisCharacter);
+ checkSumCalculation = checkSumCalculation + ((NUMBER_OF_DIGITS - digit) * currentDigit);
+ }
+
+ isValid = ((checkSumCalculation % 11) == 0);
+ return accountNumber;
+ }
+}
@@ -0,0 +1,61 @@
+package convert;
+
+import static org.junit.Assert.assertFalse;
+import static org.junit.Assert.assertTrue;
+
+import java.util.HashMap;
+import java.util.Map;
+
+import org.junit.Before;
+import org.junit.Test;
+
+
+public class AccountNumberEndToEndTest {
+ Map<String, String> sampleOCRInputString = new HashMap<String, String>();
+
+ @Before
+ public void setUp() {
+ initializeInputData();
+ }
+
+ @Test
+ public void checkSumShows_InValidAccountNumberIs_InValid() {
+ String numberToCheck = "999982865";
+ AccountNumber accountNumber = new AccountNumber(sampleOCRInputString.get(numberToCheck));
+ assertFalse("Expected " + numberToCheck + " to be invalid.", accountNumber.isValid);
+ }
+
+ @Test
+ public void checkSumShows_ValidAccountNumber_IsValid() {
+ String numberToCheck = "123456789";
+ AccountNumber accountNumber = new AccountNumber(sampleOCRInputString.get(numberToCheck));
+ assertTrue("Expected " + numberToCheck + " to be valid.", accountNumber.isValid);
+ }
+
+ private void initializeInputData() {
+ sampleOCRInputString.put("999982865",
+ " _ _ _ _ _ _ _ _ _ " +
+ "|_||_||_||_||_| _||_||_ |_ " +
+ " _| _| _| _||_||_ |_||_| _|" +
+ " ");
+
+ sampleOCRInputString.put("000000051",
+ " _ _ _ _ _ _ _ _ " +
+ "| || || || || || || ||_ |" +
+ "|_||_||_||_||_||_||_| _| |" +
+ " ");
+
+ sampleOCRInputString.put("123456789",
+ " _ _ _ _ _ _ _ " +
+ " | _| _||_||_ |_ ||_||_|" +
+ " ||_ _| | _||_| ||_| _|" +
+ " ");
+
+ sampleOCRInputString.put("000000000",
+ " _ _ _ _ _ _ _ _ _ " +
+ "| || || || || || || || || |" +
+ "|_||_||_||_||_||_||_||_||_|" +
+ " ");
+ }
+
+}
Oops, something went wrong.

0 comments on commit cead14b

Please sign in to comment.