## Exercise 122: Tokenizing a String

Tokenizing is the process of converting a string into a list of substrings, known as
tokens. In many circumstances, a list of tokens is far easier to work with than the
original string because the original string may have irregular spacing. In some cases
substantial work is also required to determine where one token ends and the next one
begins.
In a mathematical expression, tokens are items such as operators, numbers and
parentheses. Some tokens, such as *, /, ˆ, ( and ) are easy to identify because the
token is a single character, and the character is never part of another token. The + and
- symbols are a little bit more challenging to handle because they might represent
the addition or subtraction operator, or they might be part of a number token.

```
Hint: A + or - is an operator if the non-whitespace character immediately
before it is part of a number, or if the non-whitespace character immediately
before it is a close parenthesis. Otherwise it is part of a number.
```

Write a function that takes a string containing a mathematical expression as its
only parameter and breaks it into a list of tokens. Each token should be a parenthesis,
an operator, or a number with an optional leading + or - (for simplicity we will
only work with integers in this problem). Return the list of tokens as the function’s
result.
You may assume that the string passed to your function always contains a valid
mathematical expression consisting of parentheses, operators and integers. However,
your function must handle variable amounts of whitespace between these
elements. Include a main program that demonstrates your tokenizing function by
reading an expression from the user and printing the list of tokens. Ensure that the
main program will not run when the ﬁle containing your solution is imported into
another program.

In [1]:
def tokenizer(string):
	string = string.replace(' ', '')
	tokens = []
	
	i = 0
	while i < len(string):
		# Handle the tokens that are always a single character
		if string[i] == '*' or string[i] == '/' or string[i] == '^' \
			or string[i] == '(' or string[i] == ')':
			tokens.append(string[i])
			i += 1
		# Handle '-' and '+'
		elif string[i] == '-' or string[i] == '+':
			# If there is a previous charater  and it is a number
			# Or close bracket then the - or + is a token on its own
			if i > 0 and (string[i-1] >= '0' and string[i-1] <= '9' or string[i-1] == ')'):
				tokens.append(string[i])
				i += 1
			else:
				# The + or - is part of a number
				num = string[i]
				i += 1
				# Keep on adding characters to the token as long as they are digits
				while i < len(string) and string[i] >= '0' and string[i] <= '9':
		
        num = num + string[i]
					i += 1
				tokens.append(num)

		# Handle a number without a leading + or -
		elif string[i] >= '0' and string[i] <= '9':
			num = ''
			# Keep on adding characters as long as they are digits
			while i < len(string) and string[i] >= '0' and string[i] <= '9':
				num = num + string[i]
				i += 1
			tokens.append(num)

		# Any other character means the expression is not valid
		# Return an empty list to indicate that an error occured
		else:
			return []
	return tokens
def main():
	exp = input('Enter a mathematical expression: ')
	tokens = tokenizer(exp)
	print('The tokens are:', tokens)

if __name__ == '__main__':
	main()


Enter a mathematical expression: (2/5 + 1) - 34
The tokens are: ['(', '2', '/', '5', '+', '1', ')', '-', '34']
