## Problem set 2

**Problem 0** (-2 points for every missing green OK sign. If you don't run the cell below, that's -14 points.)

Make sure you are in the DATA1030 environment.

In [None]:
from __future__ import print_function
from distutils.version import LooseVersion as Version
import sys

OK = '\x1b[42m[ OK ]\x1b[0m'
FAIL = "\x1b[41m[FAIL]\x1b[0m"

try:
    import importlib
except ImportError:
    print(FAIL, "Python version 3.7 is required,"
                " but %s is installed." % sys.version)

def import_version(pkg, min_ver, fail_msg=""):
    mod = None
    try:
        mod = importlib.import_module(pkg)
        if pkg in {'PIL'}:
            ver = mod.VERSION
        else:
            ver = mod.__version__
        if Version(ver) == min_ver:
            print(OK, "%s version %s is installed."
                  % (lib, min_ver))
        else:
            print(FAIL, "%s version %s is required, but %s installed."
                  % (lib, min_ver, ver))    
    except ImportError:
        print(FAIL, '%s not installed. %s' % (pkg, fail_msg))
    return mod


# first check the python version
pyversion = Version(sys.version)
if pyversion >= "3.7":
    print(OK, "Python version is %s" % sys.version)
elif pyversion < "3.7":
    print(FAIL, "Python version 3.7 is required,"
                " but %s is installed." % sys.version)
else:
    print(FAIL, "Unknown Python version: %s" % sys.version)

    
print()
requirements = {'numpy': "1.18.5", 'matplotlib': "3.2.2",'sklearn': "0.23.1", 
                'pandas': "1.0.5",'xgboost': "1.1.1", 'shap': "0.35.0"}

# now the dependencies
for lib, required_version in list(requirements.items()):
    import_version(lib, required_version)

You will practice standard python coding in this problem set. The questions you see here are actual coding interview questions I got in the past. Please solve these problems using the [python standard library](https://docs.python.org/3/library/) only, no additional packages. 

**Problem 1a** (5 points)

Write a function that takes a number as input and it returns how many unique digits it contains. For example, if the input is 2020, the output is 2 because 2020 contains two unique digits 2 and 0. 

1a explores some simple integer numbers which is usually how the interview starts. You'll get a couple of simple test cases first.

We will practice test-driven code development and I give you the backbone of the function.

In [None]:

# here are our test cases with solutions we verified
tests = [2020, 11, 10, 55411, 0]
results = [2, 1, 2, 3, 1]

def unique_digits(number):
    '''function to check how many unique digits we have in a number'''
    # test the input
    if not isinstance(number, (int, float)):
        raise ValueError('input is not number')
    
    # feel free to delete this line once you add your solution
    nr_digits = 0
    
    # add your code here:
    
    
    return nr_digits

# we iterate through each test case 
for (test,result) in zip(tests,results):
    output = unique_digits(test)
    if result != output:
        print('input:',test)
        print('the expected result:',result)
        print('what we got instead:',output)
# Your code is correct if no line is printed in the output.

**Problem 1b** (5 points)

Here are a couple of new test cases. What happens if the number is a float or it is negative? Does your `unique_digits` function still works correctly? If not, please create a new function called `unique_digits2` in the cell below and update it! Leave `unique_digits` in the cell above unchanged.

This is typically what happens during an interview. The interviewer points out that your solution doesn't work on more complicated examples and prompts you to fix the issue.

In [None]:
tests = [-1, 0.0, -222, -67.67, 1234.5678]
results = [1, 1, 1, 2, 8]


**Problem 1c** (5 points)

We add one extra step of complexity. Numbers can be given in scientific format too:

XeY is X * 10^(Y), so for example 1.5e2 = 1.5 * 10^2 = 150 or -5.0e-3 = -5.0 * 10^(-3) = -0.005

X can be a float or an integer such that 1<=|X|<10. Y is always an integer. 0e0 is an exception because |X| is less than 1 but all other numbers can be expressed with 1<=|X|<10. E.g., while 0.8e0 or 12e0 are technically correct and python won't give you an error message, 0.8 is usually expressed as 8e-1 and 12 is expressed as 1.2e1 in the scientific format.

Check the new test examples and if your `unique_digits2` fails, create a `unique_digits3` function to correctly pass the new and previous tests. Please leave `unique_digits` and `unique_digits2` unchanged in the cells above.

There is some ambiguity in the unique digits. For example 1.1e1 can be represented as 11.0 but also as 11. In such cases I go with the simplest integer representation (11) and the correct solution is that 1.1e1 has 1 unique digits. 


In [None]:
tests = [2020, 11, 10, 55411, 0, -1, 0.0, -222, -67.67, 1234.5678, 1e0, 1.1e1, -5.23e4, 6.6e-3, 0e0]
results = [2, 1, 2, 3, 1, 1, 1, 1, 2, 8, 1, 1, 4, 2, 1]


**Problem 2a** (5 points)

Write a function that takes a string as an input and it returns a dictionary with two keys 'vowels' and 'consonants'. The values of these keys are the number of unique consonants and vowels in the string. Follow the coding structure of 1a. 

- Come up with 5 simple lower-case words (like 'apple', 'tree') and manually create the dictionary with the correct solution for each word (like 'apple' - 2 vowels, 2 consonants; 'tree' - 1 vowel, 2 consonants). 
- Write your function. 
- Loop through the words and the dictionaries, call your function, and check if the function's output dictionary is the same as the solution you expect.

This is the core idea of test-driven development. You first come up with how you'll test your code, and then you start writing it.

**Problem 2b** (10 points)

Add five new strings to your test. Two strings should contain simple words with a mix of lower and upper case characters (like 'Andras' - 1 vowel and 4 consonants). The other three strings should be whole sentences with upper and lower case characters, punctuation, and numbers too (for example "I'd like to have 5 apples please!" - 4 vowels, 8 consonants). Write a new function that can correctly solve your test cases.