## Leaf Group - Technical Assesment Quiz
#### Candidate: Darren Higa

### Data Science
***You fit a model to a set of training data and it achieves 80 accuracy. You put the model into production and it achieves 60% accuracy. Why? List as many reasons as you can think of.***




  When the accuracy of a model is significantly less than the results produced with the training data set, this could be an indication of overfitting.  Some reaons that this might occur:
  
1.  The training data set is **not large enough** to account for the vast majority of the types of cases in the real world (i.e. production).
2.  The training data is **not timely**.  Certain types of data (like consumer choice or consumer behavior) may change rapidly and the training data may represent information from a year ago (or prior) which may have little to no relevance to the target being predicted. 
3.  There could be **data-gathering** issues from production.  Bugs in the production code could be limiting the gathering or mishandling the production data.
4.  Several **new values or labels** could be coming into the production data set due to open-ended data items (i.e. from a textbox or from new versions of an application or web page).  This could present some issues, especially if this data is for key features used by the model.




***You have a set of data for a consumer bank which includes information about account activity and whether or not that account was closed. Your boss tells you to make a model of churn to predict whose account may close soon. You are told to use either xgboost or deep learning. What do you choose and why?***

  Deep learning is primarily used with human generated, unstructured data such as digital images, video, voice, etc.  For predicting customer churn at a consumer bank, the data provided here is more structured, likely provided in a tabular format.  In addition, this is a binary classification problem (either the account with close or stay open).  I would choose XGBoost as my understanding is that it's a good candidate for this type of data and prediction problem.  

### General & Web

***What happens when you enter a search phrase on google.com? Summarize in 3-5 sentences.***

When you type a search into a search bar at Google.com, the first thing that happens is an analysis of the text being typed (key word/phrase recognition, etc.).  From the Google Index, Google retrieves the webpages which match or closely match the search terms that were parsed from the search entry.  Given the huge multitude of possible results, Google must then determine the relevancy of the results in order to present the most relevant results to the user.  One of the factors used to determine relevancy is Google’s well-known PageRank.  Google uses several other factors such as user location, language, etc. to help determine the both the *relevancy* and *context* of the search.

### Coding & Python

***Create a function to convert an integer in base 8 to base 10***

In [1]:
def octaldecimal(num):
    i = 0
    baseten = 0
    
    if isinstance(num, int) == False:           # Check that input is an integer
        print('The value ',num,' is not a valid integer!')
        return -999
    
    digits = [int(x) for x in str(num)]         # Get each digit and put it into a list

    for digit in digits[::-1]:                  # Loop on each digit from right to left
        if (digit >=0) and (digit <= 7):
            baseten += digit * 8**i             # Multiply each digit by appropriate power of 8
        else:                                   # Error handling for invalid digits
            print ('This is not a valid base 8 integer!')
            return -999
        i += 1                                  # Increment to next power of 8
    return baseten


# Run sample test cases:

base8 = 137
print()
print('The base 8 number for conversion is: ',base8)
base10 = octaldecimal(base8)
if base10 != -999:
    print ('Base 8:', base8,' is equivalent to base 10:',base10)
    
base8 = 114.50
print()
print('The base 8 number for conversion is: ',base8)
base10 = octaldecimal(base8)
if base10 != -999:
    print ('Base 8:', base8,' is equivalent to base 10:',base10)
    
base8 = 234864
print()
print('The base 8 number for conversion is: ',base8)
base10 = octaldecimal(base8)
if base10 != -999:
    print ('Base 8:', base8,' is equivalent to base 10:',base10)
    
base8 = 234764
print()
print('The base 8 number for conversion is: ',base8)
base10 = octaldecimal(base8)
if base10 != -999:
    print ('Base 8:', base8,' is equivalent to base 10:',base10)


The base 8 number for conversion is:  137
Base 8: 137  is equivalent to base 10: 95

The base 8 number for conversion is:  114.5
The value  114.5  is not a valid integer!

The base 8 number for conversion is:  234864
This is not a valid base 8 integer!

The base 8 number for conversion is:  234764
Base 8: 234764  is equivalent to base 10: 80372


***Create a function to convert a base 10 (decimal) number to octal (base 8)***

In [2]:
def decimaloctal(num):

    i = 0
    base8 = 0
    while num > 0:
        remainder = num % 8 
        num = num // 8 
        base8 += remainder * 10**i
        i += 1
    return base8

# Run sample test cases:

base10 = 95
print()
print('The base 10 number for conversion is: ',base10)
base8 = decimaloctal(base10)
if base8 != -999:
    print ('Base 10:', base10,' is equivalent to base 8:',base8)
    
base10 = 1378
print()
print('The base 10 number for conversion is: ',base10)
base8 = decimaloctal(base10)
if base8 != -999:
    print ('Base 10:', base10,' is equivalent to base 8:',base8)
    
base10 = 80372
print()
print('The base 10 number for conversion is: ',base10)
base8 = decimaloctal(base10)
if base8 != -999:
    print ('Base 10:', base10,' is equivalent to base 8:',base8)
    


The base 10 number for conversion is:  95
Base 10: 95  is equivalent to base 8: 137

The base 10 number for conversion is:  1378
Base 10: 1378  is equivalent to base 8: 2542

The base 10 number for conversion is:  80372
Base 10: 80372  is equivalent to base 8: 234764


***Write a function that will return every permutation of inclusion for elements of a list.***

In [3]:
# To find all of the possible permutations, I would use itertools.combinations and to find
# all of the combinations of size i (from 0 to the size of the list).  Combining all of these
# results will provide the final list of permutations.

from itertools import combinations

def allsublists(lst):           
    resultlst = []
    
    for i in range(len(lst)+1):
        combinationtuples = list(combinations(lst,i))
        combinationlist = [list(elem) for elem in combinationtuples]  # Convert to list of lists
        resultlst.extend(combinationlist)

    return resultlst
    
## Test the function on sample lists.

permutationlist = allsublists([1,2,3,4,5])
print('Result List 1:')
print(permutationlist)
print()

permutationlist = allsublists([1,3,5,'x','y','z'])
print('Result List 2:')
print(permutationlist)
print()

permutationlist = allsublists(['cat','dog','cow','mouse'])
print('Result List 3:')
print(permutationlist)



Result List 1:
[[], [1], [2], [3], [4], [5], [1, 2], [1, 3], [1, 4], [1, 5], [2, 3], [2, 4], [2, 5], [3, 4], [3, 5], [4, 5], [1, 2, 3], [1, 2, 4], [1, 2, 5], [1, 3, 4], [1, 3, 5], [1, 4, 5], [2, 3, 4], [2, 3, 5], [2, 4, 5], [3, 4, 5], [1, 2, 3, 4], [1, 2, 3, 5], [1, 2, 4, 5], [1, 3, 4, 5], [2, 3, 4, 5], [1, 2, 3, 4, 5]]

Result List 2:
[[], [1], [3], [5], ['x'], ['y'], ['z'], [1, 3], [1, 5], [1, 'x'], [1, 'y'], [1, 'z'], [3, 5], [3, 'x'], [3, 'y'], [3, 'z'], [5, 'x'], [5, 'y'], [5, 'z'], ['x', 'y'], ['x', 'z'], ['y', 'z'], [1, 3, 5], [1, 3, 'x'], [1, 3, 'y'], [1, 3, 'z'], [1, 5, 'x'], [1, 5, 'y'], [1, 5, 'z'], [1, 'x', 'y'], [1, 'x', 'z'], [1, 'y', 'z'], [3, 5, 'x'], [3, 5, 'y'], [3, 5, 'z'], [3, 'x', 'y'], [3, 'x', 'z'], [3, 'y', 'z'], [5, 'x', 'y'], [5, 'x', 'z'], [5, 'y', 'z'], ['x', 'y', 'z'], [1, 3, 5, 'x'], [1, 3, 5, 'y'], [1, 3, 5, 'z'], [1, 3, 'x', 'y'], [1, 3, 'x', 'z'], [1, 3, 'y', 'z'], [1, 5, 'x', 'y'], [1, 5, 'x', 'z'], [1, 5, 'y', 'z'], [1, 'x', 'y', 'z'], [3, 5, 'x', 'y'