# Introduction

Purpose of this notebook: gather notes to prepare for the Code Signal Data Science Assessment. The requirements are from this file: https://codesignal.com/wp-content/uploads/2019/08/data-science-assessment-framework.pdf
<br>

The maximum allowed completion time for the test is 80 minutes. Each test has 5 implementation tasks and 12 multiple choice quiz questions; in total, there are 17 tasks. 
<br>

__Tasks__:

- Programming implementation, 20 minutes, see example 2.2.1.
- Bug-fixing, 5 minutes, see example 2.2.2.

# Programming Basics

- primitive data types (ints/floats/strings)
- basic arithmetic/logical operations, 
- loops and decision constructs, 
- basic collections (arrays, lists, and dictionaries/sets), 
- lambda functions, list comprehensions,
- using libraries highly relevant to data science/analysis (e.g., python numpy/pandas/scikit-learn). 

## Example 2.2.1

You are given a list of strings (data) where each string is in the form `device_id`,`usage_in_minutes` such that `device_id` contains exactly five lowercase English alphabets (’a’-’z’) and `usage_in_minutes` is a positive integer between 1 and 1,440 with leading zeroes (if necessary, to make its length equal to 4). For instance, “abxyz,0010" describes `device_id` = “abxyz” and `usage_in_minutes` = 10 minutes. Given data, return the `device_id` with the largest value of `usage_in_minutes`. You may assume that `device_id`’s are distinct and `usage_in_minutes` are distinct in data.

In [4]:
def return_largest_device_id(list_of_strings):
    max_usage = 0
    max_device = ''
    for item in list_of_strings:
        current_device, current_usage = item.split(',')
        current_usage = int(current_usage)
        if current_usage > max_usage:
            max_usage = current_usage
            max_device = current_device
            
    return max_device
        
test_list = ["abxyz,0016", "cdfgh,0015"]
return_largest_device_id(test_list)
        

'abxyz'

## Example 2.2.2

Consider a function that is given an array/list A of distinct integers and an integer k (where 1 ≤ k ≤ len(A)), and returns all possible subarrays of A by removing k contiguous elements in A. That is, you wish to obtain a subarray of A by removing the first k elements in A, another subarray of A by removing the next k elements in A, and so on.
<br>

For instance, when A = [2, 4, 6, 8, 10] and k = 3, you can remove 
- [2, 4, 6] (which results in [8, 10]), 
- [4, 6, 8](which results in [2, 10]), or 
- [6, 8, 10] (which results in [2, 4]).
<br>

Since you are removing k elements from A, you always obtain a subarray of length (len(A) − k), and there are (len(A) − k + 1) such subarrays. 

<br>
In the provided code, the given function contains a buggy line of code. You are asked to find and x one
line of code so that the function will return a list of subarrays correctly.

In [21]:
def remove_k_elems(A, k):
    n = len(A)
    result = lambda arr, idx: arr[0:(idx-1)] + arr[(idx+1):n]
    return [ result(A, i) for i in range(n-k+1) ]

A = [2, 4, 6, 8, 10] 
k = 3
remove_k_elems(A,k)

[[2, 4, 6, 8, 4, 6, 8, 10], [6, 8, 10], [2, 8, 10]]

In [20]:
result = lambda arr, idx: arr[0:idx] + arr[(idx+k+1):len(A)]

result(A, 3)

[2, 4, 6]

In [23]:
def remove_k_elems_corrected(A, k):
    n = len(A)
    result = lambda arr, idx: arr[0:idx] + arr[(idx+k):n]
    return [ result(A, i) for i in range(n-k+1) ]

A = [2, 4, 6, 8, 10] 
k = 3
remove_k_elems_corrected(A,k)

[[8, 10], [2, 10], [2, 4]]

# Query Language

- Basics: Filtering, Sorting, Aggregate Functions,
- If, Case and String Functions,
- subqueries (innerqueries), 
- joins (inner, left, right),
- window functions and window-specific aggregates

# Probability Basics

Random Variables, Events, Probability Distributions

# Statistics Basics

Mean/Median/Mode, Standard Deviation, z-score,
p-value, t-statistic

# Conditional Probability and Bayes Theorem

# Linear Regression

# Logistic Regression

# Clustering algorithms

# Regularization

Regularization in Linear Regression and Logistic Regression

# Model Evaluation

Training/test error, validation methods such as kfold cross validation, and various evaluation metrics