# Data Analysis with Python

> Exercises: Array Computing with NumPy.

Kuo, Yao-Jen <yaojenkuo@datainpoint.com> from [DATAINPOINT](https://www.datainpoint.com)

## Instructions

- We've imported necessary modules/libraries at the beginning of each exercise.
- We've defined the names of functions/inputs/arguments for you.
- Write down your solution between the comments `### BEGIN SOLUTION` and `### END SOLUTION`.
- Running tests to see if your solutions are right: Kernel -> Restart & Run All -> Restart and Run All Cells.
- You can run tests after each question or after finishing all questions.

In [1]:
import numpy as np
import unittest

## Define a function named `create_nn_array` that is able to generate a `(9, 9)` array as specified.

- Expected inputs: None.
- Expected outputs: a (9, 9) array.

```
[[ 1  2  3  4  5  6  7  8  9]
 [ 2  4  6  8 10 12 14 16 18]
 [ 3  6  9 12 15 18 21 24 27]
 [ 4  8 12 16 20 24 28 32 36]
 [ 5 10 15 20 25 30 35 40 45]
 [ 6 12 18 24 30 36 42 48 54]
 [ 7 14 21 28 35 42 49 56 63]
 [ 8 16 24 32 40 48 56 64 72]
 [ 9 18 27 36 45 54 63 72 81]]
```

In [2]:
def create_nn_array():
    """
    >>> nn_array = create_nn_array()
    >>> nn_array.shape
    (9, 9)
    """
    ### BEGIN SOLUTION
    nn_arr = (np.arange(1, 10).reshape(9, 1)).dot(np.arange(1, 10).reshape(1, 9))
    return nn_arr
    ### END SOLUTION

## Define a function named `filter_evens_product` that is able to product the even numbers from a given array.

- Expected inputs: an array.
- Expected outputs: a numeric.

In [3]:
def filter_evens_product(x):
    """
    >>> filter_evens_product(np.array([5, 5, 6, 6]))
    36
    >>> filter_evens_product(np.array([1, 2, 3, 4]))
    8
    >>> filter_evens_product(np.arange(4))
    0
    """
    ### BEGIN SOLUTION
    evens = x[x % 2 == 0]
    return evens.prod()
    ### END SOLUTION

## Define a function named `find_divisors` that is able to find the divisors of a given integer.

- Expected inputs: an integer.
- Expected outputs: an array.

In [4]:
def find_divisors(x):
    """
    >>> find_divisors(1)
    array([1])
    >>> find_divisors(2)
    array([1, 2])
    >>> find_divisors(3)
    array([1, 3])
    >>> find_divisors(4)
    array([1, 2, 4])
    >>> find_divisors(5)
    array([1, 5])
    """
    ### BEGIN SOLUTION
    possible_divisors = np.arange(1, x+1)
    modulos = x % possible_divisors
    divisors = possible_divisors[modulos == 0]
    return divisors
    ### END SOLUTION

## Define a function named `var` that is able to calculate the variance of a given array.

PS You may refer to the definition of variance: <https://en.wikipedia.org/wiki/Variance>

\begin{equation}
Var(x) = \frac{1}{N}\sum_{i=1}^{N}(x_i - \bar{x})^2
\end{equation}

- Expected inputs: an array.
- Expected outputs: a numeric.

In [5]:
def var(x):
    """
    >>> var(np.arange(10))
    8.25
    >>> var(np.arange(100))
    833.25
    """
    ### BEGIN SOLUTION
    N = x.size
    x_bar = x.mean()
    se = (x - x_bar)**2
    sse = se.sum()
    return sse / N
    ### END SOLUTION

## Define a function named `std` that is able to calculate the standard deviation of a given array.

\begin{equation}
SD(x) = \sqrt{Var(x)} = \sqrt{\frac{1}{N}\sum_{i=1}^{N}(x_i - \bar{x})^2}
\end{equation}

PS You may refer to the definition of standard deviation: <https://en.wikipedia.org/wiki/Standard_deviation>

- Expected inputs: an array
- Expected outputs: a numeric

In [6]:
def std(x):
    """
    >>> std(np.arange(10))
    2.8722813232690143
    >>> std(np.arange(100))
    28.86607004772212
    """
    ### BEGIN SOLUTION
    return np.sqrt(var(x))
    ### END SOLUTION

## Define a function named `cov` that is able to calculate the covariance of 2 same-length arrays.

\begin{equation}
cov(x, y) = \frac{1}{N}\sum_{i=1}^{N}(x_i-\bar{x})(y_i-\bar{y})
\end{equation}

PS You may refer to the definition of covariance: <https://en.wikipedia.org/wiki/Covariance>

- Expected inputs: 2 arrays.
- Expected outputs: a numeric.

In [7]:
def cov(x, y):
    """
    >>> np.random.seed(123)
    >>> x = np.random.randint(0, 50, 10)
    >>> y = np.random.randint(0, 50, 10)
    >>> cov(x, y)
    -54.7
    >>> np.random.seed(456)
    >>> x = np.random.randint(0, 50, 10)
    >>> y = np.random.randint(0, 50, 10)
    >>> cov(x, y)
    -23.249999999999996
    """
    ### BEGIN SOLUTION
    N = x.size
    x_bar = x.mean()
    y_bar = y.mean()
    sum_err_prod = ((x - x_bar)*(y - y_bar)).sum()
    return sum_err_prod / N
    ### END SOLUTION

## Define a function named `corr` that is able to calculate the correlation coefficient of 2 same-length arrays.

\begin{equation}
r_{xy} = \frac{cov(x, y)}{\sqrt{cov(x, x)cov(y, y)}}
\end{equation}

PS You may refer to the definition of correlation coefficient: <https://en.wikipedia.org/wiki/Correlation_coefficient>

- Expected inputs: 2 arrays.
- Expected outputs: a numeric.

In [8]:
def corr(x, y):
    """
    >>> np.random.seed(123)
    >>> x = np.random.randint(0, 50, 10)
    >>> y = np.random.randint(0, 50, 10)
    >>> corr(x, y)
    -0.3409853364175933
    >>> np.random.seed(456)
    >>> x = np.random.randint(0, 50, 10)
    >>> y = np.random.randint(0, 50, 10)
    >>> corr(x, y)
    -0.16475204420969639
    """
    ### BEGIN SOLUTION
    cov_x_y = cov(x, y)
    cov_x_x = cov(x, x)
    cov_y_y = cov(y, y)
    return cov_x_y / np.sqrt(cov_x_x * cov_y_y)
    ### END SOLUTION

## Run tests!

Kernel -> Restart & Run All -> Restart and Run All Cells.

In [9]:
class TestArrayComputing(unittest.TestCase):
    def test_00_create_nn_array(self):
        nn_array = create_nn_array()
        self.assertEqual(nn_array.shape, (9, 9))
        self.assertTrue(1 in nn_array)
        self.assertTrue(4 in nn_array)
        self.assertTrue(64 in nn_array)
        self.assertTrue(81 in nn_array)
    def test_01_filter_evens_product(self):
        self.assertEqual(filter_evens_product(np.array([5, 5, 6, 6])), 36)
        self.assertEqual(filter_evens_product(np.array([1, 2, 3, 4])), 8)
        self.assertEqual(filter_evens_product(np.arange(4)), 0)
    def test_02_find_divisors(self):
        np.testing.assert_equal(find_divisors(1), np.array([1]))
        np.testing.assert_equal(find_divisors(2), np.array([1, 2]))
        np.testing.assert_equal(find_divisors(3), np.array([1, 3]))
        np.testing.assert_equal(find_divisors(4), np.array([1, 2, 4]))
        np.testing.assert_equal(find_divisors(5), np.array([1, 5]))
    def test_03_var(self):
        self.assertAlmostEqual(var(np.arange(10)), 8.25)
        self.assertAlmostEqual(var(np.arange(100)), 833.25)
    def test_04_std(self):
        self.assertAlmostEqual(std(np.arange(10)), 2.8722813232690143)
        self.assertAlmostEqual(std(np.arange(100)), 28.86607004772212)
    def test_05_cov(self):
        np.random.seed(123)
        x = np.random.randint(0, 50, 10)
        y = np.random.randint(0, 50, 10)
        self.assertAlmostEqual(cov(x, y), -54.7)
        np.random.seed(456)
        x = np.random.randint(0, 50, 10)
        y = np.random.randint(0, 50, 10)
        self.assertAlmostEqual(cov(x, y), -23.249999999999996)
    def test_06_corr(self):
        np.random.seed(123)
        x = np.random.randint(0, 50, 10)
        y = np.random.randint(0, 50, 10)
        self.assertAlmostEqual(corr(x, y), -0.3409853364175933)
        np.random.seed(456)
        x = np.random.randint(0, 50, 10)
        y = np.random.randint(0, 50, 10)
        self.assertAlmostEqual(corr(x, y), -0.16475204420969639)

suite = unittest.TestLoader().loadTestsFromTestCase(TestArrayComputing)
runner = unittest.TextTestRunner(verbosity=2)
test_results = runner.run(suite)
number_of_failures = len(test_results.failures)
number_of_errors = len(test_results.errors)
number_of_test_runs = test_results.testsRun
number_of_successes = number_of_test_runs - (number_of_failures + number_of_errors)

test_00_create_nn_array (__main__.TestArrayComputing) ... ok
test_01_filter_evens_product (__main__.TestArrayComputing) ... ok
test_02_find_divisors (__main__.TestArrayComputing) ... ok
test_03_var (__main__.TestArrayComputing) ... ok
test_04_std (__main__.TestArrayComputing) ... ok
test_05_cov (__main__.TestArrayComputing) ... ok
test_06_corr (__main__.TestArrayComputing) ... ok

----------------------------------------------------------------------
Ran 7 tests in 0.012s

OK


In [10]:
print("You've got {} successes among {} questions.".format(number_of_successes, number_of_test_runs))

You've got 7 successes among 7 questions.
