# DS-SF-34 | 01 | What is Data Science | Assignment | Answer Key

## Python Review

Programming is a must-have skill for data scientists.  Today, to give you some more practice beyond the course pre-requisites, we are going to implement a few functions in Python.  This assignment covers to some extend the following topics:

- Functions (defining and using your own functions but also calling functions from packages)
- Loops
- Arithmetic operations
- Conditional statements

**Don't worry if you get stuck.  Ask around, review the answer key, and ask around more.  As this course progresses, your programming proficiency will increase.**

> ### Question 1.  Multiples of 3 and 5
>
> If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3, 5, 6 and 9.  The sum of these multiples is 23.
>
> Find the sum of all the multiples of 3 or 5 below 1,000.
>
> (Source: [Project Euler | Problem 1](https://projecteuler.net/problem=1))

In [1]:
def multiples_of_3_or_5(n):
    accumulator = 0
    for i in range(1, n):
        if (i % 3 == 0) or (i % 5 == 0):
            accumulator += i
    return accumulator

multiples_of_3_or_5(10)

23

In [2]:
multiples_of_3_or_5(1000)

233168

Answer: 233168

> ### Question 2.  Estimating square roots
>
> Given a real number $m$, let's define the series $u$ as follow:
> - $u_0 = 1$
> - $u_{n+1} = \frac{u_n ^ 2 + n}{2u_m}$
>
>
> Implement the calculations of the series $u$ above to estimate square roots.  Verify that $\sqrt{144} = 12$ and use your function to calculate $\sqrt{1024}$.

In [3]:
def sqrt(m):
    u_n = 1.

    while True:
        u_n_plus_1 = (u_n ** 2 + m) / (2 * u_n)

        if abs(u_n_plus_1 / u_n - 1) < 10 ** -6:
            break

        u_n = u_n_plus_1

    return u_n

- Let's test it out with 144.  We expect to get 12.

In [4]:
sqrt(144)

12.000000012408687

- Now, let's estimate $\sqrt{1024}$.

In [5]:
sqrt(1024)

32.0000071648159

- Now, let's double check this result using `math.sqrt`: (https://docs.python.org/2/library/math.html)

In [6]:
import math

math.sqrt(1024)

32.0

Answer: 32

> ### Question 3.  Prime Numbers
>
> A prime (number) is a natural number greater than 1 that has no positive divisors other than 1 and itself.  ([Wikipedia](https://en.wikipedia.org/wiki/Prime_number))
>
> Calculate all primes below 1,000.  What's their sum?

In [7]:
def is_prime(n):
    for i in range (2, int(sqrt(n)) + 1):
        if n % i == 0:
            return False
    return True

def primes(n):
    l = []
    for i in range(2, n + 1):
        if is_prime(i):
            l.append(i)
    return l

l = primes(100)

In [8]:
sum(l)

1060

Answer: 1060

> ### Question 4.  Largest prime factor
>
> The prime factors of 13195 are 5, 7, 13 and 29.
>
> What is the largest prime factor of the number 600851475143?
>
> (Source: [Project Euler | Problem 3](https://projecteuler.net/problem=3))

In [9]:
def largest_prime_factor(n):
    primes = set()

    i = 2
    while n > 1:
        while n % i == 0:
            primes.add(i)
            n /= i
        i += 1

    return max(primes)

- Let's test it out with 13195.  We expect to get 29.

In [10]:
largest_prime_factor(13195)

29

- Now, let's find the largest prime for 600851475143.

In [11]:
largest_prime_factor(600851475143)

6857

Answer: 6857

> ### Question 5.  Mean
>
> Write a function to calculate the mean (average) of a list.
>
> What's the mean of 10, 8, 13, 9, 11, 14, 6, 4, 12, 7, and 5?

In [12]:
def mean(xs):
    n = len(xs)
    if n == 0:
        return None

    accumulator = 0.

    for x in xs:
        accumulator += x

    return accumulator / n

In [13]:
l = [10, 8, 13, 9, 11, 14, 6, 4, 12, 7, 5]

mean(l)

9.0

- Let's double check this result using `numpy.mean`: (https://docs.scipy.org/doc/numpy/reference/generated/numpy.mean.html)

In [14]:
import numpy as np

np.mean(l)

9.0

Answer: 9

> ### Question 6.  Sample standard deviation
>
> Write a function to calculate the standard deviation of a sample.
>
> Given the sample $x_1, x_2, ..., x_N$, its standard deviation is defined as $s = \sqrt{\frac{1}{N - 1} \sum_{i = 1}^{N} (x_i  - \bar{x})^2}$, with $\bar{x}$ as the sample mean.
>
> What's the standard deviation of the following sample: 10, 8, 13, 9, 11, 14, 6, 4, 12, 7, and 5?
>
> ([Wikipedia](https://en.wikipedia.org/wiki/Standard_deviation#Sample_standard_deviation))

In [15]:
def std(xs):
    n = len(xs)
    if n <= 1:
        return None

    accumulator = 0.

    x_bar = mean(xs)

    for x in xs:
        accumulator += (x - x_bar) ** 2

    return sqrt(accumulator / (n - 1))

In [16]:
std(l)

3.3166248052315686

- Let's double check this result using `numpy.std`: (https://docs.scipy.org/doc/numpy/reference/generated/numpy.std.html)

In [17]:
np.std(l, ddof = 1)

3.3166247903553998

Answer: ~3.32.

> ### Question 7.  Median
>
> Write a function to calculate the median ("middle value") of a list.  ([Wikipedia](https://en.wikipedia.org/wiki/Median))
>
> What's the median of 10, 8, 13, 9, 11, 14, 6, 4, 12, 7, and 5?

In [18]:
def median(xs):
    n = len(xs)
    if n == 0:
        return None

    sorted_xs = sorted(xs)

    i = n / 2

    if (n % 2 == 1):
        return sorted_xs[i]
    else:
        return mean(sorted_xs[i - 1: i + 1])

In [19]:
median(l)

9

- Let's double check this result using `numpy.median`: (https://docs.scipy.org/doc/numpy/reference/generated/numpy.median.html)

In [20]:
np.median(l)

9.0

Answer: 9.

> ### Question 8.  Mode
>
> Write a function to calculate the mode ("most frequent value") of a list.  ([Wikipedia](https://en.wikipedia.org/wiki/Mode_(statistics)))
>
> What's the mode of 10, 8, 13, 9, 11, 14, 6, 4, 12, 7 and 5?  How about the mode of 8, 8, 8, 8, 8, 8, 19, 8, 8 and 8?

In [21]:
def mode(xs):
    counts = {}
    for x in xs:
        if x in counts:
            counts[x] += 1
        else:
            counts[x] = 1

    mode_count = 2
    mode_xs = []

    for x, count in counts.iteritems(): # counts.items() in Python 3
        if mode_count < count:
            mode_count = count
            mode_xs = [x]
        elif count == mode_count:
            mode_xs.append(x)

    return mode_xs

In [22]:
mode(l)

[]

In [23]:
l2 = [8, 8, 8, 8, 8, 8, 19, 8, 8, 8]

mode(l2)

[8]

- Let's double check this result using `scipy.stats.mode`: (https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.mode.html)

In [24]:
from scipy import stats

stats.mode(l)

ModeResult(mode=array([4]), count=array([1]))

In [25]:
stats.mode(l2)

ModeResult(mode=array([8]), count=array([9]))

Answer: No mode for the first list, 8 for the second list