# Homework 5: Multiprocessing and Pruning
### Author: Calvin Henggeler
### Dataset: Housing.csv

## A. Multiprocessing.
The Collatz conjecture says that if you start with a positive integer, divide by 2 when even, and multiply by 3 and add 1 when odd, you will eventually arrive at 1. For example,
12 -> 6 -> 3 -> 10 -> 5 -> 16 -> 8 -> 4 -> 2 -> 1
9 -> 28 -> 14 -> 7 -> 22 -> 11 -> 34 -> 17 -> 52 -> 26 -> 13 -> 40 -> 20 -> 10 -> 5 -> 16 -> 8 -> 4 -> 2 -> 1
19 -> 58 -> 29 -> 88 -> 44 -> 22 -> 11 -> 34 -> 17 -> 52 -> 26 -> 13 -> 40 -> 20 -> 10 -> 5 -> 16 -> 8 -> 4 -> 2 -> 1
We can define the Collatz length of n to be the number of steps required to arrive at 1 from n. So, the Collatz length of 12 is 9, the Collatz length of 9 is 19, and the Collatz length of 19 is 20.

1. Write a function which accepts a positive integer n and returns the Collatz length of n.
2. Use multiprocessing to get the Collatz lengths of the integers from 1 to 10**5.
3. Create a scatterplot of the Collatz length of n vs n.

## B. Pruning.

1. Use the Titanic datasetLinks to an external site..
2. Split the dataset into training and testing subsets.
3. Select a suitable feature subset to predict the survived/died column.
4. Train an unrestricted tree and compute its cost_complexity_pruning_path.
5. Find the optimum value of alpha. Use 3-fold cross-validation.
6. Train a tree, supplying the optimum value of alpha as a parameter.
7. Report the accuracy on the test data.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

## A. Multiprocessing

#### 1. Write a function which accepts a positive integer n and returns the Callatz length of n

In [1]:
def collatz_len(n):
    """ Finds the number of steps to arrive at 1 from the number n using the Collatz conjecture """
    steps = 0
    while n != 1:
        # n is even
        if n % 2 == 0:
            n = n/2
        # n is odd
        else:
            n = 3*n + 1

        steps = steps + 1
    return steps

Test collatz_len()

In [3]:
a = collatz_len(12)

In [4]:
collatz_len(9)

19

In [5]:
collatz_len(100000)

128

#### 2. Use Multiprocessing to get the Collatz lengths for integers form 1 to 10**5 (100000)

In [7]:
inpt = range(1,10**5)
# inpt = [1, 100, 1000, 10000, 100000]

In [3]:
from multiprocessing import Pool, cpu_count

In [4]:
processes = cpu_count() - 1
processes

7

In [5]:
res = map(collatz_len, inpt)
print(list(res))

[0, 25, 111, 29, 128]


In [8]:
with Pool(processes=processes) as pool:
    res = pool.map(collatz_len, inpt)
res

[0,
 1,
 7,
 2,
 5,
 8,
 16,
 3,
 19,
 6,
 14,
 9,
 9,
 17,
 17,
 4,
 12,
 20,
 20,
 7,
 7,
 15,
 15,
 10,
 23,
 10,
 111,
 18,
 18,
 18,
 106,
 5,
 26,
 13,
 13,
 21,
 21,
 21,
 34,
 8,
 109,
 8,
 29,
 16,
 16,
 16,
 104,
 11,
 24,
 24,
 24,
 11,
 11,
 112,
 112,
 19,
 32,
 19,
 32,
 19,
 19,
 107,
 107,
 6,
 27,
 27,
 27,
 14,
 14,
 14,
 102,
 22,
 115,
 22,
 14,
 22,
 22,
 35,
 35,
 9,
 22,
 110,
 110,
 9,
 9,
 30,
 30,
 17,
 30,
 17,
 92,
 17,
 17,
 105,
 105,
 12,
 118,
 25,
 25,
 25,
 25,
 25,
 87,
 12,
 38,
 12,
 100,
 113,
 113,
 113,
 69,
 20,
 12,
 33,
 33,
 20,
 20,
 33,
 33,
 20,
 95,
 20,
 46,
 108,
 108,
 108,
 46,
 7,
 121,
 28,
 28,
 28,
 28,
 28,
 41,
 15,
 90,
 15,
 41,
 15,
 15,
 103,
 103,
 23,
 116,
 116,
 116,
 23,
 23,
 15,
 15,
 23,
 36,
 23,
 85,
 36,
 36,
 36,
 54,
 10,
 98,
 23,
 23,
 111,
 111,
 111,
 67,
 10,
 49,
 10,
 124,
 31,
 31,
 31,
 80,
 18,
 31,
 31,
 31,
 18,
 18,
 93,
 93,
 18,
 44,
 18,
 44,
 106,
 106,
 106,
 44,
 13,
 119,
 119,
 119,
 26,
 26