# Midterm.todo.ipynb

## Purpose

This is the practical portion of the midterm.  The goal of this is to test your ability to put pieces together from the previous modules.  This includes up through some Pandas (although it'll be intentionally light on contents since we didn't cover it in as much detail as we needed).

This will consist of multiple "problems" (Please see procedure).  The purpose of how these are written is to emulate what you will end up seeing in the business world and applying the tools you gained thus far, to solve those issues.

## Procedure

You'll be presented with 3 "problems".  Each problem/task will be self isolated.  Meaning, the output of problem 1 should not be used in problem 2.  Each problem stands on their own.

Because there's a greater demand on creating subtasks, please leave your work in for those subtasks.  Please list your subtasks as markdown cells, and the code which you used to get there.

For all random number generation, please use a seed of **5000**.

I really do encourage you to think about each task, and how you want to get to the end.  Please note that there's no one-way to do these problems.  You can accomplish it any number of ways.  But, your output at the very end should match my output.  If it doesn't, then something was done incorrectly.  Showing your work is incredibly important here.

# Problem 1

Please use elementary python code for this type of operation.  So no numpy or pandas.  Please also study the output that I provided, yours should match a similar type of idea, but depending on your dates and sample values picked, your results can be different than mine.

1.  Generate a random list of 10 days (between 2020/01/01 and 2020/01/03)
2.  Generate a list of 10 integer values (I used between 5 and 7)
3.  Print the results for both.
4.  Create a list of tuples.  The first entry is the date, the second is the test value
5.  Create a set from the resulting list (unique)
6.  Print out the lengths of both, as well as the resulting set.
7.  Create a loop that basically groups by the date, and sums the test values

In [1]:
import random as rnd
rnd.seed(5000)

sampleTen = range(0, 10)
testDates = [f"2020/01/0{x}" for x in [rnd.randint(1,3) for _ in sampleTen]]
testValues = [rnd.randint(5, 7) for _ in sampleTen]

print(f"Generation of testDates = {testDates}")
print(f"Generation of testValues = {testValues}")

Generation of testDates = ['2020/01/01', '2020/01/02', '2020/01/02', '2020/01/02', '2020/01/01', '2020/01/03', '2020/01/02', '2020/01/01', '2020/01/03', '2020/01/02']
Generation of testValues = [6, 7, 6, 7, 5, 5, 6, 7, 6, 7]


In [2]:
dataSet = [(testDates[x], testValues[x]) for x in sampleTen]
dataSet

[('2020/01/01', 6),
 ('2020/01/02', 7),
 ('2020/01/02', 6),
 ('2020/01/02', 7),
 ('2020/01/01', 5),
 ('2020/01/03', 5),
 ('2020/01/02', 6),
 ('2020/01/01', 7),
 ('2020/01/03', 6),
 ('2020/01/02', 7)]

In [3]:
dataSetT = set(dataSet)

In [4]:
print(f"Length of dataSet = {len(dataSet)}")
print(f"Length of dataSetT = {len(dataSetT)}")
dataSetT

Length of dataSet = 10
Length of dataSetT = 7


{('2020/01/01', 5),
 ('2020/01/01', 6),
 ('2020/01/01', 7),
 ('2020/01/02', 6),
 ('2020/01/02', 7),
 ('2020/01/03', 5),
 ('2020/01/03', 6)}

In [5]:
groupedSum = {}
while len(dataSetT) > 0:
    record = dataSetT.pop()
    date, value = record
    if date in groupedSum:
        groupedSum[date] = groupedSum[date] + value
    else:
        groupedSum[date] = value
groupedSum

{'2020/01/02': 13, '2020/01/03': 11, '2020/01/01': 18}

# Problem 2
This problem is very specific, and has to be done a specific way to get decent marks on this problem.  For both of these, we're writing loops that work **backwards**, meaning instead of going through a list from index 0 to n, we're going from n to 0.  This is different than what we've done in the past.  I strongly recommend working with a smaller data set than the final answer, and getting an idea of how to do this.  I also suggest you spend time on paper before doing this.

Please note, you should only use more primitive/basic Python for this.  So no NumPy or Pandas.  Also, no reversing the list before doing the loop.

1.  Generate a random set of integers as a list.  The range of random values is flexible, I used 1, 100.  The resulting list must have 199 elements.
2.  Print out the last 10 elements of this data set, and the length of the data set.
3.  Creating a for loop, we're going to crawl through the list - backwards.
    -  Start with the max value, and go down.
    - For each iteration, you're adding the value you're at, and the next value going down (see example logic below)
    -  You resulting list should still be 199.  Note that the resulting list after this loop will be in reverse order from the original list.  The last element in this list should equal the first element from the data set generated in #1.
4.  Creating a while loop, we're going to do a similar set of operations that was done in #3.  The logic inside the loop is **very similar**.
5.  Compare the two lists.  They should be exactly the same size, and the same elements.

The logic to explain this problem is a bit difficult, so I'll try to explain with a much smaller sample.  Let's assume you have the following list defined.

myList = $[x_{1}, x_{2}, x_{3}, x_{4}, x_{5}]$

Mathematically, the result list would be:

resultList = $[x_{5}+x_{4}, x_{4}+x_{3}, x_{3}+x_{2}, x_{2}+x_{1}, x_{1}]$

As a reminder, you **must** work with the list going from $x_{5}$ to $x_{1}$.  This must not be done by reversing the list before looping through.  The goal here is to take a concept we've worked with already, and apply it in a different way.

In [6]:
rnd.seed(5000)

In [7]:
testSampleData = [rnd.randint(1, 100) for _ in range(1, 200)]
print(f"Length of testSampleData: {len(testSampleData)}")
print(f"Last 10 elements of testSampleData: {testSampleData[len(testSampleData)-10:]}")
print(f"First 10 elements of testSampleData: {testSampleData[0:10]}")

Length of testSampleData: 199
Last 10 elements of testSampleData: [39, 81, 73, 41, 3, 69, 69, 3, 92, 91]


In [8]:
nextAdditionFor = []
for x in range(len(testSampleData)-1, -1, -1):
    firstVal = testSampleData[x]
    secondVal = 0
    if x > 0:
        secondVal = testSampleData[x-1]
    nextAdditionFor.append(firstVal + secondVal)
print(f"Length of nextAdditionFor: {len(nextAdditionFor)}")
print(f"Contents(First 10): {nextAdditionFor[0:10]}")
print(f"Contents(Last 10): {nextAdditionFor[-10:]}")

Length of nextAdditionFor: 199
Contents(First 10): [183, 95, 72, 138, 72, 44, 114, 154, 120, 71]
Contents(Last 10): [109, 71, 58, 117, 80, 74, 110, 98, 77, 30]


In [9]:
nextAdditionWhile = []
counter = len(testSampleData) - 1
while counter >= 0:
    firstVal = testSampleData[counter]
    secondVal = 0
    if counter > 0:
        secondVal = testSampleData[counter-1]
    nextAdditionWhile.append(firstVal + secondVal)
    counter = counter - 1
print(f"Length of nextAdditionWhile: {len(nextAdditionWhile)}")
print(f"Contents(First 10): {nextAdditionWhile[0:10]}")
print(f"Contents(Last 10): {nextAdditionWhile[-10:]}")

Length of nextAdditionWhile: 199
Contents(First 10): [183, 95, 72, 138, 72, 44, 114, 154, 120, 71]
Contents(Last 10): [109, 71, 58, 117, 80, 74, 110, 98, 77, 30]


# Problem 3

This problem should use numPy for the entire operation.  The basic setup is done for you already.

This will also be a challenging problem to think through (like #2).  I'll provided a small output example below that should help.  But, in a more story form, we want to think of the array in slices (horizontal and vertical) in relation to the diagonal element of interest.  We want to divide the elements along the slices by the diagonal value.  To define the diagonal, in the code I have below it's:

[28, 68, 45, ..., 9]

Study the output of the code to see what I'm doing.  One example.  Look at the first diagonal element (28).  If we look at the elements on the horizontal and verticals for this, and want to divide, then lets take the first horizontal element, 67.  $67 / 28 = 2.3928..$  We'll round to 2 decimal places.  Note, to keep the numbers from getting too small, the any one number is divided _once_ and only _once_.  So as we process through the diagonal, fewer and fewer elements are divided.
 
1.  Create a 10x10 matrix of random integers
2.  Convert this matrix from integers to floats
    - Base documentation at https://numpy.org/doc/stable/reference/arrays.html
    - I'm being intentionally vague as to what you need specifically.  Please search for the right method, and the right destination type to use.
    - Do not loop through the matrix, or compose a new matrix with those types.  Also, do not regenerate #1 as floats
3. Divide elements on the horizontal, and veritcal axises (from that point on) by that element.

A few important notes.  The last element in the diagonal (9) does not divide _any_ element.  The previous diagonal element (81) will only divide 82 (horizontal), and 81 (vertical).  

To put this in picture form...

In [43]:
import numpy as np
import numpy.random as nrand
nrand.seed(5000)
np.set_printoptions(suppress=True)

In [44]:
intmatrix = nrand.randint(1, 100, (10, 10))
intmatrix

array([[28, 67, 98,  2, 30,  3, 15,  4, 97, 60],
       [27, 68, 47,  4, 65, 85, 93, 38, 60, 67],
       [77, 30, 45, 49, 53, 66, 46, 40, 33, 63],
       [81, 25, 72, 86, 45, 73, 64, 50, 51, 86],
       [68, 71, 64, 32, 56, 71, 58, 74, 95, 72],
       [89, 82, 71, 41, 42, 51, 95, 91, 38, 27],
       [17, 63, 57, 80, 43, 81, 36, 44, 91, 87],
       [ 2, 23, 30, 67, 83,  8,  6, 97, 30, 36],
       [43, 24, 19, 63, 30, 36, 76, 49, 81, 82],
       [55, 82, 83, 49, 51, 14, 20, 95, 81,  9]])

In [45]:
fmatrix = intmatrix.astype(dtype=float)
fmatrix

array([[28., 67., 98.,  2., 30.,  3., 15.,  4., 97., 60.],
       [27., 68., 47.,  4., 65., 85., 93., 38., 60., 67.],
       [77., 30., 45., 49., 53., 66., 46., 40., 33., 63.],
       [81., 25., 72., 86., 45., 73., 64., 50., 51., 86.],
       [68., 71., 64., 32., 56., 71., 58., 74., 95., 72.],
       [89., 82., 71., 41., 42., 51., 95., 91., 38., 27.],
       [17., 63., 57., 80., 43., 81., 36., 44., 91., 87.],
       [ 2., 23., 30., 67., 83.,  8.,  6., 97., 30., 36.],
       [43., 24., 19., 63., 30., 36., 76., 49., 81., 82.],
       [55., 82., 83., 49., 51., 14., 20., 95., 81.,  9.]])

In [46]:
x = 0
while x < len(fmatrix):
    divisor = fmatrix[x,x]
    ySlice = fmatrix[x+1:,x]   # More debugging purposes, can print these
    xSlice = fmatrix[x, x+1:]  # More debugging purposes, can print these

    # Process x-dimension
    fmatrix[x, x+1:] = np.around(xSlice / divisor, 2)

    # Process y-dimension
    fmatrix[x+1:,x] = np.around(ySlice / divisor, 2)
    x += 1

fmatrix


array([[28.  ,  2.39,  3.5 ,  0.07,  1.07,  0.11,  0.54,  0.14,  3.46,
         2.14],
       [ 0.96, 68.  ,  0.69,  0.06,  0.96,  1.25,  1.37,  0.56,  0.88,
         0.99],
       [ 2.75,  0.44, 45.  ,  1.09,  1.18,  1.47,  1.02,  0.89,  0.73,
         1.4 ],
       [ 2.89,  0.37,  1.6 , 86.  ,  0.52,  0.85,  0.74,  0.58,  0.59,
         1.  ],
       [ 2.43,  1.04,  1.42,  0.37, 56.  ,  1.27,  1.04,  1.32,  1.7 ,
         1.29],
       [ 3.18,  1.21,  1.58,  0.48,  0.75, 51.  ,  1.86,  1.78,  0.75,
         0.53],
       [ 0.61,  0.93,  1.27,  0.93,  0.77,  1.59, 36.  ,  1.22,  2.53,
         2.42],
       [ 0.07,  0.34,  0.67,  0.78,  1.48,  0.16,  0.17, 97.  ,  0.31,
         0.37],
       [ 1.54,  0.35,  0.42,  0.73,  0.54,  0.71,  2.11,  0.51, 81.  ,
         1.01],
       [ 1.96,  1.21,  1.84,  0.57,  0.91,  0.27,  0.56,  0.98,  1.  ,
         9.  ]])