In [3]:
import pandas as pd

# Numpy and Arrays

Another of the common libraries and data structures that we commonly use in Python is the Numpy library and the array data structure. Numpy is a library that is used for scientific computing including data science. In particular, arrays are used heavily when creating neural network models in TensorFlow and Keras, which is what we will do towards the end of the machine learning notebooks. 

## Numpy Library

The Numpy library is a Python library that is used for scientific computing, largely for creating arrays and matrices of data for machine learning models. Numpy also includes an extensive library of, largely mathematical, functions that can be applied to arrays and matrices.

We typically import numpy with an alias of "np", so most code examples you'll see will have "np.WHATEVER" for any numpy functions. Like Pandas, this is not a rule, but it is a convention that is used by most data scientists.

In [4]:
import numpy as np

## Numpy Arrays

Numpy arrays are a data structure that is used to store data, and are largely similar to a list, though a more rigid one. Numpy arrays are used heavily in machine learning models, and are the data structure that is used to store the data that is actually passed into the models.

For the most part, numpy arrays will work seamlessly in many of the functions that we use with other data structures. For example, the visualizations of data that we'll use through the machine learning work will make use of lists, series, arrays, or dataframes. 

![Array](../../images/array.png "Array")
![Array](../images/array.png "Array")

### Arrays as a Concept

The array is perhaps the most foundational data structure of all, and if we use more lower-level languages such as C, arrays are <i>the</i> data structure that we can use without importing any outside libraries. An array is structurally a lot like a list, but with a few key restrictions and differences:
<ul>
<li> All elements of an array must be of the same type, whereas a list can contain elements of different types. </li>
<li> Arrays are of a fixed size, whereas lists can grow and shrink. Array sizes are declared when created and can't change. </li>
<li> The items in an array are mutable, we can modify any value inside an array. </li>
<li> Arrays are stored in contiguous memory, whereas lists are not, or may not be. </li>
</ul>

The final point is one that is mainly irrelevant when using an array in practice, but it is one that explains why arrays were so much more critical in older languages such as C than they are in new high-level languages such as Python. It also helps illustrate the difference between an array and a list, as well as the two types of languages. 

![Array Memory](../../images/array_memory.png "Array Memory")
![Array Memory](../images/array_memory.png "Array Memory")

Because an array is a fixed size, such as 100 integers, the computer can allocate a block of memory that is 100 integers long, and then store the array in that block of memory. This is what is meant by contiguous memory, the array is stored in a single block of memory. Since arrays never shrink or grow, as long as we have the address of the first element in the array, we can calculate the address of any element in the array. We also never need to worry about what happens if we need more space, as that's not possible. This definitive location in memory, leads to a few outcomes:
<ul>
<li> First, this specific memory address was one of the building blocks of modern data structures such as a list - first adding indexing, then adding flexibility. Needing to manually manage memory is a lot of work, and avoiding doing so is one benefit of using a high-level language such as Python. </li>
<li> Second, this is why arrays are so much faster than lists. If we want to access the 50th element of an array, we can calculate the memory address of that element and go directly to it. If we want to access the 50th element of a list, we have to start at the beginning of the list and iterate through it until we get to the 50th element. With a list we need to nagivate from item to item, and ensure we are going to the correct item; with an array, we know by definition where exactly every item is, so accessing each one is extremely fast. </li>
</ul>

### Array Usage

While the restrictions of an array make it less useful than something like a list in many cases, we do still have uses for it, primarily for the final datasets that we actually feed to machine learning models. Since the data we feed in always has consistent types as well as a fixed size, and we really care about efficiency if we have large data that is going to be repeatedly accessed, arrays are a good choice for this. In general, we can do most or all of the preparation of our data while it is stored in dataframes, and then convert it to arrays when we are ready to feed it into a model.

We can make some simple arrays, though this step is actually a little more complex than creating a list or a dictionary - since an array needs to be a set size, we need to specify that size at creation. We can do this by passing in values to use in a list, or by using one of the functions to prefill an array with a specific value. We can also specify the type of the array, which we normally do implicitly, by filling it with some object. The types of arrays are the same as the types of variables, so we can have an array of integers, floats, strings, etc. For our purposes, we will virtually always use arrays of floats, as that is what we need to feed into machine learning models.

In [5]:
# Make an array
array1 = np.array([1,2,3,4,5,6,7,8,9,10])
print(array1)

# Create an empty array of 20 0's
array2 = np.zeros(20)
print(array2)

# Create an array of 10 empty values
array3 = np.full(10, None)
print(array3)

[ 1  2  3  4  5  6  7  8  9 10]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[None None None None None None None None None None]


#### Array Shape

The shape of an array is the number of dimensions that it has, and the size of each dimension. For example, a 1-dimensional array of 10 elements has a shape of (10), a 2-dimensional array of 10 rows and 5 columns has a shape of (10, 5), and a 3-dimensional array of 10 rows, 5 columns, and 2 layers has a shape of (10, 5, 2). We can get the shape with the "shape" attribute of an array.

Note that the dimensions of a 1 dimension array will show as something like (10,), not (10,1) - the first is an array of 10 items, the second is a 2-dimensional array of 10 rows and 1 column. These two are different and we do need to be careful about the distinction.

In [6]:
print(array1.shape)
print(array2.shape)
print(array3.shape)

(10,)
(20,)
(10,)


#### Operations on Arrays

When we want to perform some operation on the data in an array, we can create loops just like we would with a list. We can also use the built-in functions that are part of the Numpy library, which are generally faster and more efficient than using a loop. For simple operations, we can also just "math" on the arrays directly - this will apply the operation to the entire array, and is the fastest and most efficient way to do so.

In [7]:
print("Before: ", array2)
array2 += 1
print("After: ", array2)

Before:  [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
After:  [1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]


In [8]:
for index, item in enumerate(array2):
    print(index, item)

0 1.0
1 1.0
2 1.0
3 1.0
4 1.0
5 1.0
6 1.0
7 1.0
8 1.0
9 1.0
10 1.0
11 1.0
12 1.0
13 1.0
14 1.0
15 1.0
16 1.0
17 1.0
18 1.0
19 1.0


In [9]:
array1.sum()

55

#### Arrays from Dataframes

Probably the most common way that we'll use arrays is to take prepared data from a dataframe and convert it to an array. Since we already know the size and type of the data, the array can carve itself out a spot in memory and then copy the data over. 

This is a pretty typical example of what we'd do with arrays when doing some machine learning. We have some data in a dataframe that is ready to be used in machine learning, we split the target and features (what we want to predict from the inputs we use to predict), make them both arrays, and Bob's your uncle. As we can see, the array version is much less reader friendly than the dataframe version, but it is much more efficient to use in a model.

In general, an array can be created from pretty much any other data structure, including a list, a tuple, or a dictionary.

In [10]:
df = pd.read_excel("../../data/sportsref_download.xlsx", header=1)
#df = pd.read_excel("../data/sportsref_download.xlsx", header=1)
df.head()

FileNotFoundError: [Errno 2] No such file or directory: '../../data/sportsref_download.xlsx'

In [None]:
target_column = np.array(df['PTS'])
label_column = np.array(df['Unnamed: 1'])
feature_set = np.array(df.drop(columns={'PTS', "Unnamed: 1", "Rk"}, axis=1))
print(target_column)
print(feature_set)

[12 10 10 10 10 10  9  9  9  8  8  8  7  7  7  7  7  6  6  6  6  5  5  5
  5  4  4  4  3  2  1  1  7]
[[ 2.750e+01  6.000e+00  6.000e+00  0.000e+00  0.000e+00  1.000e+00
   2.700e+01  1.200e+01  0.000e+00  0.000e+00  2.390e+00 -1.100e-01
   4.500e+00  2.000e+00  5.000e+00  2.500e+01  2.000e+01  4.000e+00
   2.700e+01  8.519e+01  1.000e+00  1.000e+00  1.080e+01  1.180e+01
   2.100e+02  1.290e+01  1.890e+02  9.370e-01  0.000e+00]
 [ 2.790e+01  5.000e+00  5.000e+00  0.000e+00  0.000e+00  1.000e+00
   2.200e+01  8.000e+00  0.000e+00  0.000e+00  2.130e+00 -6.700e-01
   4.400e+00  1.600e+00  6.000e+00  1.900e+01  3.158e+01  2.000e+00
   2.000e+01  9.000e+01  0.000e+00  0.000e+00  8.000e+00  8.000e+00
   1.750e+02  1.260e+01  1.490e+02  9.460e-01  0.000e+00]
 [ 2.940e+01  5.000e+00  5.000e+00  0.000e+00  0.000e+00  1.000e+00
   2.300e+01  1.300e+01  1.000e+00  0.000e+00  1.940e+00 -2.600e-01
   4.600e+00  2.600e+00  8.000e+00  1.700e+01  4.706e+01  2.000e+00
   1.700e+01  8.824e+01  1.000e+00

We can also combine arrays, or other data structures, together using several functions, including the zip command. This is a common operation when we are combining data from multiple sources, or when we are combining the target and features together to make a single array. In machine learning work, something like this is pretty common as we want to take an array of predictions and compare it to an array of correct values. Since our sizes are fixed, we know that the values in the array should line up perfectly. 

In [None]:
# combine arrays
combined = zip(target_column, label_column)
for item in combined:
    print(item)

(12, 'Florida Panthers')
(10, 'Carolina Hurricanes')
(10, 'Edmonton Oilers')
(10, 'St. Louis Blues')
(10, 'Minnesota Wild')
(10, 'Washington Capitals')
(9, 'Buffalo Sabres')
(9, 'Calgary Flames')
(9, 'New York Rangers')
(8, 'San Jose Sharks')
(8, 'Columbus Blue Jackets')
(8, 'Pittsburgh Penguins')
(7, 'New York Islanders')
(7, 'Vancouver Canucks')
(7, 'Detroit Red Wings')
(7, 'Winnipeg Jets')
(7, 'Tampa Bay Lightning')
(6, 'Nashville Predators')
(6, 'Boston Bruins')
(6, 'Dallas Stars')
(6, 'New Jersey Devils')
(5, 'Anaheim Ducks')
(5, 'Philadelphia Flyers')
(5, 'Toronto Maple Leafs')
(5, 'Seattle Kraken')
(4, 'Ottawa Senators')
(4, 'Colorado Avalanche')
(4, 'Vegas Golden Knights')
(3, 'Los Angeles Kings')
(2, 'Montreal Canadiens')
(1, 'Chicago Blackhawks')
(1, 'Arizona Coyotes')
(7, 'League Average')


## Multi-Dimensional Arrays

A multi-dimensional array is an array that has more than one dimension, and is also known as a matrix when there are two dimensions, and a tensor when there are three or more dimensions. Arrays that are in two or more dimensions are commonly used to create a data structure that mirrors a dataframe, or to encode the data for an image. 

![3d Array](../../images/3d_array.webp "3d Array")
![3d Array](../images/3d_array.webp "3d Array")

### Multi-Dimensional Array Usage

We can access items in a multi-dimensional array by specifying the index of each dimension, separated by commas. For example, if we have a two-dimensional array, we can access the item in the first row and second column by using the index [0, 1]. If we have a three-dimensional array, we can access the item in the first row, second column, and third dimension by using the index [0, 1, 2]. Row is the first digit, column the second, and "depth" or "dimension" the third.

In [None]:
array2d1 = np.array([[1,2,3], [4,5,6], [7,8,9]])
print(array2d1)

[[1 2 3]
 [4 5 6]
 [7 8 9]]


In [None]:
print(feature_set)

[[ 2.750e+01  6.000e+00  6.000e+00  0.000e+00  0.000e+00  1.000e+00
   2.700e+01  1.200e+01  0.000e+00  0.000e+00  2.390e+00 -1.100e-01
   4.500e+00  2.000e+00  5.000e+00  2.500e+01  2.000e+01  4.000e+00
   2.700e+01  8.519e+01  1.000e+00  1.000e+00  1.080e+01  1.180e+01
   2.100e+02  1.290e+01  1.890e+02  9.370e-01  0.000e+00]
 [ 2.790e+01  5.000e+00  5.000e+00  0.000e+00  0.000e+00  1.000e+00
   2.200e+01  8.000e+00  0.000e+00  0.000e+00  2.130e+00 -6.700e-01
   4.400e+00  1.600e+00  6.000e+00  1.900e+01  3.158e+01  2.000e+00
   2.000e+01  9.000e+01  0.000e+00  0.000e+00  8.000e+00  8.000e+00
   1.750e+02  1.260e+01  1.490e+02  9.460e-01  0.000e+00]
 [ 2.940e+01  5.000e+00  5.000e+00  0.000e+00  0.000e+00  1.000e+00
   2.300e+01  1.300e+01  1.000e+00  0.000e+00  1.940e+00 -2.600e-01
   4.600e+00  2.600e+00  8.000e+00  1.700e+01  4.706e+01  2.000e+00
   1.700e+01  8.824e+01  1.000e+00  0.000e+00  1.300e+01  8.600e+00
   1.680e+02  1.370e+01  1.880e+02  9.310e-01  0.000e+00]
 [ 2.880e+

### Slicing

Like lists, we can slice arrays by their index. 2D arrays can be sliced by row, column, or both. 3D arrays can be sliced by row, column, depth, or any combination of the three. We can address the items in the array in two ways:
<ul>
<li> By specifying the indicies separated by commas. </li>
<li> By using a new [] for each dimension. </li>
</ul>

These two are equivalent, we can use whatever is more convenient. Personally, I think the first is easier to manage when we have multiple dimensions or are doing elaborate slicing. 

We can also use the colon ":" in a dimension to get all of the items in that dimension. For example, if we have a 2D array, we can use [0, :] to get all of the items in the first row, and [:, 0] to get all of the items in the first column. We can also use the colon to get a range of items, such as [0:2, 0:2] to get the first two rows and columns. As well, the -1 (or other negative) index can be used to get the last item, or items counting in from the end.

![Numpy Indexing](../../images/numpy_indexing.png "Numpy Indexing")
![Numpy Indexing](../images/numpy_indexing.png "Numpy Indexing")

With the examples from above...

In [None]:
# Row 1
array2d1[1]

array([4, 5, 6])

In [None]:
# Row 1, all columns
array2d1[1][:]

array([4, 5, 6])

In [None]:
# The second item in the second column
array2d1[1][1]

5

In [None]:
# The second item in the second column
array2d1[:,1][1]

5

In [None]:
# Also the second item in the second column
array2d1[1:2,1:2][0][0]

5

In [None]:
# Also the second item in the second column
array2d1[-2][-2]

5

## Exercise

Do some slicing and dicing on the feature set array. 
<ol>
<li> Get the first 5 columns. </li>
<li> Get the first 5 columns for the bottom 10 teams. </li>
<li> Get every third team GF. </li>
</ol>

In [None]:
# First 5 columns
feature_set[:,0:5]

array([[27.5,  6. ,  6. ,  0. ,  0. ],
       [27.9,  5. ,  5. ,  0. ,  0. ],
       [29.4,  5. ,  5. ,  0. ,  0. ],
       [28.8,  5. ,  5. ,  0. ,  0. ],
       [29.4,  6. ,  5. ,  1. ,  0. ],
       [29.1,  6. ,  4. ,  0. ,  2. ],
       [28.3,  6. ,  4. ,  1. ,  1. ],
       [28. ,  6. ,  4. ,  1. ,  1. ],
       [26.2,  7. ,  4. ,  2. ,  1. ],
       [28.6,  6. ,  4. ,  2. ,  0. ],
       [26. ,  6. ,  4. ,  2. ,  0. ],
       [28.2,  6. ,  3. ,  1. ,  2. ],
       [29.5,  6. ,  3. ,  2. ,  1. ],
       [26.8,  7. ,  3. ,  3. ,  1. ],
       [26.9,  6. ,  3. ,  2. ,  1. ],
       [27.7,  6. ,  3. ,  2. ,  1. ],
       [28.9,  7. ,  3. ,  3. ,  1. ],
       [27.2,  7. ,  3. ,  4. ,  0. ],
       [28. ,  4. ,  3. ,  1. ,  0. ],
       [30.4,  6. ,  3. ,  3. ,  0. ],
       [26.2,  5. ,  3. ,  2. ,  0. ],
       [27.5,  7. ,  2. ,  4. ,  1. ],
       [28.5,  4. ,  2. ,  1. ,  1. ],
       [28.2,  7. ,  2. ,  4. ,  1. ],
       [28.5,  7. ,  2. ,  4. ,  1. ],
       [26.2,  6. ,  2. ,

In [None]:
# First 5 columns, bottom 10 rows
feature_set[-10:,0:5]

array([[28.2,  7. ,  2. ,  4. ,  1. ],
       [28.5,  7. ,  2. ,  4. ,  1. ],
       [26.2,  6. ,  2. ,  4. ,  0. ],
       [27.4,  6. ,  2. ,  4. ,  0. ],
       [28.2,  6. ,  2. ,  4. ,  0. ],
       [28.2,  6. ,  1. ,  4. ,  1. ],
       [28.3,  7. ,  1. ,  6. ,  0. ],
       [27.8,  6. ,  0. ,  5. ,  1. ],
       [28.4,  6. ,  0. ,  5. ,  1. ],
       [28. ,  6. ,  3. ,  2. ,  1. ]])

In [None]:
# Every third team points
# Every third row
# Column number 6
feature_set[::3,6]

array([27., 25., 18., 20., 15., 24., 14., 20., 18., 13., 12.])

#### Slice Steps

We can also use a third number in the slice to specify the step size. For example, if we have a 2D array, we can use [0, ::2] to get every other item in the first row, and [:, 0::2] to get every other item in the first column. We can also use the step size to reverse the order of the items, such as [::-1] to get all of the items in reverse order.

<b>Note:</b> unless there's some clear reason, we probably don't want to use a step size other than one or have too complex of array slices. It can just become hard to debug and understand if there are long shortcuts slicing up the data. 

In [None]:
array2d1[::-1]

array([[7, 8, 9],
       [4, 5, 6],
       [1, 2, 3]])

In [None]:
# Every other row, every other column
array2d1[0::2,0::2]

array([[1, 3],
       [7, 9]])

#### Shape and Reshape

The shape of an array is the number of elements in each dimension. For example, if we have a two-dimensional array that is 10 rows by 5 columns, the shape of the array is (10, 5). We can get the shape of an array by using the shape attribute, which is a tuple of the dimensions. We can also use the reshape function to change the shape of an array, which is useful when we want to convert a one-dimensional array into a multi-dimensional array, or when we want to change the shape of a multi-dimensional array.

Reshape() will force an array into a different shape, assuming the dimensions actually work out. This sounds odd, but we actually use it fairly regularly in machine learning. Two common places where we'll see it are:
<ul>
<li> We have an individual (usually target) column from a dataframe and we need to make it "tall" instead of "wide". </li>
<li> We have a multi-dimensional array, like an image, that we need to flatten into a single dimension. </li>
</ul>

These reshaping operations can be a little tricky to wrap your head around, but they do start to make sense with practice. We must make sure that the new shape has the same number of elements as the old shape, or we'll get an error and the operation will fail. 

In [None]:
print(array2d1.shape)
print(feature_set.shape)
print(target_column.shape)

(3, 3)
(33, 29)
(33,)


In [None]:
print(target_column.shape)
print(target_column)

(33,)
[12 10 10 10 10 10  9  9  9  8  8  8  7  7  7  7  7  6  6  6  6  5  5  5
  5  4  4  4  3  2  1  1  7]


When using the reshape function, one common thing to see is a -1 in one of the dimensions, that means "however many is needed". For example, if we have a 100 element array, and we want to reshape it, using reshape(20, -1) will set the output to be 20 rows tall, and determine that the array needs to be 5 columns wide to fit all 100 elements. This is something we use very frequently to prepare one column data - we tell it to be one column wide, and however many rows are needed to fit all the data.

In [None]:
print(target_column.reshape(-1, 1).shape)
print(target_column.reshape(-1, 1))

(33, 1)
[[12]
 [10]
 [10]
 [10]
 [10]
 [10]
 [ 9]
 [ 9]
 [ 9]
 [ 8]
 [ 8]
 [ 8]
 [ 7]
 [ 7]
 [ 7]
 [ 7]
 [ 7]
 [ 6]
 [ 6]
 [ 6]
 [ 6]
 [ 5]
 [ 5]
 [ 5]
 [ 5]
 [ 4]
 [ 4]
 [ 4]
 [ 3]
 [ 2]
 [ 1]
 [ 1]
 [ 7]]


### Shaping Arrays

When reshaping arrays we are obviously performing a transformation on the data, changing it from one format to another. The key thing to remember when doing so is that we normally don't really care about looking at the data after it has been reshaped, we are using it for some other function. For example, we might flatten a 2D array to make it into one row of data. As long as we are consistent in exactly how we reshape the data, we will get consistent results. 

![Array Reshape](../../images/array_reshape.jpeg "Array Reshape")
![Array Reshape](../images/array_reshape.jpeg "Array Reshape")

#### Mini-Exercise

Fill in the two functions below. Each one should be pretty simple, likely one line. Don't use the flatten() function, use reshape. 

In [None]:
def flattenArray(array):
    # Make the array into one row. 
    return array.reshape(1, -1)
    
def unflattenArray(array, num_col):
    # Make the array into a 2d array with num_col columns
    return array.reshape(-1, num_col)

These should all work if the functions above do the job. 

In [None]:
array_2D = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12], [13, 14, 15, 16]])
array_2D

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12],
       [13, 14, 15, 16]])

In [None]:
tmp = flattenArray(array_2D)
tmp

array([[ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16]])

In [None]:
unflattenArray(tmp, 4)

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12],
       [13, 14, 15, 16]])

In [None]:
tmp2 = flattenArray(unflattenArray(tmp, 8))
tmp2

array([[ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16]])

In [None]:
unflattenArray(tmp2, 4)

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12],
       [13, 14, 15, 16]])

## Numpy Functions

In addition to the array, numpy provides a large number of functions that can be applied to arrays. These functions are largely mathematical in nature, and are used to perform mathematical operations on arrays. The numpy array API page is located here https://numpy.org/doc/1.26/reference/arrays.ndarray.html and is a good reference, and a good reference page to practice referring to for more information. It is fairly clear and easy to read, and has examples. We won't focus on these different functions in detail at all, if we need one we should be able to look up what the function is and the details on how it works. 

Some common numpy functions include:
<ul>
<li> np.zeros() - Creates an array of all zeros. </li>
<li> np.ones() - Creates an array of all ones. </li>
<li> np.full() - Creates an array of a specified size, filled with a specified value. </li>
<li> np.random.random() - Creates an array of a specified size, filled with random values between 0 and 1. </li>
<li> np.random.normal() - Creates an array of a specified size, filled with random values from a normal distribution with a specified mean and standard deviation. </li>
<li> np.random.shuffle() - Shuffles the values in an array. </li>
<li> np.random.seed() - Sets the random seed, so that random values are reproducible. </li>
<li> np.arange() - Creates an array of a specified size, filled with a sequence of numbers. </li>
<li> np.linspace() - Creates an array of a specified size, filled with a sequence of numbers that are evenly spaced between a specified minimum and maximum value. </li>
<li> np.reshape() - Reshapes an array into a specified shape. </li>
<li> np.concatenate() - Combines two arrays. </li>
<li> np.split() - Splits an array into two arrays. </li>
<li> np.transpose() - Transposes an array. </li>
<li> np.dot() - Performs matrix multiplication on two arrays. </li>
<li> np.sum() - Calculates the sum of an array. </li>
<li> np.mean() - Calculates the mean of an array. </li>
<li> np.std() - Calculates the standard deviation of an array. </li>
</ul>

Many of these functions are equivalent to ones we used on dataframes or lists to get statistics from the data. Others, such as the ones to initialize an array or do actions like transpose or dot product, are commonly used in the innards of machine learning models - we will later look at using them inside a homemade neural network. These functions tend to be more efficient than their equivalents, so if we already have data in an array, using the built-in functions is preferable. 

In [None]:
print(array_2D.sum())
print(array_2D[0].sum())
print(array_2D[:,0].sum())

136
10
28


#### Transpose

One of the numpy functions that is commonly useful, largely outside of manipulating arrays is the transpose function. This returns a new array that is the transpose of the original array, which means that the rows become columns and the columns become rows. In calculating things, this is useful if we want to do something like average a row, or (how we'll see it later) perform operations between two arrays. It is also useful with some of the information functions, such as describe, especially if we have a large number of columns. The transpose function has a shortcut of T, so we can use array.T to transpose an array.

For the describe example below, we can see that the results are now in a datasheet format, that is more useful if we want to do any kind of analysis or automated processing on the data.

In [None]:
print(array_2D)
print("\n\n")
print(array_2D.T)

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]
 [13 14 15 16]]



[[ 1  5  9 13]
 [ 2  6 10 14]
 [ 3  7 11 15]
 [ 4  8 12 16]]


In [None]:
df.describe(include="all").T

Unnamed: 0,count,unique,top,freq,mean,std,min,25%,50%,75%,max
Rk,32.0,,,,16.5,9.380832,1.0,8.75,16.5,24.25,32.0
Unnamed: 1,33.0,33.0,Florida Panthers,1.0,,,,,,,
AvAge,33.0,,,,28.006061,1.030152,26.0,27.5,28.2,28.5,30.4
GP,33.0,,,,6.0,0.790569,4.0,6.0,6.0,6.0,7.0
W,33.0,,,,3.0,1.414214,0.0,2.0,3.0,4.0,6.0
L,33.0,,,,2.393939,1.675921,0.0,1.0,2.0,4.0,6.0
OL,33.0,,,,0.606061,0.609272,0.0,0.0,1.0,1.0,2.0
PTS,33.0,,,,6.606061,2.737921,1.0,5.0,7.0,9.0,12.0
PTS%,33.0,,,,0.564879,0.256842,0.083,0.357,0.583,0.75,1.0
GF,33.0,,,,18.060606,4.554801,11.0,14.0,18.0,21.0,27.0


## Exercise - Array All Day

Complete the Calendar class below. This class is meant to represent a calendar that holds the number of meetings that you have on any given day. 

The calendar itself should be stored in a 2D array, where each row represents a week, and each column represents a day of the week. The calendar should be initialized to have 0 meetings on every day. This means that the calendar should start life as a 2D array of 0s, with 6 rows and 7 columns. Every time a meeting is added, the value in the array should increase by 1. The purpose of this object is to act as a simple counter for the number of meetings in a day, and provide for simple related functions.

#### Output

Printing a calendar month should look (some version of) nice. Mine looks like this, with the number of meetings in each spot (I just played around with the number of spaces before/after the values, it only took a few trials. Don't spend forever making sure it lines up perfectly):

![Array Calendar](../../images/array_calendar.png "Array Calendar")
![Array Calendar](../images/array_calendar.png "Array Calendar")

In [16]:

class MyCalendarMonth():
    
    days_week = ["Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"]
    months_31 = ["January", "March", "May", "July", "August", "October", "December"]
    months_30 = ["April", "June", "September", "November"]
    months_28 = ["February"]
    months = ["January", "February", "March", "April", "May", "June", "July", "August", "September","October", "November", "December"]

    def __init__(self, month="January", start_day="Monday"):
        self._days = np.zeros((6,7))
        self._month = month
        self._start_day = start_day
        self._first_day_index = self.setFirstDay(start_day)
        self.month_length = self.setMonthLength(month)
        self.setNone()
        #print(self._first_day_index)
    
    def setMonthLength(self, month):
        if month in MyCalendarMonth.months_31:
            return 31
        elif month in MyCalendarMonth.months_30:
            return 30
        elif month in MyCalendarMonth.months_28:
            return 28
        else:
            raise ValueError("Invalid month")
    
    def setFirstDay(self, day):
        value = [-1,-1]
        offset = MyCalendarMonth.days_week.index(day)
        value = [0, offset]
        return value
    
    def addMeeting(self, date):
        self.updateCal(date, 1)
    def deleteMeeting(self, date):
        self.updateCal(date, -1)
    
    def getCalendar(self):
        return self._days
    def getWeek(self, week):
        return self._days[week]
    def printWeek(self, week):
        temp_week = self.getWeek(week)
        for ind, day in enumerate(temp_week):
            print(MyCalendarMonth.days_week[ind], ":", int(day))
    
    def getPosition(self, date):
        #total_days = date+self._first_day_index[1]
        row = (date + self._first_day_index[0]) // 7
        col = (date + self._first_day_index[1]) % 7
        pos = (row, col)
        #print(date, self._first_day_index, pos, total_days)
        return pos
    def updateCal(self, date, val=1):
        print(self._first_day_index, date, self._start_day, val)
        spot_row, spot_col = self.getPosition(date)
        tmp = self._days[spot_row][spot_col]
        tmp += val
        if tmp < 0:
            tmp = 0
        self._days[spot_row][spot_col] = tmp
        
    def setNone(self):
        # set the days before the start and after the end to None
        wk1_limit = MyCalendarMonth.days_week.index(self._start_day)
        wk6_start = self.getPosition(self.month_length)[1]
        #print(wk1_limit, wk6_start, self._first_day_index)
        for i in range(0, wk1_limit):
            self._days[0][i] = None
        for i in range(wk6_start, 7):
            #print(i)
            self._days[5][i] = None
    
    def __str__(self):
        return_string = self._month + "\n"
        return_string += " ".join("S M T W T F S") + "\n"
        for ind, week in enumerate(self._days):
            #print(ind, week)
            if ind == 0:
                start = self._first_day_index[1]
                for i in range(0, start):
                    return_string += "    "
                return_string += " ".join([str(int(week[i]))+"  " for i in range(start, 7)])+"\n"
            elif ind == 5:
                end = self.getPosition(self.month_length)[1]
                return_string += " ".join([str(int(week[i]))+"  " for i in range(0, end)])+"\n"
            else:
                return_string += " ".join([str(int(day))+"  " if day != None else " " for day in week]) + "\n"
        return return_string

    # This will add some meetings randomly, so we can test. 
    def fakeMeetings(self):
        for i in range(0, 42):
            self.updateCal(i, np.random.randint(0, 3))

##### Testing

Let's see if this works... If you used the same method names and parameters as I did, this should work. 

In [17]:
a = MyCalendarMonth(start_day="Friday")

5 1 [0, 5]


In [18]:
a.getCalendar()


array([[nan, nan, nan, nan, nan,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0., nan, nan, nan, nan, nan, nan]])

In [19]:
# Fill with dummy data
a.fakeMeetings()
a.getCalendar()


[0, 5] 0 Friday 0
[0, 5] 1 Friday 2
[0, 5] 2 Friday 1
[0, 5] 3 Friday 2
[0, 5] 4 Friday 2
[0, 5] 5 Friday 0
[0, 5] 6 Friday 0
[0, 5] 7 Friday 0
[0, 5] 8 Friday 1
[0, 5] 9 Friday 2
[0, 5] 10 Friday 0
[0, 5] 11 Friday 0
[0, 5] 12 Friday 2
[0, 5] 13 Friday 1
[0, 5] 14 Friday 1
[0, 5] 15 Friday 2
[0, 5] 16 Friday 0
[0, 5] 17 Friday 0
[0, 5] 18 Friday 0
[0, 5] 19 Friday 2
[0, 5] 20 Friday 0
[0, 5] 21 Friday 1
[0, 5] 22 Friday 2
[0, 5] 23 Friday 2
[0, 5] 24 Friday 1
[0, 5] 25 Friday 0
[0, 5] 26 Friday 1
[0, 5] 27 Friday 1
[0, 5] 28 Friday 1
[0, 5] 29 Friday 0
[0, 5] 30 Friday 2
[0, 5] 31 Friday 1
[0, 5] 32 Friday 1
[0, 5] 33 Friday 0
[0, 5] 34 Friday 2
[0, 5] 35 Friday 2
[0, 5] 36 Friday 2
[0, 5] 37 Friday 0
[0, 5] 38 Friday 1
[0, 5] 39 Friday 0
[0, 5] 40 Friday 1
[0, 5] 41 Friday 2


array([[nan, nan, nan, nan, nan,  0.,  2.],
       [ 2.,  0.,  0.,  2.,  1.,  0.,  1.],
       [ 0.,  0.,  0.,  2.,  0.,  1.,  2.],
       [ 2.,  1.,  0.,  1.,  1.,  1.,  2.],
       [ 2.,  1.,  1.,  0.,  2.,  1.,  0.],
       [ 0., nan, nan, nan, nan, nan, nan]])

In [20]:
print(a)
print(MyCalendarMonth("July", "Tuesday"))

January
S   M   T   W   T   F   S
                    0   2  
2   0   0   2   1   0   1  
0   0   0   2   0   1   2  
2   1   0   1   1   1   2  
2   1   1   0   2   1   0  
0  

2 5 [0, 2]
July
S   M   T   W   T   F   S
        0   0   0   0   0  
0   0   0   0   0   0   0  
0   0   0   0   0   0   0  
0   0   0   0   0   0   0  
0   0   0   0   0   0   0  
0   0   0   0   0  

