## Exercises

In your `numpy-pandas-visualization-exercises` repo, create a file named `numpy_exercises.py` for this exercise.

Use the following code for the questions below:

```
a = np.array([4, 10, 12, 23, -2, -1, 0, 0, 0, -6, 3, -7])
```

1. How many negative numbers are there?

2. How many positive numbers are there?

3. How many even positive numbers are there?

4. If you were to add 3 to each data point, how many positive numbers would there be?

5. If you squared each number, what would the new mean and standard deviation be?

6. A common statistical operation on a dataset is **centering**. This means to adjust the data such that the mean of the data is 0. This is done by subtracting the mean from each data point. Center the data set. See [this link](https://www.theanalysisfactor.com/centering-and-standardizing-predictors/) for more on centering.

7. Calculate the z-score for each data point. Recall that the z-score is given by:

    $$
    Z = \frac{x - \mu}{\sigma}
    $$
    
    
8. Copy the setup and exercise directions from [More Numpy Practice](https://gist.github.com/ryanorsinger/c4cf5a64ec33c014ff2e56951bc8a42d) into your `numpy_exercises.py` and add your solutions.

In [1]:
import numpy as np

In [2]:
a = np.array([4, 10, 12, 23, -2, -1, 0, 0, 0, -6, 3, -7])
a

array([ 4, 10, 12, 23, -2, -1,  0,  0,  0, -6,  3, -7])

In [3]:
# How many negative numbers are there?
# a[a < 0] gives the negatives
print(a[a < 0])
len(a[a < 0])

[-2 -1 -6 -7]


4

In [4]:
# how many positive numbers
# .size tells us how many elements an array has.
print(a[a > 0])
print(a[a > 0].size)

[ 4 10 12 23  3]
5


In [5]:
# How would we solve this without &
positives = a[a > 0]
evens = positives[positives % 2 == 0]
evens

array([ 4, 10, 12])

In [6]:
# How many even positive numbers are there?
print(a[(a > 0) & (a % 2 == 0)])
print(a[(a > 0) & (a % 2 == 0)].shape)

[ 4 10 12]
(3,)


### Aside on `.size` and `.shape` vs `len`

In [7]:
x = np.array([[1, 2, 3], [4, 5, 6]])
print(".size returns the total number of elements in the array, regardless of any nesting")
x.size

.size returns the total number of elements in the array, regardless of any nesting


6

In [8]:
x

array([[1, 2, 3],
       [4, 5, 6]])

In [9]:
print(".shape returns the dimensions of the array")
x.shape

.shape returns the dimensions of the array


(2, 3)

In [10]:
print("The Python len function returns the number of elements contained only in the outer-most array")
len(x)

The Python len function returns the number of elements contained only in the outer-most array


2

In [11]:
# If you were to add 3 to each data point, how many positive numbers would there be?
add_three = a + 3
print(add_three[add_three > 0])
add_three[add_three > 0].size

[ 7 13 15 26  1  2  3  3  3  6]


10

In [12]:
# If you squared each number, what would the new mean and standard deviation be?
squares = a**2

{
    "mean": squares.mean(),
    "standard_deviation": squares.std()
}

{'mean': 74.0, 'standard_deviation': 144.0243035046516}

In [13]:
# Center the array
centered = a - a.mean()
centered

array([  1.,   7.,   9.,  20.,  -5.,  -4.,  -3.,  -3.,  -3.,  -9.,   0.,
       -10.])

In [14]:
centered.mean()

0.0

In [15]:
# Calculate the z-scores for each datapoint
zscores = (a - a.mean()) / a.std()
zscores

array([ 0.12403473,  0.86824314,  1.11631261,  2.48069469, -0.62017367,
       -0.49613894, -0.3721042 , -0.3721042 , -0.3721042 , -1.11631261,
        0.        , -1.24034735])

In [16]:
a.std()

8.06225774829855

In [17]:
20 / 2.48

8.064516129032258

## Exercises from the Gist 
https://gist.github.com/ryanorsinger/c4cf5a64ec33c014ff2e56951bc8a42d

In [18]:
## Setup 1
a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

# Use python's built in functionality/operators to determine the following:
# Exercise 1 - Make a variable called sum_of_a to hold the sum of all the numbers in above list

sum_of_a = sum(a)
sum_of_a

55

In [19]:
# Exercise 2 - Make a variable named min_of_a to hold the minimum of all the numbers in the above list
min_of_a = min(a)
min_of_a

1

In [20]:
# Exercise 3 - Make a variable named max_of_a to hold the max number of all the numbers in the above list
max_of_a = max(a)
max_of_a

10

In [21]:
# Exercise 4 - Make a variable named mean_of_a to hold the average of all the numbers in the above list
mean_of_a = sum(a) / len(a)
mean_of_a

5.5

In [22]:
# Exercise 5 - Make a variable named product_of_a to hold the product of multiplying all the numbers in the above list together
product_of_a = 1
for n in a:
    product_of_a *= n

product_of_a

3628800

In [23]:
# Exercise 6 - Make a variable named squares_of_a. It should hold each number in a squared like [1, 4, 9, 16, 25...]
squares_of_a = [n ** 2 for n in a]
squares_of_a

[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

In [24]:
# Exercise 7 - Make a variable named odds_in_a. It should hold only the odd numbers
odds_in_a = [n for n in a if n % 2 != 0]
odds_in_a

[1, 3, 5, 7, 9]

In [25]:
# Exercise 8 - Make a variable named evens_in_a. It should hold only the evens.
evens_in_a = [n for n in a if n % 2 == 0]
evens_in_a

[2, 4, 6, 8, 10]

In [26]:
## Setup 2: Consider what it would take to find the sum, min, max, average, sum, product, and list of squares for this list of two lists.
b = [
    [3, 4, 5],
    [6, 7, 8]
]

# base python approach to sum
# sum(b) throws an error since we have a list of lists
# we could do two loops.. or we could flatten the list of lists into one list
flattened_list = []
for numbers in b:
    for n in numbers:
        flattened_list.append(n)
flattened_list

[3, 4, 5, 6, 7, 8]

In [27]:
output = {
    "sum": sum(flattened_list),
    "min": min(flattened_list),
    "max": max(flattened_list),
    "avg": sum(flattened_list) / len(flattened_list),
    "squares": [n**2 for n in flattened_list]
}
output

{'sum': 33, 'min': 3, 'max': 8, 'avg': 5.5, 'squares': [9, 16, 25, 36, 49, 64]}

In [28]:
product = 1
for n in flattened_list:
    product *= n

output["product"] = product
output

{'sum': 33,
 'min': 3,
 'max': 8,
 'avg': 5.5,
 'squares': [9, 16, 25, 36, 49, 64],
 'product': 20160}

In [29]:
# Let's do things the numpy way!
b = [
    [3, 4, 5],
    [6, 7, 8]
]

b = np.array(b)
b

array([[3, 4, 5],
       [6, 7, 8]])

In [30]:
numpy_way = {
    "sum": b.sum(),
    "min": b.min(),
    "max": b.max(),
    "mean": b.mean(),
    "squares": b**2,
    "product": b.prod()
}
numpy_way

{'sum': 33,
 'min': 3,
 'max': 8,
 'mean': 5.5,
 'squares': array([[ 9, 16, 25],
        [36, 49, 64]]),
 'product': 20160}

In [31]:
# Exercise 9 - print out the shape of the array b.
b.shape

(2, 3)

In [32]:
# Exercise 10 - transpose the array b.
b.T

array([[3, 6],
       [4, 7],
       [5, 8]])

In [33]:
# Exercise 11 - reshape the array b to be a single list of 6 numbers. (1 x 6)
b.flatten()

array([3, 4, 5, 6, 7, 8])

In [34]:
# reshape the array b to be a single list of 6 numbers. (1 x 6)
b.reshape(1, 6)

array([[3, 4, 5, 6, 7, 8]])

In [35]:
# reshape the array b to be a single list of 6 numbers. (1 x 6)
b.reshape(6, 1)

array([[3],
       [4],
       [5],
       [6],
       [7],
       [8]])

In [36]:
## Setup 3
c = [
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
]

c = np.array(c)
c

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [37]:
{
    "sum": c.sum(),
    "min": c.min(),
    "max": c.max(),
    "mean": c.mean(),
    "product": c.prod()
}

{'sum': 45, 'min': 1, 'max': 9, 'mean': 5.0, 'product': 362880}

In [38]:
c.std()

2.581988897471611

In [39]:
c.var()

6.666666666666667

In [40]:
c.shape

(3, 3)

In [41]:
np.transpose(c)

array([[1, 4, 7],
       [2, 5, 8],
       [3, 6, 9]])

In [42]:
c.T

array([[1, 4, 7],
       [2, 5, 8],
       [3, 6, 9]])

In [43]:
np.dot(c, c)

array([[ 30,  36,  42],
       [ 66,  81,  96],
       [102, 126, 150]])

In [44]:
c * c.T

array([[ 1,  8, 21],
       [ 8, 25, 48],
       [21, 48, 81]])

In [45]:
(c * c.T).sum()

261

In [46]:
(c * c.T).prod()

131681894400

In [47]:

## Setup 4
d = [
    [90, 30, 45, 0, 120, 180],
    [45, -90, -30, 270, 90, 0],
    [60, 45, -45, 90, -45, 180]
]

d = np.array(d)

In [48]:
np.sin(d)

array([[ 0.89399666, -0.98803162,  0.85090352,  0.        ,  0.58061118,
        -0.80115264],
       [ 0.85090352, -0.89399666,  0.98803162, -0.17604595,  0.89399666,
         0.        ],
       [-0.30481062,  0.85090352, -0.85090352,  0.89399666, -0.85090352,
        -0.80115264]])

In [49]:
np.cos(d)

array([[-0.44807362,  0.15425145,  0.52532199,  1.        ,  0.81418097,
        -0.59846007],
       [ 0.52532199, -0.44807362,  0.15425145,  0.98438195, -0.44807362,
         1.        ],
       [-0.95241298,  0.52532199,  0.52532199, -0.44807362,  0.52532199,
        -0.59846007]])

In [50]:
np.tan(d)

array([[-1.99520041, -6.4053312 ,  1.61977519,  0.        ,  0.71312301,
         1.33869021],
       [ 1.61977519,  1.99520041,  6.4053312 , -0.17883906, -1.99520041,
         0.        ],
       [ 0.32004039,  1.61977519, -1.61977519, -1.99520041, -1.61977519,
         1.33869021]])

In [51]:
d[d < 0]

array([-90, -30, -45, -45])

In [52]:
d[d > 0]

array([ 90,  30,  45, 120, 180,  45, 270,  90,  60,  45,  90, 180])

In [53]:
np.unique(d)

array([-90, -45, -30,   0,  30,  45,  60,  90, 120, 180, 270])

In [54]:
np.unique(d).size

11

In [55]:
d.shape

(3, 6)

In [56]:
d.T.shape

(6, 3)

In [57]:
# np.function_name
np.product(d)

0

In [58]:
# object.method()
d.prod()

0

In [59]:
np.prod(d)

0

In [60]:
d.reshape(9, 2)

array([[ 90,  30],
       [ 45,   0],
       [120, 180],
       [ 45, -90],
       [-30, 270],
       [ 90,   0],
       [ 60,  45],
       [-45,  90],
       [-45, 180]])

In [61]:
x = np.array([1, 2, 3])
y = np.array([0, 1, 2])

np.dot(x, y)

8

#### Dot Product AKA inner product
- Sum of the products of elements at corresponding indeces
- Algorithm: multiply values at each corresponding index, then take the sum of that output

`a = [1, 2, 3]`

`b = [0, 1, 2]`

`[1*0, 2*1, 3*2]`

`[0, 2, 6]`

`sum([0, 2, 6])`

`8`

In [62]:
# Practical example of a dot product is a weighted average
# [attendance, exercises, projects, quiz]
weights = [.1, .2, .4, .3]

# actual grades for someone
grades = [100, 99, 72, 80]

weights = np.array(weights)
grades = np.array(grades)

np.dot(grades, weights)

82.6

In [63]:
grades.mean()

87.75

## Numpy Key Takeaways (transfer to Pandas seamslessly)
- Vectorized operations with comparison operators `==`, `>`, `>=`, etc..
- Vectorized operations for mathematical operators like `*`, `+`, `/`, etc...
- We use arrays of booleans (usually produced by comparison operators) to filter our arrays of data