### Package Imports

In [3]:
import numpy as np

# Assignment 1: Array Basics

Hi there,

Can you import Numpy and convert the following list comprehension (I just learned about comprehensions in an awesome course by Maven) into an array?

Once you've done that report the following about the array:
* The number of dimensions 
* The shape
* The number of elements in the array
* The type of data contained inside

In [4]:
my_list = [x * 10 for x in range(1, 11)]
my_list

[10, 20, 30, 40, 50, 60, 70, 80, 90, 100]

In [5]:
my_list = np.array(my_list)

In [6]:
my_list.ndim

1

In [7]:
my_list.shape

(10,)

In [8]:
my_list.size

10

In [9]:
my_list.dtype

dtype('int64')

# Assignment 2: Array Creation

Thanks for your help with the first piece - I'm starting to understand some of the key differences between base Python data types and NumPy arrays. 

Does NumPy have anything like the range() function from base Python?

If so: 
* create the same array from assignment 1 using a NumPy function. 
* Make it 5 rows and 2 columns. 
* It's ok if the datatype is float or int.

In [10]:
np.linspace(10, 100, 10, dtype="int").reshape(5,2)

array([[ 10,  20],
       [ 30,  40],
       [ 50,  60],
       [ 70,  80],
       [ 90, 100]])

In [11]:
np.arange(10, 101, 10, dtype="int").reshape(5,2)

array([[ 10,  20],
       [ 30,  40],
       [ 50,  60],
       [ 70,  80],
       [ 90, 100]])

Looking good so far! One of our data scientists asked about random number generation in NumPy.

Can you create a 3x3 array of random numbers between 0 and 1? Use a random state of 2022.

Store the random array in a variable called `random_array`.

In [12]:
from numpy.random import default_rng

rng = default_rng(2022)

random_array = rng.random(9).reshape(3,3)

random_array


array([[0.24742606, 0.09299006, 0.61176337],
       [0.06066207, 0.66103343, 0.75515778],
       [0.1108689 , 0.04305584, 0.41441747]])

# Assignment 3: Accessing Array Data


Slice and index the `random_array` we created in the previous exercise. Perform the following:

* Grab the first two 'rows' of the array
* Grab the entire first column
* Finally, grab the second selement of the third row.

Thanks!


In [13]:
random_array[:2, : ]

array([[0.24742606, 0.09299006, 0.61176337],
       [0.06066207, 0.66103343, 0.75515778]])

In [14]:
random_array[:, :1]

array([[0.24742606],
       [0.06066207],
       [0.1108689 ]])

In [15]:
random_array[2,1]

0.04305584439252108

# Assignment 4: Arithmetic Operations

The creativity of our marketing team knows no bounds!

They've asked us to come up with a simple algorithm to provide a random discount to our list of prices below. 

Before we do that, 

* Add a 5 dollar shipping fee to each price. Call this array `total`.

Once we have that, we want to use the random_array created in assignment 2 and apply them to the 6 prices.

* Grab the first 6 numbers from `random_array`, reshape it to one dimension. Call this `discount_pct`.
* Subtract `discount_pct` FROM 1, store this in `pct_owed`.
* Multiply `pct_owed` by `total` to get the final amount owed.

In [16]:
prices = np.array([5.99, 6.99, 22.49, 99.99, 4.99, 49.99])
total = prices + 5
total

array([ 10.99,  11.99,  27.49, 104.99,   9.99,  54.99])

In [17]:
discount_pct=random_array[:2,:].reshape(6)

In [18]:
pct_owed = 1 - discount_pct

In [19]:
(total*pct_owed).round(2)

array([ 8.27, 10.88, 10.67, 98.62,  3.39, 13.46])

# Assignment 5: Filtering Arrays

Filter the product array to only include those with prices greater than 25.

Modify your logic to include cola, despite it not having a price greater than 25. 
Store the elements returned in an array called `fancy_feast_special`.

Next, create a shipping cost array where the cost is 0 if price is greater than 20, and 5 if not. 

In [20]:
products = np.array(
    ["salad", "bread", "mustard", "rare tomato", "cola", "gourmet ice cream"]
)

products

array(['salad', 'bread', 'mustard', 'rare tomato', 'cola',
       'gourmet ice cream'], dtype='<U17')

In [25]:
fancy_feast_special = products[(prices > 25) | (products == 'cola')]

In [27]:
shipping_cost = np.where((prices > 20), 0, 5)

In [28]:
shipping_cost

array([5, 5, 0, 0, 5, 0])

# Assignment 6: Aggregating and Sorting Arrays

First, grab the top 3 highest priced items in our list. 

Then, calculated the mean, min, max, and median of the top three prices.

Finally, calculate the number of unique price tiers in our `price_tiers` array.

In [None]:
prices = np.array([5.99, 6.99, 22.49, 99.99, 4.99, 49.99])

prices

In [45]:
prices.sort()

In [46]:
prices[-3:]

array([22.49, 49.99, 99.99])

In [48]:
prices.mean()
prices.min()
prices.max()

99.99

In [50]:
price_tiers = np.array(["budget", "budget", "mid-tier", "luxury", "mid-tier", "luxury"])

In [51]:
np.unique(price_tiers)

array(['budget', 'luxury', 'mid-tier'], dtype='<U8')

# Assignment 7: Bringing it All Together

Ok, final NumPy task - let's read in some data with the help of Pandas.

Our data scientist provided the code to read in a csv as a Pandas dataframe, and has converted the two columns of interest to arrays.

* Filter `sales_array` down to only sales where the product family was produce. 

* Then, randomly sample roughly half (random number < .5) of the produce sales and report the mean and median sales. Use a random seed of 2022.

* Finally, create a new array that has the values 'above_both', 'above_median', and 'below_both' based on whether the sales were above the median and mean of the sample, just above the median of the sample, or below both the median and mean of the sample. 

In [63]:
import pandas as pd
import numpy as np

retail_df = pd.read_csv(
    "../retail/retail_2016_2017.csv", skiprows=range(1, 11000), nrows=1000
)

family_array = np.array(retail_df["family"])
sales_array = np.array(retail_df["sales"])

In [64]:
sales_array = sales_array[family_array == "PRODUCE"]
sales_array

array([1662.394,  447.064, 2423.944,  962.866, 1236.404,  298.441,
       1077.44 , 3404.531,  962.96 ,  279.505, 1852.786, 1089.319,
        726.516, 7860.031,  446.038, 1155.385,  120.202,  862.092,
        473.952,  254.263, 1272.755, 2775.771, 2030.762, 1657.432,
       2339.906,  722.333, 1567.843, 2458.456,  673.885, 8834.15 ])

In [65]:
rng = default_rng(2022)
random_array = rng.random(sales_array.size)
random_array


array([0.24742606, 0.09299006, 0.61176337, 0.06066207, 0.66103343,
       0.75515778, 0.1108689 , 0.04305584, 0.41441747, 0.98862926,
       0.96919869, 0.25697153, 0.55876211, 0.24234798, 0.32202029,
       0.89135975, 0.94611366, 0.72253931, 0.92847437, 0.99608701,
       0.2494223 , 0.06229007, 0.94479027, 0.65028587, 0.32167568,
       0.08336384, 0.21924361, 0.08417791, 0.05213927, 0.20525022])

In [66]:
filtered = sales_array[random_array < .5]

In [67]:
filtered

array([1662.394,  447.064,  962.866, 1077.44 , 3404.531,  962.96 ,
       1089.319, 7860.031,  446.038, 1272.755, 2775.771, 2339.906,
        722.333, 1567.843, 2458.456,  673.885, 8834.15 ])

In [79]:
mean = filtered.mean()
median = np.median(filtered)
new_array = np.where(((filtered > mean) & (filtered > median)), "above both", 
                     np.where(((filtered > median) & (filtered <=mean)), "above median", "below both"))
print(new_array[:5])
print(mean)
print(median)

['above median' 'below both' 'below both' 'below both' 'above both']
2268.102470588235
1272.755
