# Week 3
# NumPy Arrays

[NumPy](https://numpy.org/) is the core library for scientific computing in Python. It provides a high-performance multidimensional array object, and tools for working with these arrays.

**Resources:**
- Textbook Chapter 4
- [NumPy Documentation](https://numpy.org/doc/stable/)
- [NumPy Tutorial from W3School](https://www.w3resource.com/python-exercises/numpy/index.php)

In [1]:
import numpy as np # np is a universally-used abbrevation for numpy

## NumPy Arrays

A numpy array is a grid of values, all of the same type, and is indexed by a tuple of nonnegative integers.
- **rank**: number of dimensions
- **shape** a tuple of integers giving the size of the array along each dimension

In [4]:
# Create a numpy array from a Python list
py_list = [1, 2, 3, 4]
np_ary = np.array(py_list)
print("Numpy array:", np_ary)
print("Shape:", np_ary.shape)
print("Type:", np_ary.dtype)

Numpy array: [1 2 3 4]
Shape: (4,)
Type: int32


In [6]:
# Access elements using square brackets
# Ex: print the index 0, 2, 4 elements of ary
ary = np.array([3, 1, 4, 1, 5, 9])
print(ary[0])
print(ary[0], ary[2], ary[4])

3
3 4 5


In [7]:
# Ex: print the last element of ary
print(ary[-1])

9


In [13]:
# Ex: print the first 3 elements of ary
print(ary[0:3])
print(ary[:3])
print(ary[-3:])
print(ary[-3:6])

[3 1 4]
[3 1 4]
[1 5 9]
[1 5 9]


In [16]:
# Ex: print the last 2 elements of ary
print(ary[-2:])
print(ary[-2:6])
print(ary[4:6])

[5 9]
[5 9]
[5 9]


In [17]:
# Create a 2D array
ary2d = np.array([[1, 2, 3],
                  [4, 5, 6]])
print(ary2d)

[[1 2 3]
 [4 5 6]]


In [19]:
# Ex: print the shape and variable type of ary2d
print(ary2d.shape)
print(ary2d.dtype)

(2, 3)
int32


In [21]:
# Ex: print the element on the first row and the second column
print(ary2d[0][1])
print(ary2d[0, 1])

2
2


In [23]:
# Ex: print the entire first row
print(ary2d[0])
print(ary2d[0, :])

[1 2 3]
[1 2 3]


In [24]:
# Ex: print the entire first column
print(ary2d[:, 0])

[1 4]


NumPy also provides many functions to create arrays:

In [28]:
ary = np.zeros((2, 3))
print(ary)
print(ary.dtype)

[[0. 0. 0.]
 [0. 0. 0.]]
float64


In [26]:
ary = np.ones((3, 2))
print(ary)

[[1. 1.]
 [1. 1.]
 [1. 1.]]


In [27]:
ary = np.random.rand(2, 2)
print(ary)

[[0.72916549 0.12643226]
 [0.95844534 0.79730669]]


In [33]:
# What if we want to create an array of random integers?
int_ary = np.random.randint(0, 10, size=(4, 3))
print(int_ary)

[[7 4 6]
 [6 8 2]
 [7 4 4]
 [8 8 4]]


## Advanced Array Indexing
NumPy arrays support **integer array indexing** and **boolean indexing**, which provides additional tools to create a subarray.

In [34]:
# Integer array index: create a subarray whose indices come from another array
data_ary = np.array([0, 2, 4, 6, 8, 10])
idx_ary = [0, 4, 5]
# # What values will be printed?
print(data_ary[idx_ary])

[ 0  8 10]


In [35]:
# Python list does not support integer array indexing
data_list = [0, 2, 4, 6, 8, 10]
data_list[idx_ary]

TypeError: list indices must be integers or slices, not list

In [37]:
# Boolean indexing: select values that satisfies some condition
ary = np.array([1, 3, -5, -2, 0, -1])
idx = (ary > 0)
# What will be printed?
print(idx)
print(ary[idx])

[ True  True False False False False]
[1 3]


In [40]:
# plug in the condition directly
print(ary[ary >= 0])

[1 3 0]


## Array Math
Basic mathematical functions operate elementwise on the arrays.

In [41]:
x = np.array([[1, 2, 3],
              [4, 5, 6]])
print(x + 1)

[[2 3 4]
 [5 6 7]]


In [42]:
print(x * 2)

[[ 2  4  6]
 [ 8 10 12]]


In [43]:
y = np.array([[10, 20, 30],
              [40, 50, 60]])
print(x + y)

[[11 22 33]
 [44 55 66]]


In [44]:
print(x * y)

[[ 10  40  90]
 [160 250 360]]


In [45]:
z = np.array([[-1, -2, -3, -4],
              [-5, -6, -7, -8]])
print(x + z)

ValueError: operands could not be broadcast together with shapes (2,3) (2,4) 

## NumPy Math Functions
NumPy provides many math functions that are fully compatible with NumPy arrays

In [48]:
# Calculate the square-root of 1, 2, ..., 10
ary = np.arange(1, 11)
print(ary)
print(ary.shape)
ary2 = np.sqrt(ary)
print(ary2)

[ 1  2  3  4  5  6  7  8  9 10]
(10,)
[1.         1.41421356 1.73205081 2.         2.23606798 2.44948974
 2.64575131 2.82842712 3.         3.16227766]


In [49]:
# Statistical functions
data = np.array([12, 34, 56, 78, 90])
print("Minimum:", data.min())
print("Maximum:", data.max())
print("Mean:", data.mean())
print("Variance:", data.var())
print("Standard deviation:", data.std())

Minimum: 12
Maximum: 90
Mean: 54.0
Variance: 808.0
Standard deviation: 28.42534080710379


In [51]:
# These functions can be called directly
print(np.min(data))
print(np.max(data))

12
90


# Example: 80 Cereals - Nutrition data on 80 cereal products

In this example, we will use NumPy to analyze a dataset that contains nutrition facts on 80 cereal products. In particular, we will:
- Download and load the dataset.
- Explore for interesting information
- Examine sugar content

The data file can be downloaded from [Kaggle.com](https://www.kaggle.com/crawford/80-cereals)

>If you like to eat cereal, do yourself a favor and avoid this dataset at all costs. After seeing these data it will never be the same for me to eat Fruity Pebbles again. - Kaggle

- Download the zip file from Kaggle (login required)
- Unzip to get `cereal.csv` file
- Move the csv file to a proper folder
- Open the csv file using notepad and excel to examine its content

## Load And Examine The Data

In [2]:
# Load the csv file with np.loadtxt()
# Spoiler alert: in the next chapter we will learn a more user-friendly
# way of loading data.
import numpy as np # import numpy again to make this section self-contained
raw_data = np.loadtxt("cereal.csv", delimiter=",", skiprows=1, dtype=str)

In [7]:
# Show values in raw_data
print(raw_data[0:5, :])

[['100% Bran' 'N' 'C' '70' '4' '1' '130' '10' '5' '6' '280' '25' '3' '1'
  '0.33' '68.402973']
 ['100% Natural Bran' 'Q' 'C' '120' '3' '5' '15' '2' '8' '8' '135' '0'
  '3' '1' '1' '33.983679']
 ['All-Bran' 'K' 'C' '70' '4' '1' '260' '9' '7' '5' '320' '25' '3' '1'
  '0.33' '59.425505']
 ['All-Bran with Extra Fiber' 'K' 'C' '50' '4' '0' '140' '14' '8' '0'
  '330' '25' '3' '1' '0.5' '93.704912']
 ['Almond Delight' 'R' 'C' '110' '2' '2' '200' '1' '14' '8' '-1' '25' '3'
  '1' '0.75' '34.384843']]


In [11]:
# Ex: What is the shape of raw_data?
# len(raw_data)
raw_data.shape

(77, 16)

In [12]:
# Ex: Create a list of feature names (call it feature_names)
feature_names = ["name","mfr","type","calories","protein","fat","sodium",
                 "fiber","carbo","sugars","potass","vitamins","shelf",
                 "weight","cups","rating"]

## Explore The Contents

In [13]:
# Display the list of cereal names
raw_data[:, 0]

array(['100% Bran', '100% Natural Bran', 'All-Bran',
       'All-Bran with Extra Fiber', 'Almond Delight',
       'Apple Cinnamon Cheerios', 'Apple Jacks', 'Basic 4', 'Bran Chex',
       'Bran Flakes', "Cap'n'Crunch", 'Cheerios', 'Cinnamon Toast Crunch',
       'Clusters', 'Cocoa Puffs', 'Corn Chex', 'Corn Flakes', 'Corn Pops',
       'Count Chocula', "Cracklin' Oat Bran", 'Cream of Wheat (Quick)',
       'Crispix', 'Crispy Wheat & Raisins', 'Double Chex', 'Froot Loops',
       'Frosted Flakes', 'Frosted Mini-Wheats',
       'Fruit & Fibre Dates; Walnuts; and Oats', 'Fruitful Bran',
       'Fruity Pebbles', 'Golden Crisp', 'Golden Grahams',
       'Grape Nuts Flakes', 'Grape-Nuts', 'Great Grains Pecan',
       'Honey Graham Ohs', 'Honey Nut Cheerios', 'Honey-comb',
       'Just Right Crunchy  Nuggets', 'Just Right Fruit & Nut', 'Kix',
       'Life', 'Lucky Charms', 'Maypo',
       'Muesli Raisins; Dates; & Almonds',
       'Muesli Raisins; Peaches; & Pecans', 'Mueslix Crispy Blend',
  

In [29]:
# Display the list of cereal ratings
ratings = raw_data[:, -1]
ratings.shape

(77,)

In [None]:
# Display the amount of sugars for each product.
# How can we identify the column index for sugar?

# index_sugars = feature_names.index("sugars")
# raw_data[:, index_sugars]

# raw_data[:, feature_names.index("sugars")]

raw_data[:, np.array(feature_names) == "sugars"] # use a condition to select the sugars column

In [36]:
# What is the highest rating?
ratings.max() # This will cause an error since the array contains strings

TypeError: cannot perform reduce with flexible type

In [37]:
# Need to convert strings in ratings to floating point numbers.
ratings.astype(float)

array([68.402973, 33.983679, 59.425505, 93.704912, 34.384843, 29.509541,
       33.174094, 37.038562, 49.120253, 53.313813, 18.042851, 50.764999,
       19.823573, 40.400208, 22.736446, 41.445019, 45.863324, 35.782791,
       22.396513, 40.448772, 64.533816, 46.895644, 36.176196, 44.330856,
       32.207582, 31.435973, 58.345141, 40.917047, 41.015492, 28.025765,
       35.252444, 23.804043, 52.076897, 53.371007, 45.811716, 21.871292,
       31.072217, 28.742414, 36.523683, 36.471512, 39.241114, 45.328074,
       26.734515, 54.850917, 37.136863, 34.139765, 30.313351, 40.105965,
       29.924285, 40.69232 , 59.642837, 30.450843, 37.840594, 41.50354 ,
       60.756112, 63.005645, 49.511874, 50.828392, 39.259197, 39.7034  ,
       55.333142, 41.998933, 40.560159, 68.235885, 74.472949, 72.801787,
       31.230054, 53.131324, 59.363993, 38.839746, 28.592785, 46.658844,
       39.106174, 27.753301, 49.787445, 51.592193, 36.187559])

In [None]:
# Which cereal receives the highest rating?


In [None]:
# How many cereals receive rating above 60? What are they?


In [None]:
# What is the average rating?


## Sugar

In [None]:
# Display the list of sugar per serving


In [None]:
# Display the list of weight per serving


In [None]:
# Calculate sugar per ounce


In [None]:
# Which product has the highest amount of sugar per ounce?
