# Week 3
# NumPy Arrays

[NumPy](https://numpy.org/) is the core library for scientific computing in Python. It provides a high-performance multidimensional array object, and tools for working with these arrays.

**Resources:**
- Textbook Chapter 4
- [NumPy Documentation](https://numpy.org/doc/1.19/user/quickstart.html)
- [NumPy Tutorial from W3School](https://www.w3resource.com/python-exercises/numpy/index.php)

In [2]:
import numpy as np # np is a universally-used abbrevation for numpy

## NumPy Arrays

A numpy array is a grid of values, all of the same type, and is indexed by a tuple of nonnegative integers.
- **rank**: number of dimensions
- **shape** a tuple of integers giving the size of the array along each dimension

In [5]:
# Create a numpy array from a Python list
py_list = [1, 2, 3, 4]
np_ary = np.array(py_list)
print("Numpy array:", np_ary)
print("Shape:", np_ary.shape)
print("Type:", np_ary.dtype)

Numpy array: [1 2 3 4]
Shape: (4,)
Type: int64


In [6]:
# Access elements using square brackets
# Ex: print the index 0, 2, 4 elements of ary
ary = np.array([3, 1, 4, 1, 5, 9])
print(ary[0])
print(ary[1])
print(ary[0], ary[2], ary[4])


3
1
3 4 5


In [7]:
# Ex: print the last element of ary
print(ary[-1])


9


In [10]:
# Ex: print the first 3 elements of ary
print(ary[0:3])
print(ary[:3])
print(ary[-3:])
print(ary[-3:6])

[3 1 4]
[3 1 4]
[1 5 9]
[1 5 9]


In [12]:
# Ex: print the last 2 elements of ary
print(ary[-2:6])
print(ary[4:6])
print(ary[-2:])

[5 9]
[5 9]
[5 9]


In [13]:
# Create a 2D array
ary2d = np.array([[1, 2, 3],
                  [4, 5, 6]])
print(ary2d)

[[1 2 3]
 [4 5 6]]


In [15]:
# Ex: print the shape and variable type of ary2d
print(ary2d.shape)
print(ary.dtype)

(2, 3)
int64


In [19]:
# Ex: print the element on the first row and the second column
print(ary2d[0][1])
print(ary2d[0, 1])

2
2


In [20]:
# Ex: print the entire first row
print(ary2d[0])
print(ary2d[0, :])


[1 2 3]
[1 2 3]


In [21]:
# Ex: print the entire first column
print(ary2d[:, 0])


[1 4]


NumPy also provides many functions to create arrays:

In [22]:

ary = np.zeros((2, 3)) 
print(ary)

[[0. 0. 0.]
 [0. 0. 0.]]


In [23]:
ary = np.ones((3, 2))
print(ary)

[[1. 1.]
 [1. 1.]
 [1. 1.]]


In [28]:
ary = np.random.rand(2, 2)
ary2 = np.random.randint(0, 10, size =(4, 3))
print(ary)
print(ary2)

[[0.14871143 0.47620518]
 [0.91846721 0.65636063]]
[[5 0 7]
 [3 6 1]
 [8 9 1]
 [7 1 2]]


## Advanced Array Indexing
NumPy arrays support **integer array indexing** and **boolean indexing**, which provides additional tools to create a subarray.

In [29]:
# Integer array index: create a subarray whose indices come from another array
data_ary = np.array([0, 2, 4, 6, 8, 10])
idx_ary = [0, 4, 5]
# What values will be printed?
print(data_ary[idx_ary])

[ 0  8 10]


In [None]:
# this don't work with python list it only work with np array
data_list = [0, 2, 4, 6, 10]
data_list[idx_ary]

In [33]:
# Boolean indexing: select values that satisfies some condition
ary = np.array([1, 3, -5, -2, 0, -1])
idx = (ary > 0)
# What will be printed?
print(idx)
print(ary[idx])

[ True  True False False False False]
[1 3]


In [37]:
ary[ary>=0]
print(ary[ary>=0])

[1 3 0]


## Array Math
Basic mathematical functions operate elementwise on the arrays.

In [38]:
x = np.array([[1, 2, 3],
              [4, 5, 6]])
print(x + 1)

[[2 3 4]
 [5 6 7]]


In [39]:
print(x * 2)

[[ 2  4  6]
 [ 8 10 12]]


In [None]:
print(x * 2)

In [40]:
y = np.array([[10, 20, 30],
              [40, 50, 60]])
print(x + y)

[[11 22 33]
 [44 55 66]]


In [41]:
print(x * y)

[[ 10  40  90]
 [160 250 360]]


In [None]:
z = np.array([-1, -2, -3, -4]
             [-5, -6, -7, -8])
print(z * x)

## NumPy Math Functions
NumPy provides many math functions that are fully compatible with NumPy arrays

In [44]:
# Calculate the square-root of 1, 2, ..., 10
ary = np.arange(1, 11)
print(ary)
print(ary.shape)
ary2 = np.sqrt(ary)
print(ary2)

[ 1  2  3  4  5  6  7  8  9 10]
(10,)
[1.         1.41421356 1.73205081 2.         2.23606798 2.44948974
 2.64575131 2.82842712 3.         3.16227766]


In [45]:
# Statistical functions
data = np.array([12, 34, 56, 78, 90])
print("Minimum:", data.min())
print("Maximum:", data.max())
print("Mean:", data.mean())
print("Variance:", data.var())
print("Standard deviation:", data.std())

Minimum: 12
Maximum: 90
Mean: 54.0
Variance: 808.0
Standard deviation: 28.42534080710379


In [46]:
# theses function can be call directly
print(np.min(data))
print(np.max(data))

12
90


# Example: 80 Cereals - Nutrition data on 80 cereal products

In this example, we will use NumPy to analyze a dataset that contains nutrition facts on 80 cereal products. In particular, we will:
- Download and load the dataset.
- Explore for interesting information
- Examine sugar content

The data file can be downloaded from [Kaggle.com](https://www.kaggle.com/crawford/80-cereals)

>If you like to eat cereal, do yourself a favor and avoid this dataset at all costs. After seeing these data it will never be the same for me to eat Fruity Pebbles again. - Kaggle

- Download the zip file from Kaggle (login required)
- Unzip to get `cereal.csv` file
- Move the csv file to a proper folder
- Open the csv file using notepad and excel to examine its content

# Load And Examine The Data

In [None]:
# Load the csv file with np.loadtxt()
# Spoiler alert: in the next chapter we will learn a more user-friendly
# way of loading data.
np.loadtxt("cereal.csv")



In [None]:
# Show values in raw_data


In [None]:
# Ex: What is the shape of raw_data?


In [None]:
# Ex: Create a list of feature names (call it feature_names)


In [None]:
# Split raw_data into feature_names and data



## Explore The Contents

In [None]:
# Display the list of cereal names


In [None]:
# Display the list of cereal ratings


In [None]:
# What is the highest rating?


In [None]:
# Which cereal receives the highest rating?


In [None]:
# How many cereals receive rating above 60? What are they?


In [None]:
# What is the average rating?


## Sugar

In [None]:
# Display the list of sugar per serving


In [None]:
# Display the list of weight per serving


In [None]:
# Calculate sugar per ounce


In [None]:
# Which product has the highest amount of sugar per ounce?
