# Lab 2 

# Introduction to NumPy

In this lab, you'll be working through Chapter 2 to get an introduction to the numerical computing package for Python, NumPy. This notebook is made up of two sections.

- Section 1: Work through the code samples in Chapter 2
- Section 2: Exercises

# Section 1: Code Practice

In this section, you will be reading through the various chapter sections and typing out/running the code samples given in the sections. The purpose of this is for you to practice using Jupyter to run Python code as well as learn about the functionality available to you in both IPython and Jupyter.

##### Executing code in Jupyter

When typing and executing code in Jupyter, it is helpful to know the various keyboard shortcuts. You can find the full list of these by clicking **Help &rarr; Keyboard Shortcuts** in the menu. However, the two most useful keyboard shortcuts are:

- `Shift-Enter`: Execute the current cell and advance to the next cell. This will create one if none exists, but if a cell exists below your current cell, a new cell will **not** be created.
- `Alt-Enter`: Execute the current cell and **create** a new cell below.
- `Control-Enter`: Execute the current cell without advancing to the next cell

When writing your code, you will be using these two commands to make sure input/output (`In`/`Out`) is consistent with what is found in the chapter. If you create a cell by mistake, you can always go to **Edit &rarr; Delete Cells** to remove it.

#### Purpose of Section 1

Your purpose in this section is 

- **Type out** the code examples from the chapter (do not copy and paste)
- **Run** them
- **Check** to **make sure** you are getting the same results as what is contained in the chapter

---




## Understanding Data Types in Python

#### A Python List is More Than Just a List

In [None]:
L = list(range(10))
L

In [None]:
type(L[0])

In [None]:
L2 = [str(c) for c in L]
L2

In [None]:
type(L2[0])

In [None]:
L3 = [True, "2", 3.0, 4]
[type(item) for item in L3]

#### Fixed-Type Arrays in Python

In [None]:
import array
L = list(range(10))
A = array.array('i', L)
A

In [None]:
import numpy as np

#### Creating Arrays from Python Lists

In [None]:
# integer array:
np.array([1, 4, 2, 5, 3])

In [None]:
np.array([3.14, 4, 2, 3])

In [None]:
np.array([1, 2, 3, 4], dtype='float32')

In [None]:
# nested lists result in multi-dimenstional arrays
np.array([range(i, i + 3) for i in [2, 4, 6]])

#### Creating Arrays from Scratch

In [None]:
#Create a length-10 integer array filled with zeros
np.zeros(10, dtype=int)

In [None]:
# Create a 3x5 floating-point array filled with ones
np.ones((3, 5), dtype=float)

In [None]:
# Create a 3x5 array filled with 3.14
np.full((3, 5), 3.14)

In [None]:
# Create an array filled with a linear sequence
# Starting at 0, ending at 20, stepping by 2
#(this is similar to the built-in range() function)
np.arange(0, 20, 2)

In [None]:
# Create an array of five values evenly spaced between 0 and 1
np.linspace(0, 1, 5)

In [None]:
# Create 3x3 array of uniformly distributed 
# random values between 0 and 1
np.random.random((3, 3))

In [None]:
# Create a 3x3 array of normally distributed random values
# with mean 0 and standard deviation 1
np.random.normal(0, 1, (3, 3))

In [None]:
# Create a 3x3 array of random integers in the interval [0, 10)
np.random.randint(0, 10, (3, 3))

In [None]:
# Create a 3x3 identity matrix
np.eye(3)

In [None]:
# Create an uninitialized array of three integers
# The values will be whatever happens to already exist at that memory Location
np.empty(3)

## The Basics of NumPy Arrays

### NumPy Array Attributes

In [None]:
import numpy as np
np.random.seed(0) # seed for reproducibility

x1 = np.random.randint(10, size=6) # One-dimensional array
x2 = np.random.randint(10, size=(3, 4)) # Two-dimensional array
x3 = np.random.randint(10, size=(3, 4, 5))

In [None]:
print("x3 ndim:", x3.ndim)
print("x3 shape:", x3.shape)
print("x3 size:", x3.size)

In [None]:
print("dtype:", x3.dtype)

In [None]:
print("itemsize:", x3.itemsize, "bytes")
print("nbytes:", x3.nbytes, "bytes")

### Array Indexing: Accessing Single Elements

In [None]:
x1

In [None]:
x1[0]

In [None]:
x1[4]

In [None]:
x1[-1]

In [None]:
x1[-2]

In [None]:
x2

In [None]:
x2[0, 0]

In [None]:
x2[2, 0]

In [None]:
x2[2, -1]

In [None]:
x2[0, 0] = 12
x2

In [None]:
x1[0] = 3.14159  # this will be truncated!
x1

### Array Slicing: Accessing Subarrays

#### One-dimensional subarrays

In [None]:
x = np.arange(10)
x

In [None]:
x[:5] # first five elements

In [None]:
x[5:] #elements after index 5

In [None]:
x[4:7] # middle sub-array

In [None]:
x[::2] # every other element

In [None]:
x[1::2] # every other, starting at index 1

In [None]:
x[::-1] #all elements, reversed

In [None]:
x[5::-2] # reversed every other from index 5

#### Multi-dimentional subarrays

In [None]:
x2

In [None]:
x2[:2, :3] #two rows, three colums

In [None]:
x2[:3, ::2] # all rows, every other column

In [None]:
x2[::-1, ::-1]

In [None]:
print(x2[:, 0]) # first column of x2

In [None]:
print(x2[0, :]) # first row of x2

In [None]:
print(x2[0]) #equivalent to x2[0, :]

#### Subarrays as no-copy views

In [None]:
print(x2)

In [None]:
x2_sub = x2[:2, :2]
print(x2_sub)

In [None]:
x2_sub[0, 0] = 99
print(x2_sub)

In [None]:
print(x2)

#### Creating copies of arrays

In [None]:
x2_sub_copy = x2[:2, :2].copy()
print(x2_sub_copy)

In [None]:
x2_sub_copy[0, 0] = 42
print(x2_sub_copy)

In [None]:
print(x2)

### Reshaping of Arrays

In [None]:
grid = np.arange(1, 10).reshape((3, 3))
print(grid)

In [None]:
x = np.array([1, 2, 3])

# row vector via reshape
x.reshape((1, 3))

In [None]:
# row vector via newaxis
x[np.newaxis, :]

In [None]:
# column vector via reshape
x.reshape((3, 1))

In [None]:
# Column vector via newaxis
x[:, np.newaxis]

### Array Concatenation and Splitting

#### Concatenation of arrays

In [None]:
x = np.array([1, 2, 3])
y = np.array([3, 2, 1])
np.concatenate([x, y])

In [None]:
z = [99, 99, 99]
print(np.concatenate([x, y, z]))

In [None]:
grid = np.array([[1, 2, 3],
                 [4, 5, 6]])

In [None]:
# concatenate along the first axis
np.concatenate([grid, grid])

In [None]:
# concatenate along the second axis (zero-indexed)
np.concatenate([grid, grid], axis=1)

In [None]:
x = np.array([1, 2, 3])
grid = np.array([[9, 8, 7],
                 [6, 5, 4]])

# vertically stack the arrays
np.vstack([x, grid])

In [None]:
# horizontally stack the arrays
y = np.array([[99],
              [99]])
np.hstack([grid, y])

#### Splitting of arrays

In [None]:
x = [1, 2, 3, 99, 99, 3, 2, 1]
x1, x2, x3 = np.split(x, [3, 5])
print(x1, x2, x3)

In [None]:
grid = np.arange(16).reshape((4, 4))
grid

In [None]:
upper, lower = np.vsplit(grid, [2])
print(upper)
print(lower)

In [None]:
left, right = np.hsplit(grid, [2])
print(left)
print(right)

## Computation on NumPy Arrays: Universal Functions

### The Slowness of Loops

In [None]:
import numpy as np
np.random.seed(0)

def compute_reciprocals(values):
    output = np.empty(len(values))
    for i in range(len(values)):
        output[i] = 1.0 / values[i]
    return output
        
values = np.random.randint(1, 10, size=5)
compute_reciprocals(values)

In [None]:
big_array = np.random.randint(1, 100, size=1000000)
%timeit compute_reciprocals(big_array)

### Introducing UFuncs

In [None]:
print(compute_reciprocals(values))
print(1.0 / values)

In [None]:
%timeit (1.0 / big_array)

In [None]:
np.arange(5) / np.arange(1, 6)

In [None]:
x = np.arange(9).reshape((3, 3))
2 ** x

### Exploring NumPy's UFuncs

#### Array arithmetic

In [None]:
x = np.arange(4)
print("x      =", x)
print("x + 5  =", x + 5)
print("x - 5  =", x - 5)
print("x * 2  =", x * 2)
print("x / 2  =", x / 2)
print("x // 2 =", x // 2)  # floor division

In [None]:
print("-x     = ", -x)
print("x ** 2 = ", x ** 2)
print("x % 2  = ", x % 2)

In [None]:
-(0.5*x + 1) ** 2

In [None]:
np.add(x, 2)

#### Absolute value

In [None]:
x = np.array([-2, -1, 0, 1, 2])
abs(x)

In [None]:
np.absolute(x)

In [None]:
np.abs(x)

In [None]:
x = np.array([3 - 4j, 4 - 3j, 2 + 0j, 0 + 1j])
np.abs(x)

#### Trigonometric functions

In [None]:
theta = np.linspace(0, np.pi, 3)

In [None]:
print("theta      = ", theta)
print("sin(theta) = ", np.sin(theta))
print("cos(theta) = ", np.cos(theta))
print("tan(theta) = ", np.tan(theta))

In [None]:
x = [-1, 0, 1]
print("x         = ", x)
print("arcsin(x) = ", np.arcsin(x))
print("arccos(x) = ", np.arccos(x))
print("arctan(x) = ", np.arctan(x))

#### Exponents and logarithms

In [None]:
x = [1, 2, 3]
print("x     =", x)
print("e^x   =", np.exp(x))
print("2^x   =", np.exp2(x))
print("3^x   =", np.power(3, x))

In [None]:
x = [1, 2, 4, 10]
print("x        =", x)
print("ln(x)    =", np.log(x))
print("log2(x)  =", np.log2(x))
print("log10(x) =", np.log10(x))

In [None]:
x = [0, 0.001, 0.01, 0.1]
print("exp(x) - 1 =", np.expm1(x))
print("log(1 + x) =", np.log1p(x))

#### Specialized ufuncs

In [None]:
from scipy import special

In [None]:
# Gamma functions (generalized factorials) and related functions
x = [1, 5, 10]
print("gamma(x)     =", special.gamma(x))
print("ln|gamma(x)| =", special.gammaln(x))
print("beta(x, 2)   =", special.beta(x, 2))

In [None]:
# Error function (integral of Gaussian)
# its complement, and its inverse
x = np.array([0, 0.3, 0.7, 1.0])
print("erf(x)  =", special.erf(x))
print("erfc(x) =", special.erfc(x))
print("erfinv(x) =", special.erfinv(x))

### Advanced UFunc Features

#### Specifying output

In [None]:
x = np.arange(5)
y = np.empty(5)
np.multiply(x, 10, out=y)
print(y)

In [None]:
y = np.zeros(10)
np.power(2, x, out=y[::2])
print(y)

#### Aggregates

In [None]:
x = np.arange(1, 6)
np.add.reduce(x)

In [None]:
np.multiply.reduce(x)

In [None]:
np.add.accumulate(x)

In [None]:
np.multiply.accumulate(x)


#### Outer products

In [None]:
x = np.arange(1, 6)
np.multiply.outer (x, x)

## Aggregations: Min, Max, and Everything In Between

### Summing the Values in an Array

In [None]:
import numpy as np

In [None]:
L = np.random.random(100)

In [None]:
np.sum(L)

In [None]:
big_array = np.random.rand(1000000)
%timeit sum(big_array)
%timeit np.sum(big_array)

### Minimum and Maximum

In [None]:
min(big_array), max(big_array)

In [None]:
np.min(big_array), np.max(big_array)

In [None]:
%timeit min(big_array)
%timeit np.min(big_array)

In [None]:
print(big_array.min(), big_array.max(), big_array.sum())

#### Multi dimensional aggregates

In [None]:
M = np.random.random((3,4))
print(M)

In [None]:
M.sum()

In [None]:
M.min(axis=0)

In [None]:
M.max(axis=1)

### Example: What is the Average Height of US Presidents?

For this section, you'll need to execute the following cell first. 

In [None]:
def array_from_url(url, column):
    import pandas as pd
    import numpy as np
    data = pd.read_csv(url)
    return np.array(data[column])

heights = array_from_url('https://raw.githubusercontent.com/jakevdp/PythonDataScienceHandbook/8a34a4f653bdbdc01415a94dc20d4e9b97438965/notebooks/data/president_heights.csv','height(cm)')
print(heights)

For this portion, start with the cell labeled `In [15]:`

In [None]:
print("Mean height:       ", heights.mean())
print("Standard deviation:", heights.std())
print("Minimum height:    ", heights.min())
print("Maximum height:    ", heights.max()) 

In [None]:
print("25th percentile:", np.percentile(heights, 25))
print("Median:         ", np.median(heights))
print("75th percentile:", np.percentile(heights, 75))

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn; seaborn.set()  # set plot style

In [None]:
plt.hist(heights)
plt.title('Height Distribution of US Presidents')
plt.xlabel('height (cm)')
plt.ylabel('number');

---

# Section 2: Exercises

In this section, you will be provided a few exercises to demonstrate your understanding of the chapter contents. Each exercise will have a Markdown section describing the problem, and you will provide cells below the description with code, comments and visual demonstrations of your solution.

---

### Problem 1

Make sure you have the `array_from_url` function defined:

```python
def array_from_url(url, column):
    import pandas as pd
    import numpy as np
    data = pd.read_csv(url)
    return np.array(data[column])
```

Using the `array_from_url` function, use the following arguments

- URL: `"https://raw.githubusercontent.com/jakevdp/data-USstates/master/state-areas.csv"`
- column: `"area (sq. mi)"`

to load the NumPy array into a variable `areas`.

Print out the `mean` area for all the states in the US. Use built-in methods and UFuncs where appropriate.

In [None]:
def array_from_url(url, column):
    import pandas as pd
    import numpy as np
    data = pd.read_csv(url)
    return np.array(data[column])

# URL and column for state areas
url = "https://raw.githubusercontent.com/jakevdp/data-USstates/master/state-areas.csv"
column = "area (sq. mi)"

# Load the areas into a NumPy array
areas = array_from_url(url, column)

# Calculate and print the mean area
mean_area = areas.mean()
print(f"The mean area for all the states in the US is {mean_area:.2f} square miles.")

---

### Problem 2

Using the `areas` array created above, assign the total area of the United States and D.C. to a new variable, `total_area` by using the `sum` method of `areas`.

In [None]:
def array_from_url(url, column):
    import pandas as pd
    import numpy as np
    data = pd.read_csv(url)
    return np.array(data[column])

# URL and column for state areas
url = "https://raw.githubusercontent.com/jakevdp/data-USstates/master/state-areas.csv"
column = "area (sq. mi)"

# Load the areas into a NumPy array
areas = array_from_url(url, column)

# Calculate and print the mean area
mean_area = areas.mean()
print(f"The mean area for all the states in the US is {mean_area:.2f} square miles.")

# Calculate the total area of the US and D.C.
total_area = areas.sum()
print(f"The total area of the United States and D.C. is {total_area:.2f} square miles.")

---

### Problem 3

Using NumPy's various UFuncs, create a new array, `area_percentage`, that is each state's area as percentage of the `total_area`.

E.g. Given that Alaska's area is the second element of the array `areas` (i.e. `areas[1]`), Alaska's percentage of the total would be: `areas[1]/total_area`

In [None]:
def array_from_url(url, column):
    import pandas as pd
    import numpy as np
    data = pd.read_csv(url)
    return np.array(data[column])

# URL and column for state areas
url = "https://raw.githubusercontent.com/jakevdp/data-USstates/master/state-areas.csv"
column = "area (sq. mi)"

# Load the areas into a NumPy array
areas = array_from_url(url, column)

# Calculate the total area of the US and D.C.
total_area = areas.sum()

# Calculate the percentage area for each state
area_percentage = (areas / total_area) * 100

# Print the percentage of the total area for each state
print("Area percentages for each state as a percentage of the total area of the US and D.C.:")
print(area_percentage)

### Problem 4

Print out the heights of the American Presidents in feet (rather than cm). Use UFuncs for this. You'll need to look up the formula to convert cm to feet.

In [None]:
def array_from_url(url, column):
    import pandas as pd
    import numpy as np
    data = pd.read_csv(url)
    return np.array(data[column])

# URL and column for president heights (assuming we have a file with the data)
url = "https://raw.githubusercontent.com/jakevdp/data-USstates/master/state-heights.csv"  # Example URL (replace with actual URL)
column = "height (cm)"  # Example column (replace with actual column name)

# Load the heights into a NumPy array
heights_cm = array_from_url(url, column)

# Convert heights from cm to feet
heights_feet = heights_cm / 30.48

# Print the heights in feet
print("Heights of American Presidents in feet:")
print(heights_feet)
