#  <center> Problem Set 0 <center>
<center> Spring 2024 <center>
<center> 3.C01/3.C51, 7.C01/7.C51, 10.C01/10.C51, 20.C01/20.C51 <center>
<center> Due: February 16, 2024 (suggested) <center>

# <center>Part 1: Introduction to Colab</center>

This notebook introduces how to use Colab to write code for this class. 

If you have used jupyter notebook or Juyterlab before, you will find using Colab is very similar. 

First you can try the example intro notebook here: 

https://colab.research.google.com/notebooks/intro.ipynb



Click the "Connect" button on the right-hand side to connect this notebook to  a CPU. Use Shift+Enter to run the following cell.

In [None]:
print("Hello world")

We expect that you submit a Colab notebook link for grading. Make sure that the notebook can be run before submitting, that the figures are labeled, and that you include comment or text explaining what each part of the code does.

Your colab environment is your temporary computer and you can checkout the file system by clicking on the `Files` button on the left side bar. You can use standard Linux commands to navigate the system. Just put a `!` in front of the command, like this: 

In [None]:
! pwd 
! ls

## Programming Environment on Colab

On Colab, most packages are pre-installed (sklearn, pytorch, numpy). One can even request a GPU for computation; you will learn how to do this on PSet 2.

In [None]:
import sklearn
import numpy
import torch

## Saving your work 

You can save your work on Colab. There are three options:

* Save in Google Drive: File -> Save in Drive 
* Save on github : File -> Save a copy in Github 
* Save locally: File -> Download .ipynb or .py 

Note that a Colab will time out if it is left in a idle state for too long. This means that all the cached data and variables will be gone and you will need to rerun the notebook. 

## Installing additional packages

Sometimes, you will need to install additional packages to finish a project. For example: scikit-learn. 

You can install scikit-learn (for example) with ```!pip install scikit-learn```.

However, anaconda is not pre-installed in the Colab environment. We will provide additional instruction for installing conda-only packages from Colab. 


In [None]:
!pip install scikit-learn

## Uploading your data/file to Colab virtual envrionment 

Click the file icon from the sidebar. Click the upload button and choose your files.




## Cloning a git repository 

Sometimes you want to clone some devloped repository from Github or Gitlab. You can just do it by typing `!git clone <repo url>`

In [None]:
!git clone https://github.com/wwang2/thermo-notes.git

Note that you have to clone with https in Colab, not ssh, since you do not have an ssh keypair.

## Some basic computations

Now let's try running some simple programs. First some basic computation examples:

In [None]:
x = 1+1
print(x)

In [None]:
for i in range(10):
    print(i)

In [None]:
i

As you can see, local variables are stored in cache across cells, just like any notebook. This means that you can change the outcome of a cell by going back and rerunning a previous cell. 

### Warning: 
Please be careful to run all subsequent cells if you change a previous one! If you don't do this consistently, you can get some very confusing bugs. When you are submitting code, please ensure all your cells are ordered correctly; your TAs will be very confused if you need to run your cells out of order for your code to work.

You can also define and use functions (or classes).

In [None]:
def multiply(x, y):
    return x*y

In [None]:
multiply(2, 3)

## Visualizations

One advantage of Colab is we can show plots right underneath our code. Look at the following plots as examples (don't worry about the code for now).

In [None]:
import matplotlib.pyplot as plt
import numpy as np

x = np.arange(-10, 10, 0.01)
plt.plot(x, x**2, c='red');

In [None]:
x = np.arange(-10, 10, 0.1)
rand_data = x + 5*np.random.rand(*np.shape(x))
plt.scatter(x, rand_data, c='blue');

This is very useful in data science and machine learning for visualizing datasets. For most of this class, we will provide you with plotting routines you can use to visualize your data. If you are interested in learning more about plotting with Python, check out the matplotlib package: https://matplotlib.org/stable/tutorials/introductory/usage.html.

# <center>Part 2: Introduction to numpy</center>

## Creating numpy arrays

Just like any other Python library, you can import numpy as follows.

In [None]:
import numpy as np

Creating a numpy array is as simple as wrapping a Python list with a np.array() command. You can then print the numpy array.

In [None]:
x = np.array([1, 2, 3])
print(x)

You can create numpy arrays with different types, but they will all be cast to the same type (in this case a string).

In [None]:
a = np.array([1, 'test'])
print(a)

You can also create multidimensional arrays in numpy, i.e. arrays where each element is itself an array. This can be used to represent 2D matrices or 3D tensors.

In [None]:
y = np.array([[1, 2, 3], [4, 5, 6]])
print(y)

numpy dislikes multidimensional arrays where each sub-array has different lengths, i.e. ragged arrays. Notice that in this example, the elements of the numpy array are lists and it is not a true multidimensional array.

In [None]:
z = np.array([[1, 2, 3], [4, 5]])
print(z)

All numpy arrays have an important property called shape, which tells you their dimensionality. Shape is a tuple, which gives you information about all dimensions of a numpy array.

In [None]:
print(f"x's shape is: {x.shape} and y's shape is {y.shape}.")
print(f"This is weird: z's shape is {z.shape}.")

Be sure you understand why z's shape is (2,) above.

There are numpy commands to quickly create an array of all zeroes or ones, depending on the desired shape.

In [None]:
print(np.zeros((3, 4)))
print(np.ones((2, 3)))

It is occasionally useful to create an array that is a list of numbers spanning a given range, like the range() command in Python. The np.arange command does that and also gives you the option of specifying the step size.

In [None]:
print(np.arange(0, 10))
print(np.arange(0, 10, 2))
print(np.arange(0, 1, 0.1))

To turn a numpy array back into a list, simply use list() or .tolist().

In [None]:
list(x)

## Basic elementwise operations in numpy

Most standard arithmetic operations in numpy apply elementwise. This includes the product, which in numpy is an elementwise product, not a dot product.

In [None]:
x = np.array([[1, 2], [3, 4]])
y = np.array([[5, 6], [7, 8]])
print(f"x+y is: \n{x+y}")
print(f"x-y is: \n{x-y}")
print(f"x*y is: \n{x*y}")
print(f"x^2 is: \n{x**2}")
print(f"x/y is: \n{x/y}")
print(f"sin(x) is: \n{np.sin(x)}")

If you want a dot product, you can use the @ operator or the .dot function.

In [None]:
x.dot(y)

In [None]:
x@y

Other useful operations in numpy include the >, <, and == operations, which produce boolean arrays. We'll see why this is useful when discussing indexing.

In [None]:
print(f"x > 3 is: \n {x>3}")
print(f"x <= 2 is: \n {x<=2}")
print(f"x == 2 is: \n {x==2}")

Many common arithmetic operations, like exp, sqrt, log, sin, cos, tan, and others, are implemented as functions in numpy (e.g. np.exp, np.sqrt, etc.). You can check the documentation if you want to use any of these operations.

## Indexing and slicing

One of the biggest advantages of numpy over standard Python lists is the number of different ways you can index into an array. Standard Python indexing by just providing an element works as normal, but you can also provide a range with a colon.

In [None]:
x = np.arange(10)
print(f"x is {x}")
x[2]

In [None]:
x[2:10]

In [None]:
x[2:8]

Note that the end of the range is always excluded from the output.

You can select every Nth element of the array by adding an N to the end of your range specification. You can make your ranges run backwards by adding a -1 instead.

In [None]:
x[2:8:2]

In [None]:
x[8:2:-1]

You can also specify the very last element by using an index of -1, the second-to-last element using an index of -2, and so on.

In [None]:
x[-1]

In [None]:
x[-2]

Unlike Python lists, you can use slicing of numpy arrays to set values in a provided range.

In [None]:
x[2:8:2] = 1000
print(f"x is {x}")

Indexing in numpy is most powerful on multidimensional arrays, where you can index separately across different axes by using a comma to separate the indexing arguments. The first index is the outer list and the second index is the inner list (and so on if you have more dimensions).

In [None]:
y = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])
print(f"y is: \n {y}")
print(f"y[3, 2] is {y[3, 2]}")
print(f"y[3, 1] is {y[3, 1]}")
print(f"y[1, 0] is {y[1, 0]}")

In [None]:
y[1:3, 0:2]

Just a colon in an index means take the entire array.

In [None]:
y[1:3, :]

Beyond just indexing ranges and elements, you can index an array with another array! If you use a Boolean array with the same dimensions as the given array, you'll get the elements you specified.

In [None]:
x = np.arange(10)
index_arr = np.array([True, True, False, False, True, False, False, False, False, False])
x[index_arr]

In [None]:
y = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])
index_arr = np.array([[True, True, False], [True, False, False], [False, True, False], [False, False, True]])
y[index_arr]

Note that with a multidimensional array, the output is automatically flattened into a 1-D array. 

This operation is particularly useful when combined with a comparative operator to extract particular elements of an array.

In [None]:
y[y > 5]

Make sure you understand the above example.

You can also index an array with an array of indices. See below for an example.

In [None]:
index_arr = [0, 1, 5]
x[index_arr]

With a 2D array, you use two arrays of indices; the first specifies the first dimension of the output, and the second specifies the second dimension. Both arrays must have the same length. This extends to multidimensional  arrays as well.

In [None]:
index_arr1 = [1, 0, 3]
index_arr2 = [2, 1, 0]
y[index_arr1, index_arr2]

## Reshaping arrays

numpy has a large number of functions used for reshaping arrays. We'll cover some of the most important ones you will want for the psets.

np.ravel allows you to flatten an array (traversing the dimensions from last to first).

In [None]:
np.ravel(y)

np.reshape allows you to reshape an array into a specified shape (as long as the total number of elements remains the same). If you do not want to compute all the dimensions of the new array, you can leave one dimension as -1 and numpy will compute the appropriate number for you.

In [None]:
np.reshape(y, (6, 2))

np.transpose or .T allows you to transpose the array, i.e. swap the first and second dimensions. This works just like matrix transposition.

In [None]:
np.transpose(y)

In [None]:
y.T

Another useful method of constructing arrays is by stacking them. numpy offers two functions for this: np.hstack and np.vstack, which operate horizontally and vertically, respectively. They can work on an arbitrary number of arrays.

In [None]:
a = np.array([[0, 1], [2, 3]])
b = np.array([[4, 5], [6, 7]])
np.hstack((a, b))

In [None]:
np.vstack((a, b))

Your arrays must have compatible dimensions (i.e. the same length in every axis aside from the one being stacked) for this operation to work.

Another useful operation is np.concatenate, which concatenates multiple arrays along the specified axis. 0 refers to the 0th axis (the outer axis) and 1 refers to the 1st axis (the inner axis).

In [None]:
np.concatenate((a, b), axis=0)

In [None]:
np.concatenate((a, b), axis=1)

Occasionally (for a variety of reasons), you might end up with an axis of length 1 in a numpy array. Eliminate it with the squeeze() function.

In [None]:
c = np.array([[1], [2]])
np.squeeze(c)

## Other important numpy operations

There are a number of numpy operations, like np.sum, np.max, np.argmax, and so on, that operate on the whole array and return a scalar.

In [None]:
np.sum(y)

In [None]:
np.max(y)

In [None]:
np.argmax(x)

In [None]:
np.mean(y)

In [None]:
np.std(y)

In [None]:
np.argmax(y)

np.argmax is not very useful when applied to a multidimensional array. The following code snippet will give you the indices you want.

In [None]:
np.unravel_index(np.argmax(y, axis=None), y.shape)

All of these functions can also be applied to individual axes, in which case you will get the sum, max, etc. of each row or column, depending on which axis you specify.

In [None]:
np.sum(y, axis=0)

In [None]:
np.sum(y, axis=1)

In [None]:
np.max(y, axis=0)

In [None]:
np.max(y, axis=1)

In [None]:
np.mean(y, axis=1)


In [None]:
np.std(y, axis=1)

It's easy to get confused here with which axis is which, so always be sure to check your code.

Useful functions you may need beyond the ones above that work similarly include np.min, np.argmin, np.mean, np.std, and np.median.

When dealing with Boolean arrays, np.all and np.any tell you whether all or any of the elements are True.

In [None]:
print(np.all(y == 5), np.any(y == 5), np.all(y < 13))

np.sort allows you to sort a given array (assuming its elements are sortable) and np.argsort tells you the indices by which an array can be sorted.

In [None]:
z = np.array([1, 4, 6, 8, 3, 2, 5, 7, 9])
np.sort(z)

In [None]:
print(np.argsort(z), z[np.argsort(z)])

numpy has a built-in RNG which you can access through np.random. For example, np.random.choice selects a random element from a 1-D array. The size argument determines how many elements to select and the replace argument determines whether selection is done with or without replacement.

In [None]:
np.random.choice(x, size=(5, 2), replace=True)

If you're interested in learning more about numpy functions, please look up the documentation or the tutorial.


# **The Take Aways :** 
* Numpy has plenty of useful functions pre-defined and before you implement any piece of code first check numpy documentation to see if you can piggy back off what has already been created!
* Everything in numpy from indexing to function calls is optimized and written to support multi-dimensional arrays and you should be mindful of which data axis or data shape you are working with.
* Google Colab is a great resource. View it and all Jupyter notebooks as a foundational tool for datascience.
