# CHEM60 - Class 0 (Computing in Colab)

Welcome to computing for chemistry in Colab! This notebook will serve as the first of our in-class exercises together, and this one is all about getting familiar with the environment we'll be working in - a Jupyter Notebook hosted on Google Colab.

To get started, click on '**File**' in the left menu, then '**Save a copy in Drive**' to ensure you are editing *your* version of this assignment (if you don't, your changes won't be saved!). I strongly recommend creating a directory (folder) in your personal Drive to store your coursework (don't just save them into the generic `Colab Notebooks` directory Drive makes for you because that will get messy if you use Colab regularly).

![Screenshot of Menu to Save a Copy in Drive](https://kavassalis.space/s/save_a_copy_in_drive_CHEM60.png)



After you click '**Save a copy in Drive**' a popup that says **Notebook copy complete** should appear, and it may ask you to <font color='blue'>**Open in a new tab**</font>. When open, your new file will be named `Copy of CHEM60_Class_0_SP24.ipynb` (you may want to rename it before/after you move it to your chosen directory).

# What is a Jupyter Notebook again?

Most of you have used Jupyter Notebooks already, whether in CS5 or personal projects. But what exactly makes them so valuable, and why would we structure our entire course around their use? Jupyter notebooks are an *interactive* computing platform - more broadly, they are known as computational notebooks. Computational notebooks allow us to weave together data, code, and commentary effortlessly to tell an interactive story. Whether we are delving into a large data set produced by a mass spectrometer or visualizing output from a complex numerical model — these notebooks can merge textbook-like explanations with the active engagement of a software application.



The key thing to remember - notebooks merge code and meaningful text. This is not the same as well-commented code. This is the merger of computation *and* communication. Computational devices are an intrinsic part of our lives, merging natural language with computing opens up unique avenues for meaningful scientific exploration and presentation of concepts and enriched learning experiences.

Jupyter Notebooks are one variant of a computational notebook ([Wolfram](https://www.wolfram.com/computational-notebooks/) makes a popular, non-open source alternative ) managed by [Project Jupyter](https://jupyter.org) and they actually support many programming languages. We'll be using Python exclusively in this course, in part because it is a language common to all Mudd students and because the tools for scientific computing and chemical simulation are particularly well-developed for Python.

Unlike the locally hosted Jupyter notebooks you may have used in CS5, these notebooks are being run through Google Colaboratry ("Colab")  Google Colab, is a platform that allows you to write and execute notebooks in your browser, with no need for a local setup or Python installation. Plus, it provides free access to computing resources that exceed what many folks have available to them locally and has many standard libraries pre-installed. You can even leave 'Comments' like in a Google Doc!

# Basic Structure of a Jupyter Notebook

You will be opening and working through multiple notebooks a week in this course (for our in-class work and your homework). You will also be creating your own notebook for your final project! The basic structure of a well formatted notebook is below (although there are always exceptions possible for specific use cases - tutorials, in fact, often import things as you go, rather than up top):

- Preamble Text (written as Markdown or "Text" boxes in Colab speak)
- Installs (if needed)
- Imports
- Exploration!

A notebook consists of various units, also known as cells. These cells can encompass explanatory text, executable code, and the results it yields. Clicking on a cell will select it. Let's first get some practice with the two main cells you'll be using: Markdown (Text) and Code.

## Text

*This* represents a **text cell**. It can be edited by **double-clicking** on it (you can delete and edit the things I write!). Text cells use *markdown* syntax which you can see more examples of [here](https://colab.research.google.com/notebooks/markdown_guide.ipynb).


### Practice QUESTION 1
**Make a new text box *between* the two horizontal lines below and tell us your name *and* one skill you are excited to learn in this course!**

You can use the +Text button. Try out any formatting you would like.

---





---


### Practice QUESTION 2


Mathematical equations can be inserted into text cells using [$\LaTeX$](http://www.latex-project.org/). Simply enclose the formula
within two `$` symbols. For instance, `$(\sqrt{3\mu}-1)^2$` will be displayed as $(\sqrt{3\mu}-1)^2$. Click through to learn more on how to format $\LaTeX$ in markdown [here](https://colab.research.google.com/github/bebi103a/bebi103a.github.io/blob/master/lessons/00/intro_to_latex.ipynb#scrollTo=qAJQER9D3tX5).


**Write 2 equations of your choosing in $\LaTeX$. One should be inline (using the `$ [equation] $` syntax and one should be centred on a seperate line (using the `\begin{align}` syntax)**


---



**Double click on me to write your two equations**



---

You will get a lot more practice with markdown as we go on in the course.

## Code

The section below is a **code cell**. Once the toolbar button shows CONNECTED on the top right handside, you can select the cell and run its content by either:

* Clicking the **Play icon** on the cell's left margin
* Pressing **Cmd/Ctrl+Enter** to run the cell where it's positioned

Look at the **Runtime** menu for extra options for running some or all cells.

In [1]:
chemistry = 'amazing'
print('chemistry is', chemistry)

chemistry is amazing


Code cells should always be mixed with text cells - notebooks aren't the same as a .py file where code follows code. You want to break code up with narrative explanation - not 'doc string' kind of things, but why we are doing what we are doing kind of things.

### Practice QUESTION 3

**In the Code cell below, write some code! It can be anything you want that Python can do. For a Python refresher, you can look at this tutorial I wrote for pre-College students [here](https://colab.research.google.com/drive/1M1M1AuqMUQR9CfMK7A1m9_7TKYUx2Mvd?usp=sharing).**


---



In [None]:
# Write something here! Maybe several lines!



---

I hope you wrote some comments too!

You can also use Python [`magics`](https://ipython.readthedocs.io/en/stable/interactive/magics.html), if that's your thing (I... have never had a particularly good reason to use them in a notebook, but if you are familar with the `%%` syntax, you are more than welcome to include them. You can run this relevant example below:

In [2]:
%%html
<marquee style='width: 45%; color: #d95f02;'><b>Why is this happening?!</b></marquee>

### Practice QUESTION 4
You can access `help` using the `?` syntax at the end of a line. The below code should bring up details on the object `chemistry`.

In [3]:
chemistry?

This is honestly a lot more useful when asking about functions, or libraries. **Ask for help about some object related to your code sample above (and describe what information you gained).**


---



In [None]:
# what are you asking for help with?

**Your written answer here!**



---



## Installs

"Installs"? Installing what? I thought Colab came with everything we needed! Well... not everything. While Colab has many popular libraries pre-installed, we sometimes find ourselves needing to install things above and beyond what the Colab Python image offers. A reminder, a library is a collection of pre-written code modules that you can use to simplify your coding process. These modules or packages include a variety of pre-written functions, classes, etc. that allow us to perform many tasks without writing exhaustive code ourselves. Libraries help save time and effort by providing ready solutions to common programming tasks. Some of the most commonly used Python libraries that you have potentially used before are:

1. **NumPy**: NumPy, short for 'Numerical Python', is the fundamental package for numerical computation in Python. It provides support for arrays (including multidimensional arrays), as well as functions for mathematical operations such as linear algebra, random number generation, Fourier transform, and more.

2. **Pandas**: This library is useful for data manipulation and analysis. It provides data structures and functions needed to manipulate structured data, including functionalities for manipulating tables, time series manipulation, and data aggregation.

3. **Matplotlib**: Matplotlib is a plotting library. It allows you to create static, animated, and interactive visualizations in Python, offering an interface that mimics MatLab's graphical plotting.

These are already here and we don't need to install them. When we do need to do installs, we can do it directly into the notebook using install statements, typically leveraging the package manager `pip`.

Any library installed directly in a notebook will only be available within that specific notebook session. When you end the runtime (with free Colab accounts, this happens every time you close the notebook), you lose whatever you have installed (so you have to run your installs and imports each time). This has some advantages though (when you are working with many libraries, sometimes they don't play nice together and you can break stuff - when working with libraries locally, using Docker or a conda environment is strongly recommended. Colab is our virtual environment, so we don't have to worry about breaking things).


### Executing Install Statements

To install a new package in a notebook, preface the usual pip install command with an exclamation mark (!). This tells the notebook to execute the command as a system command line (shell) command, not as a Python statement.

```python
!pip install numpy
```

This will download and install the numpy package. We don't want to do this for numpy though (because it's here already), so we won't actually run this in a code block.



### Using Versions

It is possible to specify the version of the library to be installed. This is important when your notebook requires a specific version of a library. Google Colab periodically updates versions for standard libraries too, so sometimes functionality of code will change and you may want to go back to a specific library version. Sometimes libraries have careful dependencies and they require specific versions to play nice together. If this comes up in this course, I'll make a note of it (it's just good to know right now).

```python
!pip install numpy==1.18.5
```

You'll get familar with running 'installs' at the top of notebooks when needed (I don't like to do installs when not needed because it is wasteful of time and energy so no need to run them today).

## Import Statements

The `import` statement allows us to incorporate external library functionalities in our notebook code. These libraries can range from Python's standard library modules to third-party packages. A robust understanding of the best practices in organizing import statements will enhance the readability, simplicity, and maintainability of our code (we should all strive to make code that works *and* is nice to read and share - notebooks are really meant for sharing, so standards and readability are critical). The widely accepted conventions surrounding import statements are from [PEP 8](https://peps.python.org/pep-0008/), Python's style guide. Below, we discuss these best practices:



### Import Order

According to PEP 8 guidelines, import statements should be grouped in the following order (with an empty line separating each group):

1. **Standard library imports**: This includes Python's built-in libraries. For example, `import os`, `import math`, etc.

2. **Third party imports**: These are the libraries that are not Python built-ins but are installed and used for your project. They might include libraries like `import numpy as np`, `import pandas as pd`, etc.

3. **Local application/library specific imports**: This includes modules you or I wrote that aren't published as maintained or supported libraries, like `from my_lib import my_function`.


### Import Styles

There are three styles to import:

1. **Whole Module Import** (e.g., `import os`): This style imports the whole module and you access the functions or classes using the module name.

2. **Specific Items Importing** (e.g., `from os import path`): Here, you import only specific items from the module. This can make code easier to read but may cause confusion with local variables (don't name an object in the notebook `path` then!).

3. **Importing everything from a module** (e.g., `from os import *`): This is discouraged as it can clutter the namespace, making it unclear what's imported and where certain symbols come from. Every module within the `os` library will be imported with its own name - if you do this with every library, things can get confusing.



### Aliases

If a module's name is complex or we just know we're going to call many things from it, we can create an alias to simplify it using the `as` keyword (e.g., `import numpy as np`, `import pandas as pd`). Many of these are standard and you'll see lots of examples online using `np` and `pd` if you end up seeking out other tutorials.



### Final Recommendations

- Within each section, list each library in alphabetical order. When imports are long, it helps to have them in an easy order to find things.
- Import each library on a separate line. This makes it easier to track what libraries are used and to remove or add new ones.
   
By adhering to these conventions, we can ensure our code is more readable (making it easier to share and debug). I am a big fan of teaching good code practices, and some weeks we'll have reason to dig more into it. I recommend the [The Good Research Code Handbook](https://goodresearch.dev/) if you are excited about this.

Here is an example of perfectly fine, if not a bit boring, import statement section.

In [None]:
# Standard library imports
import math
import os
import sys

# Third party imports
import pandas as pd

A few things to note:

1. Comments are added to denote each group of imports, improving readability (this gets important when you have many things in each section).

2. Standard library imports (like `math`,  `os` and `sys`) are at the top, followed by third party imports (like `pandas`).

3. Each import is on a new line, and standard aliases are used concisely for frequently used libraries like pandas.

4. There is a clear separation (an empty line) between different groups of imports.

Here is an example of a *poorly* formatted import code block:

In [None]:
import matplotlib.pyplot as plt
from os import *
from math import pi, e
import numpy, pandas as pd
import sys, os

### Practice QUESTION 5

**Rewrite the above import statements to follow best practices:**


---



In [None]:
# write them here!



---



### Practice QUESTION 6

Thinking about the above line:


```python
import numpy, pandas as pd
```

**How would we access functions from within `numpy` if it is written like this?** Try using the help syntax - `numpy?`. Try creating a numpy [array](https://numpy.org/doc/stable/reference/generated/numpy.array.html#numpy.array) to test things out. `numpy.array?` might be helpful too!


---




In [None]:
# Maybe you want some code to try things out?

**Write some text here**



---



## Loading Data into Google Colab

Google Colab is designed to integrate with Google Drive. This means you can store your datasets on your Google Drive and access them directly in your Google Colab notebooks (for the in-class and homeworks, you'll access data stored in the class Shared Drive). First, in order to access files in your Drive, you must first "mount" the Drive onto the notebook.



### Mounting the Drive

When we "mount" the drive, we are essentially linking our Drive to our notebook, allowing us to read files from and write files to our Google Drive directly. This connection can be established using this code snippet:

```python
from google.colab import drive
drive.mount('/content/gdrive')
```

Running this code will display a prompt to authorize this notebook to access your Drive (actually, several prompts). We'll go through it below.


###  Data Access

Now you can load files, for example, the below uses pandas to load a csv file:

```python
data = pd.read_csv('/content/gdrive/My Drive/Path_to_your_file/file_name.csv')
```
Remember to replace `'Path_to_your_file/file_name.csv'` with the actual path and name of your file.



### Final Recommendations

- **Organisation**: Keep your Google Drive folders well-organized and follow a consistent folder naming system. This will make it easier to find file paths and access data. We'll talk more about meta-data and file organization in another class.

- **Permission**: Make sure the required data files are open for access. If a file is shared, ensure sharing permissions are set properly. This won't be an issue when accessing things in the class Shared Drive, but may be for your final project.


Any changes you make to your Drive (e.g., adding/removing files or folders) after it is mounted will get reflected in the notebook nearly instantly.

# Putting pieces together
First, mount the Drive.

In [None]:
from google.colab import drive
drive.mount('/content/gdrive')

Many clicks need to occur now to give permissions! A popup that asks you to **Permit this notebook to access your Google Drive files?** will appear. You will want to click <font color='blue'>**Connect to Google Drive**</font>


Then, you will have to select which Google account you want to use (you need to click on your *g.hmc.edu* account)


Because it wants you to be *so* sure you want to give this Notebook access to Google Drive, you will need to confirm a final time by clicking <font color='blue'>**Allow**</font> (you may have to scroll here).

When all works, the output of the above should be `Mounted at /content/gdrive`. You'll need to re-do this step each time you open a new runtime (typically, when you re-open the notebook or if you've lost wifi access).

That was a lot. Now, let's open a file using pandas!

In [None]:
data = pd.read_csv('/content/gdrive/Shared drives/Chem_60_Spring_2024/In_Class_Notebooks/data/class0_example_data_Claremont_pm25.csv', parse_dates=['DateTime'])

If that worked, nothing will be displayed! Let's look at what we just loaded by calling the `data` object.

In [None]:
data

No metadata came with this file I just had you load (*bad practices, Prof. Kavassalis!*). Still, I can tell you what it is - it's PM2.5 (particulate matter less than 2.5 micrometres in diameter) measured outside my house with a PurpleAir sensor reported as 30-minute averages (the pollution is bad this morning, so I was thinking about this data!).

Let's look at the data now. We loaded it in as a [pandas](https://pandas.pydata.org/about/) dataframe, and Google Colab has some helpful shortcuts for looking at pandas objects. Click the little icons next to the output above to see the 'interactive' table mode and suggested plots.

We are going to be making a lot of plots (figures) in this course, but we'll want a lot more control and customization that those little preview figures offer. Let's make a quick plot using `matplotlib` (assuming you kept the `import matplotlib.pyplot as plt` line above).

There are many, many kinds of plots we can (and will later) make. Today, let's just make a quick time series. I recommend looking at the documentation for the [subplots](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.subplots.html) functionality. The help function will also bring up details (but some folks find the all-text version overwhelming to read).

In [None]:
plt.subplots?

Let's make an example plot. There are far more comments on the below code than a normal plot might have, but we'll use overly commented plotting code so you learn the elements of a `matplotlib` figure.

In [None]:
# Create a new figure (fig) with 1 subplot, set figsize as width=12, height=6, and an axis-plane named ax
fig, ax = plt.subplots(1, figsize=(12, 6))

# Plot 'DateTime' values on x-axis and 'West Village A' values on y-axis, set color of the line to #d95f02 (I like this colour!) and line-width to 3
ax.plot(data['DateTime'], data['West Village A'], c='#d95f02', linewidth=3.0)

# Set labels for y and x axis and their font sizes
ax.set_ylabel('PM2.5 ($μg/m^3$)',fontsize=16)  # Set Y label as PM2.5 and set its fontsize to 16
ax.set_xlabel('Date',fontsize=16)  # Set X label as Date and set its fontsize to 16

# Set plot title and its fontsize
ax.set_title("Outdoor PM2.5 Concentrations at Prof. Kavassalis's House",fontsize=20)

# Draw horizontal line (y=35) in the plot, set its color to #7570b3 and line style to '--'
plt.axhline(35, c='#7570b3', linestyle='--')

# Add text above the horizontal line
ax.text(data['DateTime'].values[0], 35, "US PM2.5 24-hour standard", va='bottom', fontsize=16)

# Display the plot
plt.show()

The above plot should have created a time series showing the last few days of particular matter pollution concentrations. The horizontal line is showing one of the national (USA) air quality standards for PM2.5. Some aspects of the figure have not been optimized aesthetically - in notebooks, I typically set default font sizes after my import statements using the `params` notation:

```python
# some parameters to make the plots look like
params = {'legend.fontsize': 'x-large',
         'axes.labelsize': 'xx-large',
         'axes.titlesize':'xx-large',
         'xtick.labelsize':'xx-large',
         'ytick.labelsize':'xx-large'}
plt.rcParams.update(params)
```

Figures for papers, figures for talks, and figures for computational notebooks are often styled differently (you'll get practice will all three variants in this course). Have some fun now.

### Practice QUESTION 7
**Recreate the above figure changing as many aspects of the plot as you would like.** I strongly recommending look at the 'Anatomy of a Figure' page [here](https://matplotlib.org/stable/gallery/showcase/anatomy.html) for inspiration.

In [None]:
# your new figure will be here

# Submit your notebook

It's time to download your notebook and submit it on Canvas. Go to the File menu and click Download -> Download .ipynb

![Screenshot of how to download your notebook](https://kavassalis.space/s/download_chem60.png)

Then, go to Canvas and submit your assignment on the assignment page. It should look something like this:

![Screenshot of Assignment 1 Canvas submission page](https://kavassalis.space/s/canvas_chem60.png)

Once it is submitted, you are all done this week!