# Getting started with Google Colaboratory

For the first tutorial of the course, you'll be writing and executing your code with a tool called Google Colaboratory (Google Colab for short).

Colab is like an interactive coding notebook: you can write chunks of text just like you would with a word document or Google Doc, interspersed with chunks of Python code. Colab is especially nifty because it allows you to run your code using Google's cloud computing resources instead of your own computer's hardware. What's more, your code is saved to the cloud in real time, just like with the other tools in Google Drive.

At this point, you should have your own copies of the Colab notebooks we're going to use throughout the week. During the daily lectures, the instructors will be working on their copy of the notebook. You can (and should!) work through your own copy of notebook side by side in order to get the most out of the course.

Note: After the lecture, we'll upload the instructor's "filled out" notebook to the Google Drive.

Let's get started by familiarizing ourselves with the basic tools in Colab, many of which may be familiar to you if you've used jupyter notebook.

## Editing text cells
Much like with live lectures, you'll probably want to take notes as we go. Luckily, Colab allows you to write notes in a number of different ways. Let's start by editing some text within an already established text cell.

Editing text cells in Colab follows exactly the same way as editing "Markdown" cells in Jupyter Notebooks. Here we would just go through basics. For future reference if you need more extensive text editing:

https://medium.com/analytics-vidhya/the-ultimate-markdown-guide-for-jupyter-notebook-d5e5abf728fd



1. Single click on the sentence below to reveal the text cell. Afterwards, double click to fill in the blanks, then hold down Shift + Enter (both Windows and macOS) to save your changes.

Double-click (or enter) to edit

2. Adding text cells by clicking "+Text" on the left-up corner. Colab supports basic formatting like **bold**, *italic*, even bullet points.

## Connecting to a runtime
A runtime in Google Colab is a computing environment that allows us to execute the code in our notebook. The resources for a runtime are hosted using Google's computing resources, which means that you can run code and execute analyses on Colab regardless of your own laptop's computing capabilities.

Click on the "Connect" button on the right hand side of the top toolbar. You should see it flash through a few different status messages. After the runtime is connected, you should see a green checkmark and two bars labeled "RAM" and "Disk".

## Working with code cells
Now that we're connected to our runtime, we can start running some code cells: these are the bread and butter of our interactive notebooks.

When you're ready, go ahead and run the cell by doing one of the following:

1. Hovering over the cell and pressing the Play button that appears on the left side.
2. Click on the cell and pressing Shift + Enter (both Windows and macOS).

In [1]:
print("Hello World.")

Hello World.


You can edit and run code cells just like you would with text cells. In fact, editing cells to annotate code is encouraged: you can do this with **comments**, which are lines of code that are prefaced with `#` characters.

In [None]:
# write your first code here!
print("Hello World.")

## Useful Shortcuts
`shift` + `enter` : Run the current cell and move to the next cell.   
`control` + `enter` : Run the current cell and stay on the cell.   
`esc` + `a` : Create a new code cell above the current cell.    
`esc` + `b` : Create a new code cell below the current cell.   
`command` or `control` + `/` : comment out selected code.

In [None]:
# try comment out here.
print("Hello World.")
print("Hello World.")
print("Hello World.")

Congradulations! You've just taken your first baby step to writing code in Python. 🐍

# Python Basics

Useful External Resouces For Python Introduction:

CCB python bootcamp: https://ccb.berkeley.edu/outreach/workshops-bootcamps/

Many of the tasks scientists usually want to do with Python involve sorting and manipulating data. However, before we can do that, we have to understand how Python parses the data we give it.

Python follows what we call an object-oriented programming structure. In short, Python is a language that centers around manipulation of "objects" of different types. Data, which we would like to analyze, can be represented as certain types of objects. Tasks or functions can also be represented as different types of objects.

Although Python can be a very powerful language, it is can be a very particular language, especially for learners who are unfamiliar with the (in)flexibility of code. Let's start by going over some basic data types.

## Data types & variables

In [None]:
# strings

# interger

# float

# Nonetype

# converting datatypes


**Variables** are simply representations of the data that you want to work with. For example, we use variables in mathematical equations like `F = ma`, where each variable `(F, m, a)` represents a value corresponding to force, mass, and acceleration.

To create a variable, you write out your variable name, an assignment operator (=), and then what you want it to represent. Spaces around the equal sign are optional but encouraged for code readability.

In [None]:
# assigning variables

# calling variables


**Math & Logic**

Math is pretty straightforward in python3

we can use all the normal symbols:

> +: plus  
-: minus.  
/: divide.  
*: times.  
%: modulo.  
<: less than.  
>: greater than.  
\<=: less than or equal to.  
\>=: greater than or equal to.  
==: equivalence.  
!=: non-equivalence

Further documentation:
https://docs.python.org/3/library/stdtypes.html#numeric-types-int-float-complex

In [None]:
a = 2
print(a+1, a-1, a/2, a*3)
# # The '%' modulo operator gives the remainder
# print(a%2, a%4)
# # basic logic
# print(a==2, a<=1, a!=2)


3 1 1.0 6


In [None]:
# boolean logic
t = True
f = False

# print(type(t)) # "<class 'bool'>"
# print(t and f) # Logical AND; prints "False"
# print(t or f)  # Logical OR; prints "True"
# print(not t)   # Logical NOT; prints "False"

## Basic data structures
### **List**
Lists are exactly what they sound like: they are containers that house elements in a given sequence/order. 

In [None]:
number_list = [1, 2, 3, 4, 5]
string_list = ['a', 'p', 'p', 'l', 'e']
mixed_list = ['orange', 22, 'f', 67.2] 
nested_list = [['apple', 'banana'], ['onion', 'potato']]

How could we access the objects that we store in lists?

In Python, **iterable objects** are a special class of objects that can be accessed using something called **indexing**. In practical terms, that means that you can access the first, second, third, etc. elements of certain objects like lists and strings. Consider the following string.

In [None]:
# indexing
# return the first item of the list, note python indexing starts from 0

# return the last item of the list

# return 2nd to 3rd item of the list, note number_list[3] would not be printed

# return 1st to 3rd item of the list


# concatenation
# return a list containing all elements in string_list and number_list



# splitting strings into lists
student_names = "Debora,Miguel,Stacy,Xu"
list_of_names = ""

Here are some useful functions for list operations:


> Get the length of a list: `len()`.  
Get the maximum value in a list: `max()`.  
Check if a certain element or value is in a list: `in`.  
Sort the list: `sorted()`.  
Count the instances of an element in the list: `.count()`.  
Coerce strings to a list: `list()`

Further documentation: https://www.programiz.com/python-programming/methods/list

In [None]:
a = [1,54, 8, 0, 0,  3, 99]
# print(len(a))
# print(max(a))
# print(9 in a)
# print(sorted(a))
# print(a.count(0))
# b = 'aaaaa'
# print(list(b))

### **Dictionary**
Dictionaries hold key:value pairs - in some languages these are called "hash tables".

Dictionaries can store a mixture of different "types": all keys and values can be a mixture of types. Note here keys could not be unhashable types (so could not be data structures like list, set, etc.)

In [None]:
# student_info_dict = {2: 120120, 
#                      "first_name": ["Jane","Marry"],
#                      "last_name": "Dough"} 
# print(student_info_dict)

In [None]:
# indexing with keys (could not index with values)


# change values
# student_info_dict['first_name'][0] = 1
# print(student_info_dict['first_name'][0])

# iterating through dictionaries


### **Numpy Arrays**
NumPy is the fundamental package for scientific computing in Python. It is a Python library that provides a **multidimensional array object**, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more.

We would not go deep into NumPy here, but this is a very useful and basic data structure in Python that people use everyday. If you want to learn deeper about coding with Python, do check out the following link for how to use this powerful package:

https://numpy.org/doc/stable/user/index.html

## Basic Pandas usage

***Pandas***, a powerful Python data science package that provides infrastructure for working with complex tabular data.

The main attraction of pandas is the data type it introduces: the **DataFrame**.

**DataFrames** supports row and column names. You can index with row and column names! This can be handy if you know your sample/variable names by heart.

**DataFrames** support easy database-like operations. Merging, joining, grouping, sorting on a column's values – all possible with pandas DataFrames!

**DataFrames** allow mixed types. Unlike with arrays, you can store strings, integers, numerics, etc. in the same DataFrame.

Documentation: https://pandas.pydata.org/

In [2]:
# import the package

# Below are some lines to download the data we'll be using:
# all of the data is stored on an website called "github" where people can store and update 
# their own code and small datasets.
# The data for this tutorial is stored in our course github repo.
# This takes ~1 minute to run, since it has to download data.

!git clone https://github.com/CCB293/Fall-2022.git

# set current directory
!cd Fall-2022/week1

Cloning into 'Fall-2022'...
remote: Enumerating objects: 3, done.[K
remote: Counting objects: 100% (3/3), done.[K
remote: Compressing objects: 100% (2/2), done.[K
remote: Total 3 (delta 0), reused 3 (delta 0), pack-reused 0[K
Unpacking objects: 100% (3/3), done.
/content/Fall-2022


To check the files using command lines, we need to use a language called "Bash".

Running bash commands in Colab cells under a python environment, we need to add `!` or `%` in front of the command.

In [6]:
# show your current path

# list all files under the current path


Now let`s get into a dataset and use Pandas to manipulate and analyze the data.

In [None]:
# read in an example dataset about dogs. Since this is a comma-separated file (csv), use pd.read_csv


# Lets look at what the dataframe looks like


Now let`s get a sense of the dataset structure.

In [None]:
# how many rows are in the dataframe?


# the column names can be selected by: df.colums
# print the dataframe's columns:


# how many columns are in the dataframe?


# look at just the first 3 rows using df.head():


### **Indexing**

Columns of DataFrames are much more intuitive to index than columns of arrays. You can select the values of a single column by indexing with the column label. This is called a "series".

```
dataframe[column_label]
```

<img src ='https://github.com/ccbskillssem/pythonbootcamp/raw/main/day_4/ColumnIndex.png'> 

In [None]:
# select the column with dogs colors


DataFrame rows are indexed similarly to array rows, with a minor syntax difference. With DataFrames, we use `.loc[]` instead of `[]` to access rows.

```
dataframe.loc[row_label]
```

Just as with columns, rows of DataFrames are also stored in Series structures: the only difference is that the column labels have been transposed to row labels, as a Series only has one Index for row labels.

<img src='https://raw.githubusercontent.com/ccbskillssem/pythonbootcamp/main/day_4/RowIndex.png'>

It can help to think of a DataFrame as a Lego-like structure composed of Series. When we select parts of the DataFrame, we're snapping off some of the Legos from the main structure.

In [None]:
# select the row with index 5


### **Simple methods**

Now that we're up to speed on how to parse DataFrames, we can learn about the utility that pandas has for data exploration.

In [None]:
# print the oldest dog's age


# print the average number of meals per day


# print the median meal size of the dogs



new columns can be added to combine other columns

In [None]:
# create a column with the total food eaten each day by each dog using the equation: meals per day * size of each meal


The dataframe can be filtered by specific parameters. Let's try selecting one dog

In [None]:
# print the column with the dog names


# Are any of the names "Freddie"?


# print Freddie's information



The dataframe can be filtered to only include some values.

Let's create a dataframe with just older dogs (age over 5 years)

In [None]:
# print the column with the dog ages


# print whether each dog's age is greater than 5


# create a new dataframe with only dogs that are older than 5 years
# to do this, the rows of the dataframe that meet a specific criteria



How much food was eaten by the older dogs on average?

create a dataframe of just the younger dogs who are 5 or fewer years old

what is the mean food eaten by the young dogs?

How much food does freddie eat in a day?

## Plotting with matplotlib

You can make publication ready figures using matplotlib and you can change almost anything about the plot above (e.g. labels, fonts, colors, resolution, size, grid, axis, save in different formats etc)

Details: https://matplotlib.org/stable/tutorials/index.html

In [None]:
# import required packages
import matplotlib.pyplot as plt

In [None]:
# now check the data again
dogs

In [None]:
 # plot dots
 # plot a line
 # set x label
 # set y label
 # set title
 # save the figure

There are many packages developed for plotting in Python, Matplotlib here is the most basic one. Many other times there exist packages that are optimized for different types analysis that built upon Matplotlib. **Seaborn** is a very famous example of this. Check out the following link for plotting with seaborn:

https://seaborn.pydata.org/