# Getting started with Google Colaboratory

Colab is like an interactive coding notebook: you can write chunks of text just like you would with a word document or Google Doc, interspersed with chunks of Python code. Colab is especially nifty because it allows you to run your code using Google's cloud computing resources instead of your own computer's hardware. What's more, your code is saved to the cloud in real time, just like with the other tools in Google Drive.

Note: After the lecture, we'll upload the instructor's "filled out" notebook to the Google Drive.

In [None]:
# 


## Useful Shortcuts
`shift` + `enter` : Run the current cell and move to the next cell.   
`control` + `enter` : Run the current cell and stay on the cell.   
`esc` + `a` : Create a new code cell above the current cell.    
`esc` + `b` : Create a new code cell below the current cell.   
`command` or `control` + `/` : comment out selected code.

# Python Basics

Useful External Resouces For Python Introduction:

CCB python bootcamp: https://ccb.berkeley.edu/outreach/workshops-bootcamps/

Python is a language that centers around manipulation of "objects" of different types. Data, which we would like to analyze, can be represented as certain types of objects.

## Data types & variables

Variables are simply representations of the data that you want to work with.

**Math & Logic**

we can use all the normal symbols:

> +: plus  
-: minus.  
/: divide.  
*: times.  
%: modulo.  
<: less than.  
>: greater than.  
\<=: less than or equal to.  
\>=: greater than or equal to.  
==: equivalence.  
!=: non-equivalence

Further documentation:
https://docs.python.org/3/library/stdtypes.html#numeric-types-int-float-complex

## Basic data structures
### **List**
Lists are exactly what they sound like: they are containers that house elements in a given sequence/order. 

Here are some useful functions for list operations:


> Get the length of a list: `len()`.  
Get the maximum value in a list: `max()`.  
Check if a certain element or value is in a list: `in`.  
Sort the list: `sorted()`.  
Count the instances of an element in the list: `.count()`.  
Coerce strings to a list: `list()`

Further documentation: https://www.programiz.com/python-programming/methods/list

### **Dictionary**
Dictionaries hold key:value pairs - in some languages these are called "hash tables".


### **Numpy Arrays**
NumPy is the fundamental package for scientific computing in Python. It is a Python library that provides a **multidimensional array object**, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more.

We would not go deep into NumPy here, but this is a very useful and basic data structure in Python that people use everyday. If you want to learn deeper about coding with Python, do check out the following link for how to use this powerful package:

https://numpy.org/doc/stable/user/index.html

## Basic Pandas usage

***Pandas***, a powerful Python data science package that provides infrastructure for working with complex tabular data.

Documentation: https://pandas.pydata.org/

In [1]:
# import the package
import pandas as pd

# Below are some lines to download the data we'll be using:
# all of the data is stored on an website called "github" where people can store and update 
# their own code and small datasets.
# The data for this tutorial is stored in our course github repo.
# This takes ~1 minute to run, since it has to download data.

!git clone https://github.com/CCB293/Fall-2022.git

# set current directory
!cd Fall-2022/week1

In [None]:
# show your current path

# list all files under the current path


Now let`s get into a dataset and use Pandas to manipulate and analyze the data.

Now let`s get a sense of the dataset structure.

### **Indexing**

`dataframe[column_lable]`.  
`dataframe.loc[row_label]`


### **Simple methods**

Now that we're up to speed on how to parse DataFrames, we can learn about the utility that pandas has for data exploration.

## Plotting with matplotlib

You can make publication ready figures using matplotlib and you can change almost anything about the plot above (e.g. labels, fonts, colors, resolution, size, grid, axis, save in different formats etc)

Details: https://matplotlib.org/stable/tutorials/index.html

In [None]:
 # plot dots
 # plot a line
 # set x label
 # set y label
 # set title
 # save the figure

There are many packages developed for plotting in Python, Matplotlib here is the most basic one. Many other times there exist packages that are optimized for different types analysis that built upon Matplotlib. **Seaborn** is a very famous example of this. Check out the following link for plotting with seaborn:

https://seaborn.pydata.org/