# Google Colab & Github Repository

This workbook will show how to open a file from a GitHub repository, which is where all the files we will use live, on Google Colab. We will also cover how to deal with data that might be in a different location, and how to install packages that are not already installed on Google Colab; the later two things are workarounds that you may need if the file was originally intended to be run on your own computer.

<h3>Outline<span class="tocSkip"></span></h3>
<hr>
<div class="toc"><ul class="toc-item"><li><span><a href="#1.-Google-Colab-A-Quick-Overview" data-toc-modified-id="1.-Google-Colab-A-Quick-Overview-1">1. Google Colab: A Quick Overview</a></span></li>
    <li><span><a href="#2.-Advantages-of-Using-Google-Colab" data-toc-modified-id="2.-Advantages of-Using-Google-Colab-2">2. Advantages of Using Google Colab</a></span></li>
    <li><span><a href="#3.-How-to-Run-GitHub-Notebooks-in-Google-Colab" data-toc-modified-id="3.-How-to-Run-GitHub-Notebooks-in-Google-Colab-3">3. How to Run GitHub Notebooks in Google Colab</a></span></li>
    <li><span><a href="#4.-Grabbing-Data" data-toc-modified-id="4.-Grabbing-Data-4">4. Grabbing Data</a></span></li>
    <li><span><a href="#5.-Magic-Command" data-toc-modified-id="5.-Magic-Command-5">5. Magic Command</a></span></li>
    <li><span><a href="#6.-Installing-Packages" data-toc-modified-id="6.-Installing-Packages-6">6. Installing-Packages</a></span></li>
    <li><span><a href="#7.-Checking-if-we-are-on-Colab" data-toc-modified-id="7.-Checking-if-we-are-on-Colab-7">7. Checking if we are on Colab</a></span></li>

### 1. Google Colab: A Quick Overview
<hr>

Google Colab, or Google Colaboratory, is a cloud-based platform for writing and running Jupyter notebooks using Google’s cloud resources. It provides free access to CPUs, GPUs, and TPUs, making it ideal for resource-intensive tasks like machine learning and data analysis.

### 2. Advantages of Using Google Colab
<hr>

Why should you consider running notebooks in Google Colab?

- **Powerful Hardware**: Colab offers access to robust hardware resources, enabling you to run demanding models that may not be feasible on your local machine.
- **Collaboration**: It allows for easy sharing of your notebooks with others, making it an excellent tool for collaboration.
- **Language Support**: Colab supports multiple programming languages, including Python, R, and Julia, making it a versatile platform for running notebooks.

### 3. How to Run GitHub Notebooks in Google Colab
<hr>

To open a file from GitHub on Colab we need to know where that file is located - for these purposes we can use the repository and the filename. The steps are:

1. Open Google Colab - "Open Notebook"

![Open from GH 1](../images/colab.gif "colab")

2. Select the repository, the link to the workbooks for the course is: https://github.com/Dong2Yo/PredcitiveAnalysis_BusinessProfessional

![Open from GH 2](../images/colab2.gif "colab2")

3. Choose the file to open

![Open from GH 3](../images/colab3.gif "Open from GH 3")

### 4. Grabbing Data
<hr>

Many files will use some other data file, such as a CSV. If we are opening a file remotely, we may need to "grab" that file so we can use it. We can do this using the `wget` command. For example, if we wanted to grab the `iris.csv` file from the `data` folder in the `data-science-for-bioscientists` repository, we would use the following command:

```python

!wget https://raw.githubusercontent.com/SmithsonianWorkshops/data-science-for-bioscientists/main/data/iris.csv

```


### 5. Magic Command
<hr>


The `!` at the beginning of the command is a "magic command" that tells Colab to run the command as if it were in a terminal. This is necessary because Colab is running in a virtual machine, and we need to tell it to run the command in the virtual machine's terminal. This is basically the same as opening a terminal on your own computer and running the command there, and it is something we commonly need to do when working in a remote environment.

If you have some familiarity with Linux/Unix or Mac OS commands, then the specific commands that come up here may be familiar to you. If not, we can just look up what we need. Most times that we use something like this it is to either download a file, or install some package. 

In the command below I'll just ask pip, which is a program used to install Python packages, to give me some info. If I was to open a terminal (Terminal -> New Terminal) and type the command without the `!` at the beginning, I would get the same result.

In [2]:
!pip -help


Usage:   
  pip <command> [options]

Commands:
  install                     Install packages.
  download                    Download packages.
  uninstall                   Uninstall packages.
  freeze                      Output installed packages in requirements format.
  inspect                     Inspect the python environment.
  list                        List installed packages.
  show                        Show information about installed packages.
  check                       Verify installed packages have compatible dependencies.
  config                      Manage local and global configuration.
  search                      Search PyPI for packages.
  cache                       Inspect and manage pip's wheel cache.
  index                       Inspect information available from package indexes.
  wheel                       Build wheels from your requirements.
  hash                        Compute hashes of package archives.
  completion                  A helper co

### 6. Installing Packages
<hr>

If you are using a package that is not already installed on Google Colab, you will need to install it. You can do this using the `pip` command. For example, if we wanted to install the `pandas` package, we would use the following command:

```python

!pip install pandas

```

The majority of basic things that we might need are already installed, but not all. Each time we use Colab we get a brand new environment, so we might need to reinstall things each time we open the file. 

### 7. Checking if we are on Colab
<hr>

If we want to check if we are on Colab, we can use the following code:

This will allow us to build something that can check if we are in Colab or not, and do a different action depending on that answer. For example, we might want to install a package if we are on Colab, but not if we are not; or we might want to grab a file if we are on Colab, but not if we are not.

In [4]:
import sys
IN_COLAB = 'google.colab' in sys.modules

print(f"Am I in Colab? {IN_COLAB}")

Am I in Colab? False


#### Example Code Block

This code snipit will check if we are in colab, download files if we are, and set variables for the file paths correctly. If we are not in Colab, it'll look for the local files instead.

In [7]:
import sys
import pandas as pd
IN_COLAB = 'google.colab' in sys.modules

if IN_COLAB:
    !wget -nc https://raw.githubusercontent.com/Dong2Yo/Dataset/main/titanic.csv
    FILE_PATH = 'titanic.csv'
    
else:
    FILE_PATH = '../data/titanic.csv'
    
df = pd.read_csv(FILE_PATH)
df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S
