In [1]:
import pandas as pd

# Opening a File on Google Colab

This workbook will show how to open a file from a GitHub repository, which is where all the files we will use live, on Google Colab. We will also cover how to deal with data that might be in a different location, and how to install packages that are not already installed on Google Colab; the later two things are workarounds that you may need if the file was originally intended to be run on your own computer.

## Opening a File from GitHub

To open a file from GitHub on Colab we need to know where that file is located - for these purposes we can use the repository and the filename. The steps are:

1. Open Google Colab

![Open from GH 1](../images/open_gh_1.png "Open from GH 1")

2. Select the repository, the link to the workbooks for the course is: https://github.com/AkeemSemper/ML_for_Non_DS_Students 

![Open from GH 2](../images/open_gh_2.png "Open from GH 2")

3. Choose the file to open

![Open from GH 3](../images/open_gh_3.png "Open from GH 3")

## Grabbing Data

Many files will use some other data file, such as a CSV. If we are opening a file remotely, we may need to "grab" that file so we can use it. We can do this using the `wget` command. For example, if we wanted to grab the `iris.csv` file from the `data` folder in the `data-science-for-bioscientists` repository, we would use the following command:

```python

!wget https://raw.githubusercontent.com/SmithsonianWorkshops/data-science-for-bioscientists/main/data/iris.csv

```

### Magic Commands

The `!` at the beginning of the command is a "magic command" that tells Colab to run the command as if it were in a terminal. This is necessary because Colab is running in a virtual machine, and we need to tell it to run the command in the virtual machine's terminal. This is basically the same as opening a terminal on your own computer and running the command there, and it is something we commonly need to do when working in a remote environment.

If you have some familiarity with Linux/Unix or Mac OS commands, then the specific commands that come up here may be familiar to you. If not, we can just look up what we need. Most times that we use something like this it is to either download a file, or install some package. 

In the command below I'll just ask pip, which is a program used to install Python packages, to give me some info. If I was to open a terminal (Terminal -> New Terminal) and type the command without the `!` at the beginning, I would get the same result.

In [5]:
!pip -help


Usage:   
  pip <command> [options]

Commands:
  install                     Install packages.
  download                    Download packages.
  uninstall                   Uninstall packages.
  freeze                      Output installed packages in requirements format.
  inspect                     Inspect the python environment.
  list                        List installed packages.
  show                        Show information about installed packages.
  check                       Verify installed packages have compatible dependencies.
  config                      Manage local and global configuration.
  search                      Search PyPI for packages.
  cache                       Inspect and manage pip's wheel cache.
  index                       Inspect information available from package indexes.
  wheel                       Build wheels from your requirements.
  hash                        Compute hashes of package archives.
  completion                  A helper co

## Installing Packages

If you are using a package that is not already installed on Google Colab, you will need to install it. You can do this using the `pip` command. For example, if we wanted to install the `pandas` package, we would use the following command:

```python

!pip install pandas

```

The majority of basic things that we might need are already installed, but not all. Each time we use Colab we get a brand new environment, so we might need to reinstall things each time we open the file. 

## Checking if we are on Colab

If we want to check if we are on Colab, we can use the following code:

This will allow us to build something that can check if we are in Colab or not, and do a different action depending on that answer. For example, we might want to install a package if we are on Colab, but not if we are not; or we might want to grab a file if we are on Colab, but not if we are not.

In [None]:
import sys
IN_COLAB = 'google.colab' in sys.modules

print(f"Am I in Colab? {IN_COLAB}")

## Example Code Block

This code snipit will check if we are in colab, download files if we are, and set variables for the file paths correctly. If we are not in Colab, it'll look for the local files instead.

In [2]:
import sys
IN_COLAB = 'google.colab' in sys.modules

if IN_COLAB:
    !wget -nc https://raw.githubusercontent.com/AkeemSemper/ML_Introduction_for_non_Analysts/combo_content/data/titanic_train.csv
    FILE_PATH = 'titanic_train.csv'
    !wget -nc https://raw.githubusercontent.com/AkeemSemper/ML_Introduction_for_non_Analysts/combo_content/data/sportsref_download.xlsx
    FILE_PATH_2 = 'sportsref_download.xlsx'
else:
    FILE_PATH = '../data/titanic_train.csv'
    FILE_PATH_2 = '../data/sportsref_download.xlsx'

df = pd.read_csv(FILE_PATH)
df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [6]:
import plotly.graph_objects as go
import numpy as np

# Define the function to optimize
def f(x, y):
    return x**2 + y**2

# Define the gradient of the function
def gradient(x, y):
    return np.array([2*x, 2*y])

# Define the learning rate and number of iterations
learning_rate = 0.1
num_iterations = 100

# Initialize the starting point
x = -5
y = -5

# Perform gradient descent
trajectory = []
for i in range(num_iterations):
    trajectory.append((x, y))
    grad = gradient(x, y)
    x -= learning_rate * grad[0]
    y -= learning_rate * grad[1]

# Create the surface plot
x_vals = np.linspace(-6, 6, 100)
y_vals = np.linspace(-6, 6, 100)
X, Y = np.meshgrid(x_vals, y_vals)
Z = f(X, Y)

# Create the trajectory plot
trajectory_x = [point[0] for point in trajectory]
trajectory_y = [point[1] for point in trajectory]
trajectory_z = [f(point[0], point[1]) for point in trajectory]

# Create the figure
fig = go.Figure()

# Add the surface plot
fig.add_trace(go.Surface(x=x_vals, y=y_vals, z=Z, colorscale='Viridis', showscale=False))

# Add the trajectory plot
fig.add_trace(go.Scatter3d(x=trajectory_x, y=trajectory_y, z=trajectory_z, mode='lines', line=dict(color='red', width=3)))

# Set the layout
fig.update_layout(scene=dict(xaxis_title='X', yaxis_title='Y', zaxis_title='Z'))

# Show the plot
fig.show()


In [11]:
import plotly.graph_objects as go
import numpy as np

# Define the function to optimize
def f(x, y):
    r = np.sqrt(x**2 + y**2)
    return np.sin(r) / r

# Define the gradient of the function
def gradient(x, y):
    r = np.sqrt(x**2 + y**2)
    return np.array([x * (np.cos(r) / r - np.sin(r) / r**2), y * (np.cos(r) / r - np.sin(r) / r**2)])

# Define the learning rate and number of iterations
learning_rate = 0.1
num_iterations = 100

# Initialize the starting point
x = -5
y = -5

# Perform stochastic gradient descent
trajectory = []
for i in range(num_iterations):
    trajectory.append((x, y))
    grad = gradient(x, y)
    x -= learning_rate * grad[0]
    y -= learning_rate * grad[1]

# Create the surface plot
x_vals = np.linspace(-6, 6, 100)
y_vals = np.linspace(-6, 6, 100)
X, Y = np.meshgrid(x_vals, y_vals)
Z = f(X, Y)

# Create the trajectory plot
trajectory_x = [point[0] for point in trajectory]
trajectory_y = [point[1] for point in trajectory]
trajectory_z = [f(point[0], point[1]) for point in trajectory]

# Create the figure
fig = go.Figure()

# Add the surface plot
fig.add_trace(go.Surface(x=x_vals, y=y_vals, z=Z, colorscale='Viridis', showscale=False))

# Add the trajectory plot
fig.add_trace(go.Scatter3d(x=trajectory_x, y=trajectory_y, z=trajectory_z, mode='markers', line=dict(color='red', width=1)))

# Set the layout
fig.update_layout(scene=dict(xaxis_title='X', yaxis_title='Y', zaxis_title='Z'))

# Show the plot
fig.show()