Goals:
- Set up Python and Jupyter
    - Show Colab
    - Show VS Code
    - Show miniconda
    - Set up conda environment
- Get data
    - UCI, pick regression task
    - Put data on GitHub
    - Show how to access data in Colab
- Load data
    - Load with numpy
    - Load with pandas
- Show data
    - Matplotlib
    - Pick a couple features
- Regression
    - Derive gradient
    - Gradient descent

# Set up Python & Jupyter

### Google Colab: 
- runs on the cloud
- has most popular libraries
- has good support for widgets (better for data visualization)
- [https://colab.research.google.com](https://colab.research.google.com)
- cell magic and shell commands for advanced control (installing other things, [etc.](https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/01.05-IPython-And-Shell-Commands.ipynb#scrollTo=WclyQSP7cvmP))

### VSCode: 
- runs on the local system (though it can run remotely)
- better for larger projects with multiple Python modules
- I manage the environment
- [https://code.visualstudio.com/](https://code.visualstudio.com/)

### Miniconda
- Set up Python environments without messing with your system environment
- No need for it's larger sibling, conda
- [https://www.anaconda.com/docs/getting-started/miniconda/main](https://www.anaconda.com/docs/getting-started/miniconda/main)

### Set up an environment
- Terminal
- (ahead of time) `conda deactivate && clear`
- `conda create -n recitation python=3.12 jupyter numpy pandas matplotlib`
- `conda activate recitation`
- Getting stuff for an environment into a text file: `conda env export > env.yaml`
- Removing an environment: `conda env remove -n recitation`

# Get Data

### UCI's Machine Learning Repository: 
- [https://archive.ics.uci.edu](https://archive.ics.uci.edu)
- View datasets
- Find a dataset for regression that looks fun
    - Task: Regression
    - Data type: Multivariate
- Download

### Put data on GitHub
- [https://github.com](https://github.com)
- Create repository (empty)
- Copy repo URL
- VS Code terminal `cd ~/Developer`
- `git clone <repo URL>`
- Open in VS Code
- Into the repo, put: this notebook, the README.md, and the data folders
    - Notebooks are optional. I recommend using Colab, but will set up both here.
- Get data on GitHub
    - `cd new repo`
    - `git branch -M main`
    - `git add .`
    - `git commit -m "Initial commit"`
    - `git push -u origin main`
    - Publish branch
- View data on GitHub

### Data in Colab
- In a code cell: `!wget <URL>` (from raw link)
- Browser
    - Open data in side panel
- Edit cell:
```
%%capture
!wget <URL>
```
- New cell:
```
from pathlib import Path
DATA_PATH = "imports-85.data"
Path(DATA_PATH).exists()
```

# Load Data

In [None]:
# Check existence of data
from pathlib import Path
DATA_PATH = "imports-85.data"
Path(DATA_PATH).exists()

### Numpy

In [None]:
import numpy as np

In [None]:
np.loadtxt(DATA_PATH)

### Pandas

Let's see how these relate:
- Engine size (16) 
- City mpg (23)
- Highway mpg (24)
- Price (25)

In [None]:
data = df.iloc[:, []]  # Values here
data.columns = ["wt", "city-mpg", "hwy-mpg", "price"]
data

# Viewing data

In [None]:
import matplotlib.pyplot as plt

Pick a pair of features

In [None]:
x = "wt"
y = "price"
plt.scatter()  # Values here

# Regression

Dr. Ruozzi discussed linear regression in lecture. 

What's the goal of supervised learning? (Slide 10)

What hypothesis space do we use for linear regression? (Slide 13)

What's a loss function? Which loss function is suggested? (Slide 16)

How do we use a loss function? (Slide 16)

What's an alternative to an exact solution? (Slide 21)

## From Theory to Practicality

How do we use that alternative?

Implement it!

In [None]:
# Operation with vectors
x = data[x_label]
y = data[y_label]
a = 1
b = 0

# IMPLEMENTATION OF PARTIAL w.r.t $a$ HERE

In [None]:
# Check that this is a reasonable first value
x[0], y[0]

In [None]:
# Plug those values in


$$\theta_{t+1} = \theta_t - \gamma_t\nabla J(\theta_t)$$

In [None]:
# Define functions of (a, b, x, y)

In [None]:
# Define hyperparameters (l.r., init)

In [None]:
# Implement training loop

In [None]:
# Get final model
_, _ = _

# Apply model to even intervals of x
pred_x = np.linspace(x.min(), x.max(), 10)
pred_y = None

In [None]:
# Show result