# Software Upgrade
### `! git clone https://github.com/ds4e/upgrade`

## Software Upgrade
- A lot of people work on Google Colab, and there's nothing wrong with that! But it can be frustrating to use Colab for a variety of reasons.
- At this point in the semester, regardless of where you started, you've played with Python and GitHub enough to understand packages, import statements, and some GitHub. Since you've seen the tools a bit, now is a great time to move over to a new IDE.
- I want to help people jump from Google Colab or Jupyter to VS Code: It just works a lot better, because Microsoft owns both GitHub and VS Code
- Disclaimer: I have literally never used a Mac and only use RStudio in Windows. I do not know what happens when you turn on a Mac. One reason I use VS Code below is to avoid dealing with your operating systems directly.
- If you use Windows, I highly recommend using the Windows Subsystem for Linux (WSL)
- In general, knowing Linux is worth it: Android is Linux, over 90% of servers run Linux, and it's a lot easier to work in than Windows 

## Required Software
- There is some software you need installed:
    1. Python, version 3: https://www.python.org/downloads/
    2. Git: https://git-scm.com/downloads 
    3. VS Code: https://code.visualstudio.com/ 
    4. Pip should automatically be included with Python, but you can install it if you don't have it, for some reason: https://pip.pypa.io/en/stable/installation/
- The details of this depend on your computer a lot; I can't really help you with this

# Some Command Line Interface

## Opening a Command Line/Terminal Window
- We want to talk directly to the computer:
    - On PC, push the windows key, type `cmd`, and enter
    - On a Mac, click Applications > Utilities > Terminal
    - On Linux, if it has a desktop interface, CTRL+ALT+T or there's a terminal option in the menu
- This lets you talk directly to your operating system, as it were, and ask it to run programs for you
- Typically, the command line also tells you where you are in the computer's file system

<img src="./src/cli.png" width="500">

- For PC/Mac/Linux, type `dir`/`ls`/`ls` to see a list of the files and directories at your current location
- Use `cd <directory>` to navigate into `directory` and `cd ..` to go up a level in the filesystem; `cd <path>` will take you directly along `<path>`
- To make a new directory, `mkdir <name>`

<img src="./src/shell_commands.png" width="500">

## Getting Started with VS Code
- I think it's great to be experienced and comfortable using the command line
- But VS Code certainly flattens the learning curve for using GitHub and Python more effectively
- Open VS Code and navigate somewhere "safe" to work
- Create a new directory, and open that folder in VS Code by typing `code <folder name>` or cd'ing into it and typing `code .`

<img src="./src/launch_code.png" width="500">

## Using VS Code
- In VS Code, click `Terminal`, `New Terminal`. A terminal panel should pop up at the bottom of your VS Code session, already in the directory where we're working
- Type
    - `python3 --version`
    - `pip --version`
- If you get an error, something is wrong with python or pip
- On many Macs and some Linux distributions, `python` refers to version 2 of python, but we want version 3. If you just type `python` and everything is OK, go ahead and just type `python` instead of `python3`

<img src="./src/vscode_terminal.png" width="500">

## Nice VS Code Features
- `CTRL + ENTER` : Run the code chunk
- `ALT + ENTER` : Run this code chunk, and open another of the same kind below
- `CTRL + C` : Split a cell
- `SHIFT + TAB` : Reverse indent selected lines of code
- `CTRL + /` : Comment out selected lines of code

# Virtual Environments

## Virtual Environments
- As a programmer working on many different projects, you will probably need many different versions of packages
- This becomes a problem: I don't necessarily want everything related to machine learning installed in the same space as everything related to, say, web design. Or, you are working on one project with version 1.2 of a package, but version 2.4 of another. 
- In order to compartmentalize your projects, Python includes a concept of a "virtual environment", where packages are installed into a directory with more limited scope
- This makes it easier to tailor your computing environment to the task at hand

## Virtual Environments 
- A virtual environment is just a directory with the desired packages (e.g. NumPy, SciPy, Sci-Kit Learn) installed in it, typically in the same directory as your code
- Often, it's named just `.venv`
- VS Code can "see" your virtual environments and make them available as **kernels** for your work
- This allows us to curate our computing environments and manage them more effectively

## Creating/Activating a Virtual Environment
- In the terminal panel, type `python3 -m venv test_venv`
- You should see a new directory appear called `test_venv`. This is like a fresh install of Python, with no packages already downloaded
- To switch to this virtual environment, type
    - Mac/Linux: `source test_venv/bin/activate`
    - Windows: `test_venv\scripts\activate`
- You should see `(test_env)` appear in your command line, telling you the virtual environment is activated
- In both cases, there is a script called `activate`, and you are telling your operating system to run it
- To turn the virtual environment off, type `deactivate test_venv`

<img src="./src/activate_venv.png" width="500">

## Virtual Environments and .ipynb
- In VS Code, right click in the file panel and create a new file called `test.ipynb`
- Open `test.ipynb` as a Jupyter notebook, click on `Select Kernel` in the upper right. Under `Python Environments` you should see `test_env`
- This will probably set off a bunch of installing packages and VS Code extensions, particularly `ipykernel`, in order to use VS Code and Jupyter together; just approve everything
- Now your terminal and ipynb file are both using the same `test_env` virtual environment 

<img src="./src/select_kernel.png" width="500">

## Installing Packages into the Virtual Environment
- In the `test.ipynb` notebook, try importing NumPy; it should fail
- In the terminal at the bottom of VS Code -- with the virtual environment activated -- type `pip3 install numpy`
- In the `test.ipynb` notebook, try importing NumPy again; it should succeed 
- You can quickly set up an environment for our class with the command
    - `pip install numpy matplotlib pandas seaborn scikit-learn`
- Remember, in order to install into your virtual environment, **it must be activated in the terminal where you're running the pip command**

## Hiding Your Virtual Environment
- You typically want to hide the virtual environment, since it's not really part of your project but instead part of Python's interpreter and libraries
- To set up a hidden virtual environment, put a . at the beginning of the name of the virtual environmentL: `python3 -m venv .secret_venv`
- In my VS Code session, I can see the virtual environment in the files pane, but it does not show up when I type `ls`/`dir` in the terminal
- Most importantly, it does show up as an option for picking a kernel for VS Code

# Working with VS Code and GitHub

## Creating a New Repo on GitHub
- Now's a good time to go to Github and create your own repo: On your GitHub page, click `Repositories` and then the green `New` button
- Call it something like `test`. Add a `.readme` file so it has content when initialized. It can be public or private: We'll use VS Code to manage that
- Starting your project on GitHub with a .readme file and cloning it obviates a bunch of tedious settings issues

<img src="./src/new_repo.png" width="500">

## Cloning a Local Copy
- Now clone your test repo in the terminal panel: `git clone https://github.com/username/test`
- In the terminal panel, type `code test` (or `code <your repo's name>`) to launch a new session of VS Code, inside that repo
- Open a terminal panel in your new VS Code session, and type `git status` to check that Git is running
- To create a new notebook, right-click in the file panel and type `new_notebook.ipynb`
- You'll need to create a new virtual environment, but you can do that inside VS Code: Click `Select Kernel`, then `+ Add New Python Environment`, `Venv`, and then the version of Python 3 that you use
- This creates a hidden virtual environment called `.venv` that you can activate and manage, but it will be ignored by Git

<img src="./src/clone.png" width="500">

- To set up your new virtual environment, 
    1. Open a new terminal window
    2. Type `source .venv/bin/activate` to activate the virtual environment
    3. Type `pip install numpy matplotlib pandas seaborn scikit-learn`, or whatever list of packages you need
- This will set up `.venv` to work like Colab
- If you come back to this repo, it will have the virtual environment set up, so there's no need to activate it in the terminal window unless you need to install new packages: You can just select `.venv` from `Select Kernel` as you work

## Signing into GitHub from VS Code
- Click the "Person" icon in the bottom left, "Turn on Cloud Changes...", then "Sign in with GitHub". This will connect VS Code to your GitHub account and manage all the security issues.
- If you click the person again, you should see your GitHub username mentioned.
- When cloning a private repo, this can save you substantial typing: Open a VS Code session, open a terminal, log in to GitHub through VS Code, use the terminal to clone the private repo, type `code <repo>` to launch a new session in the cloned repo 

<img src="./src/github_sign_in.png" width="500">

# VS Code and GitHub

- Do some work in your new .ipnyb file in your test directory
- Click the Git panel; it looks like a little tree
- Now you can fetch/pull/clone all of your repos, as well as push back to them from VS Code:
    1. Write a commit message and press `Commit`; you should see your new commit show up in the log below
    2. Now you can click `Sync Changes` or `Push`, and VS Code will push your work back to GitHub
    3. If you want to update your local copy with other changes that have hit GitHub, click the "downarrow" `Pull` button on the bottom half of the GitHub panel
- This makes working with GitHub outrageously easy 

<img src="./src/github_panel.png" width="500">

## Git Commands
- The little arrow/circle buttons let you issue some useful Git commands:
    - The dashed downarrow is *Fetch*: Download changes for user to review
    - The solid downarrow is *Pull*: Download changes and merge them into your local copy
    - The uparrow into the cloud is *Publish Branch*: Return work to GitHub, potentially as a new branch or repo 
- The `...` by **SOURCE CONTROL** allows you to run most Git commands from VS Code

# Branches

## Branches
- In your groups, you probably want to work on features of a project separately but in parallel
- One nice way to do this is **branches**: Splits in the tree underlying the Git version control system, allowing people to work on different parts of the project at the same time without creating conflicts

## Creating a Branch
- At the command line, doing things in Git is easy, but there can be problems when you say "Git" but you really mean "GitHub"
- To create a new branch on GitHub for your repo, go to the repo, and click on the `Main` button with the Git logo
- Type the name of the branch `<name>` you want to create, then click `Create branch <name> from main`
- This creates "parallel" copies of the project that different people can work on, that can hopefully be **merged** back together later

<img src="./src/new_branch.png" width="500">

## Listing/Switching Branches
- First, clone the repo or pull the current state of the repo 
- To see what branches are available, at the command line in the repo, type `git branch -r`
- To switch to working on another branch, `git checkout <branch name>`
- To check your current branch, `git branch`
- Follow these steps to work on your branch, make a commit, and then push the changes back

<img src="./src/get_new_branch.png" width="500">

<img src="./src/commit_to_new_branch.png" width="500">

## Pull/Merge Requests
- Back on GitHub, the repo will recognize the push to the branch: There will be a yellow box reporting the push, and a green button saying `Compare & pull request` 
- Clicking the button takes you to an analysis of the proposed changes and whether they are feasible; if they are, clicking through the green buttons will merge the changes on the development branch into the main branch, updating the whole project
- As a group, you should practice creating branches, pushing, and handling pull requests with the labs: It's a great way to collaborate on the labs, learn how to use GitHub for group work, 

<img src="./src/pull_request.png" width="500">

<img src="./src/pull_request_1.png" width="500">

<img src="./src/merge.png" width="500">

## Managing Teams
- Honestly, you should probably just have people work in different .ipynb or .py files on the main branch, so that pull requests are really easy to handle and branching/merging isn't necessary
- You should try to always work on development branches and merge big changes into the main branch, rather than working directly on the main branch and pushing changes to it
- Otherwise, you have to work out some understanding of who is working on what part of the project when, to avoid merge conflicts

# Conclusion

## Miniconda
- It might strike you: "Wow, for data science/machine learning, virtual environments are really inefficient"
- Indeed! Having dozens of copies of NumPy, Pandas, Scikit, and Keras on your computer will take up a lot of space
- An alternative system has been created that has a "centralized" set of virtual environments that can be accessed from any repo
- This is called **Miniconda**: https://www.anaconda.com/docs/getting-started/miniconda/main 
- The integration with VS Code is very similar through `Select Kernel`, and there are hardware-related reasons to use miniconda over venv
- Installing requires some work: https://www.anaconda.com/docs/getting-started/miniconda/install#quickstart-install-instructions
- Creating environments, activating them, and so on are very similar to venv

## Conclusion
- You don't "need" to use VS Code: There are lots of great IDE's out there (PyCharm, Sublime, Jupyter, Spyder)
- But VS Code is a great tool for increasing your productivity locally
- Especially if you are working on Colab and starting to find its limitations frustrating, now is a great time to switch