# Section 3: Github Tutorial and Project Ideas

## Version Control

Version control (or revision control, or source control) is all about managing multiple versions of documents, programs, and web sites.

**A version control system does these things:**
- Keeps multiple (older and newer) versions of everything (not just source code).
- Requests comments regarding every change.
- Allows “check in” and “check out” of files so you know which files someone else is working on.
- Displays differences between versions.

**Benefits of version control:**
- For working by yourself: Gives you a “time machine” for going back to earlier versions.
- For working by yourself: Gives you great support for different versions of the same basic project.
- For working with others: Greatly simplifies concurrent work, merging changes.

## Github Tutorial

**Git** is a free and open source distributed **version control system** designed to handle everything from small to very large projects with speed and efficiency.

**GitHub** is a **web-based Git repository hosting service**, which offers all of the distributed revision control and source code management (SCM) functionality of Git as well as adding its own features.

 ### 1. Install git

- Installing

    Open the Terminal / Command Prompt and type in the following
    
    If you don't have it installed, it will prompt you to install it.
    
    If you want an updated version, go to https://git-scm.com/download/mac or https://git-scm.com/download/win

`git --version`

- Set up (onlye need to do once)

    Set up our configuration of git in the following way (which allows for easy collaboration):

` git config --global user.name "<your_name_here>" `

` git config --global user.email "<your_email@email.com>`

### 2. Create a new repository & Clone a remote repository

**Your project team should create a GitHub repository.** Each team member should have push access to the repository. 

**Add a file named README.md to the repository**, in which you state the name of your project, list the names and NetIDs of the project members, and describe your project in a paragraph or two.

![Screen%20Shot%202020-09-22%20at%2011.15.33%20PM.png](attachment:Screen%20Shot%202020-09-22%20at%2011.15.33%20PM.png)

You can **clone** your newly created repository or other remote repositories to view it in your local repository.

![Screen%20Shot%202020-09-22%20at%2011.18.00%20PM.png](attachment:Screen%20Shot%202020-09-22%20at%2011.18.00%20PM.png)

1. Open the repository webiste to copy the path (click on Code then select HTTPS). On Terminal / Command Prompt, clone the directory

`git clone https://github.com/ORIE4741/section.git`

2. Locate the directory: cd is a linux command standing for changing working directory.

`cd ./Desktop/section`

Your top-level working directory contains everything about your project. 

At any time, you can take a “snapshot” of everything (or selected things) in your project directory, and put it in your repository. This process is called **commit**.

You can work as much as you like in your working directory, but the repository isn’t updated until you commit something.

`git status`

### 3. Add, Commit, Push and Pull

We can see that our file is untracked, and hence we can **add it and then commit it**. We need to make a commit command to tell git “OK, this is a point that I want to you to mark things down so that if I mess up in the future I can return to this state”.

`git add -A` # -A stands for adding all changes

`git commit -m "add section3"`

-m stands for message. It is required by Git that we should include a git message every time so it could be easy for us to trace back to a time stamp that we sort of know what we are doing.

`git push`

![Screen%20Shot%202020-09-22%20at%209.48.06%20PM.png](attachment:Screen%20Shot%202020-09-22%20at%209.48.06%20PM.png)

Everytime we want to push something or before working on our local repo, we should **git pull to see if there is anything being made to avoid unexpected things. (Important!!)**


`git pull`

### 4. Fork & pull request

A fork is a copy of a repository that you manage. Forks let you make changes to a project without affecting the original repository. You can fetch updates from or submit changes to the original repository with pull requests.

A great example of using forks to propose changes is for bug fixes. Rather than logging an issue for a bug you've found, you can:
- Fork the repository.
- Make the fix.
- Submit a pull request to the project owner.

This will be used for your project submission. https://github.com/ORIE4741/ProjectsFall2020. (Every group only needs 1 person to do this).

The following steps can be done on Github webpage
1. Creating a fork

2. **Add the link to your fork repository**: create a file with your project link inside and commit the new file.

3. Create a pull request from the fork in the ProjectsFall2020 repo.

If you've forked a repository and added your repo link in a new file, you can create a pull request to ask us to accept your changes.

### 5. Raise an issue
You can submit comments on the proposal by opening an issue on the group's github repo (do not write grades inside)

You may read the [tutorial](https://www.atlassian.com/git/tutorials/learn-git-with-bitbucket-cloud) to get more knowledge of Git. 

### 6. Use the GUI

After installing Git, we can also use git on the graphical user interface (GUI). There are a lot of GUI git clients, e.g.  [Github Desktop](https://desktop.github.com/).

## Project Ideas

### Checklist for Data analysis
- Find a dataset you want to use.
- Formulate an important question based on your selected dataset.
- Pick a few algorithms that are suitable for your formulated question. Linear model usually finds something interesting and can serve as a baseline.
- Apply them on your dataset. Pay attention to missing values, tuning parameter selection, training/validation/test data split, feature selection, ...

### Checklist for Algorithm development
- State the problem you want to solve and previous work on solving this problem.
- State your proposed new method and its' advantage over previous work.
- Design experiments (synthetic and real data) and show your proposition outperforms previous work. Performance might mean many things: accuracy, speed, computational resources, interpretability, fairness, robustness to corruptions in the data, …

### Example projects

[Can we predict basketball players’ performance?](https://github.com/andrewkozma/4741-Project/blob/master/Final%20Report.pdf)

### Project ideas & Interesting data sets 

[Course Website](https://people.orie.cornell.edu/mru8/orie4741/projects.html)


[UCI](https://archive.ics.uci.edu/ml/index.php)

[OpenML](https://www.openml.org/)
