# Git introduction

## Version control

Imagine you made code for a customer. Everything is working. The customer asked you to rework some of the functionality. He thinks that such functionality is necessary. You made changes, updated everything and forgot about it for a while.

A month later, the customer wrote that his assumption was wrong and he would like to return everything as it was. But you don’t have the previous version of the files saved and you have to rewrite everything again.


This would not happen if versions of files from different times could be saved. We would just go back to the previous version and not have to do anything again. Imagine that all history is saved and you can return to the code from the past

As code is written and revised, different versions can be saved. This is why Git is called a version control system.

## Teamwork

In real work, you don't need to write code in a team. Several people can work on one project.

For example, you are creating functionality with a friend. Every time you make changes to the code, you send your friend an updated archive. A friend downloads it, makes his own changes, and sends it to you. Next time, you didn’t wait for your friend to complete his part of the work, and made changes, because working in parallel is faster.

As a result, you simultaneously sent each other archives with changes. But how to combine them? There are a lot of changes and you don’t really remember what they changed. You'll have to open your code on the left, your friend's code on the right, and check everything manually. Git will do this automatically.

# Install Git

Download and install Git on your computer. https://git-scm.com/download/

When you run the installer, you will be prompted to use GitBash - this is another program for fully working with Git. Be sure to check the box to install it. Also check the box: Use Git... from the Command Prompt

# Command line

The command line is also an interface, only text-based. You need to enter commands in it to interact with programs. For example, usually, to create a folder, you right-click and select the desired item from the graphical menu. But you can also create a folder through the command line by entering the appropriate command and folder name.

## Navigation

The GUI always makes it clear where exactly you are in the file system. If you have a desktop in front of you, then you are in the “Desktop” folder. If the Documents folder is open, you are in it.

In the command line, you are also always in some folder, it’s just not visible. The `pwd` (Print Working Directory) command allows you to find out where you are now. Type pwd into the command line and press Enter:

<img src="./pictures/9.png"  
  width="600"
/>

The command line displayed the answer - this is the path to the folder in which we are now. Instead of `MSI` you will have your username. This is the home directory - the directory with user files.

In the GUI you simply see the contents of the open folder. To do this on the command line, there is the `ls` command. This is short for `list directory contents`

<img src="./pictures/10.png"  
  width="600"
/>

We realized where we were and looked around, it’s time to learn to walk. To move from one folder to another, use the `cd` command. This is short for change directory. The command syntax is: `cd folder_name:`

<img src="./pictures/11.png"  
  width="600"
/>

If there are spaces in the folder name, use quotes:

To return to the parent directory - that is, to a higher level - instead of the folder name, you need to write two dots: `..`:

## Creating folders and files

It's time to act. The `mkdir` command creates a folder. The command needs to be passed the name of the new folder.

Create a folder `py_proj` in your home directory - it will become the place for your educational projects:

Let's move to the `py_proj` folder, create a `kaggle_riga` folder in it and move to it:

Let's create the necessary files. This is done with the `touch` command, passing it the file name, for example:

<img src="./pictures/12.png"  
  width="700"
/>

## Deleting folders and files

Finally, let's learn how to delete files and folders via the command line. To remove files use the `rm`(remove) command:

You can delete a folder with the command `rmdir`

But if there are any files in the folder you are trying to delete, the command line will not delete it and will display a message that the folder is not empty:

<img src="./pictures/13.png"  
  width="600"
/>

This is protection against accidental deletion of necessary files. If you still need to delete the folder, you can use the `rm` command like this:

In this case, `-r` is called a switch. They say, "we called the rm command with the r option." This key is responsible for recursively deleting files and folders. This means that the deletion process will be applied to the entire contents of the directory. Be careful, deleting files with the `rm` and `rmdir` commands is irreversible - they go past the recycle bin and are deleted forever.

# Git setup

## Settings

To make it clear to other developers who made what changes, you need to choose a name for yourself - just like in a computer game. Let's do this and start using Git to its full potential.

To do this, run the git config command on the command line with the --global option. Enter your data as user.name and user.email:

Now check what happened: run the `git config` command with the `--list` option.

<img src="./pictures/14.png"  
  width="600"
/>

# Connect Git to the project

In order for Git to start working in a project, the folder with it needs to be made a git repository. This means that Git will start tracking all changes within this directory. To do this, go to the project folder and enter the command:

The command line will tell you that the repository has been initialized.
Make the `project` folder your git repository. To do this, go to the `project` folder with the `cd` command and write `git init` to the terminal.

Run the git status command. Git will report that there are two files in this folder: `riga.ipynb`, `riga.csv` and the `data` folder:

<img src="./pictures/15.png"  
  width="600"
/>

Git calls these files “Untracked files”. This means that Git sees them, but if you try to save their version now, Git won't do it.

This is not what we wanted, our goal is to learn how to save versions. To do this, the files need to be prepared for saving. The `git add` command is responsible for this.

When using it, you need to specify the name of the file whose fate we want to record in its current form, for example: `git add riga.ipynb`.

If we want to save the state of all files, we can use the `--all` option. `git add --all` or shorter `git add -A`:

<img src="./pictures/16.png"  
  width="600"
/>

Files marked in green are ready to be saved in their current states. This is exactly what the git add command told Git. But saving has not happened yet - first we need to tell Git what we want to save, and only then save.

Saving a version of files in Git is called a “commit.” Making a commit means saving the current state of the files.

If you edit any of the files now, they will turn red again:

<img src="./pictures/17.png"  
  width="600"
/>

In order for its updated version to be included in the commit, you need to use `git add` again

# Make the first commit

When all the files are ready to be saved, we will make our first commit - we will record all the changes made in the “combat version”.

This is done with the `git commit` command with the `-m` option. After `-m` comes the name of the commit in quotes:

"first commit" is not a good name. Commits should be named so that you can later understand what changes were made.

For example:
- 20230801 data preparation
- 20230912 add linearregressor model

The `git log` command will show the commit history:

<img src="./pictures/18.png"  
  width="600"
/>

# Teamwork. GitHub

Up to this point we have been using Git locally. But to share the repository, you need to create a remote version of it. There are several platforms that allow you to do this. The most popular is `GitHub`, which is what we will use.
To start using Github, you need to register on it.

## Registration

https://github.com/signup?ref_cta=Sign+up&ref_loc=header+logged+out&ref_page=%2F&source=header-home

Everything is simple here, enter your email, login and password and you will be taken to the Github interface. Choose a free account “Choose free”. You will be asked to take a short survey: what knowledge do you have, what do you do. You can skip the questionnaire - to do this, scroll to the bottom of the page and click the “Complete setup” button. All that remains is to confirm your registration. An email with a link to activate your account has been sent to the email address you entered - follow it.

## Generate SSH keys

GitHub is a service where code is stored in “safety deposit boxes” - repositories. And when they try to take away this code or make changes to it, GitHub must make sure that the computer from which the service is accessed has rights to make changes or read data. To create a “passport” on the computer from which you will interact with GitHub, you need to create an SSH key and add it to your GitHub account.

The transport `SSH protocol` is like a “toll highway”. Through the protocol, you can receive data from a remote computer or write it to it. The protocol encrypts traffic and is therefore secure.

### Checking for an SSH key

Make sure you don't already have a pass.

By default, the directory with SSH keys is located in the user's home directory, let's go there:

Typically, SSH keys are located in the `.ssh/` directory; you can check the presence of this directory and the files in it using the following command:

If the folder is empty or does not exist, then everything is fine.

If there are files with a similar name (id_dsa.pub, id_ecdsa.pub, id_ed25519.pub), SSH keys have already been created

<img src="./pictures/19.png"  
  width="600"
/>

### Generating an SSH key

1. Generate private and public keys in the terminal. The public extension has .pub, the private extension does not. Both keys will be saved on your computer. Public is needed to link to Git. In the e-mail line, be sure to indicate the email address that is linked to Github:

If you receive an error message, your system most likely does not support the ed25519 encryption algorithm. Everything is fine, in this case it is enough to use another algorithm:

2. Specify the location where the keys are stored. A simple option is to make the user's home directory the default path. To do this, press Enter:

3. Create an access password for the SSH key. It must be entered every time you connect via the protocol. So remember it or write it down:

When you enter a password, the characters will not appear on the screen, but in fact, the password is being entered. But you can leave the field blank so you never enter a password. To do this, press Enter

<img src="./pictures/20.png"  
  width="600"
/>

4. Run the ssh-agent command in the background. It looks for the SSH key on your computer:

Bind the private key to ssh-agent. Then you will not have to enter a password every time you work with the repository. Please note that it is the private key that needs to be bound to the agent - this is a file without the `.pub` extension:

### Linking an SSH key to a GitHub account

1. Copy the public key to the clipboard:

If clip doesn't work, find the hidden .ssh folder, open the id_rsa.pub or id_ed25519.pub file in any text editor. Copy its contents to the clipboard.

2. Go to Github and open your account settings:

<img src="./pictures/21.png"  
  width="1000"
/>

3. In the left menu, select ″SSH and GPG keys″:

<img src="./pictures/22.png"  
  width="1000"
/>

4. In the tab that opens, select ″New SSH key or Add SSH key″

5. In the ″Title″ field, write a title. For example, ″Personal key″.

6. In the ″key″ field, copy your key from the clipboard

7. Click the ″Add SSH key″ button:

<img src="./pictures/23.png"  
  width="1000"
/>

That's it, your key is linked to Github. If you have set a password for the SSH key, you will have to enter it to work with the repository.

# Link local and remote repositories

## Creating a repository on Github

Go to your profile using the link: `https://github.com/your-username`. This is a presentation of you and your projects, other users can see it.

1. Let's create a repository. Click on the “Repositories” link

<img src="./pictures/24.png"  
  width="1000"
/>

2. The window for creating a new repository opens. Give it the name kaggle_riga.

The name of the remote repository does not have to match the name of the project folder on your computer. But to avoid confusion, we will call them the same.

After entering the name, click “Create Repository”. We are not interested in other fields yet.

Immediately switch to the SSH tab (next to HTTPS) and copy the remote repository address, you will need it in the next step.

<img src="./pictures/25.png"  
  width="1000"
/>

## Linking local and remote repositories

Once the GitHub repository has been created, you will see instructions with commands for uploading your code.

Note the command `git push -u origin **main**`. Here `main` is the name of the branch to which you are offered to upload the code, remember it.

Now move to the project folder with the `cd` command. To link a remote repository to a local one, you need to use the `git remote add` command. This command needs to pass two parameters: the name of the remote repository and its address (from the SSH tab). Like this (just replace SSH with yours):

By default, your local repository's main branch name is master, so if the remote repository uses any other name, for example main as in Github, then you will need to rename the local branch with the command:

Everything is clear with the address, but why is the repository name `origin` and not `kaggle_riga`? This is where things get a little confusing. In this case, the name of the repository may not coincide with its name on Github.

We will use this name to refer to the remote repository locally when entering commands into the command line. We could give the name `kaggle_riga`, but the name `origin` is the standard name of the remote repository. In the future, this will allow us to omit it in commands, Git will by default look for a remote repository named `origin`.

The repositories are linked, all that remains is to upload the code to GitHub. The `git push` command does this. Call it like this:

We will discuss the git push command in detail in the next lesson. In the next sprint we will look in more detail at what the word main means. In the meantime, go to the remote repository on Github and refresh the page

<img src="./pictures/26.png"  
  width="1000"
/>

# Synchronize local and remote repositories

You have registered, created and linked a local repository to a remote one. The hardest part is over, well done. All that remains is to figure out how to take changes from a remote repository and upload your own. This is what we will talk about.

## git push

You have already gone through the entire “commit cycle”: preparing files with the `git add` command, committing them with a comment using `git commit`. After you linked the local repository with the remote one, another command was added to this cycle - `git push` - to send changes, or simply “push”.

Add data loading to the `riga.ipynb` file. Prepare the files for the commit and commit. If everything is correct, the command line window after git status should look like this:

<img src="./pictures/27.png"  
  width="600"
/>

Go to your GitHub account in the kaggle_riga repository. You will see the changed files in the Github interface.

<img src="./pictures/28.png"  
  width="=800"
/>

## git pull

Now you are working on the repository alone, but on “combat” projects there are usually at least two of you. Imagine that your colleague has been working on a project all weekend, and you come in on Monday and want to publish the changes made on Friday. But the project already has a new version. To take back the changes made by your colleague, there is the `git pull` command - pull, or “pulse” the changes.

If you are not working on a repository alone, always type `git pull` before publishing changes. While you were writing code, someone could publish new changes.

# Cloning the repository

## git clone

Cloning a repository is usually the first thing a developer does at a new job. Let's simulate this situation. Create a second repository in your account and name it `ML practice`.

<img src="./pictures/29.png"  
  width="=600"
/>

Once cloning is complete, go inside the repository with the `cd` command, create the `lyb.py` and `ml_proj.ipynb` files and commit. After committing, push the changes to the remote repository with the command `git push -u origin main`. Then open an account on Github, select the `ML_practice` repository and you will see the published commit:

<img src="./pictures/30.png"  
  width="=600"
/>

You now have two ways to create a repository. But there is a nuance. You can clone absolutely any public repository, but you can only push it into one you created. Or if you received push rights from the repository owner.