<a href="https://colab.research.google.com/github/Talantttt/Error-Generator/blob/main/notebooks/colab-github-demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Using Google Colab with GitHub




[Google Colaboratory](http://colab.research.google.com) is designed to integrate cleanly with GitHub, allowing both loading notebooks from github and saving notebooks to github.

## Loading Public Notebooks Directly from GitHub

Colab can load public github notebooks directly, with no required authorization step.

For example, consider the notebook at this address: https://github.com/googlecolab/colabtools/blob/main/notebooks/colab-github-demo.ipynb.

The direct colab link to this notebook is: https://colab.research.google.com/github/googlecolab/colabtools/blob/main/notebooks/colab-github-demo.ipynb.

To generate such links in one click, you can use the [Open in Colab](https://chrome.google.com/webstore/detail/open-in-colab/iogfkhleblhcpcekbiedikdehleodpjo) Chrome extension.

## Browsing GitHub Repositories from Colab

Colab also supports special URLs that link directly to a GitHub browser for any user/organization, repository, or branch. For example:

- http://colab.research.google.com/github will give you a general github browser, where you can search for any github organization or username.
- http://colab.research.google.com/github/googlecolab/ will open the repository browser for the ``googlecolab`` organization. Replace ``googlecolab`` with any other github org or user to see their repositories.
- http://colab.research.google.com/github/googlecolab/colabtools/ will let you browse the main branch of the ``colabtools`` repository within the ``googlecolab`` organization. Substitute any user/org and repository to see its contents.
- http://colab.research.google.com/github/googlecolab/colabtools/blob/main will let you browse ``main`` branch of the ``colabtools`` repository within the ``googlecolab`` organization. (don't forget the ``blob`` here!) You can specify any valid branch for any valid repository.

## Loading Private Notebooks

Loading a notebook from a private GitHub repository is possible, but requires an additional step to allow Colab to access your files.
Do the following:

1. Navigate to http://colab.research.google.com/github.
2. Click the "Include Private Repos" checkbox.
3. In the popup window, sign-in to your Github account and authorize Colab to read the private files.
4. Your private repositories and notebooks will now be available via the github navigation pane.

## Saving Notebooks To GitHub or Drive

Any time you open a GitHub hosted notebook in Colab, it opens a new editable view of the notebook. You can run and modify the notebook without worrying about overwriting the source.

If you would like to save your changes from within Colab, you can use the File menu to save the modified notebook either to Google Drive or back to GitHub. Choose **File→Save a copy in Drive** or **File→Save a copy to GitHub** and follow the resulting prompts. To save a Colab notebook to GitHub requires giving Colab permission to push the commit to your repository.

## Open In Colab Badge

Anybody can open a copy of any github-hosted notebook within Colab. To make it easier to give people access to live views of GitHub-hosted notebooks,
colab provides a [shields.io](http://shields.io/)-style badge, which appears as follows:

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/googlecolab/colabtools/blob/main/notebooks/colab-github-demo.ipynb)

The markdown for the above badge is the following:

```markdown
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/googlecolab/colabtools/blob/main/notebooks/colab-github-demo.ipynb)
```

The HTML equivalent is:

```HTML
<a href="https://colab.research.google.com/github/googlecolab/colabtools/blob/main/notebooks/colab-github-demo.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>
```

Remember to replace the notebook URL in this template with the notebook you want to link to.

In [5]:
import random

def break_python_code(code: str) -> tuple[str, str, str]:
   #"""Возвращает: (исходный код, испорченный код, описание ошибки)"""
  if ':' in code:
    broken = code.replace(':', '', 1)
    return code, broken, "Ошибка: пропущен символ ':'"
  elif 'def ' in code:
    broken = code.replace('def', '', 1)
    return code, broken, "Ошибка: отсутствует ключевое слово 'def'"
  elif 'print' in code:
    broken = code.replace('print(', 'print', 1)
    return code, broken,  "Ошибка: пропущены скобки у функции print"
  else:
    code, code  #"Не удалось внести ошибку"


original, broken, error = break_python_code('for i in range(10): print(i)')
print('Original:', original)
print('Broken:', broken)
print('Error:', error)

Original: for i in range(10): print(i)
Broken: for i in range(10) print(i)
Error: Ошибка: пропущен символ ':'


In [6]:
import json

examples = [
    "for i in range(10): print(i)",
    "def hello():\n print('Hello')",
    "if x == 5:\n print('ok)",
    "print('Done')",
]

dataset = []

for _ in range(5000):
  code = random.choice(examples)
  original, broken, error = break_python_code(code)
  if original != broken:
    dataset.append({
        "input": broken,
        "output": f"{error}. Исправленный код: {original}"
    })

with open('synthetic_error_dataset.json', 'w', encoding = 'utf-8') as f:
  json.dump(dataset, f, indent = 2, ensure_ascii = False)

In [9]:
import os
os.environ["WANDB_DISABLED"] = "true"

from transformers import GPT2Tokenizer, GPT2LMHeadModel, Trainer, TrainingArguments, TextDataset, DataCollatorForLanguageModeling

# Загружаем токенизатор и модель
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
tokenizer.pad_token = tokenizer.eos_token
model = GPT2LMHeadModel.from_pretrained("gpt2")

# Готовим датасет
def create_dataset(path, tokenizer):
  return TextDataset(
      tokenizer = tokenizer,
      file_path = path,
      block_size = 128
  )

with open("train.txt", 'w', encoding = 'utf-8') as f:
  for ex in dataset:
    f.write(f"Ввод {ex['input']}\nОтвет: {ex['output']}\n\n")

train_dataset = create_dataset("train.txt", tokenizer)
data_collator = DataCollatorForLanguageModeling(tokenizer = tokenizer, mlm = False)

# Обучение
training_args = TrainingArguments(
    output_dir = "./code_error_model",
    overwrite_output_dir = True,
    num_train_epochs = 3,
    per_device_train_batch_size = 2,
    save_steps = 10_000,
    save_total_limit = 2,
    logging_steps = 500
)

trainer = Trainer(
    model = model,
    args = training_args,
    data_collator = data_collator,
    train_dataset = train_dataset
)

trainer.train()

Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Step,Training Loss


KeyboardInterrupt: 