# **GPT-NEO MODEL**

In [1]:
!pip install transformers torch textstat language-tool-python

Collecting textstat
  Downloading textstat-0.7.4-py3-none-any.whl.metadata (14 kB)
Collecting language-tool-python
  Downloading language_tool_python-2.8.1-py3-none-any.whl.metadata (12 kB)
Collecting pyphen (from textstat)
  Downloading pyphen-0.17.0-py3-none-any.whl.metadata (3.2 kB)
Downloading textstat-0.7.4-py3-none-any.whl (105 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m105.1/105.1 kB[0m [31m3.7 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading language_tool_python-2.8.1-py3-none-any.whl (35 kB)
Downloading pyphen-0.17.0-py3-none-any.whl (2.1 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.1/2.1 MB[0m [31m27.2 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pyphen, textstat, language-tool-python
Successfully installed language-tool-python-2.8.1 pyphen-0.17.0 textstat-0.7.4


In [2]:
from transformers import GPTNeoForCausalLM, AutoTokenizer
import language_tool_python
import textstat
import torch

# Initialize the model and tokenizer
model_name = "EleutherAI/gpt-neo-1.3B"
model = GPTNeoForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Initialize LanguageTool for grammar checking
tool = language_tool_python.LanguageTool('en-US')

# Function to generate articles with improved sampling techniques
def generate_article(prompt, max_length=500, temperature=0.7, top_p=0.9, repetition_penalty=1.2):
    inputs = tokenizer(prompt, return_tensors="pt")
    attention_mask = torch.ones_like(inputs['input_ids'])  # Set attention mask
    outputs = model.generate(inputs['input_ids'],
                             attention_mask=attention_mask,
                             max_length=max_length,
                             do_sample=True,
                             temperature=temperature,  # Controls creativity
                             top_p=top_p,  # Top-p sampling for diverse text
                             repetition_penalty=repetition_penalty,  # Penalize repetition
                             num_return_sequences=1)
    article = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return article

# Performance Metric 1: Grammar Check
def grammar_check(article):
    matches = tool.check(article)
    errors = len(matches)
    return errors

# Performance Metric 2: Readability Score
def readability_score(article):
    score = textstat.flesch_reading_ease(article)
    return score

# Performance Metric 3: Word Count
def word_count(article):
    words = article.split()
    return len(words)

# Performance Metric 4: Perplexity
def calculate_perplexity(article):
    inputs = tokenizer(article, return_tensors="pt")
    outputs = model(**inputs, labels=inputs["input_ids"])
    loss = outputs.loss.item()
    perplexity = torch.exp(torch.tensor(loss))
    return perplexity.item()

# Evaluate all metrics
def evaluate_article(article):
    errors = grammar_check(article)
    readability = readability_score(article)
    word_count_value = word_count(article)
    perplexity_value = calculate_perplexity(article)

    print(f"Performance Metrics:\n")
    print(f"Grammar Errors: {errors}")
    print(f"Readability Score: {readability}")
    print(f"Word Count: {word_count_value}")
    print(f"Perplexity: {perplexity_value}")

# Example usage
prompt = "How does climate change affect agriculture in various regions around the world?"
article = generate_article(prompt)

print("Generated Article:\n")
print(article)

# Evaluate the article's performance
evaluate_article(article)

# Save the generated article to a file
with open("generated_article.txt", "w") as file:
    file.write(article)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/1.35k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/5.31G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/200 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/798k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/90.0 [00:00<?, ?B/s]

Downloading LanguageTool 6.4: 100%|██████████| 246M/246M [00:04<00:00, 51.6MB/s]
INFO:language_tool_python.download_lt:Unzipping /tmp/tmp40q1tpaq.zip to /root/.cache/language_tool_python.
INFO:language_tool_python.download_lt:Downloaded https://www.languagetool.org/download/LanguageTool-6.4.zip to /root/.cache/language_tool_python.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


Generated Article:

How does climate change affect agriculture in various regions around the world?

Climate change is a global phenomenon that is affecting agriculture, but there are many different regions around the world that are facing different kinds of challenges. The regions affected by climate change are:

Europe

The European Union is facing the most severe challenges due to climate change. The EU is a large region with many different countries. This includes countries like Italy, France, Greece, Ireland, Portugal, and Spain.

The EU is facing the most severe challenges due to climate change. The EU is a large region with many different countries. This includes countries like Italy, France, Greece, Ireland, Portugal, and Spain. Asia

Asia is facing the most severe challenges due to climate change. The region is also home to many different countries. The most affected region is the Asia-Pacific region. It includes countries like Australia, New Zealand, and China.

The region is