># <center> **NLP CAC3 Project**
### **Project Title :** A Document Read-Chat-Note Making Assistant
**A joint initiative by: <br>
Krish Goyal(21112015) and Joan Job(21112037)**

#### **Key Highlights, Features, Steps and Research Gaps to Understand:**
##### **Features:** 
1. The applications supports <u>*.pdf*</u> and <u>*.txt*</u> file formats for document uploads.<br><hr>
2. If not, the user has the liberty to copy-paste the required text into the text-box provided. <br><hr>
3. The application then scans through the text provided, check for grammatical errors and corrects them <br>
    if necessary. (If time permits us to do it.) <br><hr>
4. All keywords present in the data will be highlighted/ underlined 
    (mostly Proper Nouns such as names of places, states, countries, unique personalities etc.)<br><hr>
5. There is also an option for the user to chat with the extension for text summaries, word meanings, synonyms etc. <br><hr>
6. The Notes file generated can be downloaded seperately as .txt or .pdf file for offline use (as proposed).<br><hr>

##### **Basic Architecture Requirements:**  <u></u>
1. PDF to Text File Conversion <br> 
2. Text Preprocessing <br>
3. Wikipedia API/Library or Webscraping for extracting text for Named-Entities.
4. Text Summarizing Model with the Metrics for Comparative Study for both:
    - Named-Entities with and their introductory paragraphs.
    - Also for the text document as a whole. 
    - Summary for Each Subheadings (If Possible.) <br>
5. Name-Entity-Recognition(NER) Model with most apt results.<br>
6. A Chatbot to chat with User and Understand User Inputs to the Above mentioned Functions.<br> 

#### References:
1.
1.
1.
1.
1.
1.
1.
1.

## 1. PDF to Text File Conversion and Cleaning/Formatting(Basic) <br> 

In [None]:
import PyPDF2
import re 
from datetime import datetime

# Function to read PDF files and store the extracted text in a text file
def read_pdf(file_path):
    timestamp = datetime.now().strftime("Y%Y_M%m_D%dT_%H_%M_%S") 
    cleaned_path = re.sub(r'[:.\/\\*\?"<>|]', '_', file_path)
    new_text_file_name =  f"{cleaned_path}_output_{timestamp}.txt"
    with open(file_path, 'rb') as pdf_file:
        pdf_reader = PyPDF2.PdfReader(pdf_file)
        text = ""
        for page_num in range(len(pdf_reader.pages)):
            page = pdf_reader.pages[page_num]
            text += page.extract_text()

        text_to_txt = clean_text(text)
    try:
        with open(new_text_file_name, 'w', encoding='utf-8') as file:
            file.write(text_to_txt)
        print(f"Text has been successfully written to {new_text_file_name}")
        return new_text_file_name
    except Exception as e:
        print(f"Error while writing to the file: {e}")


# Function to read text files and store the text in a variable
def read_text(file_path):
    try:
        with open(file_path, 'r', encoding='utf-8') as txt_file:
            text = txt_file.read()
        return text
    except Exception as e:
        print(f"Error while processing the text file: {e}")
        return None
    

def clean_text(text):
    # Remove line breaks and extra spaces
    text = re.sub(r'\n+', ' ', text)
    text = re.sub(r'\s+', ' ', text)
    

    # Remove unwanted characters
    text = re.sub(r'[^A-Za-z0-9.,?!()\'":;\- ]', '', text)
    
    # Add missing spaces after punctuation marks
    text = re.sub(r'([.,?!();:])', r'\1 ', text)
    
    # Remove extra spaces after punctuation marks
    text = re.sub(r' +([.,?!();:])', r'\1', text)
    
    # Remove spaces before punctuation marks
    text = re.sub(r' ([.,?!();:])', r'\1', text)
    
    # Remove spaces before and after hyphens
    text = re.sub(r' - ', ' -', text)
    text = re.sub(r' -', '-', text)
    text = re.sub(r'- ', '-', text)
    return text

# Function to identify and process the uploaded file
def process_uploaded_file(uploaded_file_path):
    if uploaded_file_path.endswith('.pdf'):
        # It's a PDF, read and store the text in a text file
        file_name = read_pdf(uploaded_file_path)
        with open(file_name, 'r', encoding='utf-8') as txt_file:
            text = txt_file.readlines()
        return text

    elif uploaded_file_path.endswith('.txt'):
        # It's a text file, read the text directly into a variable
        return read_text(uploaded_file_path)
    else:
        print("Unsupported file format. Please upload a PDF or a text file.")
        return None



## 2. Text Preprocessing 

In [None]:
import re
import unicodedata

def remove_special_characters(input_string):
    # Remove special characters and brackets using regex
    cleaned_string = re.sub(r'[^a-zA-Z0-9\s]', '', input_string)

    # Remove accents using Unicode normalization
    cleaned_string = unicodedata.normalize('NFKD', cleaned_string).encode('ASCII', 'ignore').decode('utf-8')

    return cleaned_string

### Pegasus 

In [3]:
from transformers import AutoTokenizer, PegasusForConditionalGeneration

model = PegasusForConditionalGeneration.from_pretrained("google/pegasus-xsum")
tokenizer = AutoTokenizer.from_pretrained("google/pegasus-xsum")

ARTICLE_TO_SUMMARIZE = ('''The biggest disadvantage of deep residual networks, according to some, is the feature reuse problem, in which some feature changes or blocks may contribute relatively little to learning. Wide ResNet was formed to solve this issue. The major learning potential of deep residual networks, according to Zagoruyko and Komodakis, is attributable to the residual units, whereas depth has a supplemental influence. ResNet was made wide rather than deep to take use of the residual blocks' strength''')
inputs = tokenizer(ARTICLE_TO_SUMMARIZE, max_length=1024, return_tensors="pt")

# Generate Summary
summary_ids = model.generate(inputs["input_ids"])
tokenizer.batch_decode(summary_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]

Some weights of PegasusForConditionalGeneration were not initialized from the model checkpoint at google/pegasus-xsum and are newly initialized: ['model.encoder.embed_positions.weight', 'model.decoder.embed_positions.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


'A new deep residual network, called Wide ResNet, has been developed.'

In [5]:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, Trainer, TrainingArguments

# Load the tokenizer and model
model_name = "google/pegasus-xsum"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

# Define your training dataset and training arguments
# Replace `train_dataset` and other placeholders with your actual data and arguments
train_dataset = """New York (CNN)When Liana Barrientos was 23 years old, she got married in Westchester County, New York.
A year later, she got married again in Westchester County, but to a different man and without divorcing her first husband.
Only 18 days after that marriage, she got hitched yet again. Then, Barrientos declared "I do" five more times, sometimes only within two weeks of each other.
In 2010, she married once more, this time in the Bronx. In an application for a marriage license, she stated it was her "first and only" marriage.
Barrientos, now 39, is facing two criminal counts of "offering a false instrument for filing in the first degree," referring to her false statements on the
2010 marriage license application, according to court documents.
Prosecutors said the marriages were part of an immigration scam.
On Friday, she pleaded not guilty at State Supreme Court in the Bronx, according to her attorney, Christopher Wright, who declined to comment further.
After leaving court, Barrientos was arrested and charged with theft of service and criminal trespass for allegedly sneaking into the New York subway through an emergency exit, said Detective
Annette Markowski, a police spokeswoman. In total, Barrientos has been married 10 times, with nine of her marriages occurring between 1999 and 2002.
All occurred either in Westchester County, Long Island, New Jersey or the Bronx. She is believed to still be married to four men, and at one time, she was married to eight men at once, prosecutors say.
Prosecutors said the immigration scam involved some of her husbands, who filed for permanent residence status shortly after the marriages.
Any divorces happened only after such filings were approved. It was unclear whether any of the men will be prosecuted.
The case was referred to the Bronx District Attorney\'s Office by Immigration and Customs Enforcement and the Department of Homeland Security\'s
Investigation Division. Seven of the men are from so-called "red-flagged" countries, including Egypt, Turkey, Georgia, Pakistan and Mali.
Her eighth husband, Rashid Rajput, was deported in 2006 to his native Pakistan after an investigation by the Joint Terrorism Task Force.
If convicted, Barrientos faces up to four years in prison.  Her next court appearance is scheduled for May 18."""
training_args = TrainingArguments(
    output_dir="./output",
    overwrite_output_dir=True,
    num_train_epochs=3,
    per_device_train_batch_size=16,
    save_steps=5000
)

# Create the trainer and fine-tune the model
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset
)

trainer.train()

Some weights of PegasusForConditionalGeneration were not initialized from the model checkpoint at google/pegasus-xsum and are newly initialized: ['model.encoder.embed_positions.weight', 'model.decoder.embed_positions.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


ImportError: Using the `Trainer` with `PyTorch` requires `accelerate>=0.20.1`: Please run `pip install transformers[torch]` or `pip install accelerate -U`

In [4]:
# from transformers import PegasusForConditionalGeneration, PegasusTokenizer
# import torch
# src_text = [
#     """ PG&E stated it scheduled the blackouts in response to forecasts for high winds amid dry conditions. The aim is to reduce the risk of wildfires. Nearly 800 thousand customers were scheduled to be affected by the shutoffs which were expected to last through at least midday tomorrow."""
# ]

# model_name = 'google/pegasus-xsum'
# torch_device = 'cuda' if torch.cuda.is_available() else 'cpu'
# tokenizer = PegasusTokenizer.from_pretrained(model_name)
# model = PegasusForConditionalGeneration.from_pretrained(model_name).to(torch_device)
# batch = tokenizer.prepare_seq2seq_batch(src_text, truncation=True, padding='longest').to(torch_device)
# translated = model.generate(**batch)
# tgt_text = tokenizer.batch_decode(translated, skip_special_tokens=True)
# assert tgt_text[0] == "California's largest electricity provider has turned off power to hundreds of thousands of customers."

Some weights of PegasusForConditionalGeneration were not initialized from the model checkpoint at google/pegasus-xsum and are newly initialized: ['model.encoder.embed_positions.weight', 'model.decoder.embed_positions.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
`prepare_seq2seq_batch` is deprecated and will be removed in version 5 of HuggingFace Transformers. Use the regular
`__call__` method to prepare your inputs and targets.

Here is a short example:

model_inputs = tokenizer(src_texts, text_target=tgt_texts, ...)

If you either need to use different keyword arguments for the source and target texts, you should do two calls like
this:

model_inputs = tokenizer(src_texts, ...)
labels = tokenizer(text_target=tgt_texts, ...)
model_inputs["labels"] = labels["input_ids"]

See the documentation of your specific tokenizer for more details on the specific arguments to the tokenizer of choice.
For a more complete exam

AttributeError: 'list' object has no attribute 'to'

### Facebook - BART

In [2]:
from transformers import pipeline

summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

ARTICLE = """ New York (CNN)When Liana Barrientos was 23 years old, she got married in Westchester County, New York.
A year later, she got married again in Westchester County, but to a different man and without divorcing her first husband.
Only 18 days after that marriage, she got hitched yet again. Then, Barrientos declared "I do" five more times, sometimes only within two weeks of each other.
In 2010, she married once more, this time in the Bronx. In an application for a marriage license, she stated it was her "first and only" marriage.
Barrientos, now 39, is facing two criminal counts of "offering a false instrument for filing in the first degree," referring to her false statements on the
2010 marriage license application, according to court documents.
Prosecutors said the marriages were part of an immigration scam.
On Friday, she pleaded not guilty at State Supreme Court in the Bronx, according to her attorney, Christopher Wright, who declined to comment further.
After leaving court, Barrientos was arrested and charged with theft of service and criminal trespass for allegedly sneaking into the New York subway through an emergency exit, said Detective
Annette Markowski, a police spokeswoman. In total, Barrientos has been married 10 times, with nine of her marriages occurring between 1999 and 2002.
All occurred either in Westchester County, Long Island, New Jersey or the Bronx. She is believed to still be married to four men, and at one time, she was married to eight men at once, prosecutors say.
Prosecutors said the immigration scam involved some of her husbands, who filed for permanent residence status shortly after the marriages.
Any divorces happened only after such filings were approved. It was unclear whether any of the men will be prosecuted.
The case was referred to the Bronx District Attorney\'s Office by Immigration and Customs Enforcement and the Department of Homeland Security\'s
Investigation Division. Seven of the men are from so-called "red-flagged" countries, including Egypt, Turkey, Georgia, Pakistan and Mali.
Her eighth husband, Rashid Rajput, was deported in 2006 to his native Pakistan after an investigation by the Joint Terrorism Task Force.
If convicted, Barrientos faces up to four years in prison.  Her next court appearance is scheduled for May 18.
"""
print(summarizer(ARTICLE, max_length=130, min_length=30, do_sample=False))

Downloading pytorch_model.bin:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

[{'summary_text': 'Liana Barrientos, 39, is charged with two counts of "offering a false instrument for filing in the first degree" In total, she has been married 10 times, with nine of her marriages occurring between 1999 and 2002. She is believed to still be married to four men.'}]


### 

In [5]:
from transformers import pipeline
summarizer = pipeline("summarization", model="philschmid/bart-large-cnn-samsum")

conversation = '''Jeff: Can I train a 🤗 Transformers model on Amazon SageMaker? 
Philipp: Sure you can use the new Hugging Face Deep Learning Container. 
Jeff: ok.
Jeff: and how can I get started? 
Jeff: where can I find documentation? 
Philipp: ok, ok you can find everything here. https://huggingface.co/blog/the-partnership-amazon-sagemaker-and-hugging-face                                           
'''
summarizer(conversation)


Downloading (…)lve/main/config.json:   0%|          | 0.00/1.63k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/300 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/798k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

[{'summary_text': "Jeff wants to train a Transformers model on Amazon SageMaker. He can use the new Hugging Face Deep Learning Container. Jeff can find the documentation on Huggingface's blog.    .   The blog is available at: https://huggingface.co/blog/the-partnership-amazon-sagemaker-and-hugling-face."}]

### 

In [6]:
import torch
from transformers import pipeline

hf_name = 'pszemraj/led-large-book-summary'

summarizer = pipeline(
    "summarization",
    hf_name,
    device=0 if torch.cuda.is_available() else -1,
)
wall_of_text = "your words here"

result = summarizer(
    wall_of_text,
    min_length=16,
    max_length=256,
    no_repeat_ngram_size=3,
    encoder_no_repeat_ngram_size=3,
    repetition_penalty=3.5,
    num_beams=4,
    early_stopping=True,
)

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.44k [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/1.84G [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/1.32k [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/798k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/2.11M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/772 [00:00<?, ?B/s]

Your max_length is set to 256, but your input_length is only 5. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=2)


# T5F 

In [3]:
%pip install torch
%pip install sentencepiece






Note: you may need to restart the kernel to use updated packages.




_https://huggingface.co/sarakolding/daT5-summariser_

In [4]:
# from transformers import AutoTokenizer, AutoModelForTokenClassification
# from transformers import pipeline

# # Load pre-trained T5 model and tokenizer:
# model_name = "t5-large"
# model = T5ForConditionalGeneration.from_pretrained(model_name)
# tokenizer = T5Tokenizer.from_pretrained(model_name)

# for intro_text in df["Intro_text"]:
#   inputs = tokenizer.encode("summarize: " + text_to_summarize, return_tensors="pt", max_length=1024, truncation=True)
#   summary_ids = model.generate(inputs, max_length=120, min_length=150, length_penalty=2.0, num_beams=4, early_stopping=True)

#   # Decode and print the summary
#   summary_tf_small = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
#   df["Summary_tf-small"] = summary

NameError: name 'T5ForConditionalGeneration' is not defined

https://keras.io/examples/nlp/t5_hf_summarization/

In [1]:
# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("summarization", model="sarakolding/daT5-summariser")

Downloading (…)lve/main/config.json:   0%|          | 0.00/892 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


Downloading model.safetensors:   0%|          | 0.00/977M [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/419 [00:00<?, ?B/s]

Downloading spiece.model:   0%|          | 0.00/767k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/2.21M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/65.0 [00:00<?, ?B/s]

_https://huggingface.co/aszfcxcgszdx/article-summarizer-t5-large?text=I+love+AutoTrain+%F0%9F%A4%97_   --- See this --- (best model in gaining all the validation scores)

In [11]:
%pip install torch torchvision torchaudio -f https://download.pytorch.org/whl/cu111/torch_stable.html


Looking in links: https://download.pytorch.org/whl/cu111/torch_stable.html




In [12]:
import torch
print(torch.__version__)
print(torch.version.cuda)
print(torch.backends.cudnn.version())


2.0.1+cpu
None
None


In [20]:
# Step 1: Check Pytorch (optional)
import torch
print("Cuda available: ", torch.cuda.is_available())
print("Device name:", torch.cuda.get_device_name())
# Step 2: Check Tensorflow
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
# Step 3: Check Keras (optional)
from keras import backend as K
print(K.tensorflow_backend._get_available_gpus())

Cuda available:  False


AssertionError: Torch not compiled with CUDA enabled

In [27]:
import torch

print(torch.__version__)
my_tensor = torch.tensor([[1, 2, 3], [4, 5, 6]], dtype=torch.float32, device="cpu")
print(my_tensor)
torch.cuda.is_available()

2.0.1+cpu
tensor([[1., 2., 3.],
        [4., 5., 6.]])


False

In [None]:
text = """New York (CNN)When Liana Barrientos was 23 years old, she got married in Westchester County, New York.
A year later, she got married again in Westchester County, but to a different man and without divorcing her first husband.
Only 18 days after that marriage, she got hitched yet again. Then, Barrientos declared "I do" five more times, sometimes only within two weeks of each other.
In 2010, she married once more, this time in the Bronx. In an application for a marriage license, she stated it was her "first and only" marriage.
Barrientos, now 39, is facing two criminal counts of "offering a false instrument for filing in the first degree," referring to her false statements on the
2010 marriage license application, according to court documents.
Prosecutors said the marriages were part of an immigration scam.
On Friday, she pleaded not guilty at State Supreme Court in the Bronx, according to her attorney, Christopher Wright, who declined to comment further.
After leaving court, Barrientos was arrested and charged with theft of service and criminal trespass for allegedly sneaking into the New York subway through an emergency exit, said Detective
Annette Markowski, a police spokeswoman. In total, Barrientos has been married 10 times, with nine of her marriages occurring between 1999 and 2002.
All occurred either in Westchester County, Long Island, New Jersey or the Bronx. She is believed to still be married to four men, and at one time, she was married to eight men at once, prosecutors say.
Prosecutors said the immigration scam involved some of her husbands, who filed for permanent residence status shortly after the marriages.
Any divorces happened only after such filings were approved. It was unclear whether any of the men will be prosecuted.
The case was referred to the Bronx District Attorney\'s Office by Immigration and Customs Enforcement and the Department of Homeland Security\'s
Investigation Division. Seven of the men are from so-called "red-flagged" countries, including Egypt, Turkey, Georgia, Pakistan and Mali.
Her eighth husband, Rashid Rajput, was deported in 2006 to his native Pakistan after an investigation by the Joint Terrorism Task Force.
If convicted, Barrientos faces up to four years in prison.  Her next court appearance is scheduled for May 18."""
paraphrase(text)

In [15]:
import torch
print(torch.cuda.is_available())


False


In [14]:
# Example code to move a tensor to the GPU
tensor_on_cpu = torch.randn(3, 3)
tensor_on_gpu = tensor_on_cpu.to('cuda')

AssertionError: Torch not compiled with CUDA enabled

In [2]:
from validate_email import validate_email
is_valid = validate_email(email_address='vt@alliswell.in', check_regex=True, check_mx=True, from_address='cireta7980@othao.com', helo_host='my.host.name', smtp_timeout=10, dns_timeout=10, use_blacklist=True, debug=False)

AttributeError: module 'httpcore' has no attribute 'NetworkBackend'

In [2]:
%load_ext_cudf.pandas

UsageError: Line magic function `%load_ext_cudf.pandas` not found.


In [13]:
text_input = """New York (CNN)When Liana Barrientos was 23 years old, she got married in Westchester County, New York.
A year later, she got married again in Westchester County, but to a different man and without divorcing her first husband.
Only 18 days after that marriage, she got hitched yet again. Then, Barrientos declared "I do" five more times, sometimes only within two weeks of each other.
In 2010, she married once more, this time in the Bronx. In an application for a marriage license, she stated it was her "first and only" marriage.
Barrientos, now 39, is facing two criminal counts of "offering a false instrument for filing in the first degree," referring to her false statements on the
2010 marriage license application, according to court documents.
Prosecutors said the marriages were part of an immigration scam.
On Friday, she pleaded not guilty at State Supreme Court in the Bronx, according to her attorney, Christopher Wright, who declined to comment further.
After leaving court, Barrientos was arrested and charged with theft of service and criminal trespass for allegedly sneaking into the New York subway through an emergency exit, said Detective
Annette Markowski, a police spokeswoman. In total, Barrientos has been married 10 times, with nine of her marriages occurring between 1999 and 2002.
All occurred either in Westchester County, Long Island, New Jersey or the Bronx. She is believed to still be married to four men, and at one time, she was married to eight men at once, prosecutors say.
Prosecutors said the immigration scam involved some of her husbands, who filed for permanent residence status shortly after the marriages.
Any divorces happened only after such filings were approved. It was unclear whether any of the men will be prosecuted.
The case was referred to the Bronx District Attorney\'s Office by Immigration and Customs Enforcement and the Department of Homeland Security\'s
Investigation Division. Seven of the men are from so-called "red-flagged" countries, including Egypt, Turkey, Georgia, Pakistan and Mali.
Her eighth husband, Rashid Rajput, was deported in 2006 to his native Pakistan after an investigation by the Joint Terrorism Task Force.
If convicted, Barrientos faces up to four years in prison.  Her next court appearance is scheduled for May 18"""

In [14]:
"""
At the command line, only need to run once to install the package via pip:

$ pip install google-generativeai
"""

import google.generativeai as genai

genai.configure(api_key="AIzaSyBeKXOpP1-_Uuxl8BseTdR19uvlAnIbGlo")

defaults = {
  'model': 'models/text-bison-001',
  'temperature': 0.6,
  'candidate_count': 1,
  'top_k': 40,
  'top_p': 0.95,
  'max_output_tokens': 1024,
  'stop_sequences': [],
  'safety_settings': [{"category":"HARM_CATEGORY_DEROGATORY","threshold":1},{"category":"HARM_CATEGORY_TOXICITY","threshold":1},{"category":"HARM_CATEGORY_VIOLENCE","threshold":2},{"category":"HARM_CATEGORY_SEXUAL","threshold":2},{"category":"HARM_CATEGORY_MEDICAL","threshold":2},{"category":"HARM_CATEGORY_DANGEROUS","threshold":2}],
}

prompt = f"""Summarize this paragraph and detail some relevant context.

Text: "{text_input}"""

response = genai.generate_text(
  **defaults,
  prompt=prompt
)

print(response.result)

. **Summary** Liana Barrientos, 39, of New York, is facing two criminal counts of "offering a false instrument for filing in the first degree" for allegedly lying on a 2010 marriage license application.
Prosecutors allege that Barrientos married 10 times between 1999 and 2002, sometimes to multiple men at once, in an immigration scam.
She is believed to still be married to four men, and at one time, she was married to eight men at once.
Seven of the men are from so-called "red-flagged" countries, including Egypt, Turkey, Georgia, Pakistan and Mali.
Barrientos' next court appearance is scheduled for May 18.


In [7]:
def generate_summary(request):
    text_input = request.POST.get('text_input', '')

    defaults = {
        'model': 'models/text-bison-001',
        'temperature': 0.6,
        'candidate_count': 1,
        'top_k': 40,
        'top_p': 0.95,
        'max_output_tokens': 1024,
        'stop_sequences': [],
        'safety_settings': [
            {"category": "HARM_CATEGORY_DEROGATORY", "threshold": 1},
            {"category": "HARM_CATEGORY_TOXICITY", "threshold": 1},
            {"category": "HARM_CATEGORY_VIOLENCE", "threshold": 2},
            {"category": "HARM_CATEGORY_SEXUAL", "threshold": 2},
            {"category": "HARM_CATEGORY_MEDICAL", "threshold": 2},
            {"category": "HARM_CATEGORY_DANGEROUS", "threshold": 2},
        ],
    }

    prompt = f"""Summarize this paragraph and detail some relevant context.
      Text: {text_input}"""

    # Define response after the genai.generate_text call
    response = genai.generate_text(
        **defaults,
        prompt=prompt
    )

    summary = response.result

    content = {'summary': summary}
    return render(request, 'summary.html', content)