<a href="https://colab.research.google.com/github/ThatCodeCodingGuy/Text-Translation-and-Summarization-App-from-Turkish-to-English/blob/main/app.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Installing Necessary Packages** 

In [None]:
!pip install gradio
!pip install transformers sentencepiece

Collecting gradio
  Downloading gradio-2.7.5.2-py3-none-any.whl (871 kB)
[?25l[K     |▍                               | 10 kB 18.7 MB/s eta 0:00:01[K     |▊                               | 20 kB 10.8 MB/s eta 0:00:01[K     |█▏                              | 30 kB 9.2 MB/s eta 0:00:01[K     |█▌                              | 40 kB 8.3 MB/s eta 0:00:01[K     |█▉                              | 51 kB 5.5 MB/s eta 0:00:01[K     |██▎                             | 61 kB 5.5 MB/s eta 0:00:01[K     |██▋                             | 71 kB 5.6 MB/s eta 0:00:01[K     |███                             | 81 kB 6.3 MB/s eta 0:00:01[K     |███▍                            | 92 kB 6.2 MB/s eta 0:00:01[K     |███▊                            | 102 kB 5.3 MB/s eta 0:00:01[K     |████▏                           | 112 kB 5.3 MB/s eta 0:00:01[K     |████▌                           | 122 kB 5.3 MB/s eta 0:00:01[K     |████▉                           | 133 kB 5.3 MB/s eta 0:00:01[K 

# **Importing Necessary Modules**

In [None]:
import gradio as gr
import warnings
import re

from gradio.mix import Series
from transformers import pipeline, MarianMTModel, MarianTokenizer

warnings.filterwarnings("ignore") # To ignore the warnings 

# **Cleaning Raw Data**

In [None]:
def clean_text(text):
  ''' Defining a function for data cleaning '''
    # To remove non-ascii Turkish characters
    text = text.encode("ascii", errors="ignore").decode("ascii") 
    # if there is a newline, make it a whitespace
    text = re.sub("\n", " ", text) 
    # if there's more than 1 whitespace, make it just 1
    text = re.sub("\s+", " ", text) 
    # if there is a tab, make it just a whitespace
    text = re.sub(r"\t", " ", text)
    # removing spaces at the beginning and end of the strings
    text = text.strip(" ") 
    return text   

# **Making our first part of the Gradio Series (Defining our translation model)**

In [None]:
model_name = ("Helsinki-NLP/opus-mt-tr-en")  

# Our first function in our Gradio "Series"
def tr_to_eng(text):
    input_text = clean_text(text)

    model = MarianMTModel.from_pretrained(model_name)

    tokenizer = MarianTokenizer.from_pretrained(model_name)

    batch = tokenizer.prepare_seq2seq_batch(input_text, return_tensors="pt")

    output = model.generate(**batch)

    translated = tokenizer.batch_decode(output, skip_special_tokens=True)

    return translated[0]


# The first app to be put in "Series"
translation = gr.Interface(fn=tr_to_eng, inputs=gr.inputs.Textbox(), outputs=gr.outputs.Textbox())

# **Making our second part of Gradio Series (Defining our summarization model)**

In [None]:
pipeline_sum = pipeline(
    "summarization",
    model="facebook/bart-large-cnn",  
    tokenizer="facebook/bart-large-cnn", 
    framework="pt",
)

# The second function in our Gradio "Series"
def fb_summarizer(text):
    input_text = clean_text(text)
    results = pipeline_sum(input_text)
    return results[0]["summary_text"]


# The second app to be put in "Series"
summary = gr.Interface(fn=fb_summarizer, inputs=gr.inputs.Textbox(), outputs=gr.outputs.Textbox())

Downloading:   0%|          | 0.00/1.55k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.51G [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/878k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.29M [00:00<?, ?B/s]

In [None]:
Series(
    translation,
    summary,
    title="Translate And Summarize Turkish Text Into English",
    inputs=gr.inputs.Textbox(lines=20, label="Paste some Turkish text here"),
    outputs=gr.outputs.Textbox(label="English Summary"),
).launch()

Colab notebook detected. To show errors in colab notebook, set `debug=True` in `launch()`
Running on public URL: https://26862.gradio.app

This share link expires in 72 hours. For free permanent hosting, check out Spaces (https://huggingface.co/spaces)


(<fastapi.applications.FastAPI at 0x7fd0562b5fd0>,
 'http://127.0.0.1:7860/',
 'https://26862.gradio.app')