# Spanglish

*You choose and we translate to help*

*by* **Jesús Montesinos** *for* **Solo Hacks 1.0**

## Motivation to create the project

When learning a new language, such as *English*, it is desirable to begin by following preferred lectures, not solely the most popular ones. However, discovering those preferred lectures can prove challenging.

You just need to feed the pdf of your favorite lecture, in spanish, and this app will translate some pages of your lecture and will give you an audio of that to practice reading and listening, also you can practice the speaking part comparing your pronunciation!


## ***Let's Start!!!***

# 1. Greetings, importing and installing the libraries

*Greetings*

In [None]:
print("Hello Hack!")

Hello Hack!


**Importing the libraries**

In [None]:
# To read the PDF file
! pip install pypdf
from pypdf import PdfReader
#---

# To generate the mp3 file
! pip install gTTS
from gtts import gTTS
#---

# To use the model from Hugging Face
! pip install transformers sentencepiece
from transformers import MarianMTModel, MarianTokenizer
#--

# To create the PDF with the text
! pip install reportlab
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter
from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer, PageBreak
from reportlab.lib.styles import getSampleStyleSheet, ParagraphStyle

# To download the files
from google.colab import files
#--

Collecting pypdf
  Downloading pypdf-3.17.1-py3-none-any.whl (277 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m277.6/277.6 kB[0m [31m4.3 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pypdf
Successfully installed pypdf-3.17.1
Collecting gTTS
  Downloading gTTS-2.4.0-py3-none-any.whl (29 kB)
Installing collected packages: gTTS
Successfully installed gTTS-2.4.0
Collecting sentencepiece
  Downloading sentencepiece-0.1.99-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m15.5 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: sentencepiece
Successfully installed sentencepiece-0.1.99
Collecting reportlab
  Downloading reportlab-4.0.7-py3-none-any.whl (1.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.9/1.9 MB[0m [31m20.6 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: reportlab
Successfully install

# 2. Getting the path, pages, and getting the text

**Importing the pages from the PDF**

In [None]:
path="/content/drive/MyDrive/Colab Notebooks/Projects/Translation/miCuento.pdf"
myPage=2

# Reading the PDF file
with open(path,"rb") as pdf_file:
  reader=PdfReader(pdf_file)

  # Page
  page = reader.pages[myPage-1]

  # Extract the text content from the page
  myText = page.extract_text()

print(f"Datatype: {type(myText)}, its lenght {len(myText)}")

Datatype: <class 'str'>, its lenght 1005


In [None]:
print("Hello")

Hello


# 3. Getting the model

In [None]:
# Name of the model
model_name = "Helsinki-NLP/opus-mt-es-en"

# Declaring the model and tokenizer
model = MarianMTModel.from_pretrained(model_name)
tokenizer = MarianTokenizer.from_pretrained(model_name)

config.json:   0%|          | 0.00/1.44k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/312M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/44.0 [00:00<?, ?B/s]

source.spm:   0%|          | 0.00/826k [00:00<?, ?B/s]

target.spm:   0%|          | 0.00/802k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.59M [00:00<?, ?B/s]



# 4. Translating from *Spanish* to *English*

In [None]:
if len(myText)>1024:
  print("Textlonger than the maximum sequence length of the model")
else:
  print("Your text is nice!")
  # Tokenizen the text
  inputs = tokenizer(myText, return_tensors="pt")

  # Translating
  outputs = model.generate(**inputs)

  # Decoding the translation
  translatedText = tokenizer.decode(outputs[0], skip_special_tokens=True)

  print("Text from Spanish to English ready")

Your text is nice!
Text from Spanish to English ready


# 5. Generating the audio (`mp3`) file

In [None]:
# Declaring the text to convert into MP3 file
tts = gTTS(translatedText)

# Downloading the file
nameAudioEnglish="English Audio.mp3"
audioFile=tts.save(nameAudioEnglish)
audioPath = nameAudioEnglish
files.download(audioPath)

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

# 6. Creating the PDF File with the text in *Spanish* and *English*

*Texto ajustado*

In [None]:
namePDF="Spanglish text.pdf"
doc=SimpleDocTemplate(namePDF, pagesize=letter)
styles= getSampleStyleSheet()

# Style for the title
title_style = ParagraphStyle(
    "TitleStyle",
    parent=styles["Title"],
    fontSize=20,
    spaceAfter=12,
    textColor="black",  # Customize the text color
)

# Elements of the document
story = []

# General title
title_paragraph = Paragraph("Spanglish Document", title_style)
story.append(title_paragraph)
story.append(Spacer(1, 12)) # Space title-text

paragraph_style = styles["BodyText"]
intro="by Chucho Montesinos for Solo Hacks 1.0"
paragraph1 = Paragraph(intro, paragraph_style)
story.append(paragraph1)

story.append(PageBreak()) # Page break

# Add a title to the Spanish section with the custom style
title_paragraph = Paragraph("Español (Spanish)", title_style)
story.append(title_paragraph)
story.append(Spacer(1, 12)) # Space title-text

# Content in spanish
paragraph_style = styles["BodyText"]
page1=myText
paragraph1 = Paragraph(page1, paragraph_style)
story.append(paragraph1)

story.append(PageBreak()) # Page break

# Add a title to the English section with the custom style
title_paragraph = Paragraph("English (Inglés)", title_style)
story.append(title_paragraph)
story.append(Spacer(1, 12)) # Space title-text

# Content in English
page2=translatedText
paragraph2 = Paragraph(page2, paragraph_style)
story.append(paragraph2)

# Building the document
doc.build(story)

# Downloading the file
files.download(namePDF)

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

# 7. Done?

In [None]:
print("Yes")