1. Library Installation and Imports
First, install and import the necessary libraries:

In [None]:
# Install required libraries
!pip install googletrans==4.0.0-rc1 navertrans nltk

import pandas as pd
from googletrans import Translator
from navertrans import navertrans
import nltk
from nltk.tokenize import sent_tokenize

# Download the Punkt tokenizer for sentence splitting
nltk.download('punkt')


2. Translation Function
This function translates text based on the detected language:

If the language is Chinese (Traditional or Simplified), it uses Naver's translator to translate from Chinese to English.
If the language is Japanese, it translates from Japanese to English using Naver.
For any other language, it first detects the language with Google Translate and then uses Naver to translate to English.

In [None]:
def translate_text(text, source_lang=None):
    # Initialize Google Translate for language detection
    translator = Translator(service_urls=['translate.google.com', 'translate.google.co.kr'])
    
    # Detect language if not provided
    detected_lang = translator.detect(text).lang if not source_lang else source_lang
    
    # Translate based on detected language
    if detected_lang == 'zh-cn' or detected_lang == 'zh-tw':
        # Translate Chinese to English
        return navertrans.translate(text, src_lan="zh-CN", tar_lan="en")
    elif detected_lang == 'ja':
        # Translate Japanese to English
        return navertrans.translate(text, src_lan="ja", tar_lan="en")
    else:
        # For other languages, use detected language with Naver to translate to English
        return navertrans.translate(text, src_lan=detected_lang, tar_lan="en")


3. Spell-Check Function
For spell-checking English text, the TextBlob library is used:

In [None]:
from textblob import TextBlob

def spell_check(text):
    # Perform spell-check on English text
    corrected_text = str(TextBlob(text).correct())
    return corrected_text


4. Sentence Splitting Function
This function splits text into individual sentences and stores them in a new DataFrame. It takes the original DataFrame (sentence_df) and the column containing the text to be split (target_column).

In [None]:
def split_sentence(sentence_df, target_column):
    # DataFrame to store split sentences
    splited_sentence_df = pd.DataFrame()
    
    for row_idx, row in sentence_df.iterrows():
        review = row[target_column]
        # Skip rows if text is NaN or not a string
        if pd.isna(review) or not isinstance(review, str):
            continue
            
        # Split text into sentences
        sentences = sent_tokenize(review)
        for sent in sentences:
            # Copy original row and add split sentence
            row_copy = pd.DataFrame(row.copy()).T
            row_copy['Separated' + target_column] = sent
            splited_sentence_df = pd.concat([splited_sentence_df, row_copy], ignore_index=True)
    
    return splited_sentence_df


5. Complete Code Workflow Example
To integrate everything, here’s how to process a DataFrame using the functions defined above:

In [None]:
# Sample data and DataFrame
data = {
    'Review': ['This sentence is written in Korean.', 'これは日本語で書かれています。', '这是一段中文文本。']
}
df = pd.DataFrame(data)

# Apply translation and spell-check
df['Translated_Review'] = df['Review'].apply(translate_text)
df['SpellChecked_Review'] = df['Translated_Review'].apply(spell_check)

# Apply sentence splitting
splitted_df = split_sentence(df, 'SpellChecked_Review')


Explanation of the Workflow:
Translation: translate_text translates the text in the Review column based on the detected language.
Spell-Check: spell_check corrects any English spelling errors in the translated text.
Sentence Splitting: split_sentence divides the spell-checked English text into individual sentences and stores them in splitted_df.
This structure allows for efficient, step-by-step processing of multilingual text data, making it easier to handle translations, corrections, and sentence-level analysis.