# Machine Translation for E-commerce

## Business Context: eBay's Cross-Border Commerce

eBay implemented Neural Machine Translation (NMT) to:
- Enable sellers to reach international customers regardless of language
- Automatically translate product listings across numerous languages
- Improve search relevance across language barriers
- Increase cross-border sales by making listings accessible to global buyers

After implementing NMT, eBay reported a 10.9% increase in exports on translated listings, demonstrating the significant business impact of effective translation technology.

In this notebook, we'll explore how machine translation works and implement a simple version for e-commerce product listings.

## 1. Setup and Installation

First, let's install and import the necessary libraries. We'll use the Hugging Face `transformers` library, which provides access to state-of-the-art pre-trained translation models.

In [None]:
# Install required libraries (if not already installed)
# !pip install transformers sentencepiece pandas matplotlib

# Import libraries
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from transformers import pipeline, MarianMTModel, MarianTokenizer

print("Libraries imported successfully!")

## 2. Loading Translation Models

We'll use the Helsinki-NLP Opus-MT models, which are specialized for machine translation between specific language pairs. For an e-commerce platform like eBay, multiple language pairs would be needed, but we'll focus on English-Spanish translation for this demonstration.

In [None]:
# Set up translation pipelines
# English to Spanish
en_to_es_translator = pipeline('translation_en_to_es', model="Helsinki-NLP/opus-mt-en-es")

# Spanish to English (for back-translation)
es_to_en_translator = pipeline('translation_es_to_en', model="Helsinki-NLP/opus-mt-es-en")

print("Translation models loaded successfully!")

## 3. Sample E-commerce Product Listings

Let's create a dataset of product listings similar to what you might find on an e-commerce platform like eBay.

In [None]:
# Create sample product listings
product_listings = [
    "Red leather wallet - brand new with multiple card slots and coin purse",
    "Wireless Bluetooth headphones with noise cancellation, 20-hour battery life",
    "Vintage mid-century modern coffee table, solid oak, minor scratches",
    "iPhone 12 Pro case, shockproof, black carbon fiber design",
    "Women's running shoes, size 8, breathable mesh, pink/gray color",
    "Professional DSLR camera with 18-55mm lens, includes carrying case and SD card",
    "Antique silver pocket watch, working condition, light patina",
    "Baby stroller with car seat attachment, folds flat for easy storage",
    "Gaming laptop, 16GB RAM, NVIDIA RTX 3070, 1TB SSD, 15.6" display",
    "Handmade ceramic dinnerware set, 4 place settings, dishwasher safe"
]

# Create a DataFrame to organize our data
df = pd.DataFrame({'Original (English)': product_listings})

# Display the product listings
df

## 4. Translating Product Listings

Now, let's translate our product listings from English to Spanish, simulating how eBay would make listings available to Spanish-speaking customers.

In [None]:
# Translate each product listing to Spanish
spanish_translations = []

for listing in product_listings:
    # Get the translation
    translation = en_to_es_translator(listing)
    # Extract the translated text from the result
    translated_text = translation[0]['translation_text']
    spanish_translations.append(translated_text)

# Add translations to our DataFrame
df['Spanish Translation'] = spanish_translations

# Display the original listings and their translations
df[['Original (English)', 'Spanish Translation']]

## 5. Back-Translation: Evaluating Quality

One way to evaluate translation quality is through "back-translation" - translating the Spanish text back to English and comparing it to the original. This helps identify potential meaning loss or errors in translation.

In [None]:
# Translate each Spanish translation back to English
back_translations = []

for spanish_text in spanish_translations:
    # Get the back-translation
    back_translation = es_to_en_translator(spanish_text)
    # Extract the translated text from the result
    back_translated_text = back_translation[0]['translation_text']
    back_translations.append(back_translated_text)

# Add back-translations to our DataFrame
df['Back to English'] = back_translations

# Display all three columns
df

## 6. Analyzing Translation Quality

Let's analyze some specific translation challenges in e-commerce product listings:

In [None]:
# Create a function to analyze the translation of specific terms
def analyze_term_translation(term, context):
    """Analyze how a specific term is translated in context"""
    # Translate the context containing the term
    translation = en_to_es_translator(context)
    translated_text = translation[0]['translation_text']
    
    print(f"Term: '{term}'")
    print(f"Context: '{context}'")
    print(f"Translation: '{translated_text}'")
    print("---")
    
    return translated_text

# Test brand names and product-specific terminology
print("ANALYSIS OF BRAND NAMES AND TECHNICAL TERMS:\n")

# Brand names
analyze_term_translation("Apple", "Apple iPhone 13 with 128GB storage")
analyze_term_translation("Apple", "Fresh apple pie made with organic fruit")

# Technical specifications
analyze_term_translation("16GB RAM", "Laptop with 16GB RAM and fast processor")
analyze_term_translation("4K", "4K Ultra HD Smart TV with built-in streaming")

# Clothing sizes
analyze_term_translation("Size 8", "Women's dress, Size 8, formal black design")

### Testing Special Cases: E-commerce-Specific Phrases

Let's examine how well our translation model handles some common e-commerce phrases:

In [None]:
# Define common e-commerce phrases
ecommerce_phrases = [
    "Free shipping on orders over $50",
    "Buy one, get one 50% off",
    "Available for in-store pickup",
    "30-day money-back guarantee",
    "Limited quantity available",
    "Final sale - no returns or exchanges",
    "Pre-order now for delivery in June"
]

# Translate each phrase and back-translate
results = []

for phrase in ecommerce_phrases:
    # Translate to Spanish
    spanish = en_to_es_translator(phrase)[0]['translation_text']
    # Back-translate to English
    back_to_english = es_to_en_translator(spanish)[0]['translation_text']
    
    results.append({
        'Original': phrase,
        'Spanish': spanish,
        'Back to English': back_to_english
    })

# Display results
pd.DataFrame(results)

## 7. Translation Challenges in E-commerce

Based on our experiments, let's identify specific challenges in e-commerce translation:

### Common Translation Challenges

1. **Brand Names and Product Names**
   - Should remain untranslated (e.g., "Apple iPhone" should not become "Manzana iPhone")
   - Context matters (Apple as a company vs. apple as a fruit)

2. **Technical Specifications**
   - Units and measurements may have different conventions in different countries
   - Technical abbreviations (RAM, HD, SSD) should usually remain in original form

3. **Sizes and Measurements**
   - Clothing sizes vary by country (US Size 8 â‰  European Size 8)
   - Units might need conversion (inches to centimeters, pounds to kilograms)

4. **Promotional Language**
   - Idiomatic expressions ("Buy one, get one free") need cultural adaptation
   - Legal terms for warranties and returns have specific meanings

5. **Product Categories**
   - Category names might not have direct translations
   - Search terms differ across languages and cultures

### How eBay Addresses These Challenges

eBay's custom Neural Machine Translation system likely includes:

1. **Domain-Specific Training**: Models trained specifically on e-commerce data
2. **Entity Recognition**: Special handling for brand names, product names, and measurements
3. **Custom Dictionaries**: Specialized vocabulary for product categories and features
4. **User Feedback Loop**: Learning from user corrections and preferences
5. **Hybrid Approach**: Combining NMT with rule-based systems for special cases

## 8. Building a Simple E-commerce Translation System

Now let's implement a more sophisticated translation function that handles some of the special cases we've identified:

In [None]:
def ecommerce_translate(text, source_lang='en', target_lang='es'):
    """Translate product listings with special handling for e-commerce content"""
    
    # 1. Pre-processing: Identify and protect special elements
    
    # Identify potential brand names (capitalized words)
    import re
    brand_pattern = r'\b[A-Z][a-zA-Z]*\b'
    potential_brands = re.findall(brand_pattern, text)
    
    # Create a list of known brands that shouldn't be translated
    known_brands = ['Apple', 'iPhone', 'Samsung', 'Sony', 'Nike', 'Adidas', 'NVIDIA', 'AMD']
    
    # Add potential brands to our protection list
    do_not_translate = [brand for brand in potential_brands if brand in known_brands]
    
    # Protect technical specifications
    tech_pattern = r'\b\d+(?:GB|TB|MP|K|MHz|GHz)\b'
    tech_specs = re.findall(tech_pattern, text)
    do_not_translate.extend(tech_specs)
    
    # Replace protected terms with placeholders
    protected_text = text
    replacements = {}
    
    for i, term in enumerate(do_not_translate):
        placeholder = f"PROTECTED_{i}"
        replacements[placeholder] = term
        protected_text = protected_text.replace(term, placeholder)
    
    # 2. Translate the modified text
    if source_lang == 'en' and target_lang == 'es':
        translation = en_to_es_translator(protected_text)[0]['translation_text']
    elif source_lang == 'es' and target_lang == 'en':
        translation = es_to_en_translator(protected_text)[0]['translation_text']
    else:
        return "Unsupported language pair"
    
    # 3. Post-processing: Restore protected terms
    final_translation = translation
    for placeholder, original in replacements.items():
        final_translation = final_translation.replace(placeholder, original)
    
    return final_translation

# Test our enhanced translation function
enhanced_translations = [ecommerce_translate(listing) for listing in product_listings]

# Compare standard and enhanced translations
comparison_df = pd.DataFrame({
    'Original': product_listings,
    'Standard Translation': spanish_translations,
    'Enhanced Translation': enhanced_translations
})

comparison_df

## 9. Business Value and Implementation

### Business Benefits of Machine Translation in E-commerce

1. **Market Expansion**
   - Reach customers in new geographical markets without requiring sellers to be multilingual
   - eBay reported a 10.9% increase in exports after implementing NMT

2. **Increased User Engagement**
   - Customers prefer browsing in their native language
   - Higher conversion rates when content is in the user's language

3. **Improved Search Relevance**
   - Translate search queries to match listings in other languages
   - Index translated content for better searchability

4. **Cost Efficiency**
   - Automated translation is more cost-effective than manual translation
   - Scales to millions of listings without linear cost increase

5. **Competitive Advantage**
   - Platforms with better translation gain advantage in global markets
   - Creates network effects between international buyers and sellers

### Implementation Considerations

A full-scale e-commerce translation system would include:

1. **Custom-trained Models**: Fine-tuned on e-commerce data with domain-specific vocabulary
2. **Real-time Translation API**: Fast enough for dynamic content like search results
3. **Translation Memory**: Store previous translations to maintain consistency
4. **User Feedback Loop**: Allow users to suggest better translations
5. **Quality Monitoring**: Regularly evaluate translation quality across languages
6. **Language Detection**: Automatically identify the source language
7. **Content Prioritization**: Focus highest quality translation on most-viewed listings

## 10. Learning Challenge

Now it's your turn to experiment with machine translation for e-commerce!

### Challenges:

1. Try translating these product descriptions and analyze the results:
   - "Wireless noise-cancelling headphones with 24-hour battery life"
   - "Handmade artisanal coffee mug - dishwasher safe"

2. Identify translation challenges with these phrases:
   - "One size fits all"
   - "Limited edition collectible"
   - "Refurbished Apple MacBook Pro"

3. Consider how you would handle these e-commerce translation issues:
   - How would you ensure brand names aren't translated inappropriately?
   - How would you handle clothing sizes across different countries?
   - How would you translate region-specific promotional terms?

Use the cell below to experiment:

In [None]:
# Your translation experiments here
challenge_phrases = [
    "Wireless noise-cancelling headphones with 24-hour battery life",
    "Handmade artisanal coffee mug - dishwasher safe",
    "One size fits all",
    "Limited edition collectible",
    "Refurbished Apple MacBook Pro"
]

# Translate and analyze
for phrase in challenge_phrases:
    print(f"Original: {phrase}")
    
    # Standard translation
    standard = en_to_es_translator(phrase)[0]['translation_text']
    print(f"Standard translation: {standard}")
    
    # Enhanced translation
    enhanced = ecommerce_translate(phrase)
    print(f"Enhanced translation: {enhanced}")
    
    # Back-translation
    back = es_to_en_translator(standard)[0]['translation_text']
    print(f"Back to English: {back}")
    print("\n---\n")

## Conclusion

In this notebook, we've explored machine translation for e-commerce, based on eBay's successful implementation. We've seen how neural machine translation can automatically convert product listings between languages, enabling global commerce across language barriers.

Key takeaways include:

1. Modern neural translation models provide high-quality translations with minimal setup
2. E-commerce translation has specific challenges (brand names, technical terms, sizes)
3. Custom processing can enhance generic translation for domain-specific needs
4. Back-translation provides a simple way to evaluate translation quality
5. The business impact of translation is significant, with eBay reporting a 10.9% increase in exports

By implementing machine translation, e-commerce platforms can dramatically expand their reach, connecting buyers and sellers worldwide regardless of language barriers. This technology exemplifies how AI can remove friction from global commerce, creating economic opportunities and enhancing the customer experience.