# Translation Performance Visualization

In [1]:
import pandas as pd
import plotly.express as px
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
from nltk.translate.bleu_score import sentence_bleu, SmoothingFunction

# Cargar los datos
df = pd.read_csv('Data/Performance Data/translation_performance_results.csv')  # Asegúrate de ajustar la ruta

# Verificar los primeros registros para entender la estructura de datos
print(df.head())

  Source Language Source ISO Code Target Language Target ISO Code Text Type  \
0         English              en         Spanish              es      word   
1         English              en         Spanish              es      word   
2         English              en         Spanish              es      word   
3         English              en         Spanish              es      word   
4         English              en         Spanish              es      word   

    Text  Trial HF Translated Text  HF Translation Time (s)  \
0  Hello      1              Hola.                 0.206480   
1  Hello      2              Hola.                 0.167595   
2  Hello      3              Hola.                 0.161085   
3  Hello      4              Hola.                 0.160584   
4  Hello      5              Hola.                 0.162623   

  Gemini Translated Text  Gemini Translation Time (s)  
0                   Hola                     1.190209  
1                   Hola          

# Comparison of Translation Times by Model

This bar chart compares the translation times between two different translation models, "HF" and "Gemini." The data is melted to align both models on a single axis, showing the translation time (in seconds) for each model. The bars are grouped by model, allowing for a clear comparison of performance in terms of speed.

- **HF Model**: Represents the translation times taken by the HF translation system.
- **Gemini Model**: Represents the translation times taken by the Gemini translation system.

This visualization helps to quickly assess which model performs translations faster under the given conditions.

In [None]:
# Crear un gráfico de barras para comparar los tiempos de traducción de HF y Gemini
df_melted = df.melt(id_vars=['Text Type', 'Source Language', 'Target Language'], 
                    value_vars=['HF Translation Time (s)', 'Gemini Translation Time (s)'],
                    var_name='Model', value_name='Translation Time (s)')

fig = px.bar(df_melted, x='Model', y='Translation Time (s)', color='Model', barmode='group',
             title='Comparison of Translation Times by Model')

fig.write_html("../docs/Resources/comparison_translation_times.html")

fig.show()

# Translation Times by Target Language and Model

This bar chart illustrates the translation times (in seconds) for two different translation models, "HF" and "Gemini," across various target languages. The chart groups the bars by target language and distinguishes between the models using color.

- **Target Language**: The language into which the text is being translated.
- **HF Model**: Represents the translation times for the HF translation system.
- **Gemini Model**: Represents the translation times for the Gemini translation system.

This visualization allows for a comparative analysis of how each model performs in translating into different target languages, highlighting variations in speed across languages and models.


In [5]:
fig = px.bar(df_melted, x='Target Language', y='Translation Time (s)', color='Model',
             barmode='group', title='Translation Times by Target Language and Model')
fig.show()

# Translation Times by Text Type and Model

This bar chart displays the translation times (in seconds) for two different translation models, "HF" and "Gemini," categorized by text type. The chart groups the bars by text type, with color distinguishing the two models.

- **Text Type**: The category of text being translated, such as word, sentence, or paragraph.
- **HF Model**: Represents the translation times for the HF translation system.
- **Gemini Model**: Represents the translation times for the Gemini translation system.

This visualization provides insight into how each model performs across different types of text, allowing for a comparative analysis of translation speed based on the complexity or length of the text.

In [6]:
fig = px.bar(df_melted, x='Text Type', y='Translation Time (s)', color='Model',
             barmode='group', title='Translation Times by Text Type and Model')
fig.show()

# Heatmap of Translation Times by Source and Target Language

This heatmap visualizes the translation times (in seconds) for different source and target language pairs across two translation models, "Hugging Face" and "Gemini." The data is displayed in two facets, one for each model, allowing a side-by-side comparison of how each model performs across various language pairs.

- **Source Language**: The original language of the text before translation.
- **Target Language**: The language into which the text is translated.
- **Translation Time (s)**: The time taken by each model to perform the translation, represented by color intensity on the heatmap.

The heatmap provides a clear overview of how translation time varies depending on the source and target language combination, as well as differences in performance between the two models.

In [8]:
# Reestructuración de datos para facilitar la visualización
df_melted = df.melt(id_vars=['Source Language', 'Target Language', 'Text Type', 'Text', 'Trial'],
                    value_vars=['HF Translation Time (s)', 'Gemini Translation Time (s)'],
                    var_name='Model', value_name='Translation Time (s)')

# Asegúrate de que los modelos están correctamente etiquetados en la columna 'Model'
df_melted['Model'] = df_melted['Model'].map({
    'HF Translation Time (s)': 'Hugging Face',
    'Gemini Translation Time (s)': 'Gemini'
})

# Ahora puedes crear el heatmap
fig = px.density_heatmap(df_melted, x='Source Language', y='Target Language', z='Translation Time (s)',
                         facet_col='Model', title='Heatmap of Translation Times by Source and Target Language')
fig.show()

# Translation Times by Source and Target Language for Each Model

This bar chart illustrates the translation times (in seconds) for various source and target language pairs across two different translation models: "HF" and "Gemini." The data is displayed in two separate facets, one for each model, allowing for a direct comparison of performance across languages within each model.

- **Source Language**: The original language of the text before translation.
- **Target Language**: The language into which the text is translated.
- **Translation Time (s)**: The time taken by each model to perform the translation.

The bars are grouped by target language within each source language category, providing a visual comparison of translation times between different language pairs for both the HF and Gemini models. This chart highlights the variations in translation speed depending on the language combination and the translation model used.


In [9]:
# Reestructuración de datos para facilitar la visualización
df_melted = df.melt(id_vars=['Source Language', 'Target Language'], 
                    value_vars=['HF Translation Time (s)', 'Gemini Translation Time (s)'],
                    var_name='Model', value_name='Translation Time (s)')

# Crear un gráfico de barras para cada par de idiomas
fig = px.bar(df_melted, x='Source Language', y='Translation Time (s)', color='Target Language',
             facet_col='Model', barmode='group',
             title='Translation Times by Source and Target Language for Each Model')
fig.show()

# Distribution of Translation Times by Text Type and Model

A box plot can effectively display the distribution of translation times across different text types for the two translation models, "HF" and "Gemini." This plot will highlight the median, quartiles, and potential outliers in the translation times, providing insights into the consistency and variability of each model's performance. By comparing these distributions, you can assess which model is more consistent across different text types and identify any significant differences in translation time.

In [14]:
# Create a box plot with target language in hover data
fig = px.box(df_melted, x='Source Language', y='Translation Time (s)', color='Model',
             title='Distribution of Translation Times by Source Language and Model',
             hover_data=['Target Language'])
fig.show()

# Correlation Between HF and Gemini Translation Times by Language Pair

A scatter plot can be used to explore the correlation between translation times for the HF and Gemini models across different language pairs. This type of plot will help identify if there are specific source-target language pairs that consistently require more time, potentially revealing linguistic complexities or inefficiencies in the models. The scatter plot can be color-coded by the source language and use different symbols for the target languages to enhance interpretability.

In [12]:
# Create a scatter plot to examine correlation between source and target language translation times
fig = px.scatter(df, x='HF Translation Time (s)', y='Gemini Translation Time (s)',
                 color='Source Language', symbol='Target Language',
                 title='Correlation Between HF and Gemini Translation Times by Language Pair')
fig.show()

# BLEU Score Calculation for Translation Performance

This snippet of code calculates the BLEU (Bilingual Evaluation Understudy) score, a common metric for evaluating the quality of machine-translated text. In this case, the BLEU score is used to compare the translations produced by the "HF" (Hugging Face) model against the "Gemini" model, which is treated as the reference translation.

## Explanation

1. **BLEU Score Calculation**:
   - The BLEU score is calculated by comparing the candidate translation (produced by the HF model) against a reference translation (produced by the Gemini model).
   - The `sentence_bleu` function from the `nltk.translate.bleu_score` module is used, which requires splitting the text into words (tokens) and then comparing these tokens between the candidate and reference translations.
   - A smoothing function (`SmoothingFunction().method1`) is applied to handle cases where the candidate translation may not have any n-grams in common with the reference.

2. **Function Definition**:
   - `calculate_bleu(row)`: This function takes a row from the dataframe, splits the "Gemini Translated Text" and "HF Translated Text" into word tokens, and calculates the BLEU score between them.

3. **Applying the Function**:
   - The `apply` method is used to apply the `calculate_bleu` function to each row of the dataframe, creating a new column `BLEU Score` that stores the result.

4. **Visualization**:
   - The script prints the `HF Translated Text`, `Gemini Translated Text`, and the corresponding `BLEU Score` for each row in the dataframe, allowing for a quick comparison of translation quality between the two models.

In [20]:
from nltk.translate.bleu_score import sentence_bleu, SmoothingFunction
import pandas as pd

# Cargar los datos
df = pd.read_csv('../data/translation_performance_results.csv')

# Definir una función para calcular el BLEU score usando las traducciones de Gemini como referencia
def calculate_bleu(row):
    reference = [row['Gemini Translated Text'].split()]  # Traducción de referencia (Gemini)
    candidate = row['HF Translated Text'].split()        # Traducción candidata (Hugging Face)
    return sentence_bleu(reference, candidate, smoothing_function=SmoothingFunction().method1)

# Aplicar la función a cada fila
df['BLEU Score'] = df.apply(calculate_bleu, axis=1)

# Visualizar los resultados con BLEU Score
print(df[['HF Translated Text', 'Gemini Translated Text', 'BLEU Score']])

                                     HF Translated Text  \
0                                                 Hola.   
1                                                 Hola.   
2                                                 Hola.   
3                                                 Hola.   
4                                                 Hola.   
...                                                 ...   
3145  Efficient translation can transform communicat...   
3146  Efficient translation can transform communicat...   
3147  Efficient translation can transform communicat...   
3148  Efficient translation can transform communicat...   
3149  Efficient translation can transform communicat...   

                                 Gemini Translated Text  BLEU Score  
0                                                  Hola    0.000000  
1                                                  Hola    0.000000  
2                                                  Hola    0.000000  
3          

# Distribution and Comparison of BLEU Scores

This code snippet generates two key visualizations to analyze the BLEU scores calculated from the comparison of translations between the "HF" (Hugging Face) and "Gemini" models.

### 1. Histogram of BLEU Scores
- **Purpose**: The histogram visualizes the distribution of BLEU scores across all translation pairs.
- **Details**: 
  - The BLEU scores are binned into 20 intervals (`nbins=20`), providing a clear view of how the scores are spread out.
  - The histogram helps to identify the range of BLEU scores, showing whether most translations are close to the reference or if there's significant variability in the quality of translations.

### 2. Box Plot of BLEU Scores by Text Type
- **Purpose**: The box plot compares the distribution of BLEU scores across different text types (e.g., word, sentence, paragraph).
- **Details**:
  - The x-axis represents different text types, while the y-axis shows the corresponding BLEU scores.
  - The box plot displays the median, quartiles, and any outliers in the BLEU scores for each text type, allowing for a comparison of translation quality between text types.
  - This visualization helps identify whether certain types of text are consistently harder to translate accurately for the HF model compared to the Gemini model.

In [21]:
# Histograma para visualizar la distribución de los BLEU Scores
fig = px.histogram(df, x='BLEU Score', nbins=20, title='Distribution of BLEU Scores (Hugging Face vs Gemini)')
fig.show()

# Gráfico de caja para comparar los BLEU Scores por tipo de texto
fig = px.box(df, x='Text Type', y='BLEU Score', title='BLEU Scores by Text Type')
fig.show()

# BLEU Score Distribution by Source Language

This code snippet calculates the BLEU score for each translation in the dataset, comparing the "HF" (Hugging Face) model translations against the "Gemini" model as the reference. The BLEU scores are then visualized in a histogram, with the distribution separated by source language.

### BLEU Score Calculation
- **Purpose**: The function `calculate_bleu` computes the BLEU score for each translation pair, using the "Gemini" translation as the reference and the "HF" translation as the candidate.
- **Details**: The BLEU score provides a numerical measure of how closely the HF translations match the Gemini translations, with smoothing applied to handle rare or zero n-gram overlaps.

### Histogram of BLEU Scores by Source Language
- **Purpose**: The histogram visualizes the distribution of BLEU scores across different source languages.
- **Details**:
  - The BLEU scores are binned into 20 intervals (`nbins=20`), and the bars are color-coded by source language.
  - This visualization highlights how translation quality, as measured by BLEU score, varies depending on the source language.
  - The histogram allows for an easy comparison of BLEU score distributions across different languages, identifying which languages may present more challenges in achieving high-quality translations.


In [23]:
# Definir una función para calcular el BLEU score usando las traducciones de Gemini como referencia
def calculate_bleu(row):
    reference = [row['Gemini Translated Text'].split()]  # Traducción de referencia (Gemini)
    candidate = row['HF Translated Text'].split()        # Traducción candidata (Hugging Face)
    return sentence_bleu(reference, candidate, smoothing_function=SmoothingFunction().method1)

# Aplicar la función a cada fila
df['BLEU Score'] = df.apply(calculate_bleu, axis=1)

# Histograma para visualizar la distribución de los BLEU Scores separados por el idioma de origen
fig = px.histogram(df, x='BLEU Score', nbins=20, color='Source Language',
                   title='Distribution of BLEU Scores by Source Language (Hugging Face vs Gemini)')
fig.show()

# Comparison of BLEU Scores by Text Type and Source Language

This code snippet generates a box plot to compare the BLEU scores across different text types and source languages. The plot provides insights into how translation quality, as measured by BLEU score, varies depending on the type of text and the language of the original text.

### Box Plot of BLEU Scores
- **Purpose**: The box plot visually compares the distribution of BLEU scores for different text types (e.g., word, sentence, paragraph) and separates these comparisons by source language.
- **Details**:
  - The x-axis represents the text type, while the y-axis shows the corresponding BLEU scores.
  - The boxes are color-coded by source language, allowing for a clear comparison of translation quality across languages for each text type.
  - The box plot displays the median, quartiles, and any outliers in BLEU scores, highlighting variations in translation accuracy based on both the text type and the language of origin.
  - This visualization helps identify which combinations of text type and source language result in more accurate or less accurate translations according to the BLEU metric.


In [27]:
# Gráfico de caja para comparar los BLEU Scores por tipo de texto y por idioma
fig = px.box(df, x='Text Type', y='BLEU Score', color='Source Language',
             title='BLEU Scores by Text Type and Source Language')
fig.show()

# Heatmap of Average BLEU Scores by Language Pair

This code snippet creates a heatmap to visualize the average BLEU scores for each source-target language pair. The heatmap provides a clear, comparative view of translation quality across different language pairs.

### Grouping and Averaging BLEU Scores
- **Purpose**: The data is first grouped by source and target languages, and the mean BLEU score is calculated for each language pair.
- **Details**:
  - The `groupby` function is used to aggregate BLEU scores by language pairs, providing the average BLEU score for each combination of source and target language.
  - This aggregation allows for a high-level comparison of translation accuracy across different language pairs.

### Heatmap of Average BLEU Scores
- **Purpose**: The heatmap visually represents the average BLEU scores across all language pairs, making it easy to identify which pairs yield higher or lower translation quality.
- **Details**:
  - The x-axis represents the source language, and the y-axis represents the target language.
  - The color intensity in each cell corresponds to the average BLEU score for that particular language pair.
  - This visualization helps in identifying which language pairs are more challenging to translate accurately and which pairs yield better translation performance, as indicated by higher BLEU scores.

In [25]:
# Agrupar los datos por idioma de origen y destino y calcular la media de los BLEU scores
df_heatmap = df.groupby(['Source Language', 'Target Language'])['BLEU Score'].mean().reset_index()

# Crear un heatmap de los BLEU scores promedio por par de idiomas
fig = px.density_heatmap(df_heatmap, x='Source Language', y='Target Language', z='BLEU Score',
                         title='Average BLEU Score by Language Pair')
fig.show()

# Linear Regression Analysis on Translation Time

These two code snippets perform linear regression analysis to predict translation times for two different translation models: "HF" (Hugging Face) and "Gemini." The aim is to evaluate how well certain features (source language, target language, and text type) can predict the time it takes for these models to translate text.

### Data Preparation and Model Training

1. **Feature Selection**:
   - The features selected for prediction are `Source Language`, `Target Language`, and `Text Type`. These are categorical variables, so they are converted into a numerical format using one-hot encoding (`pd.get_dummies`), which creates binary columns for each category.

2. **Target Variable**:
   - In the first snippet, the target variable (`y`) is `HF Translation Time (s)`.
   - In the second snippet, the target variable is `Gemini Translation Time (s)`.

3. **Data Splitting**:
   - The data is split into training and test sets using an 80/20 split (`train_test_split`). The model is trained on 80% of the data and tested on the remaining 20%.

4. **Model Training**:
   - A linear regression model is created using `LinearRegression()` from Scikit-learn.
   - The model is then trained (`model.fit`) on the training data.

### Prediction and Evaluation

- **Prediction**:
  - After training, the model makes predictions on the test set (`model.predict`).

- **Evaluation**:
  - The performance of the model is evaluated using two metrics:
    - **Mean Squared Error (MSE)**: This metric indicates the average squared difference between the predicted and actual values, with lower values indicating better performance.
    - **R-Squared (R²)**: This metric indicates how well the features explain the variance in the target variable, with values closer to 1 indicating a better fit.

### Results Interpretation

- The results, including MSE and R² values, are printed for each model:
  - These metrics provide insight into the accuracy and explanatory power of the model when predicting translation times based on the given features.
  - By comparing the results for "HF" and "Gemini," one can determine which model’s translation time is more predictable based on language pairs and text type.

These analyses help in understanding the factors that influence translation time and how well they can be modeled using linear regression.

### 1.- HF Translation Time (s)

In [30]:
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.model_selection import train_test_split
import pandas as pd

# Preparar los datos
X = df[['Source Language', 'Target Language', 'Text Type']]  # Asegúrate de que estas columnas existen
y = df['HF Translation Time (s)']  # Puedes cambiar esto a 'Gemini Translation Time (s)' si lo prefieres

# Codificación One-hot para variables categóricas
X = pd.get_dummies(X, drop_first=True)

# Dividir los datos en conjuntos de entrenamiento y prueba
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Crear y entrenar el modelo de regresión lineal
model = LinearRegression()
model.fit(X_train, y_train)

# Realizar predicciones
y_pred = model.predict(X_test)

# Calcular métricas de evaluación
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f'MSE: {mse}')
print(f'R^2: {r2}')

MSE: 5.863845226780196
R^2: 0.11528634557748074


### 2.- Gemini Translation Time (s)

In [31]:
# Preparar los datos
X = df[['Source Language', 'Target Language', 'Text Type']] 
y = df['Gemini Translation Time (s)']

# Codificación One-hot para variables categóricas
X = pd.get_dummies(X, drop_first=True)

# Dividir los datos en conjuntos de entrenamiento y prueba
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Crear y entrenar el modelo de regresión lineal
model = LinearRegression()
model.fit(X_train, y_train)

# Realizar predicciones
y_pred = model.predict(X_test)

# Calcular métricas de evaluación
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f'MSE: {mse}')
print(f'R^2: {r2}')

MSE: 0.02155730273336661
R^2: 0.3850547985153001


# Interpretation of Linear Regression Results for Translation Time Prediction

The linear regression models were used to predict the translation times for two different models, "HF" (Hugging Face) and "Gemini," based on features such as source language, target language, and text type. The performance of these models is evaluated using Mean Squared Error (MSE) and R-squared (R²) metrics.

### HF Model Results
- **MSE: 5.8638**
  - This relatively high MSE indicates that there is a significant average squared difference between the predicted and actual translation times for the HF model.
  - A higher MSE suggests that the model's predictions are not very accurate, with large deviations from the true translation times.

- **R²: 0.1153**
  - The low R² value indicates that only about 11.53% of the variance in translation times is explained by the source language, target language, and text type.
  - This suggests that the selected features have limited predictive power for the HF model, implying that other factors not captured by the model might significantly influence translation time.

### Gemini Model Results
- **MSE: 0.0216**
  - The much lower MSE for the Gemini model indicates that the predictions are closer to the actual translation times, with smaller deviations.
  - This suggests that the linear regression model is better at predicting translation times for the Gemini model compared to the HF model.

- **R²: 0.3851**
  - The R² value of 0.3851 indicates that about 38.51% of the variance in Gemini translation times is explained by the selected features.
  - Although still not a perfect fit, this R² is significantly higher than that of the HF model, suggesting that the source language, target language, and text type have a stronger influence on the translation times for the Gemini model.

### Overall Interpretation
- The Gemini model's translation times are more predictable based on the features used in the linear regression, as indicated by its lower MSE and higher R² values compared to the HF model.
- The HF model shows a higher degree of variability and lower predictability in translation times, suggesting that factors beyond source language, target language, and text type play a more substantial role in determining how long it takes to translate text using this model.
- These results highlight the importance of model-specific factors and suggest that further investigation is needed to identify additional features that could improve prediction accuracy for the HF model.

# Decision Tree Regression Analysis for Translation Time Prediction

These two code snippets utilize decision tree regression models to predict the translation times for the "HF" (Hugging Face) and "Gemini" translation models. The objective is to assess how well decision trees can predict translation times based on features like source language, target language, and text type.

### Data Preparation and Model Training

1. **Feature Selection**:
   - The features used for prediction are `Source Language`, `Target Language`, and `Text Type`. These categorical variables are converted into numerical format through one-hot encoding (`pd.get_dummies`).

2. **Target Variable**:
   - In the first snippet, the target variable is `HF Translation Time (s)`.
   - In the second snippet, the target variable is `Gemini Translation Time (s)`.

3. **Data Splitting**:
   - The dataset is split into training and test sets using an 80/20 split (`train_test_split`), ensuring that the model is trained on a majority of the data and tested on the remainder.

4. **Model Training**:
   - A decision tree regression model is created using `DecisionTreeRegressor` with a fixed random state for reproducibility.
   - The model is trained on the training data (`model.fit`).

### Prediction and Evaluation

- **Prediction**:
  - The trained model is used to predict translation times on the test set (`model.predict`).

- **Evaluation**:
  - The model's performance is evaluated using the following metrics:
    - **Mean Squared Error (MSE)**: Indicates the average squared difference between the predicted and actual translation times. Lower MSE values signify better model accuracy.
    - **R-Squared (R²)**: Measures the proportion of variance in the translation times that is explained by the features. Higher R² values indicate better model performance.

### Results Interpretation

#### HF Model Results:
- **MSE** and **R²** will provide insights into how well the decision tree model predicts translation times for the HF model.
- If the MSE is high and the R² is low, it suggests that the decision tree model struggles to accurately predict translation times for the HF model, potentially due to the complexity or variability in the data that the model cannot capture effectively.

#### Gemini Model Results:
- Similarly, the MSE and R² for the Gemini model will reveal the effectiveness of the decision tree in predicting its translation times.
- A lower MSE and higher R² compared to the HF model would suggest that the decision tree model is better suited for predicting translation times for the Gemini model, potentially due to more consistent or predictable patterns in the translation process.

### Overall Analysis:
- By comparing the MSE and R² values for both models, we can determine which translation model's time predictions are better captured by decision trees.
- Decision trees are typically good at handling non-linear relationships, so the results may vary depending on how the features interact with translation time. Lower performance may indicate that other modeling approaches or additional features are necessary to improve prediction accuracy.

### 1.- HF Translation Time (s)

In [32]:
from sklearn.tree import DecisionTreeRegressor

# Preparar los datos
X = df[['Source Language', 'Target Language', 'Text Type']] 
y = df['HF Translation Time (s)']  # Cambia a 'Gemini Translation Time (s)' si prefieres

# Codificación One-hot para variables categóricas
X = pd.get_dummies(X, drop_first=True)

# Dividir los datos en conjuntos de entrenamiento y prueba
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Crear y entrenar el modelo de árbol de decisión
model = DecisionTreeRegressor(random_state=42)
model.fit(X_train, y_train)

# Realizar predicciones
y_pred = model.predict(X_test)

# Calcular métricas de evaluación
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f'MSE: {mse}')
print(f'R^2: {r2}')

MSE: 3.682747776644598
R^2: 0.44436165727711463


### 2.- Gemini Translation Time (s)

In [33]:
from sklearn.tree import DecisionTreeRegressor

# Preparar los datos
X = df[['Source Language', 'Target Language', 'Text Type']] 
y = df['Gemini Translation Time (s)']  # Cambia a 'Gemini Translation Time (s)' si prefieres

# Codificación One-hot para variables categóricas
X = pd.get_dummies(X, drop_first=True)

# Dividir los datos en conjuntos de entrenamiento y prueba
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Crear y entrenar el modelo de árbol de decisión
model = DecisionTreeRegressor(random_state=42)
model.fit(X_train, y_train)

# Realizar predicciones
y_pred = model.predict(X_test)

# Calcular métricas de evaluación
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f'MSE: {mse}')
print(f'R^2: {r2}')

MSE: 0.02128063850996236
R^2: 0.39294694247733675


# Interpretation of Decision Tree Regression Results for Translation Time Prediction

The decision tree regression models were used to predict the translation times for two translation models, "HF" (Hugging Face) and "Gemini," based on features like source language, target language, and text type. The performance of these models is evaluated using Mean Squared Error (MSE) and R-squared (R²) metrics.

### HF Model Results
- **MSE: 3.6827**
  - This MSE value indicates the average squared difference between the predicted and actual translation times for the HF model. While lower than previous results with linear regression, the MSE suggests there is still some deviation between predicted and actual times, though the model performs reasonably well.
  
- **R²: 0.4444**
  - The R² value indicates that approximately 44.44% of the variance in HF translation times is explained by the decision tree model. This suggests that the decision tree captures some significant patterns in the data, though more than half of the variance remains unexplained, indicating room for improvement.

### Gemini Model Results
- **MSE: 0.0213**
  - The MSE for the Gemini model is much lower, indicating that the decision tree model predictions are very close to the actual translation times. This reflects good predictive accuracy for the Gemini model.
  
- **R²: 0.3929**
  - The R² value for the Gemini model is 39.29%, slightly lower than that of the HF model. This suggests that, while the decision tree model is quite accurate in terms of MSE, it doesn't explain as much of the variance in translation times for the Gemini model as it does for the HF model. This could indicate that other factors not captured in the model might play a significant role in translation time variability for the Gemini model.

### Overall Interpretation
- **HF Model**: The decision tree model provides a relatively good fit for the HF translation times, explaining a moderate portion of the variance. However, the MSE indicates that there is still significant room for improvement, possibly by incorporating additional features or using more complex models.

- **Gemini Model**: The decision tree model performs exceptionally well in terms of MSE, suggesting it can predict translation times with high accuracy. However, the slightly lower R² value compared to the HF model indicates that while the predictions are accurate, the model might not fully capture all the factors that influence translation time for the Gemini model.

- **Comparison**: Both models show that decision trees are effective at capturing translation time patterns, but the HF model has a slightly better R² score, indicating it benefits more from the model's structure. The Gemini model, despite having a lower R², shows very precise predictions as reflected in the low MSE.

# Ridge and Lasso Regression for Translation Time Prediction

These two code snippets perform Ridge and Lasso regression analyses to predict translation times for the "HF" (Hugging Face) and "Gemini" translation models. The goal is to identify the best regularization parameters (alpha) for each model and assess how well these regularized models can predict translation times based on features like source language, target language, and text type.

### Data Preparation and Model Training

1. **Feature Selection**:
   - The features used for prediction are `Source Language`, `Target Language`, and `Text Type`. These categorical variables are converted into numerical format using one-hot encoding (`pd.get_dummies`).

2. **Target Variable**:
   - In the first snippet, the target variable is `HF Translation Time (s)`.
   - In the second snippet, the target variable is `Gemini Translation Time (s)`.

3. **Data Splitting**:
   - The dataset is split into training and test sets using an 80/20 split (`train_test_split`). The models are trained on the training set and tested on the remaining 20%.

4. **Model Definition**:
   - Both Ridge and Lasso regression models are defined:
     - **Ridge Regression** adds an L2 penalty to the loss function, which helps prevent overfitting by shrinking the coefficients of less important features.
     - **Lasso Regression** adds an L1 penalty, which can lead to sparse models by driving some coefficients to zero, effectively performing feature selection.

5. **Hyperparameter Tuning**:
   - The `GridSearchCV` method is used to perform cross-validated searches over specified alpha values for both Ridge and Lasso regressions. The goal is to find the best alpha that maximizes the R² score.

### Prediction and Evaluation

- **Prediction**:
  - After the best hyperparameters are found, the models make predictions on the test set (`model.predict`).

- **Evaluation**:
  - The performance of the models is evaluated using the following metrics:
    - **Mean Squared Error (MSE)**: Indicates the average squared difference between the predicted and actual translation times. Lower MSE values signify better model accuracy.
    - **R-Squared (R²)**: Measures the proportion of variance in the translation times that is explained by the features. Higher R² values indicate better model performance.

- **Best Parameters**:
  - The best-performing alpha values for both Ridge and Lasso are identified and printed, indicating the optimal level of regularization for each model.

### Results Interpretation

- **HF Model**: The results (MSE and R²) for Ridge and Lasso regression will provide insights into how well these regularized models predict translation times for the HF model, and whether Ridge or Lasso is more effective for this particular dataset.

- **Gemini Model**: Similarly, the performance metrics for the Gemini model will reveal the effectiveness of Ridge and Lasso regression in predicting translation times, and the best regularization strategy for this model.

### Overall Analysis:
- By comparing the MSE and R² values for both models and regression types, we can determine the most effective regularization approach and its impact on the predictive accuracy for translation times.


### 1.- HF Translation Time (s)

In [34]:
from sklearn.linear_model import Ridge, Lasso
from sklearn.model_selection import GridSearchCV

# Preparar los datos
X = df[['Source Language', 'Target Language', 'Text Type']] 
y = df['HF Translation Time (s)']  # Target variable set to HF model

# Codificación One-hot para variables categóricas
X = pd.get_dummies(X, drop_first=True)

# Dividir los datos en conjuntos de entrenamiento y prueba
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Definir los modelos de Ridge y Lasso
ridge = Ridge()
lasso = Lasso()

# Definir los hiperparámetros para buscar
param_grid_ridge = {'alpha': [0.1, 1.0, 10.0, 100.0]}
param_grid_lasso = {'alpha': [0.01, 0.1, 1.0, 10.0]}

# Búsqueda de hiperparámetros para Ridge
grid_ridge = GridSearchCV(estimator=ridge, param_grid=param_grid_ridge, cv=5, scoring='r2')
grid_ridge.fit(X_train, y_train)

# Búsqueda de hiperparámetros para Lasso
grid_lasso = GridSearchCV(estimator=lasso, param_grid=param_grid_lasso, cv=5, scoring='r2')
grid_lasso.fit(X_train, y_train)

# Mejor modelo y sus métricas para Ridge
best_ridge = grid_ridge.best_estimator_
y_pred_ridge = best_ridge.predict(X_test)
mse_ridge = mean_squared_error(y_test, y_pred_ridge)
r2_ridge = r2_score(y_test, y_pred_ridge)
print(f'Ridge MSE: {mse_ridge}')
print(f'Ridge R^2: {r2_ridge}')
print(f'Best Ridge Parameters: {grid_ridge.best_params_}')

# Mejor modelo y sus métricas para Lasso
best_lasso = grid_lasso.best_estimator_
y_pred_lasso = best_lasso.predict(X_test)
mse_lasso = mean_squared_error(y_test, y_pred_lasso)
r2_lasso = r2_score(y_test, y_pred_lasso)
print(f'Lasso MSE: {mse_lasso}')
print(f'Lasso R^2: {r2_lasso}')
print(f'Best Lasso Parameters: {grid_lasso.best_params_}')

Ridge MSE: 5.881459998580357
Ridge R^2: 0.1126286988408366
Best Ridge Parameters: {'alpha': 10.0}
Lasso MSE: 5.889653762512118
Lasso R^2: 0.11139245631546468
Best Lasso Parameters: {'alpha': 0.01}


### 2.- Gemini Translation Time (s)

In [35]:
# Preparar los datos
X = df[['Source Language', 'Target Language', 'Text Type']] 
y = df['Gemini Translation Time (s)']  # Target variable set to Gemini model

# Codificación One-hot para variables categóricas
X = pd.get_dummies(X, drop_first=True)

# Dividir los datos en conjuntos de entrenamiento y prueba
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Definir los modelos de Ridge y Lasso
ridge = Ridge()
lasso = Lasso()

# Definir los hiperparámetros para buscar
param_grid_ridge = {'alpha': [0.1, 1.0, 10.0, 100.0]}
param_grid_lasso = {'alpha': [0.01, 0.1, 1.0, 10.0]}

# Búsqueda de hiperparámetros para Ridge
grid_ridge = GridSearchCV(estimator=ridge, param_grid=param_grid_ridge, cv=5, scoring='r2')
grid_ridge.fit(X_train, y_train)

# Búsqueda de hiperparámetros para Lasso
grid_lasso = GridSearchCV(estimator=lasso, param_grid=param_grid_lasso, cv=5, scoring='r2')
grid_lasso.fit(X_train, y_train)

# Mejor modelo y sus métricas para Ridge
best_ridge = grid_ridge.best_estimator_
y_pred_ridge = best_ridge.predict(X_test)
mse_ridge = mean_squared_error(y_test, y_pred_ridge)
r2_ridge = r2_score(y_test, y_pred_ridge)
print(f'Ridge MSE: {mse_ridge}')
print(f'Ridge R^2: {r2_ridge}')
print(f'Best Ridge Parameters: {grid_ridge.best_params_}')

# Mejor modelo y sus métricas para Lasso
best_lasso = grid_lasso.best_estimator_
y_pred_lasso = best_lasso.predict(X_test)
mse_lasso = mean_squared_error(y_test, y_pred_lasso)
r2_lasso = r2_score(y_test, y_pred_lasso)
print(f'Lasso MSE: {mse_lasso}')
print(f'Lasso R^2: {r2_lasso}')
print(f'Best Lasso Parameters: {grid_lasso.best_params_}')

Ridge MSE: 0.02155880656542802
Ridge R^2: 0.38501190009143693
Best Ridge Parameters: {'alpha': 1.0}
Lasso MSE: 0.02520346969534925
Lasso R^2: 0.28104397189120767
Best Lasso Parameters: {'alpha': 0.01}


# Interpretation of Ridge and Lasso Regression Results for Translation Time Prediction

The Ridge and Lasso regression models were applied to predict translation times for the "HF" (Hugging Face) and "Gemini" models. Below are the key findings from the model evaluation based on the Mean Squared Error (MSE) and R-squared (R²) metrics, as well as the optimal regularization parameters.

### HF Model Results

- **Ridge Regression**:
  - **MSE: 5.8815**: The MSE for Ridge regression is quite high, indicating that the model's predictions deviate significantly from the actual translation times.
  - **R²: 0.1126**: The R² value is very low, suggesting that only about 11.26% of the variance in the HF translation times is explained by the features in the model. This implies that the Ridge regression model with an alpha of 10.0 has limited effectiveness in predicting translation times for the HF model.

- **Lasso Regression**:
  - **MSE: 5.8897**: The MSE is slightly higher than that of Ridge regression, indicating similar predictive performance but with slightly worse accuracy.
  - **R²: 0.1114**: The R² value is almost identical to that of Ridge regression, suggesting that Lasso regression also struggles to capture the variance in translation times effectively. The best alpha for Lasso is 0.01, but it doesn't provide a significant improvement over Ridge.

### Gemini Model Results

- **Ridge Regression**:
  - **MSE: 0.0216**: The MSE is much lower compared to the HF model, indicating that the Ridge regression model is able to predict Gemini translation times with relatively high accuracy.
  - **R²: 0.3850**: The R² value indicates that about 38.50% of the variance in the Gemini translation times is explained by the features. This suggests that Ridge regression with an alpha of 1.0 is more effective for the Gemini model than for the HF model.

- **Lasso Regression**:
  - **MSE: 0.0252**: The MSE is higher than that of Ridge regression, indicating that Lasso regression is less accurate in predicting Gemini translation times.
  - **R²: 0.2810**: The R² value is significantly lower than that of Ridge regression, suggesting that Lasso regression explains only 28.10% of the variance in Gemini translation times. Despite the best alpha being 0.01, Lasso regression does not perform as well as Ridge for the Gemini model.

### Overall Analysis

- **HF Model**: Both Ridge and Lasso regression models show limited effectiveness in predicting HF translation times, with low R² values and high MSEs. Ridge regression slightly outperforms Lasso, but neither model captures much of the variance in the data, indicating that the HF model's translation times are influenced by factors not captured by these features.

- **Gemini Model**: Ridge regression performs significantly better than Lasso for the Gemini model, as indicated by lower MSE and higher R² values. Ridge regression with an alpha of 1.0 is the better choice for predicting Gemini translation times, while Lasso regression's performance is notably weaker.

- **Comparison**: The Gemini model's translation times are more predictable with the given features compared to the HF model. Ridge regression is more suitable for both models, but it is particularly more effective for the Gemini model, suggesting that the data's structure aligns better with the assumptions of Ridge regression.

# Random Forest Regression for Translation Time Prediction

These two code snippets perform Random Forest regression analysis to predict translation times for the "HF" (Hugging Face) and "Gemini" translation models. The goal is to find the optimal combination of hyperparameters for the Random Forest model and evaluate its performance in predicting translation times based on features like source language, target language, and text type.

### Data Preparation and Model Training

1. **Feature Selection**:
   - The features used for prediction are `Source Language`, `Target Language`, and `Text Type`. These categorical variables are converted into numerical format through one-hot encoding (`pd.get_dummies`).

2. **Target Variable**:
   - In the first snippet, the target variable is `HF Translation Time (s)`.
   - In the second snippet, the target variable is `Gemini Translation Time (s)`.

3. **Data Splitting**:
   - The dataset is split into training and test sets using an 80/20 split (`train_test_split`). The model is trained on the training set and evaluated on the test set.

### Hyperparameter Tuning and Model Training

- **Hyperparameter Tuning**:
  - The Random Forest model is tuned using `GridSearchCV`, which performs a cross-validated search over a grid of hyperparameters:
    - **n_estimators**: Number of trees in the forest (e.g., 100, 200, 500).
    - **max_depth**: Maximum depth of each tree (e.g., None, 10, 20).
    - **min_samples_split**: Minimum number of samples required to split an internal node (e.g., 2, 5, 10).
    - **min_samples_leaf**: Minimum number of samples required to be at a leaf node (e.g., 1, 2, 4).
    - **max_features**: Number of features to consider when looking for the best split (e.g., 'sqrt', 'log2').
  - The best combination of hyperparameters is identified based on the R² score.

- **Model Training**:
  - The Random Forest model is trained using the best combination of hyperparameters determined from the grid search.

### Prediction and Evaluation

- **Prediction**:
  - The model makes predictions on the test set using the best-performing Random Forest model (`best_rf.predict`).

- **Evaluation**:
  - The model's performance is evaluated using the following metrics:
    - **Mean Squared Error (MSE)**: Indicates the average squared difference between the predicted and actual translation times. Lower MSE values signify better model accuracy.
    - **R-Squared (R²)**: Measures the proportion of variance in the translation times that is explained by the features. Higher R² values indicate better model performance.

- **Best Parameters**:
  - The best hyperparameters for the Random Forest model are printed, providing insights into the optimal configuration for predicting translation times.

### Results Interpretation

- **HF Model**: The MSE and R² values for the HF model will indicate how well the Random Forest model, with the best-selected hyperparameters, predicts translation times.

- **Gemini Model**: Similarly, the performance metrics for the Gemini model will reveal the effectiveness of the Random Forest model in predicting its translation times.

### Overall Analysis:
- By comparing the MSE and R² values for both models, we can determine the most effective hyperparameter configuration and assess the predictive accuracy of the Random Forest model for translation times.

### 1.- HF Translation Time (s)

In [36]:
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import mean_squared_error, r2_score

# Preparar los datos
X = df[['Source Language', 'Target Language', 'Text Type']] 
y = df['HF Translation Time (s)']  # Target variable set to HF model

# Codificación One-hot para variables categóricas
X = pd.get_dummies(X, drop_first=True)

# Dividir los datos en conjuntos de entrenamiento y prueba
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Configurar la búsqueda de hiperparámetros para Random Forest
param_grid = {
    'n_estimators': [100, 200, 500],
    'max_depth': [None, 10, 20],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4],
    'max_features': ['sqrt', 'log2']
}

grid_search_rf = GridSearchCV(estimator=RandomForestRegressor(random_state=42),
                              param_grid=param_grid, cv=5, n_jobs=-1, verbose=1, scoring='r2')

# Entrenar el modelo con la mejor combinación de hiperparámetros
grid_search_rf.fit(X_train, y_train)
best_rf = grid_search_rf.best_estimator_

# Realizar predicciones
y_pred_rf = best_rf.predict(X_test)

# Calcular métricas de evaluación
mse_rf = mean_squared_error(y_test, y_pred_rf)
r2_rf = r2_score(y_test, y_pred_rf)

print(f'Random Forest MSE: {mse_rf}')
print(f'Random Forest R^2: {r2_rf}')
print(f'Best Random Forest Parameters: {grid_search_rf.best_params_}')

Fitting 5 folds for each of 162 candidates, totalling 810 fits
Random Forest MSE: 3.9947326268905345
Random Forest R^2: 0.3972906234570245
Best Random Forest Parameters: {'max_depth': 10, 'max_features': 'sqrt', 'min_samples_leaf': 4, 'min_samples_split': 2, 'n_estimators': 100}


### 2.- Gemini Translation Time (s)

In [37]:
# Preparar los datos
X = df[['Source Language', 'Target Language', 'Text Type']] 
y = df['Gemini Translation Time (s)']  # Target variable set to Gemini model

# Codificación One-hot para variables categóricas
X = pd.get_dummies(X, drop_first=True)

# Dividir los datos en conjuntos de entrenamiento y prueba
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Configurar la búsqueda de hiperparámetros para Random Forest
param_grid = {
    'n_estimators': [100, 200, 500],
    'max_depth': [None, 10, 20],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4],
    'max_features': ['sqrt', 'log2']
}

grid_search_rf = GridSearchCV(estimator=RandomForestRegressor(random_state=42),
                              param_grid=param_grid, cv=5, n_jobs=-1, verbose=1, scoring='r2')

# Entrenar el modelo con la mejor combinación de hiperparámetros
grid_search_rf.fit(X_train, y_train)
best_rf = grid_search_rf.best_estimator_

# Realizar predicciones
y_pred_rf = best_rf.predict(X_test)

# Calcular métricas de evaluación
mse_rf = mean_squared_error(y_test, y_pred_rf)
r2_rf = r2_score(y_test, y_pred_rf)

print(f'Random Forest MSE: {mse_rf}')
print(f'Random Forest R^2: {r2_rf}')
print(f'Best Random Forest Parameters: {grid_search_rf.best_params_}')

Fitting 5 folds for each of 162 candidates, totalling 810 fits
Random Forest MSE: 0.02116929977854491
Random Forest R^2: 0.39612299930927897
Best Random Forest Parameters: {'max_depth': 10, 'max_features': 'sqrt', 'min_samples_leaf': 1, 'min_samples_split': 2, 'n_estimators': 200}


# Interpretation of Random Forest Regression Results for Translation Time Prediction

The Random Forest regression models were applied to predict translation times for the "HF" (Hugging Face) and "Gemini" models. Below are the key findings based on the Mean Squared Error (MSE) and R-squared (R²) metrics, along with the optimal hyperparameters identified through the grid search.

### HF Model Results

- **MSE: 3.9947**:
  - The MSE indicates the average squared difference between the predicted and actual translation times. For the HF model, the MSE suggests that the predictions made by the Random Forest model are reasonably accurate but still exhibit some variance from the true values.

- **R²: 0.3973**:
  - The R² value shows that approximately 39.73% of the variance in HF translation times is explained by the features used in the model. While this is a significant improvement over the Ridge and Lasso regression models, it still suggests that a substantial portion of the variance remains unexplained, indicating that the model could benefit from additional features or complexity.

- **Best Parameters**:
  - The optimal configuration for the Random Forest model includes:
    - **max_depth: 10**
    - **max_features: 'sqrt'**
    - **min_samples_leaf: 4**
    - **min_samples_split: 2**
    - **n_estimators: 100**
  - These parameters indicate that a moderately deep Random Forest with 100 trees and a minimum of 4 samples per leaf node performs best for predicting HF translation times.

### Gemini Model Results

- **MSE: 0.0212**:
  - The MSE for the Gemini model is much lower, indicating that the Random Forest model predicts Gemini translation times with high accuracy. This low MSE reflects the model's ability to make predictions that closely match the actual values.

- **R²: 0.3961**:
  - The R² value indicates that about 39.61% of the variance in Gemini translation times is explained by the features. This is slightly lower than expected given the low MSE, suggesting that while the model is accurate, it might not fully capture all factors influencing translation times, similar to the HF model.

- **Best Parameters**:
  - The optimal configuration for the Gemini model includes:
    - **max_depth: 10**
    - **max_features: 'sqrt'**
    - **min_samples_leaf: 1**
    - **min_samples_split: 2**
    - **n_estimators: 200**
  - This configuration points to a Random Forest with a moderate depth and more trees (200), allowing it to capture more detailed patterns in the data with a high level of accuracy.

### Overall Analysis

- **HF Model**: The Random Forest model shows a significant improvement over previous models (Ridge and Lasso) for the HF model, with a much higher R² and lower MSE. However, it still leaves over 60% of the variance unexplained, suggesting that other features or more complex models may be needed to improve predictions further.

- **Gemini Model**: The Random Forest model performs exceptionally well with the Gemini model, achieving a very low MSE and a respectable R² score. The model effectively predicts Gemini translation times, though there is still room for improvement in capturing more of the variance.

- **Comparison**: Both models benefit from the Random Forest approach, with the Gemini model showing slightly better accuracy but a similar R² to the HF model. The optimal parameters differ slightly, with the Gemini model favoring a larger number of trees and smaller leaf sizes, indicating it may benefit from a more granular model structure.

These results suggest that Random Forest is a strong model choice for predicting translation times, especially when tuned to the specific characteristics of each translation model.


# Comprehensive Analysis and Conclusions on Translation Time Prediction

### Overview of Models and Methods
We applied several regression techniques—Linear Regression, Decision Trees, Ridge and Lasso, and Random Forest—to predict translation times for the "HF" (Hugging Face) and "Gemini" models. Each method brought different strengths and weaknesses to the task, and by comparing their performance, we can draw several important conclusions.

### Performance Summary

1. **Linear Regression**:
   - **HF Model**: The linear regression model performed poorly, with low R² values (around 0.11) and high MSE, indicating that the linear model could not capture the complexities of the translation time data for HF.
   - **Gemini Model**: While slightly better, the linear regression for Gemini also had low R² values (around 0.39), suggesting limited effectiveness in capturing the variance in translation times.

2. **Decision Trees**:
   - **HF Model**: Decision Trees showed improved performance over linear regression with an R² of approximately 0.44, but the MSE remained relatively high, indicating some predictive accuracy but still significant unexplained variance.
   - **Gemini Model**: The Decision Tree model also performed better for Gemini, with an R² of around 0.39. However, it still struggled to fully capture the complexities of translation time variance.

3. **Ridge and Lasso Regression**:
   - **HF Model**: Both Ridge and Lasso regression showed similar performance to linear regression, with low R² values (around 0.11) and high MSE, indicating that regularization did not significantly improve the linear model's predictive power for HF translation times.
   - **Gemini Model**: For Gemini, Ridge regression performed better than Lasso, with R² values approaching 0.39, suggesting that the slight regularization provided by Ridge was more effective than the sparsity enforced by Lasso.

4. **Random Forest**:
   - **HF Model**: Random Forest significantly outperformed the previous models for HF, achieving an R² of approximately 0.40 and a much lower MSE. This indicates that the Random Forest model is better suited to capturing the nonlinear relationships in the data.
   - **Gemini Model**: Random Forest also performed well for Gemini, with an R² of around 0.40 and the lowest MSE of all models tested. This suggests that Random Forest is highly effective for predicting translation times for the Gemini model.

### Key Conclusions

1. **Model Complexity and Nonlinearity**:
   - The performance improvements seen with Decision Trees and especially Random Forests indicate that translation time data for both HF and Gemini models exhibit nonlinear relationships that simple linear models cannot capture effectively. This highlights the importance of using models capable of handling complexity when predicting translation times.

2. **Feature Importance**:
   - Despite the improvements with more complex models, the R² values for all models suggest that the selected features (source language, target language, text type) explain only a portion of the variance in translation times. This indicates that other factors, possibly related to the specific content of the text or the underlying algorithms used by the translation models, play a significant role in determining translation time.

3. **Random Forest as the Preferred Model**:
   - Among all the models tested, Random Forest consistently provided the best performance for both HF and Gemini models, with the lowest MSE and the highest R² values. This suggests that for tasks involving translation time prediction, Random Forest is the preferred model, particularly when the goal is to achieve high accuracy with moderately complex data.

4. **HF vs. Gemini Models**:
   - The results indicate that while both models benefit from complex algorithms like Random Forest, the Gemini model’s translation times are slightly more predictable than those of the HF model, as evidenced by generally lower MSE values. However, the variance explained by the models is similar for both HF and Gemini, suggesting comparable challenges in predicting translation times across both models.

### Final Recommendations
- **Further Feature Engineering**: To improve prediction accuracy, it is recommended to explore additional features, such as text length, sentence complexity, or model-specific characteristics that could better capture the factors influencing translation times.
- **Model Selection**: For practical applications, Random Forest should be considered the model of choice for predicting translation times, especially when high accuracy is required. However, it may also be beneficial to experiment with other ensemble methods or hybrid models to further improve predictive performance.
- **Understanding Translation Models**: The insights gained suggest that a deeper understanding of the translation models themselves could be valuable. Incorporating features that capture model-specific behaviors or inefficiencies could enhance prediction accuracy.

In conclusion, while significant progress has been made using complex models like Random Forest, further research and feature development are needed to fully understand and predict translation times for both HF and Gemini models.
