# Model experimentation and selection

## Introduction

### Goal

The primary goal of this notebook is to systematically build, evaluate, and compare our four distinct recommendation models. By using a consistent set of quantitative and qualitative metrics, we will determine which approach provides the most accurate, relevant, and explainable recommendations, ultimately selecting the single best model to power our final application.

### Steps
1. Load data: Import the cleaned and feature-engineered datasets from the previous notebook.
2. Define evaluation: Create reusable functions to measure model performance based on variety match, country match, point differential, and qualitative case studies.
3. Model 1 - TF-IDF: Build and evaluate a baseline model using TF-IDF and cosine similarity.
4. Model 2 - Sentence Transformer: Build and evaluate a model using a pre-trained sentence embedding model.
5. Model 3 - Custom Word2Vec: Build and evaluate a model using custom word embeddings trained on our specific wine corpus.
6. Model 4 - Hybrid: Build and evaluate a custom hybrid model that combines the best text-based approach with our engineered numerical features.
7. Final Selection: Compare all results and save the components of the winning model for the next phase.

## 1. Setup and imports

Goal: Import necessary libraries and load the processed datasets.

In [37]:
#Import necessary libraries
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity, manhattan_distances
from sklearn.preprocessing import MinMaxScaler
from sentence_transformers import SentenceTransformer
import nltk
from gensim.models import Word2Vec

In [2]:
# Set display options for pandas
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', 80)

## 2. Data acquisition

Goal: Load the cleaned and feature-engineered data from the previous notebook.

In [3]:
train_df = pd.read_csv('train_processed.csv')
test_df = pd.read_csv('test_processed.csv')

In [4]:
# The 'price_bracket' column was useful for creating the value_score,
# but it's a categorical type that might cause issues later.
# We'll convert it to a string for simplicity.
train_df['price_bracket'] = train_df['price_bracket'].astype(str)
test_df['price_bracket'] = test_df['price_bracket'].astype(str)

In [5]:
print(f"Training set shape: {train_df.shape}")
print(f"Testing set shape:  {test_df.shape}")

Training set shape: (97567, 20)
Testing set shape:  (25592, 20)


## 3. Define evaluation metrics

Goal: Create functions to objectively measure the performance of our recommendation models.

In [6]:
def evaluate_model(recommendations_df):
    """
    Calculates performance metrics based on a dataframe of recommendations.
    """
    # Variety Match Rate
    # Calculates the percentage of recommendations where at least one recommended wine has the same variety
    variety_match_at_least_one = recommendations_df.apply(
        lambda row: row['variety'] in row['recommended_varieties'], axis=1
    ).mean()

    # Country Match Rate
    # Calculates the percentage of recommendations where at least one recommended wine is from the same country
    country_match_at_least_one = recommendations_df.apply(
        lambda row: row['country'] in row['recommended_countries'], axis=1
    ).mean()

    # Average Point Differential
    # Calculates the average difference in points
    point_diff = (recommendations_df['recommended_points'].apply(np.mean) - recommendations_df['points']).abs().mean()

    print(f"Variety Match Rate (at least one): {variety_match_at_least_one:.2%}")
    print(f"Country Match Rate (at least one): {country_match_at_least_one:.2%}")
    print(f"Average Point Differential: {point_diff:.2f} points")

In [7]:
def get_recommendations(test_indices, similarity_matrix, train_dataframe, test_dataframe, top_n=5):
    """
    Generates recommendations for a given set of test indices.
    """
    results = []
    # The similarity matrix is (test_samples x train_samples)
    # We iterate through the rows of the similarity matrix, which correspond to the test_indices
    for i, test_idx in enumerate(test_indices):
        # Get the similarity scores for the i-th test wine against all training wines
        sim_scores = list(enumerate(similarity_matrix[i]))
        # Sort the wines based on the similarity scores
        sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
        # Get the scores of the top_n most similar wines
        sim_scores = sim_scores[:top_n]
        # Get the training wine indices
        train_wine_indices = [i[0] for i in sim_scores]
        
        # Get the details of the recommended wines from the training dataframe
        rec_wines = train_dataframe.iloc[train_wine_indices]
        
        # Get the details of the original test wine
        original_wine = test_dataframe.loc[test_idx]
        
        results.append({
            'variety': original_wine['variety'],
            'country': original_wine['country'],
            'points': original_wine['points'],
            'recommended_varieties': list(rec_wines['variety']),
            'recommended_countries': list(rec_wines['country']),
            'recommended_points': list(rec_wines['points'])
        })
        
    return pd.DataFrame(results)

In [8]:
def explain_tfidf_recommendation(original_corpus, recommended_corpus, vectorizer):
    """
    Finds the shared top keywords between two documents based on TF-IDF scores.
    """
    # Create a set of words from the original corpus for fast lookup
    original_words = set(original_corpus.split())
    
    # Get feature names and their corresponding IDF scores from the vectorizer
    feature_names = np.array(vectorizer.get_feature_names_out())
    idf_scores = vectorizer.idf_
    
    # Create a dictionary of word -> idf_score
    word_idf_dict = dict(zip(feature_names, idf_scores))
    
    # Find shared words that are also in our feature set
    shared_words = [word for word in recommended_corpus.split() if word in original_words and word in word_idf_dict]
    
    # Score shared words by their IDF value (lower IDF means more common/less important)
    # We want words with high IDF, so we sort in ascending order and take the last ones.
    shared_words.sort(key=lambda word: word_idf_dict.get(word, 0))
    
    # Return the top 5 most important shared words (highest IDF)
    return shared_words[-5:]

In [17]:
def case_study(original_wine_index, similarity_matrix, train_dataframe, test_dataframe, vectorizer=None):
    """
    Performs a deep dive into a single recommendation.
    """
    # The similarity_matrix for a single case study is shape (1, n_train_samples).
    # We just need the first (and only) row.
    sim_scores = list(enumerate(similarity_matrix[0]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    sim_scores = sim_scores[:5] # Get top 5 recommendations
    train_wine_indices = [i[0] for i in sim_scores]
    
    original_wine = test_dataframe.loc[original_wine_index]
    recommended_wines = train_dataframe.iloc[train_wine_indices]
    
    print("--- CASE STUDY ---")
    print(f"Original Wine: {original_wine['title']} ({original_wine['variety']}, {original_wine['points']} pts)")
    print(f"Description: {original_wine['description']}\n")
    
    print("--- Top 5 Recommendations ---")
    for i, (idx, score) in enumerate(zip(train_wine_indices, sim_scores)):
        rec_wine = train_dataframe.iloc[idx]
        print(f"{i+1}. {rec_wine['title']} ({rec_wine['variety']}, {rec_wine['points']} pts) - Similarity: {score[1]:.4f}")
        print(f"   Description: {rec_wine['description']}")
        
        # If a vectorizer is provided, show the explainability
        if vectorizer:
            shared_keywords = explain_tfidf_recommendation(original_wine['corpus'], rec_wine['corpus'], vectorizer)
            print(f"   Shared Key Terms: {shared_keywords}\n")
        else:
            print("\n")

## 4. Approach 1: TF-IDF model

Goal: Build and evaluate a recommendation model using TF-IDF vectors.

In [10]:
# Step 1: Create and fit the TF-IDF Vectorizer on the training data
tfidf_vectorizer = TfidfVectorizer(max_features=5000)
tfidf_train_vectors = tfidf_vectorizer.fit_transform(train_df['corpus'].fillna(''))

In [11]:
# Step 2: Transform the full test data using the fitted vectorizer
tfidf_test_vectors = tfidf_vectorizer.transform(test_df['corpus'].fillna(''))

In [12]:
# Step 3: Select a sample of the test set for evaluation to avoid memory issues
# We'll use a sample of 1000 test wines for faster evaluation
test_sample = test_df.sample(n=1000, random_state=42)
test_sample_indices = test_sample.index
# We need the original index locations to slice the test vectors matrix
test_sample_locs = [test_df.index.get_loc(i) for i in test_sample_indices]
sample_test_vectors = tfidf_test_vectors[test_sample_locs]

In [13]:
# Step 4: Calculate the cosine similarity ONLY for the sample against the train vectors
# The resulting matrix will have a manageable shape of (1000, n_train_samples)
tfidf_similarity_matrix = cosine_similarity(sample_test_vectors, tfidf_train_vectors)

In [14]:
# Step 5: Generate recommendations for our sample
tfidf_recommendations = get_recommendations(
    test_indices=test_sample_indices,
    similarity_matrix=tfidf_similarity_matrix,
    train_dataframe=train_df,
    test_dataframe=test_df
)

In [15]:
# Step 6: Evaluate the model's performance (Quantitative)
evaluate_model(tfidf_recommendations)

Variety Match Rate (at least one): 93.00%
Country Match Rate (at least one): 98.60%
Average Point Differential: 1.70 points


The TF-IDF model has established a very strong baseline performance. The high Variety Match Rate (93%) and exceptionally high Country Match Rate (98.6%) show that this keyword-based approach is excellent at identifying and matching the most important explicit features in the wine descriptions. Furthermore, the low Average Point Differential (1.70) indicates that the recommendations are consistently within a similar quality tier. These results demonstrate that even a simple TF-IDF model can be highly effective for this task, setting a high bar for the more complex semantic models to beat.

In [18]:
# Step 7: Evaluate the model's performance (Qualitative)
# We need the full similarity matrix for the case study, so we'll compute it for the specific wine we choose
case_study_index = test_sample_indices[0] # Just pick the first wine from our sample
case_study_loc = test_df.index.get_loc(case_study_index)
case_study_similarity_matrix = cosine_similarity(tfidf_test_vectors[case_study_loc:case_study_loc+1], tfidf_train_vectors)
case_study(case_study_index, case_study_similarity_matrix, train_df, test_df, vectorizer=tfidf_vectorizer)

--- CASE STUDY ---
Original Wine: Beresford 2013 Single Vineyard Chardonnay (McLaren Vale) (Chardonnay, 82 pts)
Description: Let me confess upfront: creamed corn was one of my favorite side dishes when I was a child. That said, it's not something I like in my wine, and this medium-bodied Chardonnay shows an excess of those sweet corn and lactic aromas and flavors.

--- Top 5 Recommendations ---
1. El Enemigo 2013 Chardonnay (Mendoza) (Chardonnay, 85 pts) - Similarity: 0.3783
   Description: This ripe offering smells of corn, baked apple and oak. It's creamy and woody in the mouth, with modest acidity. Flavors of creamed corn, caramel and soft melon finish plump, with a baked quality. It lacks vivacity.
   Shared Key Terms: ['flavor', 'corn', 'corn', 'creamed']

2. Männle 2012 Chardonnay (Itata Valley) (Chardonnay, 80 pts) - Similarity: 0.3753
   Description: Aromas of sweet corn and melon are not convincing. This has little in the way of mouthfeel or substance; flavors of candied fruit

This case study perfectly illustrates both the power and the primary weakness of the TF-IDF model. The model was exceptionally successful at identifying the most unique and heavily weighted term in the original review: "creamed corn." As a result, it delivered five highly relevant recommendations that are all Chardonnays and all share this very specific, unusual flavor note. However, the model lacks any understanding of sentiment or context; it doesn't know that "creamed corn" was mentioned as an undesirable quality. This highlights that while TF-IDF is excellent at literal keyword matching, it can't grasp the user's actual preference, which is a key limitation we hope to address with more advanced semantic models.

## 5. Approach 2: pre-trained sentence embeddings

Goal: Build and evaluate a model using a pre-trained Sentence Transformer.

In [19]:
# Step 1: Load the pre-trained model
# 'all-MiniLM-L6-v2' is a fast and effective baseline model.
model = SentenceTransformer('all-MiniLM-L6-v2')

In [20]:
# Step 2: Generate embeddings for the training data
sbert_train_vectors = model.encode(train_df['corpus'].fillna('').tolist(), show_progress_bar=True)

Batches:   0%|          | 0/3049 [00:00<?, ?it/s]

In [21]:
# Step 3: Generate embeddings for the test sample
sbert_test_sample_vectors = model.encode(test_sample['corpus'].fillna('').tolist(), show_progress_bar=True)

Batches:   0%|          | 0/32 [00:00<?, ?it/s]

In [22]:
# Step 4: Calculate cosine similarity between the test sample and all training data
sbert_similarity_matrix = cosine_similarity(sbert_test_sample_vectors, sbert_train_vectors)

In [23]:
# Step 5: Generate recommendations for the sample
sbert_recommendations = get_recommendations(
    test_indices=test_sample_indices,
    similarity_matrix=sbert_similarity_matrix,
    train_dataframe=train_df,
    test_dataframe=test_df
)

In [24]:
# Step 6: Evaluate the model's performance (Quantitative)
evaluate_model(sbert_recommendations)

Variety Match Rate (at least one): 89.30%
Country Match Rate (at least one): 97.10%
Average Point Differential: 1.88 points


The Sentence Transformer model, which is designed to understand the meaning of the text, performs slightly worse than the TF-IDF baseline on these specific metrics. The Variety Match Rate drops to 89.3%, and the Average Point Differential increases to 1.88, indicating that it's less precise at matching the exact grape and quality score. This result suggests that for matching explicit, factual information that is present as keywords in the text (like 'Chardonnay' or 'USA'), the direct keyword-matching approach of TF-IDF is more effective. The true test for this semantic model will be in the qualitative review, where we can see if it's recommending wines that are similar in style and feel, even if they aren't an exact match on paper.

In [25]:
# Step 7: Evaluate the model's performance (Qualitative)
# We need to generate the embedding for the single case study wine
case_study_corpus = test_df.loc[case_study_index]['corpus']
case_study_vector = model.encode([case_study_corpus])
sbert_case_study_similarity = cosine_similarity(case_study_vector, sbert_train_vectors)

# We pass vectorizer=None to trigger the embedding explanation method
case_study(case_study_index, sbert_case_study_similarity, train_df, test_df, vectorizer=None)

--- CASE STUDY ---
Original Wine: Beresford 2013 Single Vineyard Chardonnay (McLaren Vale) (Chardonnay, 82 pts)
Description: Let me confess upfront: creamed corn was one of my favorite side dishes when I was a child. That said, it's not something I like in my wine, and this medium-bodied Chardonnay shows an excess of those sweet corn and lactic aromas and flavors.

--- Top 5 Recommendations ---
1. Dutton Estate 2009 Warren's Collection Chardonnay (Russian River Valley) (Chardonnay, 84 pts) - Similarity: 0.6906
   Description: The poster child for too much oak. Swamps with buttered toast, caramel, butterscotch and vanilla flavors that are so sweet, it's almost like a dessert wine. There's lots of tangerines, pineapples, apricots and apples, but those barrel influences are too much.


2. Millbrook 2013 Proprietor's Special Reserve Chardonnay (Hudson River Region) (Chardonnay, 84 pts) - Similarity: 0.6896
   Description: Pronounced notes of butter, cheese and canned cream corn dominate th

This case study demonstrates the Sentence Transformer's ability to understand the overall style of a wine, moving beyond simple keyword matching. While it still recommended several wines with the "creamed corn" note, it also identified Chardonnays that were similar in a broader sense—described as "oaky," "buttery," "creamy," and "rich." For example, the first recommendation, described as a "poster child for too much oak," is a perfect stylistic match for a wine criticized for its "excess" of lactic flavors, even if the specific keywords differ. This shows the model is capturing the semantic essence of a rich, bold Chardonnay style, although, like the TF-IDF model, it still fails to grasp the negative sentiment of the original review.

## 6. Approach 3: Custom word embeddings (Word2Vec)

Goal: Build and evaluate a model using custom Word2Vec embeddings trained on our data.

In [26]:
# Step 1: Prepare tokenized sentences from the training corpus
# We need a list of lists of words for the Word2Vec model
tokenized_corpus = [nltk.word_tokenize(doc) for doc in train_df['corpus'].fillna('')]

In [28]:
# Step 2: Train the Word2Vec model
custom_w2v_model = Word2Vec(sentences=tokenized_corpus, vector_size=100, window=5, min_count=5, workers=4)

In [29]:
# Step 3: Create a function to get the average vector for a document
def get_average_vector(tokens, model, vector_size):
    vec = np.zeros(vector_size)
    count = 0
    for word in tokens:
        if word in model.wv:
            vec += model.wv[word]
            count += 1
    if count != 0:
        vec /= count
    return vec

In [31]:
# Step 4: Generate document vectors for train and test sets
w2v_train_vectors = np.array([get_average_vector(nltk.word_tokenize(doc), custom_w2v_model, 100) for doc in train_df['corpus'].fillna('')])
w2v_test_sample_vectors = np.array([get_average_vector(nltk.word_tokenize(doc), custom_w2v_model, 100) for doc in test_sample['corpus'].fillna('')])

In [32]:
# Step 5: Calculate cosine similarity
w2v_similarity_matrix = cosine_similarity(w2v_test_sample_vectors, w2v_train_vectors)

In [33]:
# Step 6: Generate recommendations
w2v_recommendations = get_recommendations(
    test_indices=test_sample_indices,
    similarity_matrix=w2v_similarity_matrix,
    train_dataframe=train_df,
    test_dataframe=test_df
)

In [34]:
# Step 7: Evaluate the model (Quantitative)
evaluate_model(w2v_recommendations)

Variety Match Rate (at least one): 83.40%
Country Match Rate (at least one): 97.30%
Average Point Differential: 1.60 points


The custom Word2Vec model presents a very interesting trade-off. It has the lowest Variety Match Rate (83.40%) so far, indicating it's less precise at matching the exact grape type compared to the other methods. However, it achieves the best Average Point Differential (1.60), suggesting it has developed a superior, domain-specific understanding of the language related to wine quality. This model appears to be excellent at capturing the style and quality tier of a wine, even if it sometimes generalizes across different but stylistically similar varieties.

In [35]:
# Step 8: Evaluate the model (Qualitative)
case_study_corpus_w2v = test_df.loc[case_study_index]['corpus']
case_study_vector_w2v = np.array([get_average_vector(nltk.word_tokenize(case_study_corpus_w2v), custom_w2v_model, 100)])
w2v_case_study_similarity = cosine_similarity(case_study_vector_w2v, w2v_train_vectors)
case_study(case_study_index, w2v_case_study_similarity, train_df, test_df, vectorizer=None)

--- CASE STUDY ---
Original Wine: Beresford 2013 Single Vineyard Chardonnay (McLaren Vale) (Chardonnay, 82 pts)
Description: Let me confess upfront: creamed corn was one of my favorite side dishes when I was a child. That said, it's not something I like in my wine, and this medium-bodied Chardonnay shows an excess of those sweet corn and lactic aromas and flavors.

--- Top 5 Recommendations ---
1. Irony 2005 Chardonnay (Napa Valley) (Chardonnay, 81 pts) - Similarity: 0.8315
   Description: Just what many people like, an oaky, ripe, sweet Chardonnay, but purists will find it simple and pandering to public taste.


2. Dancing Bull 2005 Chardonnay (California) (Chardonnay, 81 pts) - Similarity: 0.8287
   Description: Something went south on this wine. It smells and tastes burnt, like it went through a fire, and the simple flavors are like the syrup in canned peaches.


3. Vinum Cellars 2011 Chardonnay (North Coast) (Chardonnay, 83 pts) - Similarity: 0.8158
   Description: A tropical Chard

This case study shows the custom Word2Vec model's sophisticated ability to understand the overall character and even the quality level of a wine from its description. Unlike the TF-IDF model, it did not fixate on the "creamed corn" keyword. Instead, it recommended other low-scoring Chardonnays that were described with similar negative or simplistic terms like "pandering," "burnt," and "simple." This is a remarkable result, as it demonstrates the model learned to associate the language of a flawed, low-quality Chardonnay with other similarly described wines. It's not recommending good alternatives, but it is correctly identifying wines that belong to the same stylistic and quality cluster, proving it has learned the nuanced semantics of the wine review vocabulary.

## 7. Approach 4: Custom hybrid model

Goal: Combine text similarity with the engineered features for a more nuanced model.

In [40]:
# Step 1: Normalize the numerical features ('points', 'price', 'value_score')
# We'll use MinMaxScaler, fitting it ONLY on the training data.
scaler = MinMaxScaler()
numerical_features = ['points', 'price', 'value_score']

In [41]:
# Fill any potential NaN values in value_score before scaling
train_df['value_score'] = train_df['value_score'].fillna(0.5)
test_sample['value_score'] = test_sample['value_score'].fillna(0.5)

In [42]:
train_numerical_scaled = scaler.fit_transform(train_df[numerical_features])
test_sample_numerical_scaled = scaler.transform(test_sample[numerical_features])

In [43]:
# Step 2: Calculate similarity for each numerical feature
# We use Manhattan distance and invert it to get a similarity score (0 to 1)
points_sim = 1 - manhattan_distances(test_sample_numerical_scaled[:, 0].reshape(-1, 1), train_numerical_scaled[:, 0].reshape(-1, 1))
price_sim = 1 - manhattan_distances(test_sample_numerical_scaled[:, 1].reshape(-1, 1), train_numerical_scaled[:, 1].reshape(-1, 1))
value_sim = 1 - manhattan_distances(test_sample_numerical_scaled[:, 2].reshape(-1, 1), train_numerical_scaled[:, 2].reshape(-1, 1))


In [44]:
# Step 3: Define weights for the hybrid score
# Based on our findings, text is most important, followed by quality (points/value)
weights = {
    'text': 0.7,
    'points': 0.15,
    'value': 0.15
}

In [45]:
# Step 4: Calculate the final hybrid similarity score
# We'll use the TF-IDF similarity as our text component since it performed best on explicit features
hybrid_similarity_matrix = (
    weights['text'] * tfidf_similarity_matrix +
    weights['points'] * points_sim +
    weights['value'] * value_sim
)

In [46]:
# Step 5: Generate recommendations
hybrid_recommendations = get_recommendations(
    test_indices=test_sample_indices,
    similarity_matrix=hybrid_similarity_matrix,
    train_dataframe=train_df,
    test_dataframe=test_df
)

In [47]:
# Step 6: Evaluate the model (Quantitative)
evaluate_model(hybrid_recommendations)

Variety Match Rate (at least one): 93.90%
Country Match Rate (at least one): 99.10%
Average Point Differential: 0.72 points


The hybrid model is a clear success, outperforming all previous approaches on every quantitative metric. By combining the keyword-matching strength of TF-IDF with our engineered features, it achieves an excellent Variety Match Rate (93.90%) and a near-perfect Country Match Rate (99.10%). The most impressive result, however, is the Average Point Differential, which has been dramatically reduced to just 0.72 points. This indicates that the model is not only finding stylistically similar wines but is also exceptionally good at matching their quality tier, creating a far more balanced and precise recommendation system.

In [48]:
# Step 7: Evaluate the model (Qualitative)
# For the case study, we need to calculate the hybrid similarity for just one wine
case_study_numerical = scaler.transform(test_df.loc[[case_study_index]][numerical_features])
cs_points_sim = 1 - manhattan_distances(case_study_numerical[:, 0].reshape(-1, 1), train_numerical_scaled[:, 0].reshape(-1, 1))
cs_value_sim = 1 - manhattan_distances(case_study_numerical[:, 2].reshape(-1, 1), train_numerical_scaled[:, 2].reshape(-1, 1))

hybrid_case_study_similarity = (
    weights['text'] * case_study_similarity_matrix +
    weights['points'] * cs_points_sim +
    weights['value'] * cs_value_sim
)

In [49]:
case_study(case_study_index, hybrid_case_study_similarity, train_df, test_df, vectorizer=tfidf_vectorizer)

--- CASE STUDY ---
Original Wine: Beresford 2013 Single Vineyard Chardonnay (McLaren Vale) (Chardonnay, 82 pts)
Description: Let me confess upfront: creamed corn was one of my favorite side dishes when I was a child. That said, it's not something I like in my wine, and this medium-bodied Chardonnay shows an excess of those sweet corn and lactic aromas and flavors.

--- Top 5 Recommendations ---
1. Männle 2012 Chardonnay (Itata Valley) (Chardonnay, 80 pts) - Similarity: 0.5319
   Description: Aromas of sweet corn and melon are not convincing. This has little in the way of mouthfeel or substance; flavors of candied fruits taste like corn and wheat, while the finish is spineless.
   Shared Key Terms: ['aroma', 'sweet', 'like', 'corn', 'corn']

2. El Enemigo 2013 Chardonnay (Mendoza) (Chardonnay, 85 pts) - Similarity: 0.5186
   Description: This ripe offering smells of corn, baked apple and oak. It's creamy and woody in the mouth, with modest acidity. Flavors of creamed corn, caramel and s

The hybrid model represents the most successful and balanced approach so far. It effectively uses the TF-IDF component to lock onto the key descriptive term, "corn," ensuring all recommendations are stylistically similar. However, by incorporating the `points` and `value_score` features, it significantly refines the results, recommending only wines within a very tight and appropriate quality range (80-85 points). This demonstrates a clear synergy between the text and numerical features, leading to recommendations that are not just similar in flavor profile but also in their overall quality and market position. While it still doesn't understand the negative sentiment, it has proven to be the most precise model for matching a wine's complete profile.

## 8. Final model selection

Goal: Compare all model results and select the best one for our application.

### Model Comparison Summary

| Model                       | Variety Match Rate | Country Match Rate | Avg. Point Differential | Qualitative Notes                                                              |
|-----------------------------|--------------------|--------------------|-------------------------|--------------------------------------------------------------------------------|
| **1. TF-IDF** | 93.00%             | 98.60%             | 1.70 pts                | Excellent at keyword matching, but lacks semantic understanding (e.g., sentiment). |
| **2. Sentence Transformer** | 89.30%             | 97.10%             | 1.88 pts                | Better at capturing overall style, but less precise on explicit facts.         |
| **3. Custom Word2Vec** | 83.40%             | 97.30%             | 1.60 pts                | Best at matching quality tier, understands nuanced/negative language.          |
| **4. Hybrid Model** | 93.90% | 99.10% | 0.72 pts | Best of all worlds: precise, quality-aware, and stylistically relevant.        |

### Conclusion

After a comprehensive evaluation, the Custom Hybrid Model is the clear winner. It outperforms all other models on every quantitative metric, achieving the highest match rates for variety and country while maintaining an exceptionally low average point differential. The qualitative analysis confirms that by combining the keyword-matching strength of TF-IDF with the engineered features for quality and value, the hybrid model provides recommendations that are not only stylistically similar but also precisely matched in terms of quality and market position. This balanced and highly accurate approach makes it the ideal choice for the final VinoMatch application. We will now proceed with saving this model and its components for use in our Streamlit app.

## 9. Save the final model and data

Goal: Save the components of our chosen model (Hybrid Model) for the Streamlit app.

In [50]:
import pickle

In [51]:
# Save the TF-IDF vectorizer
with open('tfidf_vectorizer.pkl', 'wb') as f:
    pickle.dump(tfidf_vectorizer, f)

In [52]:
# Save the scaler
with open('scaler.pkl', 'wb') as f:
    pickle.dump(scaler, f)

In [53]:
# The similarity matrix will be computed on the fly in the app,
# but we need the train vectors and numerical features.
np.save('tfidf_train_vectors.npy', tfidf_train_vectors.toarray())
np.save('train_numerical_scaled.npy', train_numerical_scaled)

## Conclusion and key insights

This notebook successfully executed the core experimental phase of the project. After rigorously testing four different models, the Custom Hybrid Model emerged as the definitive winner. While simpler models like TF-IDF performed well on basic keyword matching, and semantic models showed a nuanced understanding of quality, the hybrid approach delivered the best of all worlds. It achieved the highest scores across all quantitative metrics, most notably reducing the average point differential to just 0.72, while the qualitative case study confirmed its ability to provide stylistically relevant recommendations within the correct quality tier. This data-driven process gives us high confidence in our final model choice. The next and final step is to take the saved components of this winning model and build the interactive Streamlit application.