# Problem Statement

TripAdvisor aims to enhance user engagement and satisfaction by streamlining the process of finding suitable bars. Current methods require users to sift through extensive lists and reviews, which can be time-consuming and overwhelming. There is also a need to leverage the vast amount of review data to provide actionable insights to users. An intelligent chatbot can resolve these issues by offering personalized bar recommendations based on user preferences extracted from natural language input, thereby improving the decision-making process for users.

### The current problem at hand :

1. Information Overload: Users face a paradox of choice with too many options and reviews to consider.

2. Time-Consuming Process: Finding the right bar involves navigating through filters, reading numerous reviews, and comparing options manually.

3. Underutilized Data: The wealth of review data TripAdvisor possesses is not being actively used to offer personalized, data-driven recommendations.

4. User Engagement: The need to enhance user interaction with the platform, encouraging them to spend more time and return frequently.

5. Monetization Strategy: There is potential for better monetization of the platform by providing targeted advertising and featured listings.


### Benefits of the Solution:

1. Personalization: The chatbot will use natural language processing to understand user queries and preferences, offering a tailored set of recommendations.

2. Efficiency: It streamlines the decision-making process, saving users time by reducing the need to manually filter through options.

3. Data-Driven Insights: Harnessing the power of TripAdvisor's extensive review database, the chatbot can provide more accurate and relevant suggestions.

4. Increased Engagement: An interactive chatbot keeps users on the platform longer and encourages repeat usage.

5. New Revenue Streams: Featured listings and targeted advertising within the chatbot interactions can generate additional income.

6. Competitive Edge: Offering a sophisticated recommendation tool can differentiate TripAdvisor from competitors, positioning it as an innovative leader in travel tech.

7. Scalability: The chatbot can be expanded to other areas such as hotels, restaurants, and activities, making it a comprehensive travel planning tool.

# Vision

The long-term vision for integrating chatbots into TripAdvisor's services is to revolutionize the travel planning experience by creating a seamless, personalized, and interactive user journey. The initial focus on bar recommendations is just the first phase in a strategic roadmap to incorporate artificial intelligence across various travel-related decision-making scenarios. Here's the broader vision:

## Immediate Goals
- Establish a Foundation: Starting with bar recommendations allows TripAdvisor to build a solid foundation for its AI capabilities. It's an opportunity to fine-tune the chatbot technology in a controlled environment before scaling up.
- Learn from User Interactions: Early iterations will gather crucial data on user preferences and interaction patterns, which will be invaluable for enhancing the recommendation algorithms.
- Iterative Improvement: The feedback loop from user interactions will help in continuously refining the chatbot's accuracy and user experience.

## Medium-Term Expansion:
- Restaurant Recommendations: After mastering bar recommendations, the next logical step is to assist users in finding the perfect dining experiences. A restaurant recommendation chatbot would analyze user preferences, dietary restrictions, ambiance choices, and other factors to suggest ideal eateries.
- Accommodation Suggestions: The chatbot could extend to recommending hotels and other forms of accommodation, considering factors like location, amenities, price range, and user reviews.

## Long-Term Vision:
- Comprehensive Trip Planning: Eventually, the chatbot will evolve into a full-fledged virtual travel assistant, capable of curating entire trips based on user input. It would suggest flights, accommodation, dining, and activities, all within the user's budget and preference parameters.
- Integrated Ecosystem: The chatbot will operate within a larger AI ecosystem, connecting with booking systems, calendars, and weather forecasts to provide a holistic travel planning service.
- Enhanced User Profiling: By leveraging machine learning and data analytics, the chatbot will create detailed user profiles to predict future preferences and make proactive suggestions.
- Dynamic Adaptation: The system will dynamically adapt recommendations in real-time based on contextual factors such as weather, location, or even global events.
- Personal Travel Concierge: The ultimate goal is for the chatbot to function as a personal travel concierge, offering end-to-end planning and in-trip assistance, with the ability to make reservations, provide navigation, and even offer language support.

## Strategic Benefits:
- User Retention and Engagement: A chatbot that simplifies trip planning encourages users to return and engage with the platform more deeply and frequently.
- Data Monetization: The insights gained from user interactions can be monetized through targeted advertising, affiliate marketing, and premium service offerings.
- Market Differentiation: By offering a level of personalization and convenience that competitors can't match, TripAdvisor can position itself as a leader in the travel tech industry.
- Scalability and Diversification: This technology is scalable to other markets and sectors within travel, such as cruise planning, adventure tourism, and business travel services.

In summary, the bar recommendation chatbot is just the inception of a transformative journey toward creating a comprehensive, user-centric travel assistant. It's a strategic move to harness the power of AI to enhance the travel planning process, making it more intuitive, efficient, and personalized, which aligns with the evolving needs and expectations of modern travelers.

# Implementation with PyCaret

PyCaret, an open-source ML library, is pivotal for our chatbot due to its:

1. Ease of Use: It simplifies the process of creating complex ML models with an intuitive API.
2. Rapid Prototyping: PyCaret accelerates the time from data to insights, crucial for agile development.
3. AutoML Capabilities: It automatically compares various models to select the optimum performer.
4. End-to-End Framework: From data preprocessing to model deployment, PyCaret provides a seamless workflow.
5. Scalability: As TripAdvisor grows, PyCaret's efficient handling of large datasets will be essential.

In [1]:
import pandas as pd
from pycaret.classification import *

In [2]:
# Load the dataset
df = pd.read_csv('../data/bars.csv')

In [3]:
# Counting the number of reviews per bar
review_counts = df['bar_name'].value_counts()

# Identifying bars with fewer than 2 reviews
bars_with_few_reviews = review_counts[review_counts < 10].index

# Dropping the bars with fewer than 2 reviews from the dataset
df_filtered = df[~df['bar_name'].isin(bars_with_few_reviews)]

In [4]:
# Setup the PyCaret environment
clf1 = setup(data=df_filtered, target='bar_name', session_id=123, fix_imbalance=True, verbose=False)

# Compare models to find the best one
best_model = compare_models()

# Create the best model
# Note: Replace 'rf' with the actual best model abbreviation
model = create_model('rf')  # This is an example, use the actual best model

# Tune the best model
tuned_model = tune_model(model)

# Finalize the model
final_model = finalize_model(tuned_model)

# Evaluate the model
performance = pull()  # Pulls the last displayed table
print(performance)

Unnamed: 0,Model,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC,TT (Sec)
gbc,Gradient Boosting Classifier,0.9875,0.1995,0.9875,0.9896,0.9875,0.9848,0.9849,252.351
knn,K Neighbors Classifier,0.983,0.1985,0.983,0.9865,0.9832,0.9793,0.9794,0.391
dt,Decision Tree Classifier,0.98,0.1984,0.98,0.9868,0.9819,0.9758,0.9759,0.253
et,Extra Trees Classifier,0.967,0.1996,0.967,0.9736,0.9684,0.96,0.9601,1.561
rf,Random Forest Classifier,0.9597,0.1992,0.9597,0.9672,0.9613,0.9511,0.9513,4.892
lightgbm,Light Gradient Boosting Machine,0.9179,0.1995,0.9179,0.934,0.9209,0.9013,0.902,9.295
xgboost,Extreme Gradient Boosting,0.9176,0.1997,0.9176,0.9327,0.9205,0.9008,0.9014,7.877
nb,Naive Bayes,0.3677,0.16,0.3677,0.3856,0.2496,0.1614,0.2176,0.137
ridge,Ridge Classifier,0.2362,0.0,0.2362,0.3283,0.2044,0.1361,0.1501,0.168
lda,Linear Discriminant Analysis,0.1982,0.1461,0.1982,0.5028,0.2719,0.1392,0.1487,0.182


Unnamed: 0_level_0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0.9609,0.996,0.9609,0.9638,0.9598,0.9526,0.9527
1,0.9598,0.9961,0.9598,0.9639,0.9596,0.9512,0.9512
2,0.9567,0.0,0.9567,0.9622,0.9566,0.9474,0.9475
3,0.9481,0.0,0.9481,0.9679,0.9556,0.9373,0.9376
4,0.9556,0.0,0.9556,0.9686,0.9601,0.9462,0.9464
5,0.964,0.0,0.964,0.9824,0.972,0.9565,0.9569
6,0.9587,0.0,0.9587,0.969,0.9614,0.95,0.9501
7,0.9608,0.0,0.9608,0.9597,0.9574,0.9525,0.9526
8,0.9693,0.0,0.9693,0.9699,0.9682,0.9628,0.9628
9,0.963,0.0,0.963,0.9643,0.9624,0.9551,0.9552


Unnamed: 0_level_0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0.3288,0.8615,0.3288,0.7581,0.3644,0.2702,0.294
1,0.3552,0.8585,0.3552,0.6764,0.3898,0.2917,0.3133
2,0.3414,0.0,0.3414,0.6968,0.3988,0.2835,0.3061
3,0.3249,0.0,0.3249,0.6851,0.3562,0.264,0.2873
4,0.3503,0.0,0.3503,0.7002,0.3789,0.2889,0.3139
5,0.3259,0.0,0.3259,0.6793,0.3704,0.2665,0.2897
6,0.3534,0.0,0.3534,0.6444,0.3848,0.2857,0.3074
7,0.3545,0.0,0.3545,0.7001,0.4012,0.295,0.3205
8,0.3249,0.0,0.3249,0.6943,0.373,0.2645,0.29
9,0.3481,0.0,0.3481,0.6582,0.377,0.2857,0.3072


Fitting 10 folds for each of 10 candidates, totalling 100 fits
Original model was better than the tuned model, hence it will be returned. NOTE: The display metrics are for the tuned model (not the original one).
      Accuracy     AUC  Recall   Prec.      F1   Kappa     MCC
Fold                                                          
0       0.3288  0.8615  0.3288  0.7581  0.3644  0.2702  0.2940
1       0.3552  0.8585  0.3552  0.6764  0.3898  0.2917  0.3133
2       0.3414  0.0000  0.3414  0.6968  0.3988  0.2835  0.3061
3       0.3249  0.0000  0.3249  0.6851  0.3562  0.2640  0.2873
4       0.3503  0.0000  0.3503  0.7002  0.3789  0.2889  0.3139
5       0.3259  0.0000  0.3259  0.6793  0.3704  0.2665  0.2897
6       0.3534  0.0000  0.3534  0.6444  0.3848  0.2857  0.3074
7       0.3545  0.0000  0.3545  0.7001  0.4012  0.2950  0.3205
8       0.3249  0.0000  0.3249  0.6943  0.3730  0.2645  0.2900
9       0.3481  0.0000  0.3481  0.6582  0.3770  0.2857  0.3072
Mean    0.3407  0.1720  0.3407  

# K-Nearest Neighbours (KNN)

The decision to implement a K-Nearest Neighbors (KNN) algorithm for the TripAdvisor chatbot recommendation system is based on several of its intrinsic advantages:

1. Simplicity and Intuitiveness:
KNN is one of the simplest machine learning algorithms, relying on the proximity of data points to make predictions. This simplicity is particularly effective for a recommendation system where similarity between user preferences and bar attributes can directly translate into recommendations.

2. No Assumption About Data:
Unlike many other algorithms, KNN does not make any assumptions about the underlying distribution of the data. This is beneficial when dealing with the diverse and complex dataset of user reviews on TripAdvisor, which may not follow a specific statistical distribution.

3. Naturally Handles Multi-Class Cases:
The KNN algorithm can seamlessly handle situations where bars can fall into multiple categories or themes, making it a versatile choice for categorizing bars based on the varied preferences expressed in user reviews.

4. Adaptability:
KNN's instance-based learning nature means it can adapt quickly as the TripAdvisor dataset grows. When new user reviews are added, the algorithm can incorporate this new information with minimal computational overhead.

5. Interpretability:
The decisions made by a KNN model are easy for humans to understand because they're based on the most similar items. This interpretability is crucial for a chatbot, where users may seek explanations for why a particular bar was recommended.

The suitability of KNN for TripAdvisor's recommendation system lies in its straightforward approach to leveraging the wealth of user review data, providing personalized and insightful suggestions that can enhance the overall user experience on the platform.

In [5]:
# Test results
knn_model = create_model('knn')

# Using the model to predict the holdout set
knn_holdout_predictions = predict_model(knn_model)

# Display the holdout predictions
knn_holdout_predictions.head()


Unnamed: 0_level_0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0.9789,0.9925,0.9789,0.9855,0.9796,0.9743,0.9744
1,0.9831,0.9923,0.9831,0.9864,0.9828,0.9795,0.9795
2,0.9789,0.0,0.9789,0.9847,0.9799,0.9744,0.9744
3,0.9841,0.0,0.9841,0.9884,0.9843,0.9807,0.9808
4,0.9841,0.0,0.9841,0.9834,0.9829,0.9807,0.9808
5,0.9947,0.0,0.9947,0.9945,0.9943,0.9936,0.9936
6,0.9884,0.0,0.9884,0.9926,0.9893,0.9859,0.9859
7,0.9778,0.0,0.9778,0.9822,0.9786,0.9731,0.9731
8,0.9799,0.0,0.9799,0.9846,0.9811,0.9756,0.9756
9,0.9799,0.0,0.9799,0.9823,0.9798,0.9756,0.9756


Unnamed: 0,Model,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
0,K Neighbors Classifier,0.9842,0.9933,0,0,0,0.9808,0.9808


Unnamed: 0.1,Unnamed: 0,review_heading,review_text,review_rating,bar_name,prediction_label,prediction_score
5292,5292,good food but expensive,food was excellent but on my pricey side. had ...,40,kinki_restaurant_bar,kinki_restaurant_bar,1.0
117,117,stunning view of a singapore,going up this rooftop bar/lounge cost $23 sing...,50,ce_la_vi_singapore,ce_la_vi_singapore,1.0
6592,6592,stunning views by the pool,this is one of our best experiences in singapo...,45,ce_la_vi_singapore,ce_la_vi_singapore,1.0
11222,11222,"nice view, very poor service","dear coralie, thank you so much for taking the...",50,smoke_mirrors,smoke_mirrors,1.0
10602,10602,"awesome staff, food and vibes",abbiey served 4 of us at woo bar and her aweso...,50,woobar_w_singapore_sentosa_cove,woobar_w_singapore_sentosa_cove,1.0


In [6]:
# Exporting the predictions
knn_holdout_predictions.to_csv('../data/knn_predictions.csv', index=False)

# Practical Application

Here's how we're implementing the solution with PyCaret:

1. We began by filtering our dataset to include only bars with a significant number of reviews, ensuring reliability.

2. PyCaret's environment is set up to handle imbalanced data, an important consideration given the varied popularity of bars.

3. The "compare_models" function evaluates multiple algorithms to find the most suitable base model.

4. We refine the chosen model using tune_model for enhanced performance.

5. The final model is solidified with "finalize_model" and evaluated to ensure it meets our standards.

6. Using the K-Nearest Neighbours (KNN) algorithm, we create a model that predicts user preferences with high relevance.

In [7]:
# Exporting the model
from pycaret.classification import save_model

save_model(knn_model, 'knn_model_for_streamlit')

Transformation Pipeline and Model Successfully Saved


(Pipeline(memory=FastMemory(location=C:\Users\marcu\AppData\Local\Temp\joblib),
          steps=[('label_encoding',
                  TransformerWrapperWithInverse(exclude=None, include=None,
                                                transformer=LabelEncoder())),
                 ('numerical_imputer',
                  TransformerWrapper(exclude=None,
                                     include=['Unnamed: 0', 'review_rating'],
                                     transformer=SimpleImputer(add_indicator=False,
                                                               copy=True,
                                                               fill_value=None...
                                                                               n_jobs=None,
                                                                               random_state=None,
                                                                               sampling_strategy='auto')))),
                 ('clea

In [8]:
predict_model

<function pycaret.classification.functional.predict_model(estimator, data: Optional[pandas.core.frame.DataFrame] = None, probability_threshold: Optional[float] = None, encoded_labels: bool = False, raw_score: bool = False, round: int = 4, verbose: bool = True) -> pandas.core.frame.DataFrame>

# Cost-Benefit Analysis for AI-Powered Chatbot Implementation

## Costs :

1. Development Costs:
- Software Development: Costs associated with the development team to build and integrate the chatbot into TripAdvisor's platform.
- Data Preparation: Expenses for data scientists to clean and prepare the dataset for model training.
- Technology Stack: Investment in the necessary technology stack, including servers, databases, and analytics tools.

2. Operational Costs:
- Maintenance: Ongoing costs to update and maintain the chatbot, including server costs and personnel.
- Training: Costs for training staff to manage and update the chatbot system.
- Customer Support: Additional support costs if users require assistance using the new feature.

3. Miscellaneous Costs:
- Marketing: Expenses for marketing campaigns to promote the new chatbot feature to users.
- Contingency: A contingency budget to address unforeseen expenses during the chatbot implementation.

## Benefits

1. Direct Financial Gains:
- Increased Revenue: Additional income from targeted advertising and premium listings within the chatbot.
- Enhanced Conversion Rates: Improved user experience leading to more bookings and transactions on the platform.

2. Cost Savings:
- Efficiency Gains: Reduced time spent by users in finding suitable bars leads to server load reduction and cost savings.
- Customer Service: Decreased load on customer service due to more self-service options for users.

3. Indirect Benefits:
- User Engagement: Increased time spent on the platform can lead to higher retention rates and more opportunities for monetization.
- Data Collection: Valuable insights from user interactions with the chatbot can inform future business strategies and optimizations.
- Brand Image: Being an early adopter of such technology can enhance TripAdvisor's brand as a leader in travel innovation.

4. Strategic Advantages:
- Market Positioning: The chatbot could provide a significant competitive edge in the marketplace.
- Scalability: The chatbot can be expanded to other services, multiplying its benefits across the platform.

## Quantifying the Analysis

To conduct a quantitative cost-benefit analysis, we'd assign monetary values to each cost and benefit item. For instance:
 - Development costs might be estimated based on the number of hours required for development multiplied by the hourly rate of the developers, data scientists, and other team members involved.
 - Operational costs would include recurring expenses such as hosting and support staff.
 - Direct financial gains could be projected from current conversion rates and expected increases due to the chatbot.
 
 We'd then calculate the net present value (NPV) of the project by discounting future benefits to their present value and subtracting the total costs.


# Conclusion

If the NPV is positive and the return on investment (ROI) meets TripAdvisor's benchmark, the chatbot project would be financially feasible. It's also important to consider the strategic benefits that don't directly translate to immediate financial gains but contribute to long-term growth and sustainability.

Since this is not implemented yet, detailed financial data and market analysis would be needed to perform a precise CBA.