# Building the Model

For this model, I want to build a recommender system where the user imputs their symptom and the model gives them cannabis strains that are best suited for them.

In [2]:
import numpy as np
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import seaborn as sns
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')

In [3]:
file_path = r"C:\Users\Baile\Documents\DSI 24\capstone\clean_weed.csv"
weed = pd.read_csv(file_path)

In [9]:
# symptom to effects mapping
symptom_to_effects = {
    'ADD/ADHD': ['Focused', 'Energetic', 'Creative', 'Aroused', 'Talkative'],
    'Alzheimer\'s': ['Relaxed', 'Calm'],
    'Anorexia': ['Hungry', 'Relaxed', 'Euphoric', 'Giggly'],
    'Anxiety': ['Calm', 'Relaxed'],
    'Appetite Loss': ['Hungry'],
    'Arthritis': ['Relaxed', 'Calm'],
    'Asthma': ['Relaxed', 'Calm'],
    'Autism': ['Calm', 'Relaxed'],
    'Bipolar Disorder': ['Calm', 'Uplifted'],
    'Cancer': ['Relaxed', 'Hungry', 'Giggly', 'Happy', 'Euphoric'],
    'Chronic Pain': ['Relaxed', 'Calm'],
    'Cramps': ['Relaxed', 'Euphoric'],
    'Crohn\'s Disease': ['Relaxed', 'Hungry'],
    'Depression': ['Happy', 'Uplifted', 'Euphoric'],
    'Epilepsy': ['Calm', 'Relaxed'],
    'Eye Pressure': ['Calm', 'Relaxed', 'Sleepy'],
    'Tired': ['Energized', 'Energetic', 'Social'],
    'Fibromyalgia': ['Relaxed', 'Calm'],
    'Gastrointestinal Disorder': ['Hungry', 'Relaxed'],
    'Glaucoma': ['Calm', 'Relaxed', 'Happy'],
    'Headaches': ['Relaxed', 'Calm'],
    'HIV/AIDS': ['Hungry', 'Relaxed'],
    'Hypertension': ['Calm', 'Relaxed'],
    'Inflammation': ['Calm', 'Relaxed'],
    'Insomnia': ['Sleepy', 'Relaxed'],
    'Irritable Bowel Syndrome': ['Hungry', 'Relaxed'],
    'Loss of Appetite': ['Hungry'],
    'Migraines': ['Relaxed', 'Calm'],
    'Mood Swings': ['Calm', 'Uplifted'],
    'Multiple Sclerosis': ['Relaxed', 'Calm'],
    'Muscle Spasms': ['Relaxed'],
    'Narcolepsy': ['Energetic', 'Energized'],
    'Nausea': ['Relaxed', 'Calm', 'Giggly'],
    'Neuropathy': ['Calm', 'Relaxed'],
    'Nightmares': ['Calm', 'Relaxed'],
    'Parkinson\'s': ['Calm', 'Relaxed'],
    'PMS': ['Calm', 'Relaxed', 'Euphoric'],
    'PTSD': ['Calm', 'Relaxed', 'Happy', 'Euphoric'],
    'Seizures': ['Calm', 'Relaxed'],
    'Spasticity': ['Relaxed'],
    'Spinal Cord Injury': ['Relaxed'],
    'Stress': ['Calm', 'Relaxed', 'Happy', 'Euphoric'],
    'Tinnitus': ['Calm', 'Relaxed', 'Uplifted'],
    'Tremors': ['Calm', 'Relaxed']
}

# map effects to symptoms
def map_effects_to_symptoms(effects, symptom_to_effects):
    """Map effects to corresponding symptoms."""
    effects_list = effects.split(',')
    symptoms = set()
    for symptom, effect_list in symptom_to_effects.items():
        if any(effect.strip() in effects_list for effect in effect_list):
            symptoms.add(symptom)
    return ','.join(symptoms)

# apply the mapping to the dataframe
weed['symptoms'] = weed['effects'].apply(lambda x: map_effects_to_symptoms(x, symptom_to_effects))

# combine symptoms and effects
weed['symptoms_and_effects'] = weed.apply(lambda row: ','.join(filter(None, [row['symptoms'], row['effects']])), axis=1)

# convert the symptoms_and_effects column to lowercase
weed['symptoms_and_effects'] = weed['symptoms_and_effects'].str.lower()

# vectorize the combined symptoms_and_effects column using TF-IDF
vectorizer = TfidfVectorizer(tokenizer=lambda x: x.split(','))
combined_matrix = vectorizer.fit_transform(weed['symptoms_and_effects'])

# cosine similarity matrix
similarity_matrix = cosine_similarity(combined_matrix)

# normalize values
def normalize(series):
    """Normalize a pandas Series to the range [0, 1]."""
    if series.max() == series.min():
        return np.zeros_like(series)
    return (series - series.min()) / (series.max() - series.min())

# recommendation system
def recommendations(*symptoms_or_effects, similarity_matrix, weed, top_n=5, weight_similarity=0.5, weight_rating=0.5):
    """
    Get content-based recommendations for strains based on given symptoms or effects.
    
    Parameters:
    - *symptoms_or_effects (str): The symptoms or effects to base recommendations on.
    - similarity_matrix (ndarray): Precomputed cosine similarity matrix.
    - weed (DataFrame): The DataFrame containing strain data.
    - top_n (int): The number of top recommendations to return.
    - weight_similarity (float): The weight for similarity in the hybrid score.
    - weight_rating (float): The weight for rating in the hybrid score.
    
    Returns:
    - DataFrame: The top recommended strains sorted by the hybrid score.
    """
    # convert the input to lowercase to ensure case insensitivity
    symptoms_or_effects = [symptom_or_effect.lower() for symptom_or_effect in symptoms_or_effects]

    # array to store average similarity scores
    average_similarity_scores = np.zeros(len(weed))

    # track whether any symptoms or effects matched
    any_match = False

    for symptom_or_effect in symptoms_or_effects:
        # strains with the given symptom or effect
        try:
            indices = weed[weed['symptoms_and_effects'].str.contains(symptom_or_effect)].index
            any_match = True
        except IndexError:
            continue
        
        # add the average similarity scores for all matching indices
        for idx in indices:
            similarity_scores = similarity_matrix[idx]
            average_similarity_scores[idx] += similarity_scores.mean()

    if not any_match:
        return "No strains found with the given symptom(s) or effect(s)."

    # normalize the average similarity scores
    normalized_similarity = normalize(average_similarity_scores)

    # top_n indices based on average similarity scores
    similar_strains_indices = normalized_similarity.argsort()[-top_n:][::-1]

    # top_n strains and their ratings
    similar_strains = weed.iloc[similar_strains_indices][['strain', 'type', 'rating']]

    # normalize ratings
    ratings = similar_strains['rating'].values
    normalized_ratings = normalize(ratings)

    # calculate the hybrid score by combining normalized similarity and normalized ratings
    hybrid_score = normalized_similarity[similar_strains_indices] * weight_similarity + normalized_ratings * weight_rating

    # add the hybrid score to the similar strains DataFrame
    similar_strains['hybrid_score'] = hybrid_score

    # sort the DataFrame by hybrid score in descending order
    similar_strains = similar_strains.sort_values(by='hybrid_score', ascending=False)

    return similar_strains

# Model Testing

In [5]:
# test test test
recommendations('anxiety', similarity_matrix=similarity_matrix, weed=weed)

Unnamed: 0,strain,type,rating,hybrid_score
221,Black-Cherry-Pie,hybrid,5.0,1.0
741,Easy-Bud,hybrid,4.7,0.7
381,Bruce-Banner,hybrid,4.6,0.6
77,Alien-Og,hybrid,4.5,0.5
112,Animal-Cookies,hybrid,4.5,0.5


In [6]:
recommendations('anxiety', 'insomnia', similarity_matrix=similarity_matrix, weed=weed)

Unnamed: 0,strain,type,rating,hybrid_score
221,Black-Cherry-Pie,hybrid,5.0,1.0
741,Easy-Bud,hybrid,4.7,0.7
381,Bruce-Banner,hybrid,4.6,0.6
77,Alien-Og,hybrid,4.5,0.5
112,Animal-Cookies,hybrid,4.5,0.5


In [7]:
recommendations('cancer', 'tired', similarity_matrix=similarity_matrix, weed=weed)

Unnamed: 0,strain,type,rating,hybrid_score
847,Georgia-Pine,hybrid,4.8,1.0
886,Golden-Goat,hybrid,4.5,0.7
24,Acapulco-Gold,sativa,4.5,0.7
290,Blue-Hawaiian,hybrid,4.3,0.5
708,Dream-Lotus,hybrid,4.3,0.5


In [8]:
recommendations('Narcolepsy', similarity_matrix=similarity_matrix, weed=weed)

Unnamed: 0,strain,type,rating,hybrid_score
213,Birds-Eye,sativa,5.0,1.0
847,Georgia-Pine,hybrid,4.8,0.833333
24,Acapulco-Gold,sativa,4.5,0.583333
792,Fire-Og,hybrid,4.4,0.5
694,Double-Tangie-Banana,hybrid,4.4,0.5


# Logic & Reasoning

Symptom to Effects Mapping
- Purpose: This dictionary maps each symptom to a list of effects that are beneficial for that symptom- Usage: Helps in matching user-selected symptoms to relevant effects, facilitating strain recommendation

Mapping Effects to Symptoms
- Purpose: This function takes a list of effects and maps them back to potential symptoms they can alleviate- Usage: Generates a list of symptoms based on the strain's effect

Applying the Mapping to the DataFrame
- Purpose: Applies the map_effects_to_symptoms function to each row in the DataFrame, creating a new symptoms column- Usage: Adds a list of potential symptoms to each strain based on its effect

Combining Symptoms and Effects
- Purpose: Combines the symptoms and effects columns into a single column and converts it to lowercase- Usage: Facilitates text processing and ensures case insensitivity during similarity calculation

Vectorizing the Combined Column
- Purpose: Converts the combined symptoms_and_effects text into a matrix of TF-IDF features- Usage: Prepares the data for computing cosine similarity, which measures the similarity between strain

Cosine Similarity Matrix
- Purpose: Computes the cosine similarity between all pairs of strains based on their TF-IDF features- Usage: Provides a measure of similarity that is used for making recommendation to provide more than one strain for the same issue

Normalizing Values
- Purpose: Normalizes a pandas Series to a range between 0 and 1- Usage: Ensures that different metrics (like ratings and similarity scores) are on a comparable scal

Hybrid Score
- Purpose: Combines multiple factors to improve the accuracy and relevance of the recommendations
- Usage: Ensures that a variety of strains are being recommended and not just 5 star rated starins.....s.

# Results & Conclusions 

## Results
The system provides a list of top recommended strains based on user inputs. It also visualizes the distribution of effects and flavors using word clouds and various graphs giving users an intuitive understanding of the available options. The Streamlit app gives users an interactive experience with the model to see how it would work in a real world application. 


## Future Work
- Better Dataset: The DoltHub dataset hasn’t been updated in four years, so the model is not up to date with current strains 

- Improve UI/UX: Enhance the user interface for better interaction

- More Diverse Effects and Symptoms: A more comprehensive list of symptoms and their effects would improve the model’s recommendations. The model tends to recommend the same strains for different symptoms if the effects for the symptoms are the same or too similar

- Advanced Filtering: Allow users to filter recommendations based on additional criteria like potency, price or terpene profile

- Enhanced Model: Experiment with more sophisticated machine learning models to improve recommendation accuracy
  
- Collaboration with Experts: Collaborate with cannabis industry experts, healthcare professionals, researchers, and regulatory entities to validate recommendations
  
- User Response Data: In order to see if the model is properly working and fine tune it, user data should be collected and added to see if the recommended strain is accurately treating their symptoms
  
- The Future: If used and implemented properly, this could cut down on time spent in dispensaries, less time and money spent on cannabis that doesn’t work for you. All in all, our model promotes responsible use and informed decision-making
