In [8]:
import json
import nltk 
import numpy as np
import random
import string

In [24]:
path_feature_database = 'data/all_extracted_features.json'
with open(path_feature_database, "r", encoding="utf-8") as file:
    corpus_data = json.load(file)

In [9]:
# f = open(r'C:\Users\34640\Desktop\VSCode\z_Cursos_Python\chatbots\data\all_extracted_features.txt', 'r', errors= 'ignore')
# raw = f.read()

In [21]:
# print(raw[:750])

[
    {
        "mood_1": "Weightlessness",
        "text_1": "Input (Mood: Anxious, restless):\n\nI can't seem to sit still, my mind racing with \"what ifs\" and worst-case scenarios.  A low, throbbing bassline would capture that feeling of unease perfectly, maybe with high-pitched strings weaving in and out, mirroring the chaotic thoughts.  It needs to be fast, frantic even, but with a driving rhythm to keep the anxiety from overwhelming everything.  I need something to ground me, but something that acknowledges this feeling.\n",
        "features_1": "```json\n{\n  \"Tempo\": \"140 bpm\",\n  \"Intensity/Dynamics\": \"mf - crescendo to ff during the \"what ifs\" section, then diminuendo to mp\",\n  \"Timbre\": \"Dark, with a focus on low 


Ensuring that our raw corpus retains '\n' and similar markers is crucial, allowing our future chatbot to correctly segment sentences and paragraphs.

## prepropecing the corpus

This stage will involve several NLP techniques, including tokenization, stop word removal, lemmatization, and stemming

In [14]:
# nltk.download('punkt')
# nltk.download('wordnet') 
# make sur to have this libraries 

In [25]:
#extracts the preprocessed texts from the corpus to maintain the reference to their features
corpus_texts = []
corpus_features = []  

for entry in corpus_data:
    for key in entry:
        if key.startswith("text_"):  
            corpus_texts.append(entry[key]) 
            corpus_features.append(entry)  #keeps the whole entry (mood featurues and text)

We subdivide the raw corpus to facilitate later vectorization of both the users' texts in the corpus and the new input from the person interacting with the bot. This allows us to identify the most similar text using cosine distance and retrieve its corresponding features

In [33]:
corpus_texts[2]

"Overwhelmed.  A cacophony of noise in my head, a frantic rhythm that won't slow down.  I need music to match – something chaotic but ultimately resolving, maybe a crescendo into a quiet, peaceful ending.  Please, something to calm this storm.\n"

In [34]:
lemmatizer = nltk.stem.WordNetLemmatizer()
remove_punct_dict = dict((ord(punct), None) for punct in string.punctuation)

In [35]:
def preprocess_text(text):
    ''' 
    tokenizes, removes punctuation and lemmatizes
    '''
    tokens = nltk.word_tokenize(text.lower().translate(remove_punct_dict))
    return " ".join(lemmatizer.lemmatize(token) for token in tokens)

In [36]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

vectorizer = TfidfVectorizer()
tfidf_matrix = vectorizer.fit_transform(corpus_texts)

In [37]:
def chatbot_response(user_input):
    """ Encuentra la respuesta más relevante en el corpus usando TF-IDF y similitud coseno """
    
    processed_input = preprocess_text(user_input)

    user_vector = vectorizer.transform([processed_input])

    similarities = cosine_similarity(user_vector, tfidf_matrix)

    # find best index to return the features
    best_match_idx = np.argmax(similarities)
    best_match_features = corpus_features[best_match_idx]

    return best_match_features

---

## Usage example

In [None]:
while True:
    user_input = input("\n👤 Share how you're feeling or what's on your mind (or type 'exit' to close this session): ")
    
    if user_input.lower() == "exit":
        print("\n🤖 good bye!")
        break
    
    print('🤖 We are processing your feelings to return the best musical match for you... ⏳')
    response = chatbot_response(user_input)
    print("\n🤖 here it is!:")
    print(json.dumps(response, indent=4, ensure_ascii=False))
    print('hope it helped :)')

🤖 We are processing your feelings to return the best musical match for you... ⏳

🤖 here it is!:
{
    "mood_233": "Pride",
    "text_233": "Input (Mood: Anxious, restless):\n\nI can't seem to sit still, my mind racing with a million things.  A low hum of unease is constantly there, a feeling of impending doom I can't quite shake.  I need something fast-paced, maybe even dissonant, to match this internal chaos – something that will help me process all this energy.  Something abrasive and urgent,  to mirror the feeling of being trapped in my own head.\n",
    "features_233": "```json\n{\n  \"Tempo\": \"Allegro molto (160 bpm)\",\n  \"Intensity/Dynamics\": \"mf-ff, frequent crescendos, sudden sforzandi\",\n  \"Timbre\": \"Dark, abrasive; use of brass and distorted electric guitar\",\n  \"Rhythm\": \"Irregular, syncopated, polyrhythmic sections\",\n  \"Harmonic progression\": \"e minor - B minor - G major - D major - C major\",\n  \"Melody\": \"Fragmentary, jagged, rapidly ascending and de