# Financial Literacy Chatbot: Budgeting and Saving Tips.

This Jupyter Notebook contains the code and explanation for creating a simple chatbot using Streamlit. The chatbot is designed to answer questions and provide guidance on financial literacy, focusing on budgeting, saving tips, and basic financial planning. By interacting with the chatbot, users can explore practical advice for managing finances effectively.

### Project Objectives:
1. **Data Preparation**: Preprocess a text file on financial literacy to extract relevant responses for common questions on budgeting and saving.
2. **Similarity Matching**: Implement a similarity function that identifies the most relevant advice based on user questions.
3. **User Interface**: Build a Streamlit interface for interactive Q&A with the chatbot.

### Topic and Source:
The chosen topic, **Financial Literacy and Budgeting Tips**, is inspired by timeless financial principles that are widely applicable to individuals seeking to improve their financial health. The primary text source for this project is *"The Richest Man in Babylon"* by George S. Clason, available on [Project Gutenberg](https://www.gutenberg.org/). This classic provides foundational insights into personal finance, saving habits, and financial discipline.

### Instructions for Use
1. **Run the Notebook Cells**: Follow the code cells to load, preprocess, and set up the chatbot.
2. **Chatbot Interaction**: Once deployed, the Streamlit interface allows users to input financial questions, receiving practical responses based on principles from *"The Richest Man in Babylon"*.
3. **Limitations**: This chatbot is a basic informational tool designed to share general budgeting and saving tips and does not replace professional financial advice.

*Note: The chatbot operates based on public domain text, providing general guidance on financial wellness and does not constitute personalized financial advice.*

In [1]:
# Loading the text file.

def load_text_file(filename):
    with open(filename, 'r', encoding='utf-8') as file:
        text = file.read()
    return text

# Loading and displaying a preview of the text.
text_data = load_text_file("Clason-RichestManInBabylon.txt")
print(text_data[:1000])  # first 1000 characters for reference

﻿“A Classic From  
The Diamond’s Mine Library”  
The Richest  Man In Babylon 
1926  
George S. Clason
1  
Brought To You By  
http://TheDiamondsMine.com 
The Richest Man In Babylon  
1926  
Public Domain Notice  
This classic writing compliments of The Diamond’s Mine Online Library. It is  public domain and may be distributed freely. 
  

2  
Brought To You By  
http://TheDiamondsMine.com 
Index  
About The Author ...................................................... 6 Foreword ................................................................ 7 An Historical Sketch of Babylon ..................................... 8 The Man Who Desired Gold..........................................14 The Richest Man In Babylon.........................................20 Seven Cures For a Lean Purse ......................................30 
The First Cure .........................................................34 Start thy purse to fattening.................................................... 34 
The Sec

# text processing
to remove punctuation, lowercase it, and split it into sentences. 

In [3]:
import re
from nltk.corpus import stopwords
from nltk.tokenize import sent_tokenize, word_tokenize
from nltk.stem import WordNetLemmatizer
import nltk

nltk.download("punkt_tab")
nltk.download("stopwords")
nltk.download("wordnet")

def preprocess_text(text):
    # Split text into sentences
    sentences = sent_tokenize(text)
    
    # Lowercase, remove punctuation and stopwords
    lemmatizer = WordNetLemmatizer()
    stop_words = set(stopwords.words("english"))
    cleaned_sentences = []
    
    for sentence in sentences:
        words = word_tokenize(sentence.lower())
        words = [lemmatizer.lemmatize(word) for word in words if word.isalpha() and word not in stop_words]
        cleaned_sentences.append(" ".join(words))
    
    return sentences, cleaned_sentences  # Return both original and cleaned versions

# Preprocess the loaded text data
original_sentences, processed_sentences = preprocess_text(text_data)


[nltk_data] Downloading package punkt_tab to
[nltk_data]     /Users/kingoriwangui/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.
[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/kingoriwangui/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to
[nltk_data]     /Users/kingoriwangui/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


# Defining the Similarity Function
to help find the most relevant response,using cosine similarity function

In [4]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

def get_most_relevant_sentence(user_query, sentences, processed_sentences):
    # Vectorize both user query and text sentences
    vectorizer = TfidfVectorizer()
    vectors = vectorizer.fit_transform([user_query] + processed_sentences)
    
    # Calculate cosine similarity between the user query and all sentences
    similarities = cosine_similarity(vectors[0:1], vectors[1:]).flatten()
    
    # Find the most similar sentence index
    most_similar_index = similarities.argmax()
    return sentences[most_similar_index]  # Return original sentence for better readability

# Defining the Chatbot Function
to combine all the components and return a response based on the user’s input.

In [5]:
def chatbot_response(user_input):
    response = get_most_relevant_sentence(user_input, original_sentences, processed_sentences)
    return response