# Fashbot

## Abstract
Fashbot is an AI-driven fashion recommendation system that analyzes user preferences and trends to generate personalized outfit suggestions. By leveraging machine learning and natural language processing (NLP), the model curates style recommendations based on factors such as season, occasion, and current fashion trends.

### **Objective**
To develop an intelligent fashion assistant that provides tailored outfit recommendations, helping users navigate fashion choices effortlessly.

### **Problem Statement**
Choosing the right outfit can be time-consuming and overwhelming due to the vast array of fashion choices available. Traditional recommendation systems often lack personalization and fail to adapt to individual style preferences. Fashbot addresses this gap by using machine learning to deliver dynamic, context-aware fashion recommendations.

### **Key Features**
- AI-powered fashion recommendation system  
- Personalized outfit suggestions based on user preferences  
- Trend-aware recommendations using NLP and data retrieval  
- Scalable and adaptable to different fashion categories  

This implementation can be extended by integrating e-commerce platforms, user feedback loops, and real-time fashion trend analysis to enhance recommendation accuracy.


In [1]:
# Installing required libraries:
# - 'streamlit' for building the web interface of the chatbot
# - 'praw' for accessing and interacting with Reddit's API to gather fashion-related data
# - 'nltk' for natural language processing tasks (e.g., text preprocessing and tokenization)
# - 'pyngrok' for creating a secure tunnel to expose the app on the web (useful for deployment)

!pip install streamlit praw nltk
!pip install pyngrok

Collecting streamlit
  Downloading streamlit-1.43.2-py2.py3-none-any.whl.metadata (8.9 kB)
Collecting praw
  Downloading praw-7.8.1-py3-none-any.whl.metadata (9.4 kB)
Collecting watchdog<7,>=2.1.5 (from streamlit)
  Downloading watchdog-6.0.0-py3-none-manylinux2014_x86_64.whl.metadata (44 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.3/44.3 kB[0m [31m1.1 MB/s[0m eta [36m0:00:00[0m
Collecting pydeck<1,>=0.8.0b4 (from streamlit)
  Downloading pydeck-0.9.1-py2.py3-none-any.whl.metadata (4.1 kB)
Collecting prawcore<3,>=2.4 (from praw)
  Downloading prawcore-2.4.0-py3-none-any.whl.metadata (5.0 kB)
Collecting update_checker>=0.18 (from praw)
  Downloading update_checker-0.18.0-py3-none-any.whl.metadata (2.3 kB)
Downloading streamlit-1.43.2-py2.py3-none-any.whl (9.7 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.7/9.7 MB[0m [31m33.7 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading praw-7.8.1-py3-none-any.whl (189 kB)
[2K   [90m━━━━━━━━━━━

In [2]:
%%writefile app.py
"""
Fashion Trend Chatbot
---------------------
This app does the following:
1. Connects to Reddit using PRAW.
2. Fetches posts from one or more fashion-related subreddits that mention a given keyword.
3. Analyzes how many posts mention the keyword in the last day, week, and month.
4. Extracts context snippets (5 words before and after) for each occurrence.
5. Reports a trend status message based on thresholds.
6. Provides an additional function to list overall trending keywords (that appear at least 15 times) in the past 2 days.
7. Has a chat section that uses a GPT-2 LLM for general conversation.
"""

import streamlit as st
import praw
import nltk
import re
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from collections import Counter
from datetime import datetime, timedelta
from transformers import pipeline

# --------------------------
# 1. Download NLTK Resources
# --------------------------
# Downloading necessary NLTK datasets for tokenization and stopword removal.
nltk.download('punkt')  # For word tokenization
nltk.download('stopwords')  # For filtering common stopwords from the text

# --------------------------
# 2. Reddit API Setup
# --------------------------
# Reddit API credentials setup
REDDIT_CLIENT_ID = "h**"
REDDIT_CLIENT_SECRET = "x**"
REDDIT_USER_AGENT = "f**"

# Create a Reddit instance using PRAW (Python Reddit API Wrapper)
reddit = praw.Reddit(
    client_id=REDDIT_CLIENT_ID,
    client_secret=REDDIT_CLIENT_SECRET,
    user_agent=REDDIT_USER_AGENT
)

# --------------------------
# 3. Function Definitions
# --------------------------

# 3a. Function to fetch posts using Reddit search.
@st.cache_data(show_spinner=False)
def fetch_keyword_posts(keyword, subreddits=["fashion"], time_filter="month", limit=100):
    """
    Searches for posts in each subreddit that mention 'keyword' using Reddit's search API.
    'time_filter' allows you to search by time ranges such as "day", "week", "month", "year", or "all".
    Returns a list of tuples: (created_utc, title, selftext).
    """
    posts = []
    for subreddit in subreddits:
        # Searching in the specified subreddit for posts mentioning the keyword
        for submission in reddit.subreddit(subreddit).search(keyword, sort="new", time_filter=time_filter, limit=limit):
            posts.append((submission.created_utc, submission.title, submission.selftext))
    return posts

# 3b. Function to fetch recent posts for overall trending keywords.
@st.cache_data(show_spinner=False)
def fetch_recent_posts(subreddits=["fashion"], days=2, limit=200):
    """
    Fetches posts from each subreddit using .new() and filters those created within the last 'days' days.
    This helps identify trending topics by looking at the most recent posts.
    Returns a list of tuples: (created_utc, title, selftext).
    """
    posts = []
    now = datetime.utcnow()
    time_threshold = now - timedelta(days=days)  # Filters posts within the last 'days' days
    for subreddit in subreddits:
        for submission in reddit.subreddit(subreddit).new(limit=limit):
            post_time = datetime.utcfromtimestamp(submission.created_utc)
            if post_time > time_threshold:
                posts.append((submission.created_utc, submission.title, submission.selftext))
    return posts

# 3c. Function to analyze timestamps of posts.
@st.cache_data(show_spinner=False)
def analyze_post_times(posts):
    """
    Given a list of posts, counts how many were made in the last day, week, and month.
    This helps in determining how recent or old a particular trend is.
    Returns a dictionary with counts for "day", "week", and "month".
    """
    now = datetime.utcnow()
    counts = {"day": 0, "week": 0, "month": 0}
    for created_utc, title, selftext in posts:
        post_time = datetime.utcfromtimestamp(created_utc)
        delta = now - post_time
        if delta < timedelta(days=1):
            counts["day"] += 1
        if delta < timedelta(weeks=1):
            counts["week"] += 1
        if delta < timedelta(days=30):
            counts["month"] += 1
    return counts

# 3d. Function to extract keyword context snippets from text.
def extract_keyword_snippets(text, keyword, window=5):
    """
    Extracts snippets of context around the keyword occurrence in the text.
    Returns snippets that include 'window' number of words before and after the keyword.
    """
    pattern = r'(?:\S+\s+){0,' + str(window) + r'}\b' + re.escape(keyword) + r'\b(?:\s+\S+){0,' + str(window) + r'}'
    matches = re.findall(pattern, text, flags=re.IGNORECASE)
    return matches

# 3e. Function to get trend status for a given keyword.
def get_trend_status(keyword, subreddits, time_filter="month"):
    """
    Analyzes how many posts mention the keyword over the last 'time_filter' period (day, week, month).
    Uses thresholds to determine if a term is trending and extracts context snippets.
    Returns a status message with trend details.
    """
    posts = fetch_keyword_posts(keyword, subreddits=subreddits, time_filter=time_filter, limit=100)
    if not posts:
        return f"The keyword '{keyword}' was not mentioned on Reddit in the selected time period."

    counts = analyze_post_times(posts)
    message = f"For '{keyword}' over the last {time_filter}:\n"
    message += f"- Last day: {counts['day']} mentions\n"
    message += f"- Last week: {counts['week']} mentions\n"
    message += f"- Last month: {counts['month']} mentions\n\n"

    # Determine trend status based on mention counts
    threshold_day = 5
    threshold_week = 10
    threshold_month = 15
    if counts["day"] > threshold_day or counts["week"] > threshold_week:
        message += "This term appears to be trending!\n\n"
    else:
        message += "This term is not currently trending. Typically, a trending term gets at least "
        message += f"{threshold_month} mentions in a month.\n\n"

    # Extract context snippets from posts.
    snippet_list = []
    total_occurrences = 0
    for created_utc, title, selftext in posts:
        title_snippets = extract_keyword_snippets(title, keyword, window=5)
        body_snippets = extract_keyword_snippets(selftext, keyword, window=5)
        occurrences = len(title_snippets) + len(body_snippets)
        total_occurrences += occurrences
        snippet_list.extend(title_snippets)
        snippet_list.extend(body_snippets)

    unique_snippets = list(dict.fromkeys(snippet_list))  # Remove duplicates
    message += f"Total keyword occurrences found: {total_occurrences}\n\n"
    if unique_snippets:
        message += "What are customers saying:\n"
        for snippet in unique_snippets[:5]:  # Show up to 5 unique snippets
            message += f"\"{snippet}\"\n"
    else:
        message += "No context excerpts could be extracted."
    return message

# 3f. Function to list overall trending keywords over the past 2 days.
def find_trending_keywords(subreddits, days=2, min_occurrences=15):
    """
    Fetches recent posts and extracts keywords mentioned more than 'min_occurrences' times.
    Helps identify the most mentioned keywords across subreddits in the past 'days' days.
    """
    posts = fetch_recent_posts(subreddits=subreddits, days=days, limit=200)
    if not posts:
        return "No posts found in the selected time period."

    combined_text = " ".join(title + " " + selftext for (_, title, selftext) in posts)
    tokens = word_tokenize(combined_text.lower())
    filtered_tokens = [word for word in tokens if word.isalpha() and word not in set(stopwords.words('english'))]
    freq = Counter(filtered_tokens)
    trending_words = [(word, count) for word, count in freq.items() if count >= min_occurrences]
    trending_words.sort(key=lambda x: x[1], reverse=True)
    if trending_words:
        message = "Trending keywords (mentioned at least {} times in the past {} days):\n".format(min_occurrences, days)
        for word, count in trending_words:
            message += f"- {word}: {count} times\n"
        return message
    else:
        return f"No keywords were mentioned at least {min_occurrences} times in the past {days} days."

# 3g. Set up an LLM for chat responses using GPT-2.
llm = pipeline("text-generation", model="gpt2")

def generate_llm_response(prompt):
    """
    Uses GPT-2 to generate responses for user inputs in the chatbot.
    This allows for conversational interactions with the bot.
    """
    response = llm(prompt, max_length=100, num_return_sequences=1)
    return response[0]['generated_text']

# --------------------------
# 4. Streamlit App Interface
# --------------------------
st.title("What's Tredning on Fashion Reddit?")
st.write("""
This chatbot checks if a fashion-related keyword is trending on Reddit by analyzing recent mentions and provides context with excerpts.
You can also ask: "tell me what keyword is mentioned more than 15 times in fashion reddit for the past 2 days"
""")

# Option to choose time filter for trend analysis (for keyword search)
time_filter = st.selectbox("Select time period for trend analysis (search):", ["day", "week", "month", "year", "all"])

# Input for subreddits (comma-separated; default includes several fashion-related communities)
subreddit_input = st.text_input("Enter subreddits to search (comma-separated):", "fashion, mensfashion, womensfashion, streetwear")
subreddits = [s.strip() for s in subreddit_input.split(",") if s.strip()]

# User input for keyword (for direct trend analysis)
keyword = st.text_input("Enter a fashion keyword for trend analysis:")

# Button to check trend for a given keyword (using search)
if st.button("Check Trend for Keyword"):
    if not keyword.strip():
        st.write("Please enter a keyword.")
    else:
        with st.spinner("Fetching data..."):
            result = get_trend_status(keyword, subreddits, time_filter=time_filter)
        st.write(result)

# Button to list trending keywords (based on overall frequency) for the past 2 days.
if st.button("List Trending Keywords (Past 2 Days)"):
    with st.spinner("Fetching recent posts and analyzing..."):
        trending_keywords_message = find_trending_keywords(subreddits, days=2, min_occurrences=15)
    st.write(trending_keywords_message)

# Chat Section for Additional Conversation
st.write("### Chat with the Trend Bot")
user_input = st.text_input("Enter your message:")
if user_input:
    st.write("**You:**", user_input)
    prompt = f"User: {user_input}\nBot:"
    llm_reply = generate_llm_response(prompt)
    st.write("**Trend Bot:**", llm_reply)

Writing app.py


In [3]:
# Installing pyngrok to create a secure tunnel to the local server for external access
!pip install pyngrok

# Importing the ngrok module from pyngrok to set up the tunnel
from pyngrok import ngrok

# Set your ngrok auth token (replace with your own token from ngrok dashboard)
# This is required to authenticate and access ngrok's features.
ngrok.set_auth_token("2************************")

# Create a public URL that routes to the local streamlit app running on port 8501
# This allows you to access the app from anywhere via the ngrok tunnel.
public_url = ngrok.connect(addr=8501, proto="http")

# Print the public URL to the console so it can be accessed externally
print("Streamlit URL:", public_url)

# Launch the Streamlit app in the background using the system's shell
# This command runs the app without blocking the execution of the script.
get_ipython().system_raw('streamlit run app.py &')


Streamlit URL: NgrokTunnel: "https://fcca-34-56-90-159.ngrok-free.app" -> "http://localhost:8501"
