<a href="https://colab.research.google.com/github/Aidin12/gpt4-smart-chatbot/blob/main/Scraper_GPT_Small_Business_Chatbot_Project_3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##<font color='DeepPink'> Building Your Own Smart Chatbot: Engaging Customer Service with GPT-4 and OpenAI API </font>
#Learning Objectives:  
<font color='lime'>

1.  Learn how to use large language models (LLMs) to create a custom chatbot for a small business.  
2. Understand how to extract and fine-tune a GPT-based model using business-specific data.  
3. Learn about the benefits of LLM-based chatbots compared to rule-based approaches.   </font> \\

# Technologies Used:
<font color='DeepSkyBlue'>  
 - Python for general programming.
 - OpenAI API for accessing GPT-4 for language generation.
 - Pandas for data handling.
 - Flask for creating a lightweight web API.
 - Google Colab as the working environment.
 - Torch to check for GPU availability and optimize performance.
 </font>

In [None]:
# Colab requirements: Install necessary libraries
!pip install flask
!pip install beautifulsoup4



In [None]:
# Check if a GPU (CUDA) is available for faster computations; otherwise, use the CPU
# Using GPU is recommended for faster generation as Stable Diffusion is computationally intensive.
import torch
if torch.cuda.is_available():
    gpu_name = torch.cuda.get_device_name(0)
    print(f"You're using: {gpu_name}")
else:
    print("No GPU detected.")
device = "cuda" if torch.cuda.is_available() else "cpu"

You're using: Tesla T4


In [None]:
# Import libraries
import pandas as pd
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from flask import Flask, request, jsonify
import re

# Step 1: Data Collection
Using a web scraper to gather product data, delivery info, and other information from the CBD-UK website.

In [None]:
# Mount Google Drive to access storage
from google.colab import drive
# Mount the drive to access Google Drive files
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
# Create a folder in Google Drive to save the CSV file
import os
# Create the folder named 'CBD' in Google Drive if it doesn't already exist
os.makedirs('/content/drive/My Drive/CBD', exist_ok=True)

In [None]:
# Import necessary libraries for web scraping
import requests
from bs4 import BeautifulSoup

In [None]:

# Step 1: Scrape Book Information from Books to Scrape Website
def scrape_books_to_scrape(url):
    """
    Scrapes book title, stock status, and price from the Books to Scrape website.
    Args:
    - url (str): The URL of the website to scrape.

    Returns:
    - List of dictionaries containing book data.
    """
    # Send a request to the website to get the content
    response = requests.get(url)

    # Check if the request was successful
    if response.status_code != 200:
        print("Failed to retrieve the webpage.")
        return []

    # Parse the website's content using BeautifulSoup
    soup = BeautifulSoup(response.content, 'html.parser')

    # Extract book data
    books = []

    # Locate all book containers within the page using appropriate classes
    book_containers = soup.find_all('article', class_='product_pod')

    if not book_containers:
        print("No book containers found. Please check the HTML structure or the URL.")
    else:
        for book_container in book_containers:
            # Extract book title
            book_title = book_container.h3.a['title']

            # Extract stock information
            stock_tag = book_container.find('p', class_='instock availability')
            in_stock = 'No'
            if stock_tag:
                if 'in stock' in stock_tag.text.lower():
                    in_stock = 'Yes'

            # Extract price information
            price_tag = book_container.find('p', class_='price_color')
            price = price_tag.text.strip() if price_tag else 'No price available'

            # Append book details to the list
            books.append({
                'Title': book_title,
                'InStock': in_stock,
                'Price': price
            })

    return books

# Define the URL of the Books to Scrape website
url = "http://books.toscrape.com/"

# Scrape the book data
book_data = scrape_books_to_scrape(url)

# Step 2: Save the data to a CSV file on Google Drive
# Mount Google Drive to access storage
drive.mount('/content/drive')

# Create a folder in Google Drive to save the CSV file
os.makedirs('/content/drive/My Drive/BooksToScrape', exist_ok=True)

# Convert the scraped book data to a DataFrame
books_df = pd.DataFrame(book_data)

# Print a sample of the scraped book data
print("Sample of scraped book data:")
print(books_df.head())

# Save the DataFrame to a CSV file in the created folder in Google Drive
csv_file_path = '/content/drive/My Drive/BooksToScrape/books_to_scrape_data.csv'
if not books_df.empty:
    books_df.to_csv(csv_file_path, index=False)
    print(f"Data saved successfully to {csv_file_path}")
else:
    print("No data scraped. CSV file not saved.")

# Step 3: Read the data from CSV and print
try:
    loaded_books_df = pd.read_csv(csv_file_path)
    print("Loaded book data from CSV:")
    print(loaded_books_df.head())
except FileNotFoundError:
    print(f"File not found: {csv_file_path}. Please check the file path and try again.")

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
Sample of scraped book data:
                                   Title InStock   Price
0                   A Light in the Attic     Yes  £51.77
1                     Tipping the Velvet     Yes  £53.74
2                             Soumission     Yes  £50.10
3                          Sharp Objects     Yes  £47.82
4  Sapiens: A Brief History of Humankind     Yes  £54.23
Data saved successfully to /content/drive/My Drive/BooksToScrape/books_to_scrape_data.csv
Loaded book data from CSV:
                                   Title InStock   Price
0                   A Light in the Attic     Yes  £51.77
1                     Tipping the Velvet     Yes  £53.74
2                             Soumission     Yes  £50.10
3                          Sharp Objects     Yes  £47.82
4  Sapiens: A Brief History of Humankind     Yes  £54.23
