# Python Web Scraping and GUI Programming
This notebook covers web scraping and GUI programming with real-life use cases, best practices, and code examples.

## 1. Web Scraping
**Definition:** Web scraping is the process of extracting data from websites using code.

### Importing Required Libraries for Web Scraping

**Introduction:**
To perform web scraping, you need libraries for HTTP requests and HTML parsing.

**Real-life use case:**
Automating data collection from websites for analysis or reporting.

**What the code does:**
The next code cell imports the necessary libraries for web scraping.

In [None]:
import requests
from bs4 import BeautifulSoup
import time
import re
from pprint import pprint

### Basic Web Scraping Example: Extracting Title and Headings

**Introduction:**
A simple web scraping task involves fetching a web page and extracting its title and main heading.

**Real-life use case:**
Collecting article titles and headings from news websites.

**What the code does:**
The next code cell fetches a web page and extracts the title and main heading.

In [None]:
url = 'https://www.example.com'
try:
    response = requests.get(url)
    if response.status_code == 200:
        soup = BeautifulSoup(response.text, 'html.parser')
        print(f"Page title: {soup.title.string}")
        main_heading = soup.find('h1')
        print(f"Main heading: {main_heading.text if main_heading else 'Not found'}")
    else:
        print(f"Failed to retrieve page: Status code {response.status_code}")
except Exception as e:
    print(f"Error: {e}")

### Extracting Paragraphs from a Web Page

**Introduction:**
You can extract all paragraph text from a web page for further analysis.

**Real-life use case:**
Gathering article content for text analysis or summarization.

**What the code does:**
The next code cell extracts and prints all paragraph text from the web page.

In [None]:
try:
    response = requests.get(url)
    if response.status_code == 200:
        soup = BeautifulSoup(response.text, 'html.parser')
        paragraphs = soup.find_all('p')
        print("\nParagraph content:")
        for p in paragraphs:
            print(f"- {p.text.strip()}")
    else:
        print(f"Failed to retrieve page: Status code {response.status_code}")
except Exception as e:
    print(f"Error: {e}")

### Using CSS Selectors in BeautifulSoup

**Introduction:**
CSS selectors allow you to target specific elements in the HTML for extraction.

**Real-life use case:**
Extracting headlines, links, or table data from structured web pages.

**What the code does:**
The next code cell demonstrates how to use CSS selectors with BeautifulSoup.

In [None]:
# Example usage of CSS selectors (demonstrative, not executed)
# headlines = soup.select('h2.headline')
# user_links = soup.select('a.user-link')
# table_rows = soup.select('table.data-table tr')
# first_column = soup.select('table.data-table tr > td:first-child')

### Finding Elements by Attributes

**Introduction:**
You can find elements by their attributes, such as class, id, or custom data attributes.

**Real-life use case:**
Extracting links, user data, or posts from a web page with specific attributes.

**What the code does:**
The next code cell shows how to find elements by attributes using BeautifulSoup.

In [None]:
# Example usage of finding by attributes (demonstrative, not executed)
# external_links = soup.find_all('a', attrs={'rel': 'external'})
# data_elements = soup.find_all(attrs={'data-type': 'user-content'})
# recent_posts = soup.find_all('div', attrs={'class': 'post', 'data-date-created': lambda x: x and '2023' in x})

### Navigating the DOM with BeautifulSoup

**Introduction:**
BeautifulSoup allows you to navigate the HTML tree to find parents, siblings, and children of elements.

**Real-life use case:**
Extracting related data from complex HTML structures.

**What the code does:**
The next code cell demonstrates DOM navigation methods.

In [None]:
# Example DOM navigation (demonstrative, not executed)
# element = soup.find('span', class_='highlight')
# parent_div = element.parent
# ancestor = element.find_parent('section')
# next_element = element.next_sibling
# prev_element = element.previous_sibling
# children = list(element.children)
# descendants = list(element.descendants)

### Using Regular Expressions in Web Scraping

**Introduction:**
Regular expressions help you find elements or text matching specific patterns.

**Real-life use case:**
Extracting emails, image URLs, or headers from web pages.

**What the code does:**
The next code cell shows how to use regular expressions with BeautifulSoup.

In [None]:
# Example regex usage (demonstrative, not executed)
# emails = soup.find_all(text=re.compile(r'[\w\.-]+@[\w\.-]+'))
# images = soup.find_all('img', attrs={'src': re.compile(r'\.jpg$|\.jpeg$')})
# all_headers = soup.find_all(re.compile(r'^h[1-6]$'))

### Simulated E-commerce Product Scraping Example

**Introduction:**
You can scrape product data from e-commerce sites for price comparison or market analysis.

**Real-life use case:**
Building a tool to compare prices and ratings of products across different websites.

**What the code does:**
The next code cell parses simulated HTML and extracts product information.

In [None]:
simulated_html = """
<div class="product-listing">
  <div class="product" id="prod-1234">
    <h2 class="product-title">Wireless Bluetooth Headphones</h2>
    <div class="product-price">$79.99</div>
    <div class="product-rating">4.5/5 (243 reviews)</div>
    <div class="product-stock">In Stock</div>
  </div>
  <div class="product" id="prod-5678">
    <h2 class="product-title">Smart Fitness Watch</h2>
    <div class="product-price">$149.99</div>
    <div class="product-rating">4.2/5 (187 reviews)</div>
    <div class="product-stock">Low Stock</div>
  </div>
  <div class="product" id="prod-9012">
    <h2 class="product-title">Bluetooth Portable Speaker</h2>
    <div class="product-price">$39.99</div>
    <div class="product-rating">4.7/5 (312 reviews)</div>
    <div class="product-stock">Out of Stock</div>
  </div>
</div>
"""
soup = BeautifulSoup(simulated_html, 'html.parser')
products = []
for product_elem in soup.select('.product'):
    product_id = product_elem.get('id')
    title = product_elem.select_one('.product-title').text
    price_text = product_elem.select_one('.product-price').text
    price = float(price_text.replace('$', ''))
    rating_text = product_elem.select_one('.product-rating').text
    rating_match = re.search(r'([\d\.]+)/5', rating_text)
    rating = float(rating_match.group(1)) if rating_match else None
    reviews_match = re.search(r'\((\d+) reviews\)', rating_text)
    reviews = int(reviews_match.group(1)) if reviews_match else 0
    stock_status = product_elem.select_one('.product-stock').text
    products.append({
        'id': product_id,
        'title': title,
        'price': price,
        'rating': rating,
        'reviews': reviews,
        'stock_status': stock_status
    })
pprint(products)

### Calculating Average Price and Rating

**Introduction:**
After extracting product data, you can analyze it, such as calculating averages.

**Real-life use case:**
Summarizing product data for business intelligence or reporting.

**What the code does:**
The next code cell calculates and prints the average price and rating of the products.

In [None]:
avg_price = sum(p['price'] for p in products) / len(products) if products else 0
avg_rating = sum(p['rating'] for p in products if p['rating']) / len(products) if products else 0
print(f"Average price: ${avg_price:.2f}")
print(f"Average rating: {avg_rating:.1f}/5")

### Web Scraping Ethics and Best Practices

**Introduction:**
Ethical web scraping respects website rules and avoids overloading servers.

**Real-life use case:**
Building scrapers that are less likely to be blocked and are legally compliant.

**What the code does:**
The next code cell lists best practices for ethical web scraping.

In [None]:
ethics_tips = [
    "1. Always check the website's robots.txt file and terms of service",
    "2. Add delays between requests (don't overwhelm the server)",
    "3. Identify your scraper with appropriate User-Agent headers",
    "4. Cache results to avoid repeated requests",
    "5. Only collect the data you need",
    "6. Consider using the site's API if available",
    "7. Respect copyright and data ownership"
]
for tip in ethics_tips:
    print(tip)

### Respecting robots.txt in Web Scraping

**Introduction:**
The robots.txt file tells scrapers which parts of a website can be crawled.

**Real-life use case:**
Avoiding legal issues and being a responsible web citizen.

**What the code does:**
The next code cell demonstrates how to check robots.txt before scraping.

In [None]:
# Demonstrative code for respecting robots.txt (not executed)
# from urllib.robotparser import RobotFileParser
# def is_scraping_allowed(url, user_agent="MyScraperBot"):
#     from urllib.parse import urlparse
#     parsed_url = urlparse(url)
#     domain = f"{parsed_url.scheme}://{parsed_url.netloc}"
#     robots_url = f"{domain}/robots.txt"
#     rp = RobotFileParser()
#     rp.set_url(robots_url)
#     try:
#         rp.read()
#         return rp.can_fetch(user_agent, url)
#     except Exception as e:
#         print(f"Error reading robots.txt: {e}")
#         return False

### Using Proper Headers in Web Scraping

**Introduction:**
Setting headers can help your scraper mimic a real browser and avoid being blocked.

**Real-life use case:**
Accessing content that is only available to browsers or avoiding detection as a bot.

**What the code does:**
The next code cell shows how to set headers in a request.

In [None]:
# Example of setting headers (demonstrative, not executed)
# headers = {
#     'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
#     'Accept': 'text/html,application/xhtml+xml,application/xml',
#     'Accept-Language': 'en-US,en;q=0.9',
#     'Referer': 'https://www.google.com/',
#     'DNT': '1'
# }
# response = requests.get('https://www.example.com', headers=headers)

## 2. GUI Programming (tkinter)
**Definition:** GUI programming allows you to create graphical user interfaces for your applications.

**Syntax and Example:** Simple window with a button.

In [None]:
import tkinter as tk
from tkinter import ttk, messagebox, filedialog
import matplotlib.pyplot as plt
from matplotlib.backends.backend_tkagg import FigureCanvasTkAgg
import numpy as np

# Basic tkinter window example
def basic_window_example():
    """Create a basic window with a button"""
    # Create the main window
    root = tk.Tk()
    root.title('Simple GUI Example')  # Set window title
    root.geometry('300x200')  # Set window size (width x height)
    
    # Function to handle button clicks
    def say_hello():
        print('Hello from GUI!')
        # Show a message box
        messagebox.showinfo("Greeting", "Hello from the GUI!")
    
    # Create a button widget
    button = tk.Button(root, text='Click Me', command=say_hello)
    # Place the button in the window (using pack layout manager)
    button.pack(padx=20, pady=20)
    
    # Create a label widget
    label = tk.Label(root, text="This is a simple GUI example", font=("Arial", 12))
    label.pack(pady=10)
    
    # Start the GUI event loop (would block execution in a regular script)
    print("\nNote: The following code would start the event loop and display the window.")
    print("In a Jupyter notebook, we'll skip this to avoid blocking the execution.\n")
    # root.mainloop()  # Uncomment to run the event loop

# Show basic window example code
print("Basic tkinter window example:")
basic_window_example()

# More comprehensive GUI example
print("\nMore comprehensive GUI example:")

def comprehensive_gui_example():
    """A more comprehensive GUI with multiple widgets and layouts"""
    # Simulated application code without actually running it
    
    print("This example would create a data entry form with:")
    print("- Text entry fields")
    print("- Dropdown menus")
    print("- Checkboxes")
    print("- Radio buttons")
    print("- A data table")
    print("- File selection dialog")
    print("- A matplotlib chart embedded in the GUI")
    
    # Sample code that would be used (for demonstration)
    code_sample = """
    # Create main application window
    app = tk.Tk()
    app.title("Data Analysis Tool")
    app.geometry("800x600")
    
    # Create a notebook (tabbed interface)
    notebook = ttk.Notebook(app)
    notebook.pack(fill='both', expand=True, padx=10, pady=10)
    
    # First tab - Data Entry
    data_tab = ttk.Frame(notebook)
    notebook.add(data_tab, text="Data Entry")
    
    # Create a form in the first tab
    tk.Label(data_tab, text="Name:").grid(row=0, column=0, sticky='w', pady=5, padx=5)
    name_entry = tk.Entry(data_tab, width=30)
    name_entry.grid(row=0, column=1, pady=5, padx=5)
    
    tk.Label(data_tab, text="Age:").grid(row=1, column=0, sticky='w', pady=5, padx=5)
    age_entry = tk.Entry(data_tab)
    age_entry.grid(row=1, column=1, pady=5, padx=5)
    
    tk.Label(data_tab, text="Occupation:").grid(row=2, column=0, sticky='w', pady=5, padx=5)
    occupation = ttk.Combobox(data_tab, values=["Data Scientist", "Software Engineer", "Analyst", "Manager", "Other"])
    occupation.grid(row=2, column=1, pady=5, padx=5)
    
    # Checkboxes
    tk.Label(data_tab, text="Skills:").grid(row=3, column=0, sticky='w', pady=5, padx=5)
    skills_frame = tk.Frame(data_tab)
    skills_frame.grid(row=3, column=1, sticky='w')
    
    python_var = tk.BooleanVar()
    r_var = tk.BooleanVar()
    sql_var = tk.BooleanVar()
    
    tk.Checkbutton(skills_frame, text="Python", variable=python_var).pack(anchor='w')
    tk.Checkbutton(skills_frame, text="R", variable=r_var).pack(anchor='w')
    tk.Checkbutton(skills_frame, text="SQL", variable=sql_var).pack(anchor='w')
    
    # Radio buttons
    tk.Label(data_tab, text="Experience Level:").grid(row=4, column=0, sticky='w', pady=5, padx=5)
    exp_frame = tk.Frame(data_tab)
    exp_frame.grid(row=4, column=1, sticky='w')
    
    exp_var = tk.StringVar(value="intermediate")
    tk.Radiobutton(exp_frame, text="Beginner", variable=exp_var, value="beginner").pack(anchor='w')
    tk.Radiobutton(exp_frame, text="Intermediate", variable=exp_var, value="intermediate").pack(anchor='w')
    tk.Radiobutton(exp_frame, text="Expert", variable=exp_var, value="expert").pack(anchor='w')
    
    # Submit button
    def submit_data():
        data = {
            "name": name_entry.get(),
            "age": age_entry.get(),
            "occupation": occupation.get(),
            "skills": {
                "python": python_var.get(),
                "r": r_var.get(),
                "sql": sql_var.get()
            },
            "experience": exp_var.get()
        }
        messagebox.showinfo("Data Submitted", f"Submitted: {data}")
    
    submit_btn = tk.Button(data_tab, text="Submit", command=submit_data)
    submit_btn.grid(row=5, column=1, pady=10, padx=5, sticky='e')
    
    # Second tab - Data Visualization
    viz_tab = ttk.Frame(notebook)
    notebook.add(viz_tab, text="Visualization")
    
    # Add a matplotlib figure to the visualization tab
    fig = plt.Figure(figsize=(6, 4), dpi=100)
    ax = fig.add_subplot(111)
    
    # Sample data
    x = np.arange(0, 10, 0.1)
    y = np.sin(x)
    
    # Plot data
    ax.plot(x, y)
    ax.set_title('Sample Visualization')
    ax.set_xlabel('X axis')
    ax.set_ylabel('Y axis')
    
    # Embed the matplotlib figure in the tkinter window
    canvas = FigureCanvasTkAgg(fig, master=viz_tab)
    canvas.draw()
    canvas.get_tk_widget().pack(fill=tk.BOTH, expand=True, padx=10, pady=10)
    
    # Third tab - Data Table
    table_tab = ttk.Frame(notebook)
    notebook.add(table_tab, text="Data Table")
    
    # Create a treeview (table widget)
    columns = ('name', 'age', 'occupation', 'experience')
    tree = ttk.Treeview(table_tab, columns=columns, show='headings')
    
    # Define headings
    tree.heading('name', text='Name')
    tree.heading('age', text='Age')
    tree.heading('occupation', text='Occupation')
    tree.heading('experience', text='Experience')
    
    # Sample data
    sample_data = [
        ('Alice', 28, 'Data Scientist', 'Expert'),
        ('Bob', 34, 'Software Engineer', 'Intermediate'),
        ('Charlie', 22, 'Analyst', 'Beginner'),
        ('Diana', 41, 'Manager', 'Expert')
    ]
    
    # Add data to the table
    for item in sample_data:
        tree.insert('', tk.END, values=item)
    
    # Add scrollbar
    scrollbar = ttk.Scrollbar(table_tab, orient=tk.VERTICAL, command=tree.yview)
    tree.configure(yscroll=scrollbar.set)
    
    # Pack widgets
    tree.pack(side=tk.LEFT, fill=tk.BOTH, expand=True)
    scrollbar.pack(side=tk.RIGHT, fill=tk.Y)
    
    # Menu bar
    menubar = tk.Menu(app)
    
    # File menu
    file_menu = tk.Menu(menubar, tearoff=0)
    file_menu.add_command(label="New", command=lambda: print("New file"))
    file_menu.add_command(label="Open", command=lambda: filedialog.askopenfilename())
    file_menu.add_command(label="Save", command=lambda: filedialog.asksaveasfilename())
    file_menu.add_separator()
    file_menu.add_command(label="Exit", command=app.quit)
    menubar.add_cascade(label="File", menu=file_menu)
    
    # Help menu
    help_menu = tk.Menu(menubar, tearoff=0)
    help_menu.add_command(label="About", command=lambda: messagebox.showinfo("About", "Data Analysis Tool v1.0"))
    menubar.add_cascade(label="Help", menu=help_menu)
    
    # Set the menu bar
    app.config(menu=menubar)
    
    # Start the application
    app.mainloop()
    """
    
    print("\nSample code structure (not executed):")
    print(code_sample[:500] + "...\n[code continues]")

# Show comprehensive GUI example
comprehensive_gui_example()

# Alternative GUI libraries
print("\nAlternative Python GUI libraries:")

gui_libraries = {
    "PyQt/PySide": [
        "Pros: Professional look, comprehensive, cross-platform", 
        "Cons: Licensing considerations, more complex",
        "Best for: Complex, feature-rich applications"
    ],
    "wxPython": [
        "Pros: Native look on each platform, free for commercial use", 
        "Cons: Documentation can be lacking",
        "Best for: Business applications requiring native look"
    ],
    "Kivy": [
        "Pros: Great for multi-touch, mobile-friendly, cross-platform", 
        "Cons: Non-native look and feel",
        "Best for: Mobile apps, touch applications, games"
    ],
    "PySimpleGUI": [
        "Pros: Simple API, low learning curve", 
        "Cons: Limited for complex applications",
        "Best for: Quick utility tools, simple interfaces"
    ],
    "Dash": [
        "Pros: Web-based, great for data visualization", 
        "Cons: Requires web browser to run",
        "Best for: Data dashboards, interactive data apps"
    ]
}

for lib, details in gui_libraries.items():
    print(f"\n{lib}:")
    for detail in details:
        print(f"- {detail}")

# GUI Design Best Practices
print("\nGUI Design Best Practices:")
best_practices = [
    "1. Keep the interface simple and intuitive",
    "2. Group related elements together",
    "3. Provide feedback for user actions",
    "4. Be consistent with layout and design",
    "5. Use appropriate widgets for each task",
    "6. Handle errors gracefully with informative messages",
    "7. Make important actions visible and accessible",
    "8. Test your UI with actual users"
]

for practice in best_practices:
    print(practice)

# Data science specific GUI applications
print("\nGUI applications for data science:")
ds_applications = {
    "Data visualization tools": "Interactive plots, dashboards, and exploratory data analysis tools",
    "Parameter tuning interfaces": "GUIs for adjusting model parameters and seeing results in real-time",
    "Data labeling tools": "Interfaces for annotating training data",
    "Result presentation": "Professional presentations of analysis results for non-technical audiences",
    "Workflow management": "Visual pipeline creation for data processing workflows"
}

for app_type, description in ds_applications.items():
    print(f"- {app_type}: {description}")

# Expected output:
# Basic tkinter window example:
# 
# Note: The following code would start the event loop and display the window.
# In a Jupyter notebook, we'll skip this to avoid blocking the execution.
#
# More comprehensive GUI example:
# This example would create a data entry form with:
# - Text entry fields
# - Dropdown menus
# - Checkboxes
# - Radio buttons
# - A data table
# - File selection dialog
# - A matplotlib chart embedded in the GUI
#
# Sample code structure (not executed):
# [code sample]
#
# Alternative Python GUI libraries:
# PyQt/PySide:
# - Pros: Professional look, comprehensive, cross-platform
# - Cons: Licensing considerations, more complex
# - Best for: Complex, feature-rich applications
# [other libraries...]
#
# GUI Design Best Practices:
# 1. Keep the interface simple and intuitive
# [other best practices...]
#
# GUI applications for data science:
# - Data visualization tools: Interactive plots, dashboards, and exploratory data analysis tools
# [other applications...]

## 3. Web Frameworks
**Definition:** Web frameworks enable Python developers to create web applications. Popular options include Flask and Django.

### Overview of Python Web Frameworks

**Introduction:**
Python offers several frameworks for building web applications, each with different features and use cases.

**Real-life use case:**
Choosing the right framework for a web app, API, or dashboard.

**What the code does:**
The next code cell lists popular Python web frameworks and their characteristics.

In [None]:
web_frameworks = [
    {"name": "Flask", "desc": "Lightweight, flexible, minimal structure, good for small to medium apps and APIs"},
    {"name": "Django", "desc": "Full-featured, batteries-included, with admin interface, ORM, and security features"},
    {"name": "FastAPI", "desc": "Modern, high-performance, built for API development with automatic docs"},
    {"name": "Pyramid", "desc": "Flexible but includes more features than Flask, less opinionated than Django"},
    {"name": "Bottle", "desc": "Ultra-lightweight, single-file framework for simple apps"}
]
for fw in web_frameworks:
    print(f"- {fw['name']}: {fw['desc']}")

### Flask Example: Simple Web Application

**Introduction:**
Flask is a lightweight web framework for building web apps and APIs.

**Real-life use case:**
Creating a simple web form or API endpoint for a data project.

**What the code does:**
The next code cell shows a basic Flask app with an HTML form and a JSON API endpoint.

In [None]:
from flask import Flask, render_template, request, jsonify
app = Flask(__name__)

@app.route('/')
def home():
    return """
    <!DOCTYPE html>
    <html>
    <head>
        <title>Python Web App</title>
    </head>
    <body>
        <h1>Simple Flask App</h1>
        <form action="/calculate" method="post">
            <label>Enter value 1:</label>
            <input type="number" name="value1" required><br>
            <label>Enter value 2:</label>
            <input type="number" name="value2" required><br>
            <button type="submit">Calculate Sum</button>
        </form>
    </body>
    </html>
    """

@app.route('/calculate', methods=['POST'])
def calculate():
    try:
        value1 = float(request.form['value1'])
        value2 = float(request.form['value2'])
        result = value1 + value2
        return f"<h1>Result: {value1} + {value2} = {result}</h1>"
    except Exception as e:
        return f"Error: {str(e)}"

@app.route('/api/sum', methods=['GET'])
def api_sum():
    try:
        a = float(request.args.get('a', 0))
        b = float(request.args.get('b', 0))
        return jsonify({'a': a, 'b': b, 'sum': a + b})
    except Exception as e:
        return jsonify({'error': str(e)}), 400

if __name__ == '__main__':
    print("Starting Flask development server...")
    # app.run(debug=True, port=5000)
    print("This Flask application would:")
    print("1. Start a web server on http://localhost:5000")
    print("2. Serve an HTML form at the root URL (/)")
    print("3. Process form submissions at /calculate")
    print("4. Provide a JSON API endpoint at /api/sum?a=5&b=10")