Skip to content

Harrypatria/SQLite_Advanced_Tutorial_Google_Colab

Repository files navigation

πŸ—„οΈ SQLite Advanced Tutorial by Patria & Co.

SQLite Logo

GitHub stars GitHub forks Python Version License: MIT

🌟 Overview

A comprehensive, production-ready SQLite tutorial for Python developers, from basic operations to advanced implementations. Perfect for data scientists, web developers, embedded systems engineers, and anyone looking to leverage the power of this serverless, zero-configuration database.

This repository contains real-world examples, proven optimization techniques, and industry best practices that you can immediately implement in your projects.

✨ Key Features

  • πŸ“š Complete Learning Path: Progressive examples from beginner to advanced
  • ⚑ Performance Optimization: Practical techniques to squeeze maximum performance
  • πŸ”„ Real-world Applications: Task manager, IoT sensor data logger, sales analytics dashboard
  • πŸ“Š Data Science Integration: Seamless integration with pandas, numpy, and visualization tools
  • πŸ§ͺ Interactive Notebooks: Run all examples directly in Google Colab
  • πŸ“± Mobile & Embedded: Techniques for resource-constrained environments
  • πŸ›‘οΈ Production-ready Code: Industry best practices for error handling, security, and robustness
  • πŸ§ͺ Healthcare Analytics: Patient Monitoring System for tracking patient vital signs and clinical outcomes.

πŸš€ Quick Start

Option 1: Run in Google Colab (No Installation Required)

Open Main Tutorial In Colab Open Social Network Analysis In Colab Open Biological Networks In Colab Open Biological Networks In Colab Open Biological Networks In Colab Open Biological Networks In Colab

Option 2: Local Installation

# Clone the repository
git clone https://github.com/Harrypatria/SQLite_Advanced_Tutorial_Google_Colab.git
cd SQLite_Advanced_Tutorial_Google_Colab

# Set up a virtual environment (optional but recommended)
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Launch Jupyter notebook
jupyter notebook

πŸ“‹ Notebooks Included

  1. SQLite.ipynb - Core SQLite functionality from basics to advanced techniques
  2. SQLite_TimeSeries_Visualization.ipynb - Time series data visualization with SQLite
  3. SQLite_Dashboard_with_Plotly_Dash.ipynb - Building interactive dashboards with SQLite and Plotly Dash
  4. SQLite_Geographic_Visualization.ipynb - Geographic data visualization using SQLite and Folium

πŸ“‹ Table of Contents

  1. Fundamentals

    • Database Connection
    • Creating Tables
    • CRUD Operations (Create, Read, Update, Delete)
    • SQL Query Basics
  2. Intermediate Operations

    • SQLite with Pandas Integration
    • Transactions & Error Handling
    • Using Python Functions with SQLite
    • Relationships & Foreign Keys
  3. Advanced Techniques

    • Performance Optimization & Indexing
    • Full-Text Search
    • Binary Data (BLOB) Storage
    • Web & Mobile Integration
  4. Data Visualization

    • Time Series Analysis
    • Geographic Visualization
    • Interactive Dashboards
    • Custom Reports and Charts
  5. Real-world Applications

    • Task Management Application
    • IoT Sensor Data Logger
    • Sales Analytics Dashboard
    • Connection Pooling for Multi-threaded Apps
    • Geographic Visualization using Folium
    • Healthcare Analytics
  6. Production Best Practices

    • Security Considerations
    • Backup & Recovery
    • Monitoring & Debugging
    • Deployment Strategies

πŸ”₯ Optimization Techniques Showcase

Technique Description Performance Gain
WAL Journal Mode Write-Ahead Logging for concurrent operations Up to 50x faster writes
Prepared Statements Pre-compiled SQL for repeated execution 3-5x faster for repeated queries
Bulk Operations Batch inserts/updates in transactions 10-100x faster than individual ops
Optimized Indexes Strategic indexing for query patterns 10-1000x faster lookups
Memory-Mapped I/O Direct memory access for databases 2-3x faster for large databases
Custom Collations Performance-optimized string comparison 2-4x faster text operations

πŸ’‘ Real-world Example: IoT Sensor Dashboard

# Create an optimized database for IoT sensor data
conn = sqlite3.connect('sensors.db')
conn.execute('PRAGMA journal_mode = WAL')
conn.execute('PRAGMA synchronous = NORMAL')

# Define schema with indexes for time-series queries
conn.execute('''
CREATE TABLE sensors (
    id INTEGER PRIMARY KEY,
    timestamp TEXT NOT NULL,
    device_id TEXT NOT NULL,
    temperature REAL,
    humidity REAL
);
''')
conn.execute('CREATE INDEX idx_sensors_time_device ON sensors(timestamp, device_id);')

# Insert 10,000 data points in a single transaction
conn.execute('BEGIN TRANSACTION')
for i in range(10000):
    conn.execute(
        'INSERT INTO sensors VALUES (NULL, ?, ?, ?, ?)',
        (datetime.now().isoformat(), f'device_{i%10}', 20+random.random()*10, 30+random.random()*20)
    )
conn.execute('COMMIT')

# Query with pandas for analysis and visualization
import pandas as pd
df = pd.read_sql_query(
    "SELECT device_id, avg(temperature) as avg_temp FROM sensors GROUP BY device_id", 
    conn
)

πŸ”§ Advanced Use Cases

Web Application Integration (Flask)

from flask import Flask, g, jsonify
import sqlite3

app = Flask(__name__)

def get_db():
    db = getattr(g, '_database', None)
    if db is None:
        db = g._database = sqlite3.connect('app.db')
        db.row_factory = sqlite3.Row
    return db

@app.teardown_appcontext
def close_connection(exception):
    db = getattr(g, '_database', None)
    if db is not None:
        db.close()

@app.route('/api/data')
def get_data():
    cursor = get_db().execute('SELECT * FROM items ORDER BY timestamp DESC LIMIT 100')
    items = [dict(row) for row in cursor.fetchall()]
    return jsonify({'items': items})

Data Science Pipeline

import pandas as pd
import sqlite3

# Load data
df = pd.read_csv('large_dataset.csv')

# Process and clean data
df_processed = transform_data(df)

# Store in SQLite for efficient querying
conn = sqlite3.connect('analysis.db')
df_processed.to_sql('cleaned_data', conn, if_exists='replace', index=False)

# Run complex aggregations with SQL
results = pd.read_sql_query('''
    SELECT 
        category,
        COUNT(*) as count,
        AVG(value) as avg_value,
        SUM(CASE WHEN flag = 1 THEN 1 ELSE 0 END) as flag_count
    FROM cleaned_data
    GROUP BY category
    HAVING count > 100
    ORDER BY avg_value DESC
''', conn)

Interactive Dashboard with Plotly Dash

from dash import Dash, dcc, html
from dash.dependencies import Input, Output
import plotly.express as px
import pandas as pd
import sqlite3

# Connect to database
conn = sqlite3.connect('sales.db')

# Load data
df = pd.read_sql_query("""
    SELECT date, product, category, region, sales_amount 
    FROM sales 
    ORDER BY date
""", conn)

# Initialize Dash app
app = Dash(__name__)

# App layout
app.layout = html.Div([
    html.H1("Sales Dashboard"),
    
    html.Div([
        html.Div([
            html.Label("Select Category:"),
            dcc.Dropdown(
                id='category-dropdown',
                options=[{'label': c, 'value': c} for c in df['category'].unique()],
                value=df['category'].unique()[0]
            ),
        ], style={'width': '30%', 'display': 'inline-block'}),
        
        html.Div([
            html.Label("Select Region:"),
            dcc.Dropdown(
                id='region-dropdown',
                options=[{'label': r, 'value': r} for r in df['region'].unique()],
                value=df['region'].unique()[0]
            ),
        ], style={'width': '30%', 'display': 'inline-block'}),
    ]),
    
    dcc.Graph(id='time-series-chart'),
    
    dcc.Graph(id='category-chart')
])

# Callbacks
@app.callback(
    Output('time-series-chart', 'figure'),
    [Input('category-dropdown', 'value'),
     Input('region-dropdown', 'value')]
)
def update_time_series(selected_category, selected_region):
    filtered_df = df[(df['category'] == selected_category) & (df['region'] == selected_region)]
    
    fig = px.line(
        filtered_df, 
        x='date', 
        y='sales_amount',
        color='product',
        title=f'Sales Trend for {selected_category} in {selected_region}'
    )
    
    return fig

if __name__ == '__main__':
    app.run_server(debug=True)

πŸ“š Learning Path

This tutorial is designed to progressively build your SQLite expertise:

  1. Day 1: Basic operations and CRUD
  2. Day 2: Relationships and advanced queries
  3. Day 3: Integration with pandas and data analysis
  4. Day 4: Performance optimization and indexing
  5. Day 5: Building real-world applications
  6. Day 6: Data visualization and dashboards
  7. Day 7: Advanced applications and deployment

πŸ€” Why SQLite?

  • Zero Configuration: No server setup, just import and use
  • Cross-Platform: Works identically across Windows, macOS, Linux, Android, iOS
  • Incredibly Reliable: 100% test coverage and battle-tested in billions of deployments
  • Self-Contained: Single file database with no external dependencies
  • Full-Featured SQL: Supports most SQL features including complex joins, views, and triggers
  • Excellent Performance: Faster than many client-server database setups for most applications
  • Public Domain: Free for any use (commercial or private)

πŸ“Š Industry Applications

  • Mobile Apps: Local data storage on iOS and Android
  • Desktop Applications: Configuration and data storage
  • Embedded Systems: IoT devices and edge computing
  • Web Development: Small to medium websites and prototyping
  • Data Science: Data cleaning, transformation, and analysis
  • Testing: Isolated test databases that require no setup
  • Data Visualization: Interactive dashboards and reports

πŸ› οΈ Troubleshooting Common Issues

Database Locked Error

When encountering "database is locked" errors:

# Solution 1: Use WAL mode
conn.execute('PRAGMA journal_mode = WAL')

# Solution 2: Increase timeout
conn.execute('PRAGMA busy_timeout = 5000')  # 5 second timeout

# Solution 3: Better transaction management
conn.isolation_level = None  # Control transactions manually
conn.execute('BEGIN IMMEDIATE')
try:
    # Your operations here
    conn.execute('COMMIT')
except Exception:
    conn.execute('ROLLBACK')
Performance Issues with Large Datasets

When SQLite feels slow with large datasets:

# Solution 1: Strategic indexing
conn.execute('CREATE INDEX idx_column_name ON table_name(column_name)')

# Solution 2: Bulk operations in transactions
conn.execute('BEGIN TRANSACTION')
# Insert many rows
conn.execute('COMMIT')

# Solution 3: Optimize queries
# Instead of:
# SELECT * FROM large_table WHERE condition
# Use:
# SELECT specific_column FROM large_table WHERE condition LIMIT 1000
Memory Management for Large Queries

When working with large result sets:

# Solution 1: Use iterators instead of fetchall()
cursor = conn.execute("SELECT * FROM large_table")
for row in cursor:  # Process one row at a time
    process_row(row)

# Solution 2: Chunk your queries
def process_in_chunks(table, chunk_size=10000):
    offset = 0
    while True:
        query = f"SELECT * FROM {table} LIMIT {chunk_size} OFFSET {offset}"
        chunk = pd.read_sql_query(query, conn)
        
        if chunk.empty:
            break
            
        # Process chunk
        process_data(chunk)
        
        offset += chunk_size
Database Corruption Recovery

If you suspect database corruption:

# Solution 1: Check database integrity
conn.execute("PRAGMA integrity_check")

# Solution 2: Create a recovery copy
import os
import sqlite3

def recover_sqlite_db(corrupt_db_path, recovered_db_path):
    """Attempt to recover a corrupted SQLite database"""
    if os.path.exists(recovered_db_path):
        os.remove(recovered_db_path)
        
    # Create a new empty database
    new_conn = sqlite3.connect(recovered_db_path)
    new_conn.close()
    
    # Use SQLite's recovery shell commands
    os.system(f"echo .dump | sqlite3 {corrupt_db_path} | sqlite3 {recovered_db_path}")
    
    # Verify the recovery worked
    try:
        test_conn = sqlite3.connect(recovered_db_path)
        test_conn.execute("SELECT name FROM sqlite_master WHERE type='table'")
        test_conn.close()
        return True
    except:
        return False

🌟 Star History

If this project has helped you on your Python journey, please consider giving it a star! ⭐

Star History Chart

🀝 Contributing

Contributions are welcome! Here's how you can help:

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/amazing-feature
  3. Commit your changes: git commit -m 'Add some amazing feature'
  4. Push to the branch: git push origin feature/amazing-feature
  5. Open a Pull Request

🌟 Support This Project

Follow me on GitHub: GitHub Follow Connect on LinkedIn: LinkedIn Follow

Click the buttons above to show your support!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published