# Air Quality Data Exploration
This notebook is focused on exploring the air quality data stored in a PostgreSQL database. We will:
- Load the necessary libraries
- Connect to the PostgreSQL database
- Retrieve the air quality data into a pandas DataFrame
- Perform initial data exploration and processing

In [None]:
# Import necessary libraries
import warnings

import pandas as pd
import psycopg2
from sklearn.model_selection import train_test_split
from tqdm import tqdm
import numpy as np

# Visualization libraries
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import MinMaxScaler

# Suppress warnings for cleaner output
warnings.filterwarnings('ignore')
warnings.filterwarnings('ignore', category=DeprecationWarning)

# Setting display options for pandas
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)


## Connecting to the Database and Loading Data
In this section, we establish a connection to the PostgreSQL database, execute a SQL query to retrieve air quality data, and store it in a pandas DataFrame.

In [None]:
try:
    # Establishing the connection to the PostgreSQL database
    connection = psycopg2.connect(
        dbname='AirQualityDB',
        user='root',
        password='root',
        host='localhost',
        port='5432'
    )

    # Using a cursor to execute a query
    with connection.cursor() as cursor:
        # Executing the SQL query to fetch all records from the AirQuality table
        cursor.execute('SELECT * FROM "AirQuality"')
        
        # Fetching all rows from the executed query
        rows = cursor.fetchall()

        # Extracting column names
        colnames = [desc[0] for desc in cursor.description]

        # Creating a DataFrame from the fetched data
        df = pd.DataFrame(rows, columns=colnames)
        
        # Dropping the index column if it exists (assumption)
        if 'index' in df.columns:
            df.drop('index', axis=1, inplace=True)

except Exception as e:
    # Handling exceptions during database connection or data fetching
    print(f'An error occurred: {e}')

finally:
    # Ensuring the database connection is closed
    if connection:
        connection.close()


## Initial Data Exploration
We will now explore the structure of the DataFrame to understand the data better.

In [None]:
# Display the shape of the DataFrame (rows, columns)
df.shape