# This Notebook contains the python code that performs operations on database

## Import all necessary libraries

In [12]:
# import data manipulation libraries use in data science and analytics
import numpy as np
import pandas as pd 
from matplotlib import pyplot as plt
import itertools

# This library allows us to display data in table in html format with relying on pandas
from IPython.display import display, HTML

# import psycopg library to talk to the database
import psycopg2

# import pydaantic for type checking and insuring typesafety 
from pydantic import (BaseModel, EmailStr, Field, SecretStr, ValidationError)
from typing import (Optional, Any, Dict, List, Tuple)

# import sqlalchemy to manage database connections and carry out db operations
from sqlalchemy import (create_engine, inspect, text)
from sqlalchemy.engine import Engine

# importing python-decouple to help us access the secrets from a .env file
from decouple import (config, Config, RepositoryEnv)
# import path lib to set path for the env file
from pathlib import Path

## Create database connection model and function to validate the database parameters. 
I could have fed the database creds to the function directly like people normally do, but I chose to use pydantic to enforce type safety and to make sure once the parameters are set they are not changed during the execution of the program. <br>
- ```frozen=True``` : When frozen is set to ```True``` it tells pydantic that once the parameters are set they are not allowed to change.
- ```default=None``` : If suppose nothing is passed in any of the field then None will be passed automatically since I set None to be the default value.

In [13]:
# A pydantic model for database connection
# The reason I am using frozen in the fields in this model is because I don't want the fields to change after its setup for the first time
class DbConnManagerModel (BaseModel):
    host : str = Field(default=None, examples=["localhost"], frozen=True)
    database : str = Field(default=None, examples=["db_name"], frozen=True)
    user : str = Field(default=None, examples=["username"], frozen=True)
    password : str = Field(default=None, examples=["your_password"], frozen=True)
    port : int = Field(default=None, examples=[5432], frozen=True)           

## Tell jupyter notebook where to look for .env file

Here I will have to tell jupyter notebook where to look for the env file otherwise it won't know from where should it get the credentials for the database.
### Explaination : 
#### RepositoryEnv:
- This comes from the ```python-decouple``` library.
- This library knows how to read a ```.env``` file.
- It parses through the ```.env``` file and makes the key-value pair available.
#### Config:
- This ```Config``` library comes from ```python-decouple``` library.
- ```Config``` is a wrapper around different configuration sources like (```.env``` files)
- When I created ```Config(RepositoryEnv(env_path))``` I am telling it to load configuration values from the given ```.env``` file (via ```RepositoryEnv```)

### **NOTICE :** 
If your os platform is windows and you have created an .env file using file explorer like what I used to do in linux then windows has a tendency adding ```.text``` after the env file so your env file becomes like this ```.env.txt```. Beware of this when trying to use windows file exporer to create an env file. Use Vs code or jupyter notebook instead to create a ```.env``` file. <br>
I will have to look into it a bit more no biggy.

In [14]:
# The path below is windows specific
# env_path = Path("D:/training/credencys-training-pyspark/python/database_operation/.env")
# The path below is an arch linux specific path
env_path = Path("/home/aditya/github/credencys-training-pyspark/python/database_operation/.env")
config = Config(RepositoryEnv(env_path))

# for debugging purposes 
# I wrote this code when I was trying to find out the reasons for my changes not being saved by the jupyter notebook.
import os
# The path below is windows specific
# os.path.exists(r"D:\training\credencys-training-pyspark\python\database_operation")

## Create a database connection using sql alchemy and create a connection object

In [15]:
# get the database connection prameters from the .env file and create a dictionary out of it
db_connection_params = {
    "host":config('host'),
    "database":config('database'),
    "user":config('user'),
    "password":config('password'),
    "port":config('port'),
}

# create a function to manage database connection
def database_engine(model_config:DbConnManagerModel) -> Engine:
    try:
        # validate connection parameters before creating a database connection object
        DBConnObj = DbConnManagerModel.model_validate(db_connection_params)
        print(f"DBConnObj : {DBConnObj}")
    
        # create a database engine
        engine = create_engine(
            f"postgresql+psycopg2://{DBConnObj.user}:{DBConnObj.password}@{DBConnObj.host}:{DBConnObj.port}/{DBConnObj.database}"
        )
        return engine
    except Exception as e:
        raise RuntimeError(f"failed to make connection with the database : {e}")

# create an databse engine object
engine_obj = database_engine(db_connection_params)

DBConnObj : host='localhost' database='training' user='postgres' password='1212' port=5432


## Create a table in the database

### Method 1 : Create the table using raw sql query without using sql alchemy models

## Read a Table from the database table

### Create a function to display data 
This function will be responsible to display the incoming data from the database in a table in jupyter notebook using html and css for this I am using IPython
- The reason I am going to display data by manually creating table in html and inserting it in this notebook's output window using IPython library instead of using pandas because I want to fire up raw sql queries and I don't want to rely on panda's function to carry out sql operations in the database. I wanted to do it manually using sqlAlchemy.

### Explaination : 
- ```display(HTML(html))``` :
    - ```HTML(html)``` → takes the raw HTML string and tells Jupyter: “Render this as HTML, not plain text.”
    - ```display(...)``` → is the IPython/Jupyter function that actually shows the formatted output in the notebook cell output area. Without this, you’d just see the raw string like.

In [8]:
# Create a function which is use the data returned by the "read_db_table_data" function
# to display the data nicely in a table using html code 
class DisplayDataModel(BaseModel):
    headers : List[str]
    rows : List[Tuple[Any,...]]
    
def display_data(display_model_config:DisplayDataModel) -> None:
    html = "<table border='1' style='border-collapse: collapse;'>"
    # Header row
    html += "<tr><th>S.no</th>" + "".join([f"<th>{h}</th>" for h in display_model_config.headers]) + "</tr>"
    # Data rows
    # using enumerate to get key value pairs on serial no and the actual data
    for i, row in enumerate(display_model_config.rows,start=1):  # limit rows for readability
        html += f"<tr><td>{i}</td>" + "".join([f"<td>{col}</td>" for col in row]) + "</tr>"
    html += "</table>"
    display(HTML(html))

### Create a function to read data from the database using 
#### Explaination : 
```python
model_config = {
        "arbitrary_types_allowed":True
    }
``` 

<br>

- The reason I have to use ```model_config``` to set ```arbitrary_types_allowed``` flag to ```True``` to avoid error raiseed by pydantic saying ```RuntimeError: Unable to validate field "engine"```
- This tells pydantic to accept the object as it is and don't be alarmed by just accept it as it is.

In [9]:
# Created a pydantic model.
# This model is used to validate the parameters required to read the data from a table from the database
class ReadDbTableModel(BaseModel):
    table_name:str=Field(default=None)
    columns:List[str]=Field(default=[])
    engine:Engine
    max_rows : Optional[int] = Field(default=20)
    model_config = {
        "arbitrary_types_allowed":True
    }
    
# Create a function to read the data from the table in the database
def read_db_table_data(table_model_config:ReadDbTableModel) -> tuple[list[str], list[tuple[Any,...]]]:
    try:
        with table_model_config.engine.connect() as conn:
            if table_model_config.columns:
                query = text(f"SELECT {', '.join(table_model_config.columns)} FROM {table_model_config.table_name} LIMIT {table_model_config.max_rows}")
            else:
                query = text(f"SELECT * FROM {table_model_config.table_name} LIMIT {table_model_config.max_rows}")
            result = conn.execute(query)
            rows_data = result.fetchall()
            headers_data = result.keys()
        return headers_data,rows_data
    except Exception as e:
        raise RuntimeError(f"Cannot read the data from the table {table_model_config.table_name} : {e}")

### NOTICE : 
If you are using passord type field in your pydantic model then using SecretStr is not a good idea in this case. Since we want to make connection to the database we have to use str instead to be able to successfully make a connection to the database otherwise we will get the error : "Authentication failed unable login into postgres"

### Get all the columns from the table 

In [11]:
config_read_customers_table = ReadDbTableModel(table_name="customers", engine=engine_obj)
headers_data, rows_data = read_db_table_data(config_read_customers_table)
display_obj = DisplayDataModel(headers=headers_data, rows=rows_data)
display_data(display_obj)

RuntimeError: Cannot read the data from the table customers : (psycopg2.errors.UndefinedTable) relation "customers" does not exist
LINE 1: SELECT * FROM customers LIMIT 20
                      ^

[SQL: SELECT * FROM customers LIMIT 20]
(Background on this error at: https://sqlalche.me/e/20/f405)

### Get selected columns from the table 

In [22]:
config_read_customers_table = ReadDbTableModel(table_name="customers", engine=engine_obj
                                               ,columns=["city","state_province"])
headers_data, rows_data = read_db_table_data(config_read_customers_table)
display_obj = DisplayDataModel(headers=headers_data, rows=rows_data)
display_data(display_obj)

S.no,city,state_province
1,Naperville,Illinois
2,Henderson,Kentucky
3,Los Angeles,California
4,Huntsville,Texas
5,Laredo,Texas
6,Springfield,Virginia
7,San Francisco,California
8,Bossier City,Louisiana
9,Mount Pleasant,South Carolina
10,Newark,Ohio


### Get data of 80 rows from the table

In [23]:
config_read_customers_table = ReadDbTableModel(table_name="customers", engine=engine_obj
                                               ,columns=["city","state_province"], max_rows=80)
headers_data, rows_data = read_db_table_data(config_read_customers_table)
display_obj = DisplayDataModel(headers=headers_data, rows=rows_data, max_rows=80)
display_data(display_obj)

S.no,city,state_province
1,Naperville,Illinois
2,Henderson,Kentucky
3,Los Angeles,California
4,Huntsville,Texas
5,Laredo,Texas
6,Springfield,Virginia
7,San Francisco,California
8,Bossier City,Louisiana
9,Mount Pleasant,South Carolina
10,Newark,Ohio


## Unable to save changes in the jupyter notebook file

I randomely faced an issue where I was not able to save any changes I made in my jupyter notebook. <br>

Error message looks something like this : <br>
```Unexpected error while saving file: python/database_operation/Database_operations_python.ipynb [Errno 22] Invalid argument: 'D:\\training\\credencys-training-pyspark\\python\\database_operation\\Database_operations_python.ipynb'```

<br>

I googled about this issue and found out :
- I found out that windows doesn't allow you to make changes to a file if another program is using it. In my case this was irrelevant because only jupyter notebook was the only program accessing this file. I haven't opened this file on any other application.
- I used this code to check if the file acutally exists or not in the directory where its supposed to be ```import os
os.path.exists(r"D:\training\credencys-training-pyspark\python\database_operation")``` I got the result to be True.
- I opened the directory in file manager where I noticed there were two temp files that were present in that directory. I simply deleted it. Deleting the temp files resolved this issue.

<br>

I still don't know why this glitch happened. I will look into it at a later date. This has never happend to be before when I used to use jupyter notebook on linux platforms. 

<br>

**NOTICE:** It took a bit more time to resolve this issue and I found the solution by accident. 