<a href="https://colab.research.google.com/github/Animeshcoder/MySQL-Python/blob/main/Python_MySQL_P8.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### **Introduuction:**
 This project extracts data from an existing table in a MySQL database, transforms it using a custom function that checks for specific ID values and fills new rows with values from the Value column, and inserts the transformed data into a new table in a different database. This allows you to reorganize and restructure the data in a way that is more suitable for your needs.

This code connects to a MySQL database using the create_engine function from the sqlalchemy library and providing the necessary credentials. It then constructs an SQL query to extract data from an existing table in the database.

The query is executed using the read_sql_query function from the pandas library, which returns the result as a DataFrame. The code then creates a connection to a different database where the transformed data will be inserted.

A function named process_group is defined to process each group of rows with the same values in two columns. This function checks if any of the rows have an ID value that is in a specific group of values. If it does, it starts a new entry with other remaining columns set to NULL. Otherwise, it continues filling the same row with values from the Value column.

The function then fills a new row with values from the Value column corresponding to each ID. This new row is returned by the function.

The process_group function is applied to each group of rows with the same values in two columns using the groupby and apply methods of the DataFrame. The result is a new DataFrame containing the transformed data.

Finally, this new DataFrame is saved to the new database using the to_sql method of the DataFrame and providing the necessary arguments such as table name, connection object, and options for handling existing data.

Here’s a step-by-step tutorial explaining each part of the code:

**Import necessary libraries:** Import the pandas, sqlalchemy, and urllib.parse libraries.

In [None]:
import pandas as pd
from sqlalchemy import create_engine
import urllib.parse

**Create a connection to the MySQL database:** Use the create_engine function from the sqlalchemy library to create a connection to the MySQL database. Provide the necessary credentials such as host, user, password, and database name. Use the quote function from the urllib.parse library to properly encode special characters in the password.

In [None]:
password = "yourpassword@123"
password = urllib.parse.quote(password)
engine = create_engine(f"mysql+pymysql://youruser:{password}@yourlhost/yourdatabasename")

**Write an SQL query to extract data from an existing table:** Write an SQL query to select data from an existing table in the database based on certain conditions.

In [None]:
query = """
    SELECT ID, Value FROM yourdatabasename.tablename
    WHERE Value is not null and ID_No > '363000' and ID IN ('1', '2','3','4','5','6', '7', '8')
"""

**Execute the query and store the result in a DataFrame:** Use the read_sql_query function from the pandas library to execute the query and store the result in a DataFrame.

In [None]:
df = pd.read_sql_query(query, engine)

**Create a connection to a different database:** Create a connection to a different MySQL database where you want to insert the transformed data.

In [None]:
new_engine = create_engine(f"mysql+pymysql://youruser:{password}@yourhost/yourdatabasename")

**Define a function to process each group of rows:** Define a function named process_group that takes as input a group of rows with the same values in two columns. This function checks if any of these rows have an ID value that is in a specific group of values. If it does, it starts a new entry with other remaining columns set to NULL. Otherwise, it continues filling the same row with values from the Value column.
The function then fills a new row with values from the Value column corresponding to each ID. This new row is returned by the function.

In [None]:
def process_group(group):
    # check if any of these rows have an ID value that is in a specific group of values
    if group["ID"].isin(['1', '2']).any():
        # start a new entry with other remaining columns set to NULL
        new_row = pd.Series({"Name": None, "Phone No": None, "Date of Birth": None, "Age": None})
    else:
        # continue filling same row with values from Value column
        new_row = pd.Series(dtype=object)

    # fill new_row with values from Value column corresponding to each ID
    for form_meta_id, value in group[["ID", "Value"]].values:
        if form_meta_id in ['1', '2']:
            new_row["Name"] = value
        elif form_meta_id in ['3','4']:
            new_row["Phone No"] = value
        elif form_meta_id in ['5','6']:
            new_row["Date of Birth"] = value
        elif form_meta_id in ['7','8']:
            new_row["Age"] = value
    return new_row

**Apply the process_group function to each group of rows:** Use the groupby and apply methods of the DataFrame to apply the process_group function to each group of rows with the same values in two columns. The result is a new DataFrame containing the transformed data.

In [None]:
column_data = df.groupby(["ID", "Value"]).apply(process_group).reset_index()

**Save the transformed data to the new database:** Use the to_sql method of the DataFrame to save the transformed data to a new table in the new database. Provide the necessary arguments such as table name, connection object, and options for handling existing data.

In [None]:
column_data.to_sql("newtable", new_engine, index=False, if_exists="append")