#### You can use Parquet, a columnar storage file format that is optimized for use with big data processing frameworks. Here’s how you can modify your function to first write the DataFrame to a Parquet file, and then load the Parquet file into MySQL:

#### **Parquet** and **PyArrow** are two different but related technologies:

##### **Parquet** is an efficient, compressed, column-oriented storage format for arrays and tables of data³. It's designed to bring efficient columnar storage of data closer to the processing, to make the processing faster. It's often used in big data processing frameworks³.

##### **PyArrow**, on the other hand, is a Python library that is part of the Apache Arrow project³. It provides Python APIs to use and manipulate Arrow's in-memory columnar format³. PyArrow also includes a Pythonic API for reading and writing Parquet files³.

##### So, in essence, Parquet is a storage format, and PyArrow is a Python library that can read and write this format, among other things³. They are often used together in data processing pipelines, especially in big data contexts³.


In [42]:
!pip3 install mysqlclient
!pip install python-dotenv
!pip install pymysql



In [43]:
import pandas as pd
from sqlalchemy import create_engine
from dotenv import load_dotenv
import os
import logging
from sqlalchemy.exc import SQLAlchemyError
import pyarrow.parquet as pq
import pyarrow as pa

# Access environment variables
load_dotenv(override=True, encoding='utf-16')

# Access environment variables
database_url = os.getenv('DATABASE_URL')

def load_csv_to_mysql_parquet(file_path, table_name, database_url):
    try: 
        # Create a connection to the MySQL database using the provided database URL
        engine = create_engine(database_url, pool_size=10, max_overflow=20)

        # Read the data from the csv file into a pandas dataframe
        df = pd.read_csv(file_path)

        # Convert the pandas dataframe to a PyArrow Table
        table = pa.Table.from_pandas(df)

        # Write the PyArrow Table to a Parquet file
        pq.write_table(table, 'temp.parquet')

        # Read the Parquet file into a new pandas dataframe
        df_new = pd.read_parquet('temp.parquet')

        # Use pandas to_sql function to load data from DataFrame into MySQL table
        df_new.to_sql(table_name, con=engine, if_exists='append', chunksize=100000, index=False)

        print(f"Data loaded successfully into the '{table_name}' table." )
        
    except FileNotFoundError as e:
        logging.error(f"Error: CSV file '{file_path}' not found. {e}")
    
    except pd.errors.EmptyDataError as e:
        logging.error(f"Error: CSV file '{file_path}' is empty. {e}")
        
    except SQLAlchemyError as e:
        logging.error(f"SQLAlchemy Error: {e}")
            
    except Exception as e:
        logging.error(f"An unexpected error occurred: {e}")
    
    finally:
        if 'engine' in locals():
            engine.dispose()

##### In this modified version of your function, pd.read_csv(file_path) returns a pandas DataFrame. The DataFrame is then converted to a PyArrow Table and written to a Parquet file. The Parquet file is read back into a new pandas DataFrame, which is then written to the MySQL table using the to_sql method.

##### Please note that this approach involves writing and reading a temporary Parquet file (‘temp.parquet’), so make sure you have enough disk space.

#### Load users data into the Users Table

In [44]:
#Usage: Calling the function
file_path = "C:/Users/Bookie/Documents/Dufuna_Documentation/CapstoneProject/data_tables/users.csv"
table_name = "users"
database_url = os.getenv('DATABASE_URL')

if __name__=="__main__":
    load_csv_to_mysql_parquet(file_path, table_name, database_url)

Data loaded successfully into the 'users' table.


**Load users data into the Events Table**

In [45]:
file_path = "C:/Users/Bookie/Documents/Dufuna_Documentation/CapstoneProject/data_tables/events.csv"
table_name = "events"
database_url = os.getenv('DATABASE_URL')

if __name__=="__main__":
        load_csv_to_mysql_parquet(file_path, table_name, database_url)

Data loaded successfully into the 'events' table.


**Load users data into the Distribution_Centers Table**

In [46]:
file_path = "C:/Users/Bookie/Documents/Dufuna_Documentation/CapstoneProject/data_tables/distribution_centers.csv"
table_name = "distribution_centers"
database_url = os.getenv('DATABASE_URL')

if __name__=="__main__":
    load_csv_to_mysql_parquet(file_path, table_name, database_url)

Data loaded successfully into the 'distribution_centers' table.


**Load users data into the Orders Table**

In [47]:
file_path = "C:/Users/Bookie/Documents/Dufuna_Documentation/CapstoneProject/data_tables/orders.csv"
table_name = "orders"
database_url = os.getenv('DATABASE_URL')

if __name__=="__main__":    
    load_csv_to_mysql_parquet(file_path, table_name, database_url)

Data loaded successfully into the 'orders' table.


**Load users data into the Products Table**

In [48]:
file_path = "C:/Users/Bookie/Documents/Dufuna_Documentation/CapstoneProject/data_tables/products.csv"
table_name = "products"
database_url = os.getenv('DATABASE_URL')

if __name__=="__main__":    
    load_csv_to_mysql_parquet(file_path, table_name, database_url)

Data loaded successfully into the 'products' table.


**Load users data into the Inventory_Items Table**

In [49]:
file_path = "C:/Users/Bookie/Documents/Dufuna_Documentation/CapstoneProject/data_tables/inventory_items.csv"
table_name = "inventory_items"
database_url = os.getenv('DATABASE_URL')

if __name__=="__main__":    
    load_csv_to_mysql_parquet(file_path, table_name, database_url)

Data loaded successfully into the 'inventory_items' table.


**Load users data into the Order_Items Table**

In [50]:
file_path = "C:/Users/Bookie/Documents/Dufuna_Documentation/CapstoneProject/data_tables/order_items.csv"
table_name = "order_items"
database_url = os.getenv('DATABASE_URL')

if __name__=="__main__":    
    load_csv_to_mysql_parquet(file_path, table_name, database_url)

Data loaded successfully into the 'order_items' table.
