## Importing *people_increased.csv* into `MySQL`.

In [1]:
# Importing needed packages.

import time                           # Built-in, no version.
import pandas as pd                   # Version 2.2.1.
from datetime import datetime         # Built-in, no version.
from sqlalchemy import create_engine  # Version 2.0.28.

In [2]:
start_time = time.time() # Starting time. By timing this operation perfomance is evaluated.

# Connection details, these are my credentials and db name, change if you intend to replicate it.
user = 'root'
password = 'password'
host = 'localhost'
database = 'people'

# Creating the engine for database connection.
engine = create_engine(f'mysql+pymysql://{user}:{password}@{host}/{database}')

# Specify the chunk size, we need to break it into smaller pieces otherwise system will crash, not enough RAM.
chunksize = 10000

# Specify the path to your csv file, mine is in the same directory.
csv_file = 'people_increased.csv'

# Read and insert the data in chunks.
for chunk in pd.read_csv(csv_file, chunksize=chunksize):
    chunk.to_sql('people_data', con=engine, if_exists='append', index=False)
    
end_time = time.time() # Stopping the clock.

In [3]:
execution_time = end_time - start_time # Calculating running time.

# Convert execution time to minutes and seconds.
minutes = int(execution_time // 60)
seconds = int(execution_time % 60)

print(f"Running time to import a 1.6GB csv into MySQL: {minutes} minutes {seconds} seconds")

Running time to import a 1.6GB csv into MySQL: 6 minutes 31 seconds


## Why this script?
* ## Given the urgency of uploading a 1.6GB dataset into the schema *people* and the table *people_data*, and facing the issue that the `MySQL Workbench` feature `Data Import` under the `Server` option was not functioning, a solution was imperative for performing the upload into `MySQL`. This script served as the enabler to import the dataset.

Due to the recording requirement and the time constraint of 5 to 7 minutes, I'm setting up a timestamp to show the exact time the script was run.

In [4]:
# Getting current date and time.
current_time = datetime.now()

# Formatting the date and time in a readable format:
formatted_time = current_time.strftime('%B %d, %Y, %H:%M:%S')

# Print the formatted date and time.
print(f"2.Importing_1.6GB_CSV_to_MySQL.ipynb was last run on: {formatted_time}")

2.Importing_1.6GB_CSV_to_MySQL.ipynb was last run on: April 05, 2024, 09:44:49
