**Data Loading and Extraction (SQL Server)**
Tasks:
1. Load all provided datasets into SQL Server.
2. Extract Customers and Orders datasets separately using Python.
3. Perform necessary data joins to create a Unified Customer View that combines:
     * Customer Profiles
     * Order Information
     * All transformations listed below.
4. Load the Unified Customer View back into SQL Server for further analysis and reporting

In [57]:
server='DESKTOP-HJVSCEN\MSSQLSERVER1'
database='Python ETL'
username='sa'
password='Ka@12345678'

In [58]:
import urllib

ConnectionString = f"""
    DRIVER={{ODBC Driver 18 for SQL Server}};
    SERVER={server};
    DATABASE={database};
    UID={username};
    PWD={password};
    TrustServerCertificate=yes;
"""
# URL-encode the connection string for SQLAlchemy
params=urllib.parse.quote_plus(ConnectionString)

In [59]:
from sqlalchemy import create_engine
import pandas as pd

engine=create_engine(f"mssql+pyodbc:///?odbc_connect={params}")

In [60]:
customer_df=pd.read_sql("SELECT * FROM CUSTOMERS",con=engine)
order_df=pd.read_sql("SELECT * FROM ORDERS",con=engine)
transaction_df=pd.read_sql("SELECT * FROM TRANSACTIONS",con=engine)

In [61]:
customer_df

Unnamed: 0,customer_id,customer_name,email,phone,address,registration_date,loyalty_status
0,1,Michelle Kidd,vayala@example.net,6197234258,"USNS Santiago, FPO AE 80872",2025-01-25,Gold
1,2,Brad Newton,taylorcatherine@example.net,5376741158,"38783 Oliver Street, West Kristenborough, MT 9...",2023-07-13,Silver
2,3,Larry Torres,dsanchez@example.net,8102564505,"6845 Steele Turnpike, West Erikabury, UT 37487",2023-08-18,Bronze
3,4,Kimberly Price,jessicaknight@example.com,4232229779,"1631 Alexis Meadows, Lake Amanda, CA 75179",2024-12-08,Gold
4,5,Matthew Phillips,qwilliams@example.com,2207633522,"2274 Williams Heights Suite 895, Andersonhaven...",2024-02-03,Gold
...,...,...,...,...,...,...,...
995,996,Jerry Mcdaniel,walkerlisa@example.net,6389899441,"34746 Smith Gateway, New Sarah, AS 12715",2025-02-10,Silver
996,997,Jodi Simpson,eric24@example.org,4836252940,"2876 Tucker Road Suite 947, North Tommyborough...",2024-04-18,Bronze
997,998,Crystal Brown,pshaffer@example.net,3907473088,"095 Janice Forest Suite 570, Boltonmouth, DE 7...",2024-08-30,Bronze
998,999,Gregory Duarte,caitlindunlap@example.org,2574098196,"Unit 6377 Box 7662, DPO AP 03300",2024-05-16,Gold


Trim the prefixes and suffixes in the name of the customers using **strip()**

In [62]:
# customer_df['customer_name']=customer_df['customer_name'].str.strip()

Now split the customer name into first name and last name using split()

In [63]:
# customer_df[['first_name','last_name']]=customer_df['customer_name'].str.split(' ',n=1,expand=True)
# customer_df

**Trimming and splitting the names using regex and nlp**

In [64]:
# %pip install re
# %pip install spacy
# %python -m spacy download en_core_web_sm



In [65]:
import sys
print(sys.executable)
!python -m spacy validate



c:\Users\Tanuj\Documents\UseCases\Usecases\Scripts\python.exe

[2K[38;5;2m✔ Loaded compatibility table[0m
[1m
[38;5;4mℹ spaCy installation:
c:\Users\Tanuj\Documents\UseCases\Usecases\Lib\site-packages\spacy[0m

NAME             SPACY            VERSION                            
en_core_web_sm   >=3.8.0,<3.9.0   [38;5;2m3.8.0[0m   [38;5;2m✔[0m



In [66]:
!c:/Users/Tanuj/Documents/UseCases/Usecases/Scripts/python.exe -m spacy download en_core_web_sm


Collecting en-core-web-sm==3.8.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.8.0/en_core_web_sm-3.8.0-py3-none-any.whl (12.8 MB)
     ---------------------------------------- 0.0/12.8 MB ? eta -:--:--
     ---------------------------------------- 0.0/12.8 MB ? eta -:--:--
     ---------------------------------------- 0.1/12.8 MB 1.2 MB/s eta 0:00:12
      --------------------------------------- 0.3/12.8 MB 2.9 MB/s eta 0:00:05
     --- ------------------------------------ 1.1/12.8 MB 7.9 MB/s eta 0:00:02
     ------ --------------------------------- 2.1/12.8 MB 11.4 MB/s eta 0:00:01
     --------- ------------------------------ 3.1/12.8 MB 13.1 MB/s eta 0:00:01
     ------------ --------------------------- 3.9/12.8 MB 14.7 MB/s eta 0:00:01
     -------------- ------------------------- 4.8/12.8 MB 14.5 MB/s eta 0:00:01
     ----------------- ---------------------- 5.6/12.8 MB 15.4 MB/s eta 0:00:01
     ------------------- ------------------


[notice] A new release of pip is available: 24.0 -> 25.1.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [67]:
!python -m spacy link en_core_web_sm en-core-web-sm


[38;5;3m⚠ As of spaCy v3.0, model symlinks are not supported anymore. You can
load trained pipeline packages using their full names or from a directory
path.[0m




In [54]:
import re
import spacy

nlp = spacy.load("en_core_web_sm")  

def clean_name(name):
    name=name.strip()
    name=re.sub(r'^(Mr\.?|Mrs\.?|Ms\.?|Dr\.?|Jr\.?)\s+','',name,flags=re.IGNORECASE)
    name=re.sub(r'\s+(Jr\.?|Sr\.?|II|III|IV)$','',name,flags=re.IGNORECASE)
    return name


def get_parts(name):
    name=clean_name(name)
    doc=nlp(name)
    tokens=[token.text for token in doc if not token.is_punct and not token.is_space]

    first=tokens[0] if len(tokens)>0 else ''
    last=tokens[-1] if len(tokens)>1 else ''
    return pd.Series([first,last])

In [72]:
customer_df[['first_name', 'last_name']] = customer_df['customer_name'].apply(get_parts)

In [73]:
customer_df.columns

Index(['customer_id', 'customer_name', 'email', 'phone', 'address',
       'registration_date', 'loyalty_status', 'first_name', 'last_name'],
      dtype='object')

In [74]:
new_order=['customer_id','customer_name', 'first_name', 'last_name', 'email', 'phone', 'address',
       'registration_date', 'loyalty_status' ]

In [75]:
customer_df=customer_df[new_order]

In [76]:
customer_df

Unnamed: 0,customer_id,customer_name,first_name,last_name,email,phone,address,registration_date,loyalty_status
0,1,Michelle Kidd,Michelle,Kidd,vayala@example.net,6197234258,"USNS Santiago, FPO AE 80872",2025-01-25,Gold
1,2,Brad Newton,Brad,Newton,taylorcatherine@example.net,5376741158,"38783 Oliver Street, West Kristenborough, MT 9...",2023-07-13,Silver
2,3,Larry Torres,Larry,Torres,dsanchez@example.net,8102564505,"6845 Steele Turnpike, West Erikabury, UT 37487",2023-08-18,Bronze
3,4,Kimberly Price,Kimberly,Price,jessicaknight@example.com,4232229779,"1631 Alexis Meadows, Lake Amanda, CA 75179",2024-12-08,Gold
4,5,Matthew Phillips,Matthew,Phillips,qwilliams@example.com,2207633522,"2274 Williams Heights Suite 895, Andersonhaven...",2024-02-03,Gold
...,...,...,...,...,...,...,...,...,...
995,996,Jerry Mcdaniel,Jerry,Mcdaniel,walkerlisa@example.net,6389899441,"34746 Smith Gateway, New Sarah, AS 12715",2025-02-10,Silver
996,997,Jodi Simpson,Jodi,Simpson,eric24@example.org,4836252940,"2876 Tucker Road Suite 947, North Tommyborough...",2024-04-18,Bronze
997,998,Crystal Brown,Crystal,Brown,pshaffer@example.net,3907473088,"095 Janice Forest Suite 570, Boltonmouth, DE 7...",2024-08-30,Bronze
998,999,Gregory Duarte,Gregory,Duarte,caitlindunlap@example.org,2574098196,"Unit 6377 Box 7662, DPO AP 03300",2024-05-16,Gold


In [77]:
customer_df.to_sql("customers",con=engine,index=False,if_exists='replace')

68