# String Functions & Text Analysis in SQL

## Introduction

String functions in SQL are used to manipulate, clean, and analyze textual data stored in database tables. In real-world datasets, text fields such as names, cities, emails, product descriptions, and categories often require transformation before analysis.

SQL provides built-in string functions to modify case, remove unwanted spaces, extract substrings, replace patterns, and perform text-based filtering.

Text analysis is an important step in data preprocessing, reporting, and data validation workflows. Proper use of string functions improves data consistency, readability, and analytical accuracy.

This notebook focuses on practical string operations commonly used in data analysis environments.

---

## Topics Covered

- Case conversion (`UPPER`, `LOWER`)
- Trimming and cleaning text (`TRIM`, `LTRIM`, `RTRIM`)
- String extraction (`SUBSTRING`)
- Text replacement (`REPLACE`)
- Concatenation (`CONCAT`)
- Pattern matching with `LIKE`
- Handling inconsistent or messy text data


---

### Database Connection 

In [1]:
%reload_ext sql
%config SqlMagic.style = '_DEPRECATED_DEFAULT'
%sql mysql+pymysql://root:Bhavesh%402025@localhost/customers

In [3]:
%%sql
select * from employees;

 * mysql+pymysql://root:***@localhost/customers
8 rows affected.


employee_id,name,department,role,salary,joining_date,city
1,Amit,IT,Data Analyst,60000,2022-01-15,Mumbai
2,Neha,HR,HR Executive,45000,2021-07-10,Pune
3,Rahul,IT,Data Scientist,85000,2020-03-20,Bangalore
4,Priya,Finance,Accountant,50000,2022-11-01,Delhi
5,Suresh,IT,Backend Engineer,75000,2019-06-18,Hyderabad
6,Anita,Marketing,Marketing Manager,55000,2021-02-25,Mumbai
7,Vikram,Finance,Financial Analyst,65000,2020-09-12,Chennai
8,Rohit,Sales,Sales Executive,48000,2023-02-05,Delhi


## Basic String Functions

### UPPER and LOWER 

### Example 1: Convert names to uppercase

In [4]:
%%sql 
select name , 
UPPER(name)as upper_name
from employees

 * mysql+pymysql://root:***@localhost/customers
8 rows affected.


name,upper_name
Amit,AMIT
Neha,NEHA
Rahul,RAHUL
Priya,PRIYA
Suresh,SURESH
Anita,ANITA
Vikram,VIKRAM
Rohit,ROHIT


### Example 2: Convert department to lowercase

In [5]:
%%sql 
select department, 
lower(department) as low_dept 
from employees

 * mysql+pymysql://root:***@localhost/customers
8 rows affected.


department,low_dept
IT,it
HR,hr
IT,it
Finance,finance
IT,it
Marketing,marketing
Finance,finance
Sales,sales


### LENGTH()

### Example 3: Find the Length of employees name 

In [7]:
%%sql
select name , 
LENGTH(name) as name_len 
from employees;

 * mysql+pymysql://root:***@localhost/customers
8 rows affected.


name,name_len
Amit,4
Neha,4
Rahul,5
Priya,5
Suresh,6
Anita,5
Vikram,6
Rohit,5


### CONCAT & String Combination

### Example 4: Create full description

In [11]:
%%sql 
select CONCAT(name ,' work in ', department) as description 
from employees

 * mysql+pymysql://root:***@localhost/customers
8 rows affected.


description
Amit work in IT
Neha work in HR
Rahul work in IT
Priya work in Finance
Suresh work in IT
Anita work in Marketing
Vikram work in Finance
Rohit work in Sales


### Example 5: Create employee label

In [12]:
%%sql 
select CONCAT('EMP-',employee_id,'-',department) as emp_code 
from employees

 * mysql+pymysql://root:***@localhost/customers
8 rows affected.


emp_code
EMP-1-IT
EMP-2-HR
EMP-3-IT
EMP-4-Finance
EMP-5-IT
EMP-6-Marketing
EMP-7-Finance
EMP-8-Sales


### SUBSTRING & Extraction

### Example 6: First 3 letters of name

In [14]:
%%sql 
select name,  
SUBSTRING(name,1,3) as short_name 
from employees

 * mysql+pymysql://root:***@localhost/customers
8 rows affected.


name,short_name
Amit,Ami
Neha,Neh
Rahul,Rah
Priya,Pri
Suresh,Sur
Anita,Ani
Vikram,Vik
Rohit,Roh


### Example 7: Extract year from joining date (as string)

In [15]:
%%sql 
select name,
SUBSTRING(joining_date,1,4) as Join_year
from employees

 * mysql+pymysql://root:***@localhost/customers
8 rows affected.


name,Join_year
Amit,2022
Neha,2021
Rahul,2020
Priya,2022
Suresh,2019
Anita,2021
Vikram,2020
Rohit,2023


### TRIM & Cleaning

### Example 8: Remove extra spaces

In [17]:
%%sql
select TRIM('        SQL      ') as cleand_text

 * mysql+pymysql://root:***@localhost/customers
1 rows affected.


cleand_text
SQL


### Example 9: Remove leading & trailing spaces from name

In [19]:
%%sql 
select name ,
TRIM(name) as clean_name 
from employees

 * mysql+pymysql://root:***@localhost/customers
8 rows affected.


name,clean_name
Amit,Amit
Neha,Neha
Rahul,Rahul
Priya,Priya
Suresh,Suresh
Anita,Anita
Vikram,Vikram
Rohit,Rohit


### REPLACE()

### Example 10: Replace 'IT' with 'Information Technology'

In [20]:
%%sql 
select department ,
REPLACE(department,'IT','Information Technology') as updated_dept
from employees

 * mysql+pymysql://root:***@localhost/customers
8 rows affected.


department,updated_dept
IT,Information Technology
HR,HR
IT,Information Technology
Finance,Finance
IT,Information Technology
Marketing,Marketing
Finance,Finance
Sales,Sales


### Example 11: Mask salary format

In [26]:
%%sql
select name, 
CONCAT('₹',salary) as formated_salary
from employees

 * mysql+pymysql://root:***@localhost/customers
8 rows affected.


name,formated_salary
Amit,₹60000
Neha,₹45000
Rahul,₹85000
Priya,₹50000
Suresh,₹75000
Anita,₹55000
Vikram,₹65000
Rohit,₹48000


### Pattern Matching (LIKE & REGEXP)

### Example 12: Names starting with 'A'

In [28]:
%%sql 
select name
from employees 
where name LIKE 'A%'

 * mysql+pymysql://root:***@localhost/customers
2 rows affected.


name
Amit
Anita


### Example 13: Names ending with 'a'

In [32]:
%%sql 
select name 
from employees 
where name LIKE '%a'

 * mysql+pymysql://root:***@localhost/customers
3 rows affected.


name
Neha
Priya
Anita


### Example 14: Names containing 'it'

In [34]:
%%sql
select name 
from employees
where name LIKE '%it%'

 * mysql+pymysql://root:***@localhost/customers
3 rows affected.


name
Amit
Anita
Rohit


### REGEXP (Advanced Pattern Matching)

### Example 15: Names starting with A or R

In [36]:
%%sql 
select name
from employees 
where name REGEXP '^[AR]'

 * mysql+pymysql://root:***@localhost/customers
4 rows affected.


name
Amit
Rahul
Anita
Rohit


### LEFT() and RIGHT()

### Example 16: First letter of name

In [38]:
%%sql 
select name,
LEFT(name,1)as first_let 
from employees

 * mysql+pymysql://root:***@localhost/customers
8 rows affected.


name,first_let
Amit,A
Neha,N
Rahul,R
Priya,P
Suresh,S
Anita,A
Vikram,V
Rohit,R


### Example 17: Last 2 letters of name

In [39]:
%%sql 
select name , 
RIGHT(name,2) as last_two
from employees

 * mysql+pymysql://root:***@localhost/customers
8 rows affected.


name,last_two
Amit,it
Neha,ha
Rahul,ul
Priya,ya
Suresh,sh
Anita,ta
Vikram,am
Rohit,it


### INSTR() / POSITION()

### Example 18: Position of 'a' in name

In [40]:
%%sql 
select name,
INSTR(name,'a') as position 
from employees

 * mysql+pymysql://root:***@localhost/customers
8 rows affected.


name,position
Amit,1
Neha,4
Rahul,2
Priya,5
Suresh,0
Anita,1
Vikram,5
Rohit,0


In [48]:
%%sql 
select name,
POSITION('a' IN name) as position 
from employees

 * mysql+pymysql://root:***@localhost/customers
8 rows affected.


name,position
Amit,1
Neha,4
Rahul,2
Priya,5
Suresh,0
Anita,1
Vikram,5
Rohit,0


## Text-Based Analysis (Business Level)

### Example 19: Count employees by first letter

In [55]:
%%sql
select LEFT(name,1) as first_letter,
count(*) as emp_count
from employees 
group by first_letter 
order by first_letter

 * mysql+pymysql://root:***@localhost/customers
6 rows affected.


first_letter,emp_count
A,2
N,1
P,1
R,2
S,1
V,1


### Example 20: Standardize department names

In [57]:
%%sql 
select distinct UPPER(TRIM(department)) as sandardize_dept 
from employees

 * mysql+pymysql://root:***@localhost/customers
5 rows affected.


sandardize_dept
IT
HR
FINANCE
MARKETING
SALES


### Example 21: Detect duplicate department values (case issue)

In [60]:
%%sql 
select LOWER(department),count(*)
from employees 
group by LOWER(department)
having count(*) > 1; 

 * mysql+pymysql://root:***@localhost/customers
2 rows affected.


LOWER(department),count(*)
it,3
finance,2


## Advanced Interview Problem

### Extract Initials from Name

In [67]:
%%sql 
select name, 
CONCAT(LEFT(name,1),'.') as initial
from employees

 * mysql+pymysql://root:***@localhost/customers
8 rows affected.


name,initial
Amit,A.
Neha,N.
Rahul,R.
Priya,P.
Suresh,S.
Anita,A.
Vikram,V.
Rohit,R.


### Create Email ID Automatically

In [68]:
%%sql
select name , CONCAT(LOWER(name),'@compny.com') as email
from employees

 * mysql+pymysql://root:***@localhost/customers
8 rows affected.


name,email
Amit,amit@compny.com
Neha,neha@compny.com
Rahul,rahul@compny.com
Priya,priya@compny.com
Suresh,suresh@compny.com
Anita,anita@compny.com
Vikram,vikram@compny.com
Rohit,rohit@compny.com
