# Text Preprocessing Notebook (Up to Lemmatization with spaCy)

This notebook covers text preprocessing steps up to lemmatization using **spaCy**.

In [None]:

# Install and load spaCy
!pip install -q spacy
!python -m spacy download en_core_web_sm


In [None]:

# Import libraries
import pandas as pd
import spacy
import re


In [None]:

# Load spaCy English model
nlp = spacy.load("en_core_web_sm")


## Load Dataset
Replace the file path with your dataset location.

In [None]:

# Load dataset
df = pd.read_excel("Delinquency_prediction_dataset.xlsx")

# Display first few rows
df.head()


## Basic Text Cleaning

In [None]:

def clean_text(text):
    text = str(text).lower()
    text = re.sub(r'[^a-z\s]', '', text)  # remove punctuation & numbers
    text = re.sub(r'\s+', ' ', text).strip()
    return text

df['Summary'] = df['Summary'].apply(clean_text)
df['Text'] = df['Text'].apply(clean_text)

df.head()


## Lemmatization using spaCy

In [None]:

def lemmatize_spacy(text):
    doc = nlp(text)
    return ' '.join([token.lemma_ for token in doc if not token.is_stop])

df['Summary'] = df['Summary'].apply(lemmatize_spacy)
df['Text'] = df['Text'].apply(lemmatize_spacy)

df.head()


## Preprocessing Complete
The dataset is now cleaned and lemmatized using spaCy.