This Python project involves web scraping and topic modeling. The steps are as follows:

1. **Web Scraping:**
   - Import necessary libraries (`requests`, `BeautifulSoup`).
   - Define the target URL: `https://www.bajajfinserv.in/insights/personal-loan-terms-conditions`.
   - Use a user-agent header to mimic a web browser request.
   - Fetch the web page content and parse it using BeautifulSoup.
   - Extract and store the relevant text content from a specific HTML `div` element with the class `aem-rte-content`.

2. **Topic Modeling:**
   - Import libraries for topic modeling (`gensim`, `pyLDAvis`, `matplotlib`), text processing (`nltk`), and word lemmatization (`WordNetLemmatizer`).
   - Download necessary NLTK datasets (`punkt`, `stopwords`, `wordnet`, `omw-1.4`).
   - Define a function `topic_modeling` to:
     - Tokenize the text into words.
     - Remove stopwords.
     - Lemmatize the words to their base forms.
     - Create a dictionary and a bag of words (BoW) representation of the text.
     - Train a Latent Dirichlet Allocation (LDA) model with a specified number of topics and words.
     - Print the top words for each topic.
     - Visualize the topics using `pyLDAvis`.

3. **Execution:**
   - Call the `topic_modeling` function with the scraped content to perform topic modeling and visualize the results.

This project combines web scraping to gather data from a web page and natural language processing (NLP) techniques to identify and visualize topics within the extracted text.

In [None]:
import requests
from bs4 import BeautifulSoup

url = 'https://www.bajajfinserv.in/insights/personal-loan-terms-conditions'


headers={'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36'}
data=[]

#url1
response0 = requests.get(url,headers=headers)
soup0 = BeautifulSoup(response0.content, 'html.parser')
content = soup0.find('div',class_='aem-rte-content').text
data.append(content)
data



[' Data from the RBI reveals an increase in takers of personal loans in India in the last decade. This can be credited to the fact that it is both a collateral-free loan that doesn’t risk your assets, and is offered instantly online by lenders who often disburse funds the same or next day.\n\nWhile personal loan rules and regulations in India are usually governed by the RBI and almost identical amongst lenders, it is important you know the more detailed loan\xa0terms and conditions\xa0set by your specific lender before you sign the dotted line. So, keep the following in mind.\n\nThe use of the loan\n\nThe terms and conditions of a personal loan specify that it is versatile and you can use it for any legal purpose. You can use it towards personal needs like paying your child’s school or college fees, renovating your home, or even taking an overseas vacation.\n\nAdditional Read: Personal Loan Glossary & Terminology\n\nThe eligibility criteria to avail of the loan\n\nSince a personal loan

In [None]:
!pip install pyLDAvis

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting pyLDAvis
  Downloading pyLDAvis-3.4.0-py3-none-any.whl (2.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.6/2.6 MB[0m [31m25.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting joblib>=1.2.0
  Downloading joblib-1.2.0-py3-none-any.whl (297 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m298.0/298.0 KB[0m [31m22.9 MB/s[0m eta [36m0:00:00[0m
Collecting funcy
  Downloading funcy-2.0-py2.py3-none-any.whl (30 kB)
Installing collected packages: funcy, joblib, pyLDAvis
  Attempting uninstall: joblib
    Found existing installation: joblib 1.1.1
    Uninstalling joblib-1.1.1:
      Successfully uninstalled joblib-1.1.1
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
pandas-profiling 3.2.0 requires joblib~=1.1.0

In [None]:
# Import necessary libraries
import gensim
from gensim import corpora
import pyLDAvis.gensim_models
import matplotlib.pyplot as plt
from nltk.stem import WordNetLemmatizer
from nltk.corpus import stopwords
import nltk
nltk.download('punkt')
nltk.download('stopwords')
nltk.download('wordnet')
nltk.download('omw-1.4')

# Define a function for topic modeling
def topic_modeling(text, num_topics=5, num_words=10):
    # Tokenize the text
    text_tokens = gensim.utils.simple_preprocess(text, deacc=True, min_len=3)
    print(text_tokens)

    # Remove stopwords
    stop_words = set(stopwords.words('english'))
    filtered_tokens = [token for token in text_tokens if token not in stop_words]

    # Lemmatize words
    lemmatizer = WordNetLemmatizer()
    lemmatized_tokens = [lemmatizer.lemmatize(token) for token in filtered_tokens]

    # Create a dictionary from the tokenized text
    dictionary = corpora.Dictionary([lemmatized_tokens])

    # Create a bag of words from the dictionary
    bow_corpus = [dictionary.doc2bow(lemmatized_tokens)]

    # Train the LDA model
    lda_model = gensim.models.LdaModel(bow_corpus, num_topics=num_topics, id2word=dictionary, passes=10)

    # Print the top words for each topic
    for idx, topic in lda_model.print_topics(num_topics=num_topics, num_words=num_words):
        print("Topic: {} \nWords: {}".format(idx, topic))

    # Visualize the topics
    pyLDAvis.enable_notebook()
    vis = pyLDAvis.gensim_models.prepare(lda_model, bow_corpus, dictionary, R=30)
    return vis

# Perform topic modeling and visualization
topic_modeling(content)

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data] Downloading package omw-1.4 to /root/nltk_data...


['data', 'from', 'the', 'rbi', 'reveals', 'increase', 'takers', 'personal', 'loans', 'india', 'the', 'last', 'decade', 'this', 'can', 'credited', 'the', 'fact', 'that', 'both', 'collateral', 'free', 'loan', 'that', 'doesn', 'risk', 'your', 'assets', 'and', 'offered', 'instantly', 'online', 'lenders', 'who', 'often', 'disburse', 'funds', 'the', 'same', 'next', 'day', 'while', 'personal', 'loan', 'rules', 'and', 'regulations', 'india', 'are', 'usually', 'governed', 'the', 'rbi', 'and', 'almost', 'identical', 'amongst', 'lenders', 'important', 'you', 'know', 'the', 'more', 'detailed', 'loan', 'terms', 'and', 'conditions', 'set', 'your', 'specific', 'lender', 'before', 'you', 'sign', 'the', 'dotted', 'line', 'keep', 'the', 'following', 'mind', 'the', 'use', 'the', 'loan', 'the', 'terms', 'and', 'conditions', 'personal', 'loan', 'specify', 'that', 'versatile', 'and', 'you', 'can', 'use', 'for', 'any', 'legal', 'purpose', 'you', 'can', 'use', 'towards', 'personal', 'needs', 'like', 'paying',

  default_term_info = default_term_info.sort_values(
