These are the libraries used for web scraping and data processing.

In [None]:
import pandas as pd
import requests
from requests.exceptions import RequestException
from bs4 import BeautifulSoup

This code reads a CSV file named 'links_during_new.csv' and stores it in a Pandas dataframe 'df'. The CSV file has only one column with headers named 'links-href'.

In [None]:
df = pd.read_csv('links_during_new.csv', header=None, names=['links-href'])

This code initializes an empty list 'texts'. This list will store the text from web pages.

In [None]:
texts = []

This code initializes a session using the 'requests' library and iterates through each URL in the 'links-href' column of the 'df' dataframe.

In [None]:
with requests.Session() as session:
    for url in df['links-href']:

This code first checks if the URL starts with 'http://' or 'https://'. If it doesn't, it adds 'http://' to the beginning of the URL. It then sends a GET request to the URL using the session. If the response is successful, it extracts the content using BeautifulSoup and finds all the 'p' tags in the HTML. It then extracts the text from each 'p' tag and joins it with a newline separator. The resulting text is appended to the 'texts' list.

In [None]:
if not url.startswith('http://') and not url.startswith('https://'):
    url = 'http://' + url

response = session.get(url)

soup = BeautifulSoup(response.content, 'html.parser')

paragraphs = soup.find_all('p')

text = '\n'.join([p.get_text() for p in paragraphs])

texts.append(text)

This code catches any exceptions that occur during the processing of a URL and prints an error message indicating the URL and the specific exception. It then appends an empty string to the 'texts' list.

In [None]:
except RequestException as e:
    print(f"Error processing URL {url}: {e}")
    texts.append('')

This code creates a new column in the 'df' dataframe called 'text' and adds the 'texts' list as a series to that column. It then saves the resulting dataframe to an Excel file called 'output_during.xlsx' without including an index column.

In [None]:
df['text'] = pd.Series(texts)

df.to_excel('output_during.xlsx', index=False)