# Scrape values texts
Cycle 8. 13th Nov 2025

https://thephilosophyforum.com/discussion/5463/what-are-our-values

In [1]:
# scrape html content from a webpage
from bs4 import BeautifulSoup
import requests
import pandas as pd

url = "https://thephilosophyforum.com/discussion/5463/what-are-our-values/p1"
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')

# Get the specific ul element
discussion_list_1 = soup.find('ul', class_='DataList MessageList Discussion FirstPage')

# Get all li elements within the ul
li_elements_1 = discussion_list_1.find_all('li')


# Page 2
url = "https://thephilosophyforum.com/discussion/5463/what-are-our-values/p2"
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
discussion_list_2 = soup.find('ul', class_='DataList MessageList Discussion')

# Get all li elements within the ul
li_elements_2 = discussion_list_2.find_all('li')


# Combine li elements from both pages
li_elements = li_elements_1 + li_elements_2

In [2]:
li_elements_1[0]

<li class="Item Comment FirstComment nc" id="Discussion_5463">
<div class="Comment">
<span class="Author">
<a class="ProfileLink" href="/profile/694/t-clark" title="T Clark"><img class="ProfilePhotoMedium" src="//i-6uf0utvje8gy-cdn.plushcontent.com/uploads/userpics/635/nQN2X097HS9ID.jpg?v=YBM"/></a><a href="/profile/694/t-clark">T Clark</a> </span>
<div class="CommentInfo">
<span style="cursor:default;" title="Posts">15.6k</span> </div>
<div class="Message">
			In the last week or so, in several different contexts, I have found myself pontificating on human values and the part they play in philosophy and human life. For me, the subject always comes forward when there is a discussion of objectivity - objective truth, objective morals, objective beauty. I believe the idea of objectivity can be useful in a relatively limited set of circumstances as long as it is recognized that applying it outside it's limits will lead to misunderstanding and confusion. I think that's true because talk of

In [3]:
# Extract data from each li element
data = []
for li in li_elements:
    # Skip list items that are just values (nested li without Comment class)
    if 'Comment' not in li.get('class', []):
        continue

    author_link = li.find('a', class_='ProfileLink')
    author_name = li.find('a', href=True)
    post_count = li.find('span', style='cursor:default;')
    content = li.find('div', class_='Message')
    date = li.find('time', class_='newtime')
    comment_id = li.get('id')

    data.append({
        'comment_id': comment_id,
        'author': author_link.get('title') if author_link else None,
        'post_count': post_count.text if post_count else None,
        'content': content.get_text(strip=True) if content else None,
        'date': date.get('datetime') if date else None
    })

# Create DataFrame
df = pd.DataFrame(data)
df

Unnamed: 0,comment_id,author,post_count,content,date
0,Discussion_5463,T Clark,15.6k,"In the last week or so, in several different c...",2019-03-30T18:17:56+00:00
1,Comment_270772,T Clark,15.6k,Following up on the post I sent out a few minu...,2019-03-30T18:22:56+00:00
2,Comment_270776,RegularGuy,2.6k,↪T ClarkI value different things in different ...,2019-03-30T18:34:29+00:00
3,Comment_270788,Joshs,6.5k,↪T ClarkYou missed something here. You left ou...,2019-03-30T19:34:39+00:00
4,Comment_270803,praxis,7k,↪T ClarkJudging by your declared values I woul...,2019-03-30T20:01:40+00:00
5,Comment_270827,T Clark,15.6k,Judging by your declared values I would guess ...,2019-03-30T20:28:34+00:00
6,Comment_270833,T Clark,15.6k,You missed something here. You left out philos...,2019-03-30T20:32:58+00:00
7,Comment_270847,Shawn,13.5k,"I understand the importance of values; but, ho...",2019-03-30T21:10:27+00:00
8,Comment_270854,Joshs,6.5k,"↪T Clark""As I indicated in the OP, broader iss...",2019-03-30T21:23:39+00:00
9,Comment_270856,VagabondSpectre,1.9k,↪T ClarkSome of us will have conflicting value...,2019-03-30T21:26:32+00:00


In [4]:
# Concatenate the texts of all comments for each author with a newline in between
df_grouped = df.groupby('author')['content'].apply(lambda texts: ' '.join(texts)).reset_index()

# Include a column with User ID (assinged number in integer)
df_grouped.insert(0, 'user_id', range(1, len(df_grouped) + 1))
df_grouped['user_id'] = df_grouped['user_id'].astype(int)

# Change content column name to value_text
df_grouped = df_grouped.rename(columns={'content': 'value_text'})

# Save to CSV
df_grouped.to_csv('values_scraped_comments_thephilosophyforum.csv', index=False)
df_grouped

Unnamed: 0,user_id,author,value_text
0,1,BC,obedient ... reverent ... capitalism—T ClarkAn...
1,2,I like sushi,The basic principle of anarchism seems to be a...
2,3,Janus,I have a feeling these are too simplistic and ...
3,4,Joshs,↪T ClarkYou missed something here. You left ou...
4,5,Judaka,↪T ClarkYou take anything and express all the ...
5,6,Possibility,"Personally I see nationalism, patriotism and l..."
6,7,RegularGuy,↪T ClarkI value different things in different ...
7,8,S,↪T ClarkDiogenes and Nietzsche were good on va...
8,9,Shawn,"I understand the importance of values; but, ho..."
9,10,T Clark,"In the last week or so, in several different c..."
