# Ask HN Books

Hacker News has a type of post called "Ask HN" where people ask questions. Book recommendations is one kind of frequently asked question.

In [1]:
import numpy as np
import pandas as pd

import html

from pathlib import Path

In [2]:
hn_path = Path('../data/01_raw/hackernews2021.parquet')

df = pd.read_parquet(hn_path, use_nullable_dtypes=True).set_index('id')

In [3]:
pd.options.display.max_colwidth = 400

Let's find all the parents

In [4]:
from tqdm.notebook import tqdm

from collections import defaultdict

parent_dict = df['parent'].dropna().to_dict()

parent_dict = defaultdict(lambda: pd.NA, parent_dict)


MAX_DEPTH = 50

df['parent0'] = df['parent']

for idx in tqdm(range(MAX_DEPTH)):
    last_col = f'parent{idx}'
    col = f'parent{idx+1}'
    
    df[col] = df[last_col].map(parent_dict, na_action='ignore')
    if df[col].isna().all():
        del df[col]
        break


root = None

for col in df.filter(regex='parent\d+').iloc[:,::-1]:
    if root is None:
        root = df[col]
    else:
        root = root.combine_first(df[col])
df['root'] = root

  0%|          | 0/50 [00:00<?, ?it/s]

In [5]:
df['depth'] = df.filter(regex='parent\d+').notna().sum(axis=1)

## Ask HN Books

Let's search for all Ask HN posts with books or textbooks in the title.

In [6]:
ask_hn_books = df[df['title'].str.contains(r'Ask HN.*\b(?:text)?books?\b', regex=True)]

ask_hn_books

Unnamed: 0_level_0,title,url,text,dead,by,score,time,timestamp,type,parent,...,parent36,parent37,parent38,parent39,parent40,parent41,parent42,parent43,root,depth
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
29622942,Ask HN: How much do you love discussing books with the people who read them?,,And how do you discuss them? what type of discussions you love about books?. What do you think about discussing books with a stranger on the internet?,,NithurM,1,1639993717,2021-12-20 09:48:37+00:00,story,,...,,,,,,,,,,0
26624324,Ask HN: Is it worth it for a tech startup founder to write a chapter in a book?,,"I was approached with a proposal to write a chapter in an Elseveir book. I am a startup founder so time is pretty scarce and this will take up all my free time for the next couple months. It sounds interesting to think about our field from a different, more scientific perspective, but 2 months of life might be a bit too high of a price if there are no other outputs.<p>Problem is that when I w...",True,isitworthit,1,1617035558,2021-03-29 16:32:38+00:00,story,,...,,,,,,,,,,0
29304013,"Ask HN: Forgot this name of product design book, any ideas?",,"I forgot the title of the book and name of the author, but all I can recall is it&#x27;s written by a guy that worked at Uber.. Anyone got ideas? Thank you very much. If you have other recommendations too, I&#x27;d love to check them out, thank you :)",,joshxyz,7,1637568639,2021-11-22 08:10:39+00:00,story,,...,,,,,,,,,,0
27394502,Ask HN: Have you stopped reading books?,,"Reading has always been a big part of my life but I was out with friends the other day and they asked what I was reading and I said that I feel like I&#x27;m reading more than ever but that I haven&#x27;t bought or read a book for over a year. In their place is a mix of podcasts, blogs, other articles, YouTube videos, and HN (a lot of which aren&#x27;t reading at all but scratch the same itch ...",,anm89,31,1622820202,2021-06-04 15:23:22+00:00,story,,...,,,,,,,,,,0
25831897,Ask HN: What programming tutorials/courses/e-books do you wish existed?,,"Even though we are spoiled for choice with the amount of programming learning materials produced each year, I imagine there are still some important topics which do not get as much love as others.",,carlmungz,2,1611051711,2021-01-19 10:21:51+00:00,story,,...,,,,,,,,,,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
27225583,Ask HN: Take notes with Apple Pencil on a ePub book?,,"I read ePub books on Apple Books on my Mac or iPad. I hate that I have to take notes in a separate notebook while iPad’s pencil is so great, and taking notes as annotations on the book itself would be so much more intuitive and useful. Anybody here have any solutions for this? Any app that solves this problem creatively by automagically converting ePub to something else?",,reacharavindh,3,1621537276,2021-05-20 19:01:16+00:00,story,,...,,,,,,,,,,0
29688332,Ask HN: What are the best tech books you read in 2021?,,"Mine are Statistical Inference in Computer Age, and Transactional Information Systems. Curious what other books teach us powerful concepts and tools that carry us a long way",,hintymad,95,1640477353,2021-12-26 00:09:13+00:00,story,,...,,,,,,,,,,0
28208817,Ask HN: Any good books about stress and management?,,I am constantly involved in multiple projects with different clients. I have read about that multitasking is stressful and that meditation is good against stress. But I wonder if there is a book on how to manage the situation without feeling stressed in the first place. I want a system that works rather than a cure for the symptoms or something like that. Any suggestions?,,waspight,2,1629200669,2021-08-17 11:44:29+00:00,story,,...,,,,,,,,,,0
26888567,Ask HN: Publishing a book about my time at a prominent startup,,"Looking for advice, throwaway account for obvious reasons.<p>Early employee at a startup for ~6 years from seed stage. Startup has successfully raised multiple rounds of funding. Profitable but pre-exit.<p>I&#x27;ve kept detailed notes and have numerous interesting -&gt; to me at least, stories to tell. Stories from founder histories. Growing Pains. Customer problems. Raising. Finding fit and ...",True,startupinsider,1,1619007198,2021-04-21 12:13:18+00:00,story,,...,,,,,,,,,,0


Some of these are also about things like reading books.
There are only 176 threads so we could classify them manually.

However we could go further and search for recommendations using words like "recommend", "best" or "top".

In [7]:
book_recommendations = ask_hn_books[ask_hn_books['title'].str.contains(r'\b(?:recommend(?:ed)|best|favou?rite|top)\b', case=False, regex=True)]

There are 42 threads and almost all of them are asking for book recommendations.

In [8]:
book_recommendations

Unnamed: 0_level_0,title,url,text,dead,by,score,time,timestamp,type,parent,...,parent36,parent37,parent38,parent39,parent40,parent41,parent42,parent43,root,depth
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
29710173,Ask HN: What are your favourite computer science books?,,,,wizardofmysore,5,1640663827,2021-12-28 03:57:07+00:00,story,,...,,,,,,,,,,0
28659626,Ask HN: What is the best programming book you've read?,,,,cylde_frog,3,1632650305,2021-09-26 09:58:25+00:00,story,,...,,,,,,,,,,0
25987664,Ask HN: Recommended books and papers on distributed systems?,,"The most recent and complete book on Distributed Systems that I&#x27;m aware of is Design Data Intensive Application (2017). I&#x27;m currently reading it. I also want to learn about other problems and ideas:<p>- Ideas that stood the test of times<p>- Ideas that were not feasible but now possible thanks to hardware improvement.<p>So, what&#x27;s your recommendations for books and papers on the...",,letientai299,302,1612178191,2021-02-01 11:16:31+00:00,story,,...,,,,,,,,,,0
29668228,Ask HN: What's the best book you read in 2021?,,Yearly thread. It can be books published on 2021 or in previous years (but that you read this year.),,AccountAccount1,515,1640304895,2021-12-24 00:14:55+00:00,story,,...,,,,,,,,,,0
28391738,Ask HN: Best books on modern distributed systems,,"I&#x27;ve read designing data intensive systems and it covered distributed systems a bit.<p>I don&#x27;t find most textbooks to be an actually good intro outside of a course setting. For example, I own Andrew Tannenbaum&#x27;s Distributed System book and a few others of his. But his writing style is too dense for me to make enough progress without giving up.<p>What other books (probably not te...",,eatonphil,62,1630589316,2021-09-02 13:28:36+00:00,story,,...,,,,,,,,,,0
28308141,Ask HN: What are your top 5 favorite computer books?,,I&#x27;m looking to expand my book collection and I&#x27;d like to know what this community is reading related to computers.,,justinzollars,11,1629931501,2021-08-25 22:45:01+00:00,story,,...,,,,,,,,,,0
29602228,Ask HN: What are the best books for professional effectiveness?,,What books have helped you be more effective at work that apply to most “knowledge work” jobs?,,arikr,107,1639808795,2021-12-18 06:26:35+00:00,story,,...,,,,,,,,,,0
29634694,Ask HN: What was the best book you read in 2021?,,,,ent101,12,1640069172,2021-12-21 06:46:12+00:00,story,,...,,,,,,,,,,0
28456318,Ask HN: What's the best book on AWS Lambda?,,,True,gilbertmpanga12,1,1631105874,2021-09-08 12:57:54+00:00,story,,...,,,,,,,,,,0
28181074,Ask HN: Best (practical) books on web security?,,"I would like to learn more about topics like:<p>- DMZ<p>- bastion hosts (should we use them? Why or why not)<p>- ssh<p>- best practices<p>in the context of web development on the cloud. I&#x27;ve found a lot of material but they are very cloud-focused (aws&#x2F;gcp security, for example) or rely a lot on Kubernetes (which I&#x27;m not using). I&#x27;m a solo-developer maintaining a simple Saas...",,ingvul,10,1628952503,2021-08-14 14:48:23+00:00,story,,...,,,,,,,,,,0


We can now find all their threads

In [9]:
book_recommendation_threads = df[df.root.isin(book_recommendations.index)]

Interconnections by radia perlman interconnections

How to Build a Car by Adrian Newey

Liftoff by Eric Berger

The Pragmatic Programmer: From Journeyman to Master by Andy Hunt

Let's clean the text to make it easier to read

In [10]:
import re

def clean(text):
    text = html.unescape(text)
    text = text.replace('<i>', '*')
    text = text.replace('</i>', '*')
    text = text.replace('<p>', '\n\n')
    text = re.sub('<a href="(.*?)".*?>.*?</a>', r'\1', text)
    return text

Almost all the top level children contain a book recommendation.

In [11]:
for _id, row in book_recommendation_threads.query('depth==1').dropna(subset='text').sample(10).iterrows():
    print(_id, row.depth, df.loc[row.root].title)
    print(clean(row.text))
    print()

29288060 1 Ask HN: What are some of the best well-written books on computer science?
Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems
by Martin Kleppmann
https://www.amazon.com/Designing-Data-Intensive-Applications-Reliable-Maintainable/dp/1449373321

You can learn a lot of algorithms. It's useless unless you start to create architecture and use them in practice.

29603708 1 Ask HN: What are the best books for professional effectiveness?
yep, the happiness project: https://gretchenrubin.com/books/the-happiness-project/about-the-book/

Hear me out :)

It teaches you to do a lot of little things that over time make a huge impact on your well being and the people around you. I loved this book and found a huge impact on my life year after year as little habits it teaches started to add up.

In the workforce it also helped me. I am already very empathetic, but it helped me do a lot of little things for my team and the people I manage t

The second level comments contain some books, but it's much less likely

In [12]:
for _id, row in book_recommendation_threads.query('depth==2').dropna(subset='text').sample(10).iterrows():
    print(_id, row.depth, df.loc[row.root].title)
    print(clean(row.text))
    print()

29675450 2 Ask HN: What's the best book you read in 2021?
The most important part of this book is the thoughtful explanation and deep dive into psychedelics. Sure you might come away with "I should try this", but it explains how and why these drugs (*medicines) should be used, especially the potential downsides and adverse effects.

29671338 2 Ask HN: What's the best book you read in 2021?
I don't mean this in a cynical way, but how come you didn't expect a literary classic to be well-written?

29670440 2 Ask HN: What's the best book you read in 2021?
(I) think of the baroque cycle as fictionalized history rather than historical fiction

29607659 2 Ask HN: What are the best books for professional effectiveness?
I just picked up How to take Smart Notes. In undergrad a lifetime ago, I remember feeling that my note taking skills were woefully inadequate. My notes became more scattered, less comprehensive, and on the whole less reliable to use as semesters went on.

I think there's somethi

This gives us over 500 posts, most of which contain a book.

In [13]:
len(book_recommendation_threads.query('depth==1'))

537

In [14]:
(
    book_recommendation_threads
    .query('depth==1 & dead.isna() & text.notna()')
    .merge(df[['title', 'text']], how='left', left_on='root', right_index=True, suffixes=('', '_parent'))
    [['text', 'by', 'timestamp', 'root', 'title_parent', 'text_parent']]
).to_csv('../data/02_intermediate/hn_ask_book_recommendations.csv')

We could get some titles from here by looking for italics.

In [15]:
book_recommendation_threads.text.dropna().str.extractall('<i>(.*?)</i>').head(30)

Unnamed: 0_level_0,Unnamed: 1_level_0,0
id,match,Unnamed: 2_level_1
29673970,0,The Peripheral
29669067,0,Reaper
29670564,0,The Unwomanly Face of War: An Oral History of Women in World War II
29670564,1,Beanpole
29670564,2,Klara and the Sun
29671265,0,Three Men in a Boat
29671265,1,Tristram Shandy
29672389,0,essential
29692384,0,Designing Distributed Control Systems: A Pattern Language Approach
29616178,0,Difficult Conversations


And others by looking for "Book by Author"; which works somewhat.

In [16]:
book_recommendation_threads.text.dropna().apply(clean).str.extractall(r'[*"“]?([^.*"“\n]+?)[*"”]?[, ]*\bby\b((?:[^[A-Z]\w+)+)').head(30)

Unnamed: 0_level_0,Unnamed: 1_level_0,0,1
id,match,Unnamed: 2_level_1,Unnamed: 3_level_1
29669495,0,The Coming of Neo-Feudalism,Joel Kotkin
29669067,0,Reaper,Will Wight
29670033,0,A Gentleman in Moscow,Amor Towles
29670439,0,In Cold Blood,Truman Capote
25989272,0,CSP,Tony Hoare
25989272,1,Making reliable distributed systems in the presence of software errors,Joe Armstrong
25989272,2,Conflict-free Replicated Data Types,Marc Shapiro
25867397,0,very relatable,modern standards
25988717,0,The best and most accessible book on theory is probably Reliable and Secure Distributed systems,Cachin
29672082,0,Thanks for pointing out the new book,Neal Stephenson
