# Web Scraping of Haodoo Backup Using BeautifulSoup Take 1
### David Lowe
### July 8, 2022

SUMMARY: This project aims to practice web scraping by extracting specific pieces of information from a website. The web scraping Python code leverages the BeautifulSoup module.

INTRODUCTION: Haodoo is a website that houses classic Chinese literature for its readers’ enjoyment. Haodoo in Chinese can be translated to “Good Reads” in English. It collects hard-to-find Chinese text/books and makes them available for online reading. The Haodoo collection includes over 3,500 titles of text and audiobooks.

In this Take1 iteration, we will scrape the website and obtain all the book titles and their assigned categories.

Starting URL: https://haodoo.org/

## Task 1. Prepare Environment

In [1]:
# import os
import sys
import requests
from bs4 import BeautifulSoup
import pandas as pd
from time import sleep

In [2]:
# Specifying the URL of desired web page to be scrapped
website_url = "https://haodoo.org/"

## Task 2. Get Categories

In [3]:
df_catogory = pd.DataFrame(columns=['category_name', 'category_link'])

try:
    sess = requests.Session()
    resp = sess.get(website_url)
    # print(resp.text)
except requests.HTTPError as e:
    print('The server could not serve up the web page!')
    sys.exit("Script processing cannot continue!!!")
except requests.ConnectionError as e:
    print('The server could not be reached due to connection issues!')
    sys.exit("Script processing cannot continue!!!")

if resp.status_code==requests.codes.ok :
    print('Successfully accessed the web page: ' + website_url)
    home_page = BeautifulSoup(resp.text, 'lxml')
    category_tags = home_page.find("td", class_="a03").find_all("a")
    # print(category_tags)
    for tag in category_tags :
        category_name = tag.string
        category_link = website_url + tag.get('href')
        df_catogory.loc[len(df_catogory)] = [category_name, category_link]

    df_catogory.drop(df_catogory.tail(1).index,inplace=True)
    print(df_catogory)

Successfully accessed the web page: https://haodoo.org/
  category_name                       category_link
0          世紀百強      https://haodoo.org/?M=hd&P=100
1          隨身智囊   https://haodoo.org/?M=hd&P=wisdom
2          歷史煙雲  https://haodoo.org/?M=hd&P=history
3          武俠小說  https://haodoo.org/?M=hd&P=martial
4          懸疑小說  https://haodoo.org/?M=hd&P=mystery
5          言情小說  https://haodoo.org/?M=hd&P=romance
6          奇幻小說    https://haodoo.org/?M=hd&P=scifi
7          小說園地  https://haodoo.org/?M=hd&P=fiction
8          有聲書籍    https://haodoo.org/?M=hd&P=audio


## Task 3. Get Book Titles for each Category

### Task 3.1 Get Titles and Description for the 100 Category

In [4]:
df_title_100 = pd.DataFrame(columns=['category_name', 'book_title', 'title_link'])

category_name = df_catogory.loc[0].category_name
category_link = df_catogory.loc[0].category_link

max_grouping = 5
for j in range(max_grouping) :
    grouping_link = category_link + '-' + str(j+1)
    try:
        sess = requests.Session()
        resp = sess.get(grouping_link)
        # print(resp.text)
    except requests.HTTPError as e:
        print('The server could not serve up the web page!')
        sys.exit("Script processing cannot continue!!!")
    except requests.ConnectionError as e:
        print('The server could not be reached due to connection issues!')
        sys.exit("Script processing cannot continue!!!")

    if resp.status_code==requests.codes.ok :
        print('Successfully accessed the web page: ' + grouping_link)
        grouping_page = BeautifulSoup(resp.text, 'lxml')
        book_tags = grouping_page.find("div", class_="a03").find_all("a")
        # print(book_tags)
        for tag in book_tags :
            if tag.string is not None:
                book_title = tag.string
                title_link = tag.get('href')
                df_title_100.loc[len(df_title_100)] = [category_name, book_title, title_link]
    sleep(2)

print(df_title_100)

Successfully accessed the web page: https://haodoo.org/?M=hd&P=100-1
Successfully accessed the web page: https://haodoo.org/?M=hd&P=100-2
Successfully accessed the web page: https://haodoo.org/?M=hd&P=100-3
Successfully accessed the web page: https://haodoo.org/?M=hd&P=100-4
Successfully accessed the web page: https://haodoo.org/?M=hd&P=100-5
    category_name book_title      title_link
0            世紀百強       【吶喊】   ?M=book&P=435
1            世紀百強       【邊城】   ?M=book&P=394
2            世紀百強     《駱駝祥子》   ?M=book&P=401
3            世紀百強       【傳奇】   ?M=book&P=430
4            世紀百強       【圍城】   ?M=book&P=399
..            ...        ...             ...
103          世紀百強       【活著】   ?M=book&P=471
104          世紀百強   《岡底斯的誘惑》   ?M=book&P=555
105          世紀百強     《十年十癔》  ?M=book&P=11M7
106          世紀百強    【北極風情畫】   ?M=book&P=548
107          世紀百強     【雍正皇帝】   ?M=book&P=240

[108 rows x 3 columns]


### Task 3.2 Get Titles and Description for the Wisdom Category 

In [5]:
df_title_wisdom = pd.DataFrame(columns=['category_name', 'book_title', 'title_link'])

category_name = df_catogory.loc[1].category_name
category_link = df_catogory.loc[1].category_link

max_grouping = 6
for j in range(max_grouping) :
    grouping_link = category_link + '-' + str(j+1)
    try:
        sess = requests.Session()
        resp = sess.get(grouping_link)
        # print(resp.text)
    except requests.HTTPError as e:
        print('The server could not serve up the web page!')
        sys.exit("Script processing cannot continue!!!")
    except requests.ConnectionError as e:
        print('The server could not be reached due to connection issues!')
        sys.exit("Script processing cannot continue!!!")

    if resp.status_code==requests.codes.ok :
        print('Successfully accessed the web page: ' + grouping_link)
        grouping_page = BeautifulSoup(resp.text, 'lxml')
        book_tags = grouping_page.find("div", class_="a03").find_all("a")
        # print(book_tags)
        for tag in book_tags :
            if tag.string is not None:
                book_title = tag.string
                title_link = tag.get('href')
                df_title_wisdom.loc[len(df_title_wisdom)] = [category_name, book_title, title_link]
    sleep(2)

print(df_title_wisdom)

Successfully accessed the web page: https://haodoo.org/?M=hd&P=wisdom-1
Successfully accessed the web page: https://haodoo.org/?M=hd&P=wisdom-2
Successfully accessed the web page: https://haodoo.org/?M=hd&P=wisdom-3
Successfully accessed the web page: https://haodoo.org/?M=hd&P=wisdom-4
Successfully accessed the web page: https://haodoo.org/?M=hd&P=wisdom-5
Successfully accessed the web page: https://haodoo.org/?M=hd&P=wisdom-6
    category_name  book_title         title_link
0            隨身智囊    【簡化你的生活】     ?M=book&P=17S4
1            隨身智囊     《男女大不同》    ?M=Share&P=0836
2            隨身智囊     《人性的弱點》      ?M=book&P=519
3            隨身智囊       【沉思錄】    ?M=Share&P=09E5
4            隨身智囊      【勵志文粹】    ?M=Share&P=09D9
..            ...         ...                ...
390          隨身智囊        金匱要略  ?M=m&P=J090620a:0
391          隨身智囊        [下載]  PDB/J/090620a.pdb
392          隨身智囊         傷寒論  ?M=m&P=J090620b:0
393          隨身智囊        [下載]  PDB/J/090620b.pdb
394          隨身智囊  【麻瑞亭治驗集總論】

### Task 3.3 Get Titles and Description for the History Category

In [6]:
df_title_history = pd.DataFrame(columns=['category_name', 'book_title', 'title_link'])

category_name = df_catogory.loc[2].category_name
category_link = df_catogory.loc[2].category_link

max_grouping = 3
for j in range(max_grouping) :
    grouping_link = category_link + '-' + str(j+1)
    try:
        sess = requests.Session()
        resp = sess.get(grouping_link)
        # print(resp.text)
    except requests.HTTPError as e:
        print('The server could not serve up the web page!')
        sys.exit("Script processing cannot continue!!!")
    except requests.ConnectionError as e:
        print('The server could not be reached due to connection issues!')
        sys.exit("Script processing cannot continue!!!")

    if resp.status_code==requests.codes.ok :
        print('Successfully accessed the web page: ' + grouping_link)
        grouping_page = BeautifulSoup(resp.text, 'lxml')
        book_tags = grouping_page.find("div", class_="a03").find_all("a")
        # print(book_tags)
        for tag in book_tags :
            if tag.string is not None:
                book_title = tag.string
                title_link = tag.get('href')
                df_title_history.loc[len(df_title_history)] = [category_name, book_title, title_link]
    sleep(2)

print(df_title_history)

Successfully accessed the web page: https://haodoo.org/?M=hd&P=history-1
Successfully accessed the web page: https://haodoo.org/?M=hd&P=history-2
Successfully accessed the web page: https://haodoo.org/?M=hd&P=history-3
    category_name       book_title      title_link
0            歷史煙雲           【考古中國】  ?M=book&P=17I7
1            歷史煙雲             【炎黃】  ?M=book&P=17G0
2            歷史煙雲             【孔子】  ?M=book&P=1745
3            歷史煙雲           《大秦帝國》   ?M=book&P=356
4            歷史煙雲   【秦始皇：從戰國到一統天下】  ?M=book&P=14S0
..            ...              ...             ...
271          歷史煙雲           【逃出西貢】  ?M=book&P=17O7
272          歷史煙雲           【銀海千秋】  ?M=book&P=1280
273          歷史煙雲           《文武北洋》  ?M=book&P=1398
274          歷史煙雲         《美國種族簡史》  ?M=book&P=1364
275          歷史煙雲  【公主柳－西藏文化的變遷模式】  ?M=book&P=1383

[276 rows x 3 columns]


### Task 3.4 Get Titles and Description for the Martial Category

In [7]:
df_title_martial = pd.DataFrame(columns=['category_name', 'book_title', 'title_link'])

category_name = df_catogory.loc[3].category_name
category_link = df_catogory.loc[3].category_link

max_grouping = 10
for j in range(max_grouping) :
    grouping_link = category_link + '-' + str(j+1)
    try:
        sess = requests.Session()
        resp = sess.get(grouping_link)
        # print(resp.text)
    except requests.HTTPError as e:
        print('The server could not serve up the web page!')
        sys.exit("Script processing cannot continue!!!")
    except requests.ConnectionError as e:
        print('The server could not be reached due to connection issues!')
        sys.exit("Script processing cannot continue!!!")

    if resp.status_code==requests.codes.ok :
        print('Successfully accessed the web page: ' + grouping_link)
        grouping_page = BeautifulSoup(resp.text, 'lxml')
        book_tags = grouping_page.find("div", class_="a03").find_all("a")
        # print(book_tags)
        for tag in book_tags :
            if tag.string is not None:
                book_title = tag.string
                title_link = tag.get('href')
                df_title_martial.loc[len(df_title_martial)] = [category_name, book_title, title_link]
    sleep(2)

print(df_title_martial)

Successfully accessed the web page: https://haodoo.org/?M=hd&P=martial-1
Successfully accessed the web page: https://haodoo.org/?M=hd&P=martial-2
Successfully accessed the web page: https://haodoo.org/?M=hd&P=martial-3
Successfully accessed the web page: https://haodoo.org/?M=hd&P=martial-4
Successfully accessed the web page: https://haodoo.org/?M=hd&P=martial-5
Successfully accessed the web page: https://haodoo.org/?M=hd&P=martial-6
Successfully accessed the web page: https://haodoo.org/?M=hd&P=martial-7
Successfully accessed the web page: https://haodoo.org/?M=hd&P=martial-8
Successfully accessed the web page: https://haodoo.org/?M=hd&P=martial-9
Successfully accessed the web page: https://haodoo.org/?M=hd&P=martial-10
    category_name book_title       title_link
0            武俠小說    【射鵰英雄傳】     ?M=book&P=55
1            武俠小說     【神鵰俠侶】     ?M=book&P=56
2            武俠小說    【倚天屠龍記】     ?M=book&P=57
3            武俠小說    【書劍恩仇錄】     ?M=book&P=58
4            武俠小說      【碧血劍】     ?M=boo

### Task 3.5 Get Titles and Description for the Mystery Category

In [8]:
df_title_mystery = pd.DataFrame(columns=['category_name', 'book_title', 'title_link'])

category_name = df_catogory.loc[4].category_name
category_link = df_catogory.loc[4].category_link

max_grouping = 5
for j in range(max_grouping) :
    grouping_link = category_link + '-' + str(j+1)
    try:
        sess = requests.Session()
        resp = sess.get(grouping_link)
        # print(resp.text)
    except requests.HTTPError as e:
        print('The server could not serve up the web page!')
        sys.exit("Script processing cannot continue!!!")
    except requests.ConnectionError as e:
        print('The server could not be reached due to connection issues!')
        sys.exit("Script processing cannot continue!!!")

    if resp.status_code==requests.codes.ok :
        print('Successfully accessed the web page: ' + grouping_link)
        grouping_page = BeautifulSoup(resp.text, 'lxml')
        book_tags = grouping_page.find("div", class_="a03").find_all("a")
        # print(book_tags)
        for tag in book_tags :
            if tag.string is not None:
                book_title = tag.string
                title_link = tag.get('href')
                df_title_mystery.loc[len(df_title_mystery)] = [category_name, book_title, title_link]
    sleep(2)

print(df_title_mystery)

Successfully accessed the web page: https://haodoo.org/?M=hd&P=mystery-1
Successfully accessed the web page: https://haodoo.org/?M=hd&P=mystery-2
Successfully accessed the web page: https://haodoo.org/?M=hd&P=mystery-3
Successfully accessed the web page: https://haodoo.org/?M=hd&P=mystery-4
Successfully accessed the web page: https://haodoo.org/?M=hd&P=mystery-5
    category_name  book_title             title_link
0            懸疑小說    柯賴二氏探案系列  ?M=hd&P=Mimic-Gardner
1            懸疑小說   《初出茅廬破大案》          ?M=book&P=357
2            懸疑小說      《險中取勝》          ?M=book&P=366
3            懸疑小說     《黃金的秘密》          ?M=book&P=367
4            懸疑小說  《拉斯維加斯錢來了》          ?M=book&P=368
..            ...         ...                    ...
596          懸疑小說      《一的悲劇》        ?M=Share&P=0939
597          懸疑小說      《斜屋犯罪》        ?M=Share&P=0959
598          懸疑小說     【１３級階梯】         ?M=book&P=14R5
599          懸疑小說      【花園迷宮】         ?M=book&P=15E3
600          懸疑小說      《大海獠牙》         ?M=book&P=15Y6

### Task 3.6 Get Titles and Description for the Romance Category

In [9]:
df_title_romance = pd.DataFrame(columns=['category_name', 'book_title', 'title_link'])

category_name = df_catogory.loc[5].category_name
category_link = df_catogory.loc[5].category_link

max_grouping = 6
for j in range(max_grouping) :
    grouping_link = category_link + '-' + str(j+1)
    try:
        sess = requests.Session()
        resp = sess.get(grouping_link)
        # print(resp.text)
    except requests.HTTPError as e:
        print('The server could not serve up the web page!')
        sys.exit("Script processing cannot continue!!!")
    except requests.ConnectionError as e:
        print('The server could not be reached due to connection issues!')
        sys.exit("Script processing cannot continue!!!")

    if resp.status_code==requests.codes.ok :
        print('Successfully accessed the web page: ' + grouping_link)
        grouping_page = BeautifulSoup(resp.text, 'lxml')
        book_tags = grouping_page.find("div", class_="a03").find_all("a")
        # print(book_tags)
        for tag in book_tags :
            if tag.string is not None:
                book_title = tag.string
                title_link = tag.get('href')
                df_title_romance.loc[len(df_title_romance)] = [category_name, book_title, title_link]
    sleep(2)

print(df_title_romance)

Successfully accessed the web page: https://haodoo.org/?M=hd&P=romance-1
Successfully accessed the web page: https://haodoo.org/?M=hd&P=romance-2
Successfully accessed the web page: https://haodoo.org/?M=hd&P=romance-3
Successfully accessed the web page: https://haodoo.org/?M=hd&P=romance-4
Successfully accessed the web page: https://haodoo.org/?M=hd&P=romance-5
Successfully accessed the web page: https://haodoo.org/?M=hd&P=romance-6
    category_name   book_title       title_link
0            言情小說       《我的故事》   ?M=book&P=10S2
1            言情小說       《還珠格格》    ?M=book&P=319
2            言情小說      【還珠格格三】   ?M=book&P=15R1
3            言情小說         《窗外》    ?M=book&P=320
4            言情小說        《幸運草》    ?M=book&P=321
..            ...          ...              ...
350          言情小說       《冬季戀歌》    ?M=book&P=422
351          言情小說  《未來，我是你的老婆》  ?M=Share&P=0912
352          言情小說       【三世守護】   ?M=book&P=1173
353          言情小說        【朝天闕】   ?M=book&P=11K6
354          言情小說        【玉蟾記】   ?

### Task 3.7 Get Titles and Description for the SciFi Category

In [10]:
df_title_scifi = pd.DataFrame(columns=['category_name', 'book_title', 'title_link'])

category_name = df_catogory.loc[6].category_name
category_link = df_catogory.loc[6].category_link

max_grouping = 10
for j in range(max_grouping) :
    grouping_link = category_link + '-' + str(j+1)
    try:
        sess = requests.Session()
        resp = sess.get(grouping_link)
        # print(resp.text)
    except requests.HTTPError as e:
        print('The server could not serve up the web page!')
        sys.exit("Script processing cannot continue!!!")
    except requests.ConnectionError as e:
        print('The server could not be reached due to connection issues!')
        sys.exit("Script processing cannot continue!!!")

    if resp.status_code==requests.codes.ok :
        print('Successfully accessed the web page: ' + grouping_link)
        grouping_page = BeautifulSoup(resp.text, 'lxml')
        book_tags = grouping_page.find("div", class_="a03").find_all("a")
        # print(book_tags)
        for tag in book_tags :
            if tag.string is not None:
                book_title = tag.string
                title_link = tag.get('href')
                df_title_scifi.loc[len(df_title_scifi)] = [category_name, book_title, title_link]
    sleep(2)

print(df_title_scifi)

Successfully accessed the web page: https://haodoo.org/?M=hd&P=scifi-1
Successfully accessed the web page: https://haodoo.org/?M=hd&P=scifi-2
Successfully accessed the web page: https://haodoo.org/?M=hd&P=scifi-3
Successfully accessed the web page: https://haodoo.org/?M=hd&P=scifi-4
Successfully accessed the web page: https://haodoo.org/?M=hd&P=scifi-5
Successfully accessed the web page: https://haodoo.org/?M=hd&P=scifi-6
Successfully accessed the web page: https://haodoo.org/?M=hd&P=scifi-7
Successfully accessed the web page: https://haodoo.org/?M=hd&P=scifi-8
Successfully accessed the web page: https://haodoo.org/?M=hd&P=scifi-9
Successfully accessed the web page: https://haodoo.org/?M=hd&P=scifi-10
    category_name  book_title       title_link
0            奇幻小說       【玫瑰紅】   ?M=book&P=17E4
1            奇幻小說      【倪匡傳奇】  ?M=Share&P=0389
2            奇幻小說     【香港鬼故事】   ?M=book&P=12I2
3            奇幻小說  【香港鬼故事第二集】   ?M=book&P=1672
4            奇幻小說     【城市怪故事】   ?M=book&P=12F8
..     

### Task 3.8 Get Titles and Description for the Fiction Category

In [11]:
df_title_fiction = pd.DataFrame(columns=['category_name', 'book_title', 'title_link'])

category_name = df_catogory.loc[7].category_name
category_link = df_catogory.loc[7].category_link

max_grouping = 7
for j in range(max_grouping) :
    grouping_link = category_link + '-' + str(j+1)
    try:
        sess = requests.Session()
        resp = sess.get(grouping_link)
        # print(resp.text)
    except requests.HTTPError as e:
        print('The server could not serve up the web page!')
        sys.exit("Script processing cannot continue!!!")
    except requests.ConnectionError as e:
        print('The server could not be reached due to connection issues!')
        sys.exit("Script processing cannot continue!!!")

    if resp.status_code==requests.codes.ok :
        print('Successfully accessed the web page: ' + grouping_link)
        grouping_page = BeautifulSoup(resp.text, 'lxml')
        book_tags = grouping_page.find("div", class_="a03").find_all("a")
        # print(book_tags)
        for tag in book_tags :
            if tag.string is not None:
                book_title = tag.string
                title_link = tag.get('href')
                df_title_fiction.loc[len(df_title_fiction)] = [category_name, book_title, title_link]
    sleep(2)

print(df_title_fiction)

Successfully accessed the web page: https://haodoo.org/?M=hd&P=fiction-1
Successfully accessed the web page: https://haodoo.org/?M=hd&P=fiction-2
Successfully accessed the web page: https://haodoo.org/?M=hd&P=fiction-3
Successfully accessed the web page: https://haodoo.org/?M=hd&P=fiction-4
Successfully accessed the web page: https://haodoo.org/?M=hd&P=fiction-5
Successfully accessed the web page: https://haodoo.org/?M=hd&P=fiction-6
Successfully accessed the web page: https://haodoo.org/?M=hd&P=fiction-7
    category_name book_title       title_link
0            小說園地       【傳奇】    ?M=book&P=430
1            小說園地      【半生緣】    ?M=book&P=431
2            小說園地      【色，戒】  ?M=Share&P=10B3
3            小說園地       【流言】   ?M=book&P=15K1
4            小說園地       【張看】   ?M=book&P=15M7
..            ...        ...              ...
711          小說園地     【歡喜冤家】   ?M=book&P=17H3
712          小說園地    【歡喜冤家續】   ?M=book&P=17Q2
713          小說園地      【巫夢緣】   ?M=book&P=17R5
714          小說園地      【鬧花叢】 

### Task 3.9 Get Titles and Description for the Audio Category

In [12]:
df_title_audio = pd.DataFrame(columns=['category_name', 'book_title', 'title_link'])

category_name = df_catogory.loc[8].category_name
category_link = df_catogory.loc[8].category_link

grouping_link = category_link
try:
    sess = requests.Session()
    resp = sess.get(grouping_link)
    # print(resp.text)
except requests.HTTPError as e:
    print('The server could not serve up the web page!')
    sys.exit("Script processing cannot continue!!!")
except requests.ConnectionError as e:
    print('The server could not be reached due to connection issues!')
    sys.exit("Script processing cannot continue!!!")

if resp.status_code==requests.codes.ok :
    print('Successfully accessed the web page: ' + grouping_link)
    grouping_page = BeautifulSoup(resp.text, 'lxml')
    book_tags = grouping_page.find("div", class_="a03").find_all("a")
    # print(book_tags)
    for tag in book_tags :
        if tag.string is not None:
            book_title = tag.string
            title_link = tag.get('href')
            if book_title != 'Audacity' :
                df_title_audio.loc[len(df_title_audio)] = [category_name, book_title, title_link]

print(df_title_audio)

Successfully accessed the web page: https://haodoo.org/?M=hd&P=audio
   category_name        book_title             title_link
0           有聲書籍            《雅舍小品》     ?M=book&P=audio229
1           有聲書籍       《又見棕櫚．又見棕櫚》     ?M=book&P=audio573
2           有聲書籍           《早起看人間》    ?M=book&P=audio12D0
3           有聲書籍         《把話說到心窩裡》    ?M=book&P=audio0903
4           有聲書籍           《放下的幸福》    ?M=book&P=audio1147
5           有聲書籍         《方法總比問題多》    ?M=book&P=audio0969
6           有聲書籍      《男女大不同 1-6章》    ?M=book&P=audio0836
7           有聲書籍     《男女大不同 7-13章》  ?M=book&P=audio0836-1
8           有聲書籍       《爸爸，我們去哪裡？》    ?M=book&P=audio1413
9           有聲書籍              《深情》    ?M=book&P=audio14A5
10          有聲書籍           《河上的月光》    ?M=book&P=audio14T3
11          有聲書籍         《愛廬小品：生活》   ?M=book&P=audio12J9B
12          有聲書籍            《旅美小簡》    ?M=book&P=audio1289
13          有聲書籍       《過得好，因為我值得》    ?M=book&P=audio12E6
14          有聲書籍   《震撼心靈的116個生命感悟》    ?M=book&P=audio09B4
15 

### Task 3.10 Organizing Data and Producing Outputs

In [13]:
df_title_all = pd.concat([df_title_100, df_title_wisdom, df_title_history,
                          df_title_martial, df_title_mystery, df_title_romance,
                          df_title_scifi, df_title_fiction, df_title_audio])
out_file = df_title_all.to_csv(index=False, line_terminator = '\r')
with open('py_webscraping_beautifulsoup_haodoo_titles.csv', 'w', encoding="utf-8") as f:
    f.write(out_file)
print('Total number of title found from web scraping:', len(df_title_all))

Total number of title found from web scraping: 3545
