# Webscraping Nostradamus' Prophecies
## by Andreea Ion

In this notebook, I will scrape data from the open-source website, Internet Sacred Text Archive. 

**Importing Pandas**

In [81]:
import pandas as pd

**Read CSV file**

In [82]:
urls = pd.read_csv("metadata_WebScraping.csv", delimiter=',', encoding='utf=8')

**Display DataFrame**

In [83]:
urls

Unnamed: 0,ID,TITLE,AUTHOR,YEAR,URL
0,nos_01,The Prophecies of Nostradamus,Nostradamus,1555/1558,https://www.sacred-texts.com/nos/preface.htm
1,nos_02,The Prophecies of Nostradamus,Nostradamus,1555/1558,https://www.sacred-texts.com/nos/cen1eng.htm
2,nos_03,The Prophecies of Nostradamus,Nostradamus,1555/1558,https://www.sacred-texts.com/nos/cen2eng.htm
3,nos_04,The Prophecies of Nostradamus,Nostradamus,1555/1558,https://www.sacred-texts.com/nos/cen3eng.htm
4,nos_05,The Prophecies of Nostradamus,Nostradamus,1555/1558,https://www.sacred-texts.com/nos/cen4eng.htm
5,nos_06,The Prophecies of Nostradamus,Nostradamus,1555/1558,https://www.sacred-texts.com/nos/cen5eng.htm
6,nos_07,The Prophecies of Nostradamus,Nostradamus,1555/1558,https://www.sacred-texts.com/nos/cen6eng.htm
7,nos_08,The Prophecies of Nostradamus,Nostradamus,1555/1558,https://www.sacred-texts.com/nos/cen7eng.htm
8,nos_09,The Prophecies of Nostradamus,Nostradamus,1555/1558,https://www.sacred-texts.com/nos/epistle.htm
9,nos_10,The Prophecies of Nostradamus,Nostradamus,1555/1558,https://www.sacred-texts.com/nos/cen8eng.htm


Each "Century" in this CSV is paired with a URL to the text. 

**Importing Requests**

In [84]:
import requests

**Get HTML Data**

In [85]:
sample_urls = urls

In [86]:
def scrape_screenplay(url):
    response = requests.get(url) #verify=False
    html_string = response.text
    return html_string

In [87]:
sample_urls

Unnamed: 0,ID,TITLE,AUTHOR,YEAR,URL
0,nos_01,The Prophecies of Nostradamus,Nostradamus,1555/1558,https://www.sacred-texts.com/nos/preface.htm
1,nos_02,The Prophecies of Nostradamus,Nostradamus,1555/1558,https://www.sacred-texts.com/nos/cen1eng.htm
2,nos_03,The Prophecies of Nostradamus,Nostradamus,1555/1558,https://www.sacred-texts.com/nos/cen2eng.htm
3,nos_04,The Prophecies of Nostradamus,Nostradamus,1555/1558,https://www.sacred-texts.com/nos/cen3eng.htm
4,nos_05,The Prophecies of Nostradamus,Nostradamus,1555/1558,https://www.sacred-texts.com/nos/cen4eng.htm
5,nos_06,The Prophecies of Nostradamus,Nostradamus,1555/1558,https://www.sacred-texts.com/nos/cen5eng.htm
6,nos_07,The Prophecies of Nostradamus,Nostradamus,1555/1558,https://www.sacred-texts.com/nos/cen6eng.htm
7,nos_08,The Prophecies of Nostradamus,Nostradamus,1555/1558,https://www.sacred-texts.com/nos/cen7eng.htm
8,nos_09,The Prophecies of Nostradamus,Nostradamus,1555/1558,https://www.sacred-texts.com/nos/epistle.htm
9,nos_10,The Prophecies of Nostradamus,Nostradamus,1555/1558,https://www.sacred-texts.com/nos/cen8eng.htm


In [88]:
from bs4 import BeautifulSoup
import re

In [89]:
prophecies = []
for url in sample_urls['URL']:
    response = requests.get(url)
    document = BeautifulSoup(response.text, "html.parser")
    paragraphs = document.find_all("p")
    clean_paragraphs = []
    for p in paragraphs:
        text = p.get_text(separator = " ") # get the text inside p with spaces not breaks
        text = re.sub(r"\s+", " ", text) # remove \n and collapse any extra spaces
        clean_paragraphs.append(text)

    prophecy_text = " ".join(clean_paragraphs)
    prophecies.append(prophecy_text)

sample_urls['Text'] = prophecies

In [90]:
sample_urls

Unnamed: 0,ID,TITLE,AUTHOR,YEAR,URL,Text
0,nos_01,The Prophecies of Nostradamus,Nostradamus,1555/1558,https://www.sacred-texts.com/nos/preface.htm,Preface by M. Nostradamus to His Prophecies G...
1,nos_02,The Prophecies of Nostradamus,Nostradamus,1555/1558,https://www.sacred-texts.com/nos/cen1eng.htm,1 Sitting alone at night in secret study; it ...
2,nos_03,The Prophecies of Nostradamus,Nostradamus,1555/1558,https://www.sacred-texts.com/nos/cen2eng.htm,1 Towards Aquitaine by the British Isles By t...
3,nos_04,The Prophecies of Nostradamus,Nostradamus,1555/1558,https://www.sacred-texts.com/nos/cen3eng.htm,"1 After combat and naval battle, The great Ne..."
4,nos_05,The Prophecies of Nostradamus,Nostradamus,1555/1558,https://www.sacred-texts.com/nos/cen4eng.htm,1 That of the remainder of blood unshed: Veni...
5,nos_06,The Prophecies of Nostradamus,Nostradamus,1555/1558,https://www.sacred-texts.com/nos/cen5eng.htm,"1 Before the coming of Celtic ruin, In the te..."
6,nos_07,The Prophecies of Nostradamus,Nostradamus,1555/1558,https://www.sacred-texts.com/nos/cen6eng.htm,2 In the year five hundred eighty more or les...
7,nos_08,The Prophecies of Nostradamus,Nostradamus,1555/1558,https://www.sacred-texts.com/nos/cen7eng.htm,1 The arc of the treasure deceived by Achille...
8,nos_09,The Prophecies of Nostradamus,Nostradamus,1555/1558,https://www.sacred-texts.com/nos/epistle.htm,TO THE MOST INVINCIBLE MOST POWERFUL AND MOST...
9,nos_10,The Prophecies of Nostradamus,Nostradamus,1555/1558,https://www.sacred-texts.com/nos/cen8eng.htm,"1 Pau, Nay, Loron will be more of fire than b..."


In [94]:
titles = []

for url in sample_urls["URL"]:
    response = requests.get(url)
    document = BeautifulSoup(response.text, "html.parser")

    title_tag = document.find("h1")
    title = title_tag.get_text().strip()

    titles.append(title)
sample_urls["Chapter"] = titles
sample_urls

Unnamed: 0,ID,TITLE,AUTHOR,YEAR,URL,Text,Title,Chapter
0,nos_01,The Prophecies of Nostradamus,Nostradamus,1555/1558,https://www.sacred-texts.com/nos/preface.htm,Preface by M. Nostradamus to His Prophecies G...,Preface,Preface
1,nos_02,The Prophecies of Nostradamus,Nostradamus,1555/1558,https://www.sacred-texts.com/nos/cen1eng.htm,1 Sitting alone at night in secret study; it ...,Century I,Century I
2,nos_03,The Prophecies of Nostradamus,Nostradamus,1555/1558,https://www.sacred-texts.com/nos/cen2eng.htm,1 Towards Aquitaine by the British Isles By t...,Century II,Century II
3,nos_04,The Prophecies of Nostradamus,Nostradamus,1555/1558,https://www.sacred-texts.com/nos/cen3eng.htm,"1 After combat and naval battle, The great Ne...",Century III,Century III
4,nos_05,The Prophecies of Nostradamus,Nostradamus,1555/1558,https://www.sacred-texts.com/nos/cen4eng.htm,1 That of the remainder of blood unshed: Veni...,Century IV,Century IV
5,nos_06,The Prophecies of Nostradamus,Nostradamus,1555/1558,https://www.sacred-texts.com/nos/cen5eng.htm,"1 Before the coming of Celtic ruin, In the te...",Century V,Century V
6,nos_07,The Prophecies of Nostradamus,Nostradamus,1555/1558,https://www.sacred-texts.com/nos/cen6eng.htm,2 In the year five hundred eighty more or les...,Century VI,Century VI
7,nos_08,The Prophecies of Nostradamus,Nostradamus,1555/1558,https://www.sacred-texts.com/nos/cen7eng.htm,1 The arc of the treasure deceived by Achille...,Century VII,Century VII
8,nos_09,The Prophecies of Nostradamus,Nostradamus,1555/1558,https://www.sacred-texts.com/nos/epistle.htm,TO THE MOST INVINCIBLE MOST POWERFUL AND MOST...,Epistle to Henry II,Epistle to Henry II
9,nos_10,The Prophecies of Nostradamus,Nostradamus,1555/1558,https://www.sacred-texts.com/nos/cen8eng.htm,"1 Pau, Nay, Loron will be more of fire than b...",Century VIII,Century VIII


In [95]:
sample_urls.to_csv("nostradamus_corpus.csv")