# Scrapping Author and Quotes from the Website

# Importing Libraries

In [265]:
from bs4 import BeautifulSoup as bs
import pandas as pd
pd.set_option('display.max_colwidth', 500)
import time
import requests
import random

# Extracting content from Website's Page 1

## Accessing the Website's Page 1

In [266]:
page = requests.get("https://www.kdnuggets.com/2017/05/42-essential-quotes-data-science-thought-leaders.html/")
page

<Response [200]>

## Parsing the Website's Page 1

In [267]:
soup = bs(page.content)
soup

<!DOCTYPE html>
<html lang="en-US" xmlns="https://www.w3.org/1999/xhtml">
<head profile="https://gmpg.org/xfn/11">
<meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
<meta content="width=device-width, initial-scale=1" name="viewport"/>
<meta content="max-image-preview:large" name="robots"/>
<title>  42 Essential Quotes by Data Science Thought Leaders - KDnuggets</title>
<link href="/wp-content/themes/kdn17/images/favicon.ico" rel="shortcut icon"/>
<link href="/wp-content/themes/kdn17/style.css" media="screen" rel="stylesheet" type="text/css"/>
<script src="/wp-content/themes/kdn17/js/jquery-1.9.1.min.js" type="text/javascript"></script>
<script src="/aps/kda_all.js" type="text/javascript"></script>
<link href="/feed/" rel="alternate" title="KDnuggets: AI, Analytics, Data Science, Machine Learning Feed" type="application/rss+xml"/>
<meta content="42 illuminating quotes you need to read if you’re a data scientist or considering a career in the field - insights from indu

# Creating function for extracting quotes

In [268]:
def get_quotes(soup):
    q = soup.find("div", id ="post-")

    quotes = q.select('ol')
    quotes = [quote.text.strip() for quote in quotes]
    return quotes

## Navigating the soup for Website's Page 1

In [269]:
# extracting quotes from website's page1

quotes = get_quotes(soup)
quotes

['“By definition all scientists are data scientists. In my opinion, they are half hacker, half analyst, they use data to build products and find insights. It’s Columbus meet Columbo―starry-eyed explorers and skeptical detectives.”',
 '“‘Possessed’ is probably the right word. I often tell people, ‘I don’t want to necessarily be a data scientist. You just kind of are a data scientist. You just can’t help but look at that data set and go, ‘I feel like I need to look deeper. I feel like that’s not the right fit.’”',
 '“I think of data science as more like a practice than a job. Think of the scientific method, where you have to have a problem statement, generate a hypothesis, collect data, analyze data and then communicate the results and take action…. If you just use the scientific method as a way to approach data-intensive projects, I think you’re more apt to be successful with your outcome.”',
 '“As a data scientist, I can predict what is likely to happen, but I cannot explain why it is 

In [270]:
# Checking the length of quotes

len(quotes)

22

In [271]:
# Selecting text having authors

soup.select('a em')

[<em>Monica Rogati</em>,
 <em>Jennifer Shin</em>,
 <em>Bob Hayes</em>,
 <em>Bill Schmarzo</em>,
 <em>John Foreman</em>,
 <em>Josh Wills</em>,
 <em>Daniel Tunkelang</em>,
 <em>Hilary Mason</em>,
 <em>Chris Pehura</em>,
 <em>Peter Skomoroch</em>,
 <em>Carla Gentry</em>,
 <em>Brendan Tierney</em>,
 <em>Dr. Kirk Borne</em>,
 <em>DJ Patil</em>,
 <em>Vik Paruchuri</em>,
 <em>Shelly D. Farnham, Ph.D.</em>,
 <em>Gregory Piatetsky-Shapiro</em>,
 <em>Claudia Perlich</em>,
 <em>James Kobielus</em>,
 <em>Devavrat Shah</em>,
 <em>Lutz Finger</em>,
 <em>Stephan Kolassa</em>]

In [272]:
# Creating list of authors after extracting author names

authors = [i.text for i in soup.select('a em')]
authors

['Monica Rogati',
 'Jennifer Shin',
 'Bob Hayes',
 'Bill Schmarzo',
 'John Foreman',
 'Josh Wills',
 'Daniel Tunkelang',
 'Hilary Mason',
 'Chris Pehura',
 'Peter Skomoroch',
 'Carla Gentry',
 'Brendan Tierney',
 'Dr. Kirk Borne',
 'DJ Patil',
 'Vik Paruchuri',
 'Shelly D. Farnham, Ph.D.',
 'Gregory Piatetsky-Shapiro',
 'Claudia Perlich',
 'James Kobielus',
 'Devavrat Shah',
 'Lutz Finger',
 'Stephan Kolassa']

In [273]:
# Checking length of authors

len(authors)

22

## DataFrame of Quotes & Authors from Page 1

In [274]:
# Creating DataFrame using dictionary

data1 = {'Quote' : quotes, 'Author' : authors}
df1 = pd.DataFrame.from_dict(data1)
df1

Unnamed: 0,Quote,Author
0,"“By definition all scientists are data scientists. In my opinion, they are half hacker, half analyst, they use data to build products and find insights. It’s Columbus meet Columbo―starry-eyed explorers and skeptical detectives.”",Monica Rogati
1,"“‘Possessed’ is probably the right word. I often tell people, ‘I don’t want to necessarily be a data scientist. You just kind of are a data scientist. You just can’t help but look at that data set and go, ‘I feel like I need to look deeper. I feel like that’s not the right fit.’”",Jennifer Shin
2,"“I think of data science as more like a practice than a job. Think of the scientific method, where you have to have a problem statement, generate a hypothesis, collect data, analyze data and then communicate the results and take action…. If you just use the scientific method as a way to approach data-intensive projects, I think you’re more apt to be successful with your outcome.”",Bob Hayes
3,"“As a data scientist, I can predict what is likely to happen, but I cannot explain why it is going to happen. I can predict when someone is likely to attrite, or respond to a promotion, or commit fraud, or pick the pink button over the blue button, but I cannot tell you why that’s going to happen. And I believe that the inability to explain why something is going to happen is why I struggle to call ‘data science’ a science.”",Bill Schmarzo
4,"“Data scientists are kind of like the new Renaissance folks, because data science is inherently multidisciplinary.”",John Foreman
5,“Data Scientist (n.): Person who is better at statistics than any software engineer and better at software engineering than any statistician.”,Josh Wills
6,"“As data scientists, our job is to extract signal from noise.”",Daniel Tunkelang
7,"“The job of the data scientist is to ask the right questions. If I ask a question like ‘how many clicks did this link get?’ which is something we look at all the time, that’s not a data science question. It’s an analytics question. If I ask a question like, ‘based on the previous history of links on this publisher’s site, can I predict how many people from France will read this in the next three hours?’ that’s more of a data science question.”",Hilary Mason
8,"“A data scientist does model-driven analyses of our data; analyzes to improve our planning, increase our productivity, and develop our deeper levels of subject matter expertise. A data scientist works at the tactical, operational, and strategic levels, sharing insights with the business.”",Chris Pehura
9,"“[Data scientists are] able to think of ways to use data to solve problems that otherwise would have been unsolved, or solved using only intuition.”",Peter Skomoroch


# Extracting content from Website's Page 2

## Accessing the Website's Page 2

In [275]:
page2 = requests.get('https://www.kdnuggets.com/2017/05/42-essential-quotes-data-science-thought-leaders.html/2')
page2

<Response [200]>

## Parsing the Website's Page 2

In [276]:
soup2 = bs(page2.content)
soup2

<!DOCTYPE html>
<html lang="en-US" xmlns="https://www.w3.org/1999/xhtml">
<head profile="https://gmpg.org/xfn/11">
<meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
<meta content="width=device-width, initial-scale=1" name="viewport"/>
<meta content="max-image-preview:large" name="robots"/>
<title>  42 Essential Quotes by Data Science Thought Leaders - KDnuggets</title>
<link href="/wp-content/themes/kdn17/images/favicon.ico" rel="shortcut icon"/>
<link href="/wp-content/themes/kdn17/style.css" media="screen" rel="stylesheet" type="text/css"/>
<script src="/wp-content/themes/kdn17/js/jquery-1.9.1.min.js" type="text/javascript"></script>
<script src="/aps/kda_all.js" type="text/javascript"></script>
<link href="/feed/" rel="alternate" title="KDnuggets: AI, Analytics, Data Science, Machine Learning Feed" type="application/rss+xml"/>
<meta content="42 illuminating quotes you need to read if you’re a data scientist or considering a career in the field - insights from indu

## Navigating the soup (soup2)

In [277]:
# extracting quotes from website's page 2 using get_quotes function

quotes2 = get_quotes(soup2)

In [278]:
# Checking length of quotes2

len(quotes2)

21

### Extracting Authors from Page 2

In [286]:
# Selecting text having authors

soup2.select('a em')

[<em>Jean-Paul Isson</em>,
 <em>Martyn Jones</em>,
 <em>Bill Franks</em>,
 <em>Vincent Granville</em>,
 <em>Monica Rogati</em>,
 <em>Drew Conway</em>,
 <em>Jake Porway</em>,
 <em>Cathy O’Neil</em>,
 <em>Douglas Merrill</em>,
 <em>Mike Driscoll</em>,
 <em>KarolisUrbonas</em>,
 <em>Edwin Chen</em>,
 <em>Thomas C. Redman, Ph.D.</em>,
 <em>Nate Silver</em>,
 <em>Victor Hu</em>,
 <em>Alexander Linden</em>,
 <em>YanirSeroussi</em>,
 <em>Amy Heineike</em>,
 <em>Foster Provost</em>,
 <em>Tom Fawcett</em>,
 <em>ShanjiXiong</em>]

In [287]:
# Creating list of authors after extracting author names

authors2 = [i.text for i in soup2.select('a em')]
authors2

['Jean-Paul Isson',
 'Martyn Jones',
 'Bill Franks',
 'Vincent Granville',
 'Monica Rogati',
 'Drew Conway',
 'Jake Porway',
 'Cathy O’Neil',
 'Douglas Merrill',
 'Mike Driscoll',
 'KarolisUrbonas',
 'Edwin Chen',
 'Thomas C. Redman, Ph.D.',
 'Nate Silver',
 'Victor Hu',
 'Alexander Linden',
 'YanirSeroussi',
 'Amy Heineike',
 'Foster Provost',
 'Tom Fawcett',
 'ShanjiXiong']

In [288]:
# Creating separate list for two authors

li = authors2[18:20]
li

['Foster Provost', 'Tom Fawcett']

In [289]:
#Removing the 2 authors from the original author list

authors2.remove('Foster Provost')
authors2.remove('Tom Fawcett') 
authors2

['Jean-Paul Isson',
 'Martyn Jones',
 'Bill Franks',
 'Vincent Granville',
 'Monica Rogati',
 'Drew Conway',
 'Jake Porway',
 'Cathy O’Neil',
 'Douglas Merrill',
 'Mike Driscoll',
 'KarolisUrbonas',
 'Edwin Chen',
 'Thomas C. Redman, Ph.D.',
 'Nate Silver',
 'Victor Hu',
 'Alexander Linden',
 'YanirSeroussi',
 'Amy Heineike',
 'ShanjiXiong']

In [290]:
# inserting the separate list created (li) between the original list

authors2.insert(18,li)
authors2

['Jean-Paul Isson',
 'Martyn Jones',
 'Bill Franks',
 'Vincent Granville',
 'Monica Rogati',
 'Drew Conway',
 'Jake Porway',
 'Cathy O’Neil',
 'Douglas Merrill',
 'Mike Driscoll',
 'KarolisUrbonas',
 'Edwin Chen',
 'Thomas C. Redman, Ph.D.',
 'Nate Silver',
 'Victor Hu',
 'Alexander Linden',
 'YanirSeroussi',
 'Amy Heineike',
 ['Foster Provost', 'Tom Fawcett'],
 'ShanjiXiong']

In [291]:
# Checking length of authors2

len(authors2)

20

## DataFrame of Quotes & Authors from Page 2

In [259]:
# Creating DataFrame using dictionary

data2 = ({'Quote' : quotes2, 'Author' : authors2})
df2 = pd.DataFrame.from_dict(data2)
df2

Unnamed: 0,Quote,Author
0,"“Being a data scientist is not only about data crunching. It’s about understanding the business challenge, creating some valuable actionable insights to the data, and communicating their findings to the business.”",Jean-Paul Isson
1,"“Without a grounding in statistics, a Data Scientist is a Data Lab Assistant.”",Martyn Jones
2,"“Having skills in statistics, math, and programming is certainly necessary to be a great analytic professional, but they are not sufficient to make a person a great analytic professional.”",Bill Franks
3,“Talented data scientists leverage data that everybody sees; visionary data scientists leverage data that nobody sees.”,Vincent Granville
4,"“What makes a good scientist great is creativity with data, skepticism and good communication skills. Getting all of that together in the same person is difficult―because traditionally, different people follow different paths in their careers―some are more technical, others are more creative and communicative. A data scientist has to have both.”",Monica Rogati
5,"“Good data science is exactly the same [as] good science…. Good data science will never be measured by the terabytes in your Cassandra database, the number of EC2 nodes your jobs is using, or the volume of mappers you can send through a Hadoop instance. Having a lot of data does not license you to have a lot to say about it.”",Drew Conway
6,"“Critical thinking skills…really [set] apart the hackers from the true scientists, for me…. You must must MUST be able to question every step of your process and every number that you come up with.”",Jake Porway
7,"“How do we start to regulate the mathematical models that run more and more of our lives? I would suggest that the process begin with the modelers themselves. Like doctors, data scientists should pledge a Hippocratic Oath, one that focuses on the possible misuses and misinterpretations of their models.”",Cathy O’Neil
8,"“With too little data, you won’t be able to make any conclusions that you trust. With loads of data you will find relationships that aren’t real… Big data isn’t about bits, it’s about talent.”",Douglas Merrill
9,"“Data analysts who don’t organize their transformation pipelines often end up not being able to repeat their analyses, so the advice I would give to myself is the same advice often given to traditional scientists: make your experiments repeatable!”",Mike Driscoll


# Merging DataFrames

In [260]:
# Concatenating the two dataframes

merged = pd.concat([df1, df2], axis=0)
merged

Unnamed: 0,Quote,Author
0,"“By definition all scientists are data scientists. In my opinion, they are half hacker, half analyst, they use data to build products and find insights. It’s Columbus meet Columbo―starry-eyed explorers and skeptical detectives.”",Monica Rogati
1,"“‘Possessed’ is probably the right word. I often tell people, ‘I don’t want to necessarily be a data scientist. You just kind of are a data scientist. You just can’t help but look at that data set and go, ‘I feel like I need to look deeper. I feel like that’s not the right fit.’”",Jennifer Shin
2,"“I think of data science as more like a practice than a job. Think of the scientific method, where you have to have a problem statement, generate a hypothesis, collect data, analyze data and then communicate the results and take action…. If you just use the scientific method as a way to approach data-intensive projects, I think you’re more apt to be successful with your outcome.”",Bob Hayes
3,"“As a data scientist, I can predict what is likely to happen, but I cannot explain why it is going to happen. I can predict when someone is likely to attrite, or respond to a promotion, or commit fraud, or pick the pink button over the blue button, but I cannot tell you why that’s going to happen. And I believe that the inability to explain why something is going to happen is why I struggle to call ‘data science’ a science.”",Bill Schmarzo
4,"“Data scientists are kind of like the new Renaissance folks, because data science is inherently multidisciplinary.”",John Foreman
5,“Data Scientist (n.): Person who is better at statistics than any software engineer and better at software engineering than any statistician.”,Josh Wills
6,"“As data scientists, our job is to extract signal from noise.”",Daniel Tunkelang
7,"“The job of the data scientist is to ask the right questions. If I ask a question like ‘how many clicks did this link get?’ which is something we look at all the time, that’s not a data science question. It’s an analytics question. If I ask a question like, ‘based on the previous history of links on this publisher’s site, can I predict how many people from France will read this in the next three hours?’ that’s more of a data science question.”",Hilary Mason
8,"“A data scientist does model-driven analyses of our data; analyzes to improve our planning, increase our productivity, and develop our deeper levels of subject matter expertise. A data scientist works at the tactical, operational, and strategic levels, sharing insights with the business.”",Chris Pehura
9,,Peter Skomoroch


In [261]:
# Resetting the index of merged dataframe

merged.reset_index(drop=True, inplace=True)
merged

Unnamed: 0,Quote,Author
0,"“By definition all scientists are data scientists. In my opinion, they are half hacker, half analyst, they use data to build products and find insights. It’s Columbus meet Columbo―starry-eyed explorers and skeptical detectives.”",Monica Rogati
1,"“‘Possessed’ is probably the right word. I often tell people, ‘I don’t want to necessarily be a data scientist. You just kind of are a data scientist. You just can’t help but look at that data set and go, ‘I feel like I need to look deeper. I feel like that’s not the right fit.’”",Jennifer Shin
2,"“I think of data science as more like a practice than a job. Think of the scientific method, where you have to have a problem statement, generate a hypothesis, collect data, analyze data and then communicate the results and take action…. If you just use the scientific method as a way to approach data-intensive projects, I think you’re more apt to be successful with your outcome.”",Bob Hayes
3,"“As a data scientist, I can predict what is likely to happen, but I cannot explain why it is going to happen. I can predict when someone is likely to attrite, or respond to a promotion, or commit fraud, or pick the pink button over the blue button, but I cannot tell you why that’s going to happen. And I believe that the inability to explain why something is going to happen is why I struggle to call ‘data science’ a science.”",Bill Schmarzo
4,"“Data scientists are kind of like the new Renaissance folks, because data science is inherently multidisciplinary.”",John Foreman
5,“Data Scientist (n.): Person who is better at statistics than any software engineer and better at software engineering than any statistician.”,Josh Wills
6,"“As data scientists, our job is to extract signal from noise.”",Daniel Tunkelang
7,"“The job of the data scientist is to ask the right questions. If I ask a question like ‘how many clicks did this link get?’ which is something we look at all the time, that’s not a data science question. It’s an analytics question. If I ask a question like, ‘based on the previous history of links on this publisher’s site, can I predict how many people from France will read this in the next three hours?’ that’s more of a data science question.”",Hilary Mason
8,"“A data scientist does model-driven analyses of our data; analyzes to improve our planning, increase our productivity, and develop our deeper levels of subject matter expertise. A data scientist works at the tactical, operational, and strategic levels, sharing insights with the business.”",Chris Pehura
9,,Peter Skomoroch


In [262]:
# setting the maximum column width for the dataframe

pd.set_option('display.max_colwidth', 750)
merged

Unnamed: 0,Quote,Author
0,"“By definition all scientists are data scientists. In my opinion, they are half hacker, half analyst, they use data to build products and find insights. It’s Columbus meet Columbo―starry-eyed explorers and skeptical detectives.”",Monica Rogati
1,"“‘Possessed’ is probably the right word. I often tell people, ‘I don’t want to necessarily be a data scientist. You just kind of are a data scientist. You just can’t help but look at that data set and go, ‘I feel like I need to look deeper. I feel like that’s not the right fit.’”",Jennifer Shin
2,"“I think of data science as more like a practice than a job. Think of the scientific method, where you have to have a problem statement, generate a hypothesis, collect data, analyze data and then communicate the results and take action…. If you just use the scientific method as a way to approach data-intensive projects, I think you’re more apt to be successful with your outcome.”",Bob Hayes
3,"“As a data scientist, I can predict what is likely to happen, but I cannot explain why it is going to happen. I can predict when someone is likely to attrite, or respond to a promotion, or commit fraud, or pick the pink button over the blue button, but I cannot tell you why that’s going to happen. And I believe that the inability to explain why something is going to happen is why I struggle to call ‘data science’ a science.”",Bill Schmarzo
4,"“Data scientists are kind of like the new Renaissance folks, because data science is inherently multidisciplinary.”",John Foreman
5,“Data Scientist (n.): Person who is better at statistics than any software engineer and better at software engineering than any statistician.”,Josh Wills
6,"“As data scientists, our job is to extract signal from noise.”",Daniel Tunkelang
7,"“The job of the data scientist is to ask the right questions. If I ask a question like ‘how many clicks did this link get?’ which is something we look at all the time, that’s not a data science question. It’s an analytics question. If I ask a question like, ‘based on the previous history of links on this publisher’s site, can I predict how many people from France will read this in the next three hours?’ that’s more of a data science question.”",Hilary Mason
8,"“A data scientist does model-driven analyses of our data; analyzes to improve our planning, increase our productivity, and develop our deeper levels of subject matter expertise. A data scientist works at the tactical, operational, and strategic levels, sharing insights with the business.”",Chris Pehura
9,,Peter Skomoroch
