# Daily Motivation Quotes


## Business Understanding

In our increasingly fast-paced world, people encounter numerous challenges and responsibilities on a daily basis. To address the need for consistent motivation, we propose a data science project that revolves around curating and delivering carefully selected quotes. These quotes, extracted from diverse sources including historical figures, popular literature, and prominent personalities, will serve as a source of encouragement, reflection, and empowerment for individuals.

#### Objectives:

The primary objectives of this project are as follows:
1.	Curate Inspirational Quotes:
Gather a diverse collection of quotes from the Good Reads website, which boasts an extensive compilation of quotes spanning various genres and themes.
2.	Daily Motivational Updates: Develop a system to provide users with daily updates featuring a thoughtfully chosen quote. These updates will cater to different areas of life, ensuring a comprehensive and relatable experience.
3.	Tag-based Grouping: Implement a categorization mechanism that tags each quote based on its thematic content. This grouping will enable users to easily identify quotes that resonate with their specific preferences or current situations.


## Data Understanding

•	Source quotes from the Good Reads website, exploring the wide array of authors and themes available.

•	Analyze the structure of the collected data, including metadata such as author names, publication dates, and associated tags.


In [1]:
import requests
import pandas as pd
from bs4 import BeautifulSoup
import scrapy 
#from pathlib import path


### 3.	Data Preparation
We will begin by scraping the data from the website, perform some preprocessing to ensure the data obtained is accurate. 

In [23]:
# Make a get request to retrieve the page
Web_page = requests.get('https://www.goodreads.com/quotes?page=1')
# Pass the page contents to beautiful soup for parsing
soup = BeautifulSoup(Web_page.content, 'html.parser')

In [24]:
# Preview the structure
soup.prettify

<bound method Tag.prettify of <!DOCTYPE html>

<html class="desktop withSiteHeaderTopFullImage">
<head>
<title>Popular Quotes</title>
<meta content="Popular quotes from Goodreads members. Oscar Wilde: ‘Be yourself; everyone else is already taken.’, Marilyn Monroe: ‘I'm selfish, impatient and a little ..." name="description"/>
<meta content="telephone=no" name="format-detection"/>
<link href="https://www.goodreads.com/quotes" rel="canonical"/>
<!-- * Copied from https://info.analytics.a2z.com/#/docs/data_collection/csa/onboard */ -->
<script>
  //<![CDATA[
    !function(){function n(n,t){var r=i(n);return t&&(r=r("instance",t)),r}var r=[],c=0,i=function(t){return function(){var n=c++;return r.push([t,[].slice.call(arguments,0),n,{time:Date.now()}]),i(n)}};n._s=r,this.csa=n}();
    
    if (window.csa) {
      window.csa("Config", {
        "Application": "GoodreadsMonolith",
        "Events.SushiEndpoint": "https://unagi.amazon.com/1/events/com.amazon.csm.csa.prod",
        "Events.Name

In [25]:
# Selecting the Div container with the quote
quote_container = soup.find('div', class_="leftContainer")
quote_container

<div class="leftContainer">
<div class="quoteSearchBox u-marginBottomMedium">
<form accept-charset="UTF-8" action="/quotes/search" class="gr-form gr-form--compact gr-form--fullWidth" method="get" name="quoteSearchForm"><input name="utf8" type="hidden" value="✓"/>
<input class="searchBox--large__input" id="explore_search_query" name="q" placeholder="Find quotes by keyword, author" type="text"/>
<input class="searchBox__button searchBox--large__button" name="commit" type="submit" value="Search"/>
</form>
</div>
<div class="tabs mediumTabs">
<span class="selectedTab" id="popularLink" url="/quotes">Popular</span>
<a class="tab" href="/quotes/recently_added" id="recentLink">Recent</a>
<a class="tab" href="/quotes/recently_created" id="newLink">New</a>
<a class="tab" href="/quotes/friend_quotes" id="friendsLink" rel="nofollow">Friends</a>
<a class="tab" href="/quotes/my_authors" id="my_authorsLink" rel="nofollow">Authors</a>
<div class="clear"> </div></div><script charset="utf-8" type="text/

In [26]:
quote_author = quote_container.findAll('span' ,class_='authorOrTitle')
quote_author

[<span class="authorOrTitle">
     Oscar Wilde
   </span>,
 <span class="authorOrTitle">
     Marilyn Monroe
   </span>,
 <span class="authorOrTitle">
     Albert Einstein
   </span>,
 <span class="authorOrTitle">
     Frank Zappa
   </span>,
 <span class="authorOrTitle">
     Marcus Tullius Cicero
   </span>,
 <span class="authorOrTitle">
     Bernard M. Baruch
   </span>,
 <span class="authorOrTitle">
     William W. Purkey
   </span>,
 <span class="authorOrTitle">
     Dr. Seuss
   </span>,
 <span class="authorOrTitle">
     Mae West
   </span>,
 <span class="authorOrTitle">
     Mahatma Gandhi
   </span>,
 <span class="authorOrTitle">
     Robert Frost
   </span>,
 <span class="authorOrTitle">
     J.K. Rowling,
   </span>,
 <span class="authorOrTitle">
     Albert Camus
   </span>,
 <span class="authorOrTitle">
     Mark Twain
   </span>,
 <span class="authorOrTitle">
     C.S. Lewis,
   </span>,
 <span class="authorOrTitle">
     Maya Angelou
   </span>,
 <span class="authorOrTit

In [27]:

# Create an empty list to store the author names
authors = []

# Loop through each element in quote_author
for author_html in quote_author:
    # Get the text from the HTML element
    author_name = author_html.text.strip()
    
    # Filter out book names based on length and presence of colons or semicolons
    if len(author_name) < 40 and ':' not in author_name and ';' not in author_name:
        # Add the author name to the authors list
        authors.append(author_name)
        
# Create a dataframe from the authors list
df = pd.DataFrame(authors, columns=['Author Name'])

# Print the dataframe
df

Unnamed: 0,Author Name
0,Oscar Wilde
1,Marilyn Monroe
2,Albert Einstein
3,Frank Zappa
4,Marcus Tullius Cicero
5,Bernard M. Baruch
6,William W. Purkey
7,Dr. Seuss
8,Mae West
9,Mahatma Gandhi


In [28]:
quote = quote_container.findAll(class_='quoteText')
quote[0]

<div class="quoteText">
      “Be yourself; everyone else is already taken.”
    <br/>
  ―
  <span class="authorOrTitle">
    Oscar Wilde
  </span>
</div>

In [29]:

# Create an empty list to store the author names
quotes = []

# Loop through each element in quote
for quote_html in quote:
    # Get the text from the HTML element
    quote_text = quote_html.text.strip().replace('\n', '')
    
    # Add the quote to the quotes list
    quotes.append(quote_text)

# Create a dataframe from the quotes and authors lists
df2 = pd.DataFrame(quotes, columns=['Quote'])

# Print the dataframe
print(df2)

                                                Quote
0   “Be yourself; everyone else is already taken.”...
1   “I'm selfish, impatient and a little insecure....
2   “Two things are infinite: the universe and hum...
3   “So many books, so little time.”      ―      F...
4   “A room without books is like a body without a...
5   “Be who you are and say what you feel, because...
6   “You've gotta dance like there's nobody watchi...
7   “You know you're in love when you can't fall a...
8   “You only live once, but if you do it right, o...
9   “Be the change that you wish to see in the wor...
10  “In three words I can sum up everything I've l...
11  “If you want to know what a man's like, take a...
12  “Don’t walk in front of me… I may not followDo...
13  “If you tell the truth, you don't have to reme...
14  “Friendship ... is born at the moment when one...
15  “I've learned that people will forget what you...
16  “A friend is someone who knows all about you a...
17  “To live is the rarest t

In [30]:
df2['Quote'][1]

"“I'm selfish, impatient and a little insecure. I make mistakes, I am out of control and at times hard to handle. But if you can't handle me at my worst, then you sure as hell don't deserve me at my best.”      ―      Marilyn Monroe"

In [31]:
# Split the content after the hyphen (―) into 'quote' and 'author' columns
df2[['quote', 'author']] = df2['Quote'].str.split('―',  expand=True)

# Strip leading and trailing whitespaces from 'quote' and 'author' columns
df2['quote'] = df2['quote'].str.strip()
df2['author'] = df2['author'].str.strip()

# Drop the original 'Quote' column since we have extracted its contents
df2.drop('Quote', axis=1, inplace=True)

In [32]:
df2

Unnamed: 0,quote,author
0,“Be yourself; everyone else is already taken.”,Oscar Wilde
1,"“I'm selfish, impatient and a little insecure....",Marilyn Monroe
2,“Two things are infinite: the universe and hum...,Albert Einstein
3,"“So many books, so little time.”",Frank Zappa
4,“A room without books is like a body without a...,Marcus Tullius Cicero
5,"“Be who you are and say what you feel, because...",Bernard M. Baruch
6,“You've gotta dance like there's nobody watchi...,William W. Purkey
7,“You know you're in love when you can't fall a...,Dr. Seuss
8,"“You only live once, but if you do it right, o...",Mae West
9,“Be the change that you wish to see in the wor...,Mahatma Gandhi


In [33]:
df2['quote'][19]

'“Darkness cannot drive out darkness: only light can do that. Hate cannot drive out hate: only love can do that.”'

In [34]:
quote_tags = quote_container.findAll('div' ,class_='greyText smallText left')
quote_tags

[<div class="greyText smallText left">
      tags:
        <a href="/quotes/tag/attributed-no-source">attributed-no-source</a>,
        <a href="/quotes/tag/be-yourself">be-yourself</a>,
        <a href="/quotes/tag/gilbert-perreira">gilbert-perreira</a>,
        <a href="/quotes/tag/honesty">honesty</a>,
        <a href="/quotes/tag/inspirational">inspirational</a>,
        <a href="/quotes/tag/misattributed-oscar-wilde">misattributed-oscar-wilde</a>,
        <a href="/quotes/tag/quote-investigator">quote-investigator</a>
 </div>,
 <div class="greyText smallText left">
      tags:
        <a href="/quotes/tag/attributed-no-source">attributed-no-source</a>,
        <a href="/quotes/tag/best">best</a>,
        <a href="/quotes/tag/life">life</a>,
        <a href="/quotes/tag/love">love</a>,
        <a href="/quotes/tag/misattributed-marilyn-monroe">misattributed-marilyn-monroe</a>,
        <a href="/quotes/tag/mistakes">mistakes</a>,
        <a href="/quotes/tag/out-of-control">out-of-c

In [35]:
# Create an empty list to store tags for each quote
tags_list = []

# Loop through each element in tag_elements
for quote_tag in quote_tags:
    # Get tag texts and store them in a list
    tags = [tag.text for tag in quote_tag.find_all('a')]
    tags_list.append(tags)

# Create a DataFrame from the tags_list
df_tags = pd.DataFrame({'Tags': tags_list})

print(df_tags)

                                                 Tags
0   [attributed-no-source, be-yourself, gilbert-pe...
1   [attributed-no-source, best, life, love, misat...
2   [attributed-no-source, human-nature, humor, in...
3                                      [books, humor]
4         [attributed-no-source, books, simile, soul]
5   [ataraxy, be-yourself, confidence, fitting-in,...
6   [dance, heaven, hurt, inspirational, life, lov...
7   [attributed-no-source, dreams, love, reality, ...
8                                       [humor, life]
9   [action, change, inspirational, misattributed-...
10                                             [life]
11                     [from-charles-bayard-mitchell]
12  [attributed-no-source, friends, friendship, mi...
13                       [lies, lying, memory, truth]
14                                       [friendship]
15              [friend, friendship, knowledge, love]
16                                             [life]
17  [attributed-no-source, e

In [36]:
df_tags['Tags'][19]

['carpe-diem', 'education', 'inspirational', 'learning']

In [37]:
# combining author dataframe with the quotes dataframe to one. 
data = pd.concat([df, df2], axis=1)

In [38]:
data

Unnamed: 0,Author Name,quote,author
0,Oscar Wilde,“Be yourself; everyone else is already taken.”,Oscar Wilde
1,Marilyn Monroe,"“I'm selfish, impatient and a little insecure....",Marilyn Monroe
2,Albert Einstein,“Two things are infinite: the universe and hum...,Albert Einstein
3,Frank Zappa,"“So many books, so little time.”",Frank Zappa
4,Marcus Tullius Cicero,“A room without books is like a body without a...,Marcus Tullius Cicero
5,Bernard M. Baruch,"“Be who you are and say what you feel, because...",Bernard M. Baruch
6,William W. Purkey,“You've gotta dance like there's nobody watchi...,William W. Purkey
7,Dr. Seuss,“You know you're in love when you can't fall a...,Dr. Seuss
8,Mae West,"“You only live once, but if you do it right, o...",Mae West
9,Mahatma Gandhi,“Be the change that you wish to see in the wor...,Mahatma Gandhi


In [39]:
# combining the data  and the tags data
data = pd.concat([data, df_tags], axis=1)

In [40]:
data

Unnamed: 0,Author Name,quote,author,Tags
0,Oscar Wilde,“Be yourself; everyone else is already taken.”,Oscar Wilde,"[attributed-no-source, be-yourself, gilbert-pe..."
1,Marilyn Monroe,"“I'm selfish, impatient and a little insecure....",Marilyn Monroe,"[attributed-no-source, best, life, love, misat..."
2,Albert Einstein,“Two things are infinite: the universe and hum...,Albert Einstein,"[attributed-no-source, human-nature, humor, in..."
3,Frank Zappa,"“So many books, so little time.”",Frank Zappa,"[books, humor]"
4,Marcus Tullius Cicero,“A room without books is like a body without a...,Marcus Tullius Cicero,"[attributed-no-source, books, simile, soul]"
5,Bernard M. Baruch,"“Be who you are and say what you feel, because...",Bernard M. Baruch,"[ataraxy, be-yourself, confidence, fitting-in,..."
6,William W. Purkey,“You've gotta dance like there's nobody watchi...,William W. Purkey,"[dance, heaven, hurt, inspirational, life, lov..."
7,Dr. Seuss,“You know you're in love when you can't fall a...,Dr. Seuss,"[attributed-no-source, dreams, love, reality, ..."
8,Mae West,"“You only live once, but if you do it right, o...",Mae West,"[humor, life]"
9,Mahatma Gandhi,“Be the change that you wish to see in the wor...,Mahatma Gandhi,"[action, change, inspirational, misattributed-..."


Now that we have the first page as a trial, below we will loop through the entire website pages below.

In [41]:
# Base URL
base_url = 'https://www.goodreads.com/quotes?page='

# Create empty lists to store data
authors = []
quotes = []
tags_list = []

# Loop through pages
for page_num in range(1, 101):  # Loop through pages 1 to 100
    url = base_url + str(page_num)
    response = requests.get(url)
    soup = BeautifulSoup(response.content, 'html.parser')
    
    # Selecting the Div container with the quote
    quote_container = soup.find('div', class_='leftContainer')
    
    # Finding author names
    quote_author = quote_container.findAll('span', class_='authorOrTitle')
    for author_html in quote_author:
        author_name = author_html.text.strip()
        if len(author_name) < 40 and ':' not in author_name and ';' not in author_name:
            authors.append(author_name)
    
    # Finding quotes
    quote = quote_container.findAll(class_='quoteText')
    for quote_html in quote:
        quote_text = quote_html.text.strip().replace('\n', '')
        quotes.append(quote_text)
    
    # Finding tags
    quote_tags = quote_container.findAll('div', class_='greyText smallText left')
    for quote_tag in quote_tags:
        tags = [tag.text for tag in quote_tag.find_all('a')]
        tags_list.append(tags)

# Create DataFrames for authors, quotes, and tags
df_authors = pd.DataFrame({'Author Name': authors})
df_quotes = pd.DataFrame({'Quote': quotes})
df_tags = pd.DataFrame({'Tags': tags_list})

# Combine DataFrames
combined_df = pd.concat([df_authors, df_quotes, df_tags], axis=1)


In [48]:
combined_df

Unnamed: 0,Author Name,Tags,Quote,Author
0,Oscar Wilde,"[attributed-no-source, be-yourself, gilbert-pe...",“Be yourself; everyone else is already taken.”,Oscar Wilde
1,Marilyn Monroe,"[attributed-no-source, best, life, love, misat...","“I'm selfish, impatient and a little insecure....",Marilyn Monroe
2,Albert Einstein,"[attributed-no-source, human-nature, humor, in...",“Two things are infinite: the universe and hum...,Albert Einstein
3,Frank Zappa,"[books, humor]","“So many books, so little time.”",Frank Zappa
4,Marcus Tullius Cicero,"[attributed-no-source, books, simile, soul]",“A room without books is like a body without a...,Marcus Tullius Cicero
...,...,...,...,...
2995,"Cassandra Clare,",,“Black hair and blue eyes are my favorite comb...,"Cassandra Clare, Clockwork Angel"
2996,"A.A. Milne,",,“I'm not lost for I know where I am. But howev...,"A.A. Milne, Winnie-the-Pooh"
2997,Henry David Thoreau,,“Dreams are the touchstones of our characters.”,Henry David Thoreau
2998,"Nicholas Sparks,",,“In times of grief and sorrow I will hold you ...,"Nicholas Sparks, The Notebook"


We observe that the tags column did not get all the tags for the quotes and we will need to fill them up or remove them if that will not be possible. 
Although removing them will really affect the number of quotes availabe for us to use, therefore removing them will be a last resort. 
We will attempt to fill them based on the author. 

In [49]:
# Checking actual number of missing values. 
combined_df.isna().sum()

Author Name      0
Tags           502
Quote            0
Author           0
dtype: int64

In [52]:
# checking contents of the quote column. 
combined_df['Quote'][3]

'“So many books, so little time.”'

In [57]:
combined_df.groupby('Author Name').sum()

Unnamed: 0_level_0,Tags,Quote,Author
Author Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
"A. A. Milne,",[antolini],“It is more fun to talk with someone who doesn...,"A. A. Milne, Winnie-the-Pooh"
A.A. Milne,"[writing, secrets, inspirational-ship-storms, ...","“Weeds are flowers, too, once you get to know ...",A.A. MilneA.A. MilneA.A. MilneA.A. MilneA.A. M...
"A.A. Milne,","[live-death-love, activism, dave-matthews, fri...","“Piglet sidled up to Pooh from behind. ""Pooh!""...","A.A. Milne, The House at Pooh CornerA.A. Miln..."
A.J. Cronin,[writing],"“Worry never robs tomorrow of its sorrow, but ...",A.J. Cronin
Abigail Van Buren,[life],“The best index to a person's character is how...,Abigail Van Buren
...,...,...,...
جلال الدين الرومي,0,“لا تجزع من جرحك، وإلا فكيف للنور أن يتسلل إلى...,جلال الدين الرومي
عباس محمود العقاد,0,“ليس هناك كتابا أقرأه و لا أستفيد منه شيئا جدي...,عباس محمود العقاد
غسان كنفاني,[identity],“!لك شيء في هذا العالم.. فقم”,غسان كنفاني
محمود درويش,"[disappointment, dorian-gray, marriage, men, r...",“و كن من أنتَ حيث تكون و احمل عبءَ قلبِكَ وحدهُ”,محمود درويش
