### Webscraping popular websites to find a source of truth

From the Google Trends analysis, this project will focus on three design aesthetics that are recently popular and possibly more unfamiliar to the public. 
1. Midcentury Modern (MCM)
2. Boho Chic
3. Farmhouse 

In [1]:
import pandas as pd
from bs4 import BeautifulSoup
import requests
import numpy as np 
import matplotlib.pyplot as plt
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize 
import re
import seaborn as sns
import string
from sklearn.feature_extraction import text 
from sklearn.feature_extraction.text import CountVectorizer

In [2]:
# url = 'https://www.apartmenttherapy.com/'
# html = requests.get(url)
# soup = BeautifulSoup(html.content, 'lxml')

In [3]:
# html.status_code

Well! Apartment Therapy doesn't want me to grab their data, so we'll try a different site.

A 403 status code is a "Forbidden" status, meaning I don't have proper permissions to scrape this site. 

### Midcentury Modern

#### Architectual Digest

test out an article first to get familiar with the html

In [4]:
# url = 'https://www.architecturaldigest.com/story/lauren-goodman-rose-garden'
# html = requests.get(url)
# soup = BeautifulSoup(html.content, 'lxml')

# html.status_code

In [5]:
# container = soup.findAll('div', class_ = 'article__chunks')

In [6]:
# article_text = container[0].findAll('p')

Now create a function to scrape multiple articles at once!

In [7]:
#create a function for AD articles that captures both article body and caption text

def AD_scrape(urls, style):
    '''scrape articles from architectual digest'''
    ad_scraped = {}
    c = 1
    for link in urls: 
        html = requests.get(link)
        soup = BeautifulSoup(html.content, 'lxml')
        
        container = soup.findAll('div', class_ = 'article__chunks')
        smaller_container = container[0].findAll('p')
        paragraph = []
        for i in range(len(smaller_container)):
            clean = smaller_container[i].text.strip()
            ad_scraped.update({style + str(c): clean})
            c += 1
            
    return ad_scraped 

In [8]:
urls = ['https://www.architecturaldigest.com/story/lauren-goodman-rose-garden',
        'https://www.architecturaldigest.com/story/midcentury-and-tropical-vibes-meet-in-this-barcelona-home',
       'https://www.architecturaldigest.com/story/osklo-hollywood-hills-home',
       'https://www.architecturaldigest.com/story/lauren-rottet-montauk-home',
       'https://www.architecturaldigest.com/story/in-bucks-county-a-glassy-homage-to-florence-and-hans-knoll']

ad_mcm = AD_scrape(urls, 'm')

{'m1': 'When I do something, I go deep. So, when it came time to finding a new home in Marin County last October—after selling a much-loved 1894 Ernest Coxhead in Pacific Heights—it was time for an architectural palette cleanser. (Goodbye lead pane windows and hand-carved moody majesty; hello open floor plan, democratic design, and glass walls.) I wanted midcentury.',
 'm2': 'In California, midcentury modern often means Eichler. The 1957 specimen I found was pure—with original luminous Philippine mahogany walls and sapphire wall-to-wall carpet (a dream manifestation).',
 'm3': 'BEFORE: The original walls left something to be desired, both in paint condition and color choice. “The green walls competed with the greenery,” says Lauren, who wanted a backdrop for the roses to be the star, and that kept the historical integrity of the 1957 space. Swiss architect Le Corbusier, a pioneer of midcentury-modern design—also a painter and color theorist—was top of the list when it came to a choosin

#### BHG

In [9]:
# url = 'https://www.bhg.com/decorating/decorating-style/midcentury-modern/?slide=slide_434d4501-3327-4249-944d-a8b2bef642ff#slide_434d4501-3327-4249-944d-a8b2bef642ff'
# html = requests.get(url)
# soup = BeautifulSoup(html.content, 'lxml')

# html.status_code

In [10]:
# container = soup.findAll('p')

In [11]:
urls = ['https://www.bhg.com/decorating/decorating-style/midcentury-modern/?slide=slide_e60301f4-f633-48ea-b612-b8a8243daab0#slide_e60301f4-f633-48ea-b612-b8a8243daab0',
       'https://www.bhg.com/decorating/do-it-yourself/wall-art/midcentury-modern-tissue-paper-art/',
       'https://www.bhg.com/decorating/makeovers/before-and-after/tour-this-midcentury-modern-home-with-a-bohemian-twist/',
       'https://www.bhg.com/decorating/decorating-style/modern/colorful-palm-springs-showhouse/',
       'https://www.bhg.com/decorating/decorating-style/modern/midcentury-modern-showhouse/']

def BHG_scrape(urls, style):
    '''scrape articles from better homes and gardens'''
    bhg_scraped = {}
    c = 1
    for link in urls: 
        html = requests.get(link)
        soup = BeautifulSoup(html.content, 'lxml')
        container = soup.findAll('p')
        paragraph = []
        for i in range(len(container)):
            clean = container[i].text.strip()
            bhg_scraped.update({style + str(i): clean})
            c += 1
    return bhg_scraped

In [12]:
bhg_mcm = BHG_scrape(urls, 'm')

#### ElleDecor

In [14]:
# url = 'https://www.elledecor.com/design-decorate/house-interiors/a31102258/wasco-reynolds-midcentury-california-home/'
# html = requests.get(url)
# soup = BeautifulSoup(html.content, 'lxml')

# html.status_code

In [15]:
# container = soup.findAll('div', class_ = 'article-body-content standard-body-content')
# #container

In [16]:
# article_text = container[0].findAll('p')
# #article_text

In [17]:
urls = ['https://www.elledecor.com/design-decorate/house-interiors/a31102258/wasco-reynolds-midcentury-california-home/',
       'https://www.elledecor.com/design-decorate/house-interiors/a34862121/silvio-rech-lesley-carstens-south-africa-bungalow/']

def ELLE_scrape(urls, style):
    '''scrape articles from ElleDecor'''
    elle_scraped = {}
    c = 1
    for link in urls: 
        html = requests.get(link)
        soup = BeautifulSoup(html.content, 'lxml')
        
        container = soup.findAll('div', class_ = 'article-body-content standard-body-content')
        cont = (container[0].findAll('p'))
        paragraph = []
        for i in range(len(cont)):
            clean = cont[i].text.strip()
            elle_scraped.update({style + str(c): clean})
            c += 1
    return elle_scraped

In [18]:
elle_mcm = ELLE_scrape(urls, 'm')

#### Dwell

In [19]:
# url = 'https://www.dwell.com/home/mid-century-remodel-in-marin-county-6695884e'
# html = requests.get(url)
# soup = BeautifulSoup(html.content, 'lxml')

# html.status_code

In [20]:
# container = soup.findAll('p')

In [21]:
urls = ['https://www.dwell.com/article/a-renovated-midcentury-gem-in-austin-776ce7b6',
       'https://www.dwell.com/home/midcentury-modern-summer-home-fbbf2900',
       'https://www.dwell.com/home/clyde-hill-mid-century-4573648a',
       'https://www.dwell.com/article/midcentury-mash-up-6b12b153',
       'https://www.dwell.com/home/south-tyrol-mid-century-96160b3c',
       'https://www.dwell.com/home/mid-century-remodel-in-marin-county-6695884e',
       'https://www.dwell.com/home/midcentury-makeover-c40efb7d']

def Dwell_scrape(urls, style):
    '''scrape articles from dwell'''
    dwell_scraped = {}
    c = 1
    for link in urls: 
        html = requests.get(link)
        soup = BeautifulSoup(html.content, 'lxml')
        container = soup.findAll('p')
        paragraph = []
        for i in range(len(container)):
            clean = container[i].text.strip()
            dwell_scraped.update({style + str(c): clean})
            c += 1
    return dwell_scraped

In [22]:
dwell_mcm = Dwell_scrape(urls, 'm')

{'m1': 'The midcentury cognoscenti in Austin, Texas, know where to spot the houses designed by a man named Arthur Dallas Stenger: dotted throughout a couple of neighborhoods south of the Colorado river, mainly on two streets in the hilly enclave of Rollingwood and another cluster in next-door Barton Hills. Their front elevations include a few hallmarks, like riverstone walls and clerestory windows that rise to meet canted or lightly gabled rooflines, and carports incorporated into the overall plan. Some even call A.D. Stenger the "Eichler of Austin." The designer-builder’s 1964 house on Ridgewood Drive, however, has something extra going for it. Its roof is an undulating affair that rests lightly on its glass-and-steel frame, and the transparent central volume is bookended by symmetrical wings, fronted by riverstone walls, and all set atop a cantilevered concrete foundation.',
 'm2': 'Subscribe to Dwell+ to get everything you already love about Dwell, plus exclusive home tours, video f

#### HGTV

In [23]:
# url = 'https://www.hgtv.com/design/decorating/design-101/add-midcentury-modern-style-to-your-home-pictures'
# html = requests.get(url)
# soup = BeautifulSoup(html.content, 'lxml')

# html.status_code

In [24]:
# container = soup.findAll('div', attrs = {'class' : 'photo-viewer', 'data-module' : 'inline-gallery', 'data-gallery-title' : "38 Bohemian Living Rooms You'll Love", 
#                                         'id' : 'mod-inline-gallery-1'})
# container = soup.findAll('div', class_ = 'o-PhotoGalleryPromo__a-Description asset-description')
# slide = []
# for i in range(len(container)): 
#     slide.append(container[i].findAll('p'))

In [25]:
#Old function

# def HGTV_scrape(urls):
#     hgtv_scraped = []
#     for link in urls: 
#         html = requests.get(link)
#         soup = BeautifulSoup(html.content, 'lxml')
#         container = soup.findAll('div', class_ = 'o-PhotoGalleryPromo__a-Description asset-description')
#         for i in range(len(container)):
#             clean = container[i].text.strip()
#             hgtv_scraped.append(clean)
#     return hgtv_scraped

In [26]:
def HGTV_scrape(urls, style):
    hgtv_scraped = {}
    c = 1
    for link in urls: 
        html = requests.get(link)
        soup = BeautifulSoup(html.content, 'lxml')
        container = soup.findAll('div', class_ = 'o-PhotoGalleryPromo__a-Description asset-description')
        
        for i in range(len(container)):
            clean = container[i].text.strip()
            hgtv_scraped.update({style + str(c): clean})
            c += 1
    return hgtv_scraped

In [27]:
urls = ['https://www.hgtv.com/design/decorating/design-101/add-midcentury-modern-style-to-your-home-pictures',
    'https://www.hgtv.com/design/decorating/design-101/midcentury-modern-ranch-renovation--pictures',
    'https://www.hgtv.com/design/rooms/kitchens/midcentury-modern-kitchens-pictures',
    'https://www.hgtv.com/design/remodel/interior-remodel/design-a-midcentury-modern-space-pictures',
    'https://www.hgtv.com/design/decorating/design-101/15-ways-to-give-your-rooms-midcentury-modern-mojo-pictures',
    'https://www.hgtv.com/design/decorating/design-101/bringing-classic-midcentury-modern-touches-to-your-home-pictures']

hgtv_mcm = HGTV_scrape(urls, 'm')

In [28]:
#hgtv_mcm

### Boho Chic

#### Architectual Digest

In [29]:
urls = ['https://www.architecturaldigest.com/story/josh-greene-brentwood-house',
       'https://www.architecturaldigest.com/story/inside-fashion-designer-ulla-johnsons-bohemian-brownstone']
ad_boho = AD_scrape(urls, 'b')

In [30]:
url = 'https://www.architecturaldigest.com/gallery/bring-on-the-bedroom-drama-with-a-rattan-bed'
html = requests.get(url)
soup = BeautifulSoup(html.content, 'lxml')

In [31]:
container = soup.findAll('div', class_ = 'content-chunks')

In [32]:
clever_boho = []
c = 1
cont = container[0].findAll('p')
for i in range(len(cont)):
    clean = cont[i].text.strip()
    clever_bohod = {'b' + str(c): clean}
    c += 1

#### ElleDecor

Unfortunately, these next ElleDecor articles don't follow the same structure as the ones above. This means they'll have to be scraped individually instead of using a funtion. 

In [34]:
url = 'https://www.elledecor.com/design-decorate/house-interiors/a18930010/kathryn-ireland-bohemian-decor-santa-monica-homes/'
html = requests.get(url)
soup= BeautifulSoup(html.content, 'lxml')

In [35]:
container = soup.findAll('div', class_ = 'article-body longform-body')
elle_boho_2 = []
c = 1
cont = container[0].findAll('p')
for i in range(len(cont)):
    clean = cont[i].text.strip()
    elle_boho2 = {'b' + str(c): clean}
    c += 1

In [36]:
url = 'https://www.elledecor.com/design-decorate/trends/g33219844/the-edit-chic-boho-items-that-channel-your-inner-wanderlust-walmart/'
html = requests.get(url)
soup= BeautifulSoup(html.content, 'lxml')

In [37]:
container = soup.findAll('div', class_ = 'listicle-body-content')
cont = container[0].findAll('p')
elle_boho_3 = []
c = 1
for i in range(len(cont)):
    clean = cont[i].text.strip()
    elle_boho3 = {'b' + str(c): clean}
    c += 1

In [38]:
url = 'https://www.elledecor.com/shopping/a26098714/the-inside-victoria-smith-collaboration/'
html = requests.get(url)
soup= BeautifulSoup(html.content, 'lxml')

In [39]:
container = soup.findAll('div', class_='standard-body')
c =1
cont = container[0].findAll('p')
for i in range(len(cont)):
    clean = cont[i].text.strip()
    elle_boho4 = {'b' + str(c): clean}
    c += 1

In [40]:
url = 'https://www.elledecor.com/design-decorate/house-interiors/a34886814/natasha-baradaran-santa-monica-house/'
html = requests.get(url)
soup= BeautifulSoup(html.content, 'lxml')

In [41]:
container = soup.findAll('div', class_='standard-body')
c = 1
cont = container[0].findAll('p')
for i in range(len(cont)):
    clean = cont[i].text.strip()
    elle_boho5 = {'b' + str(c): clean}
    c += 1

In [42]:
url = 'https://www.elledecor.com/design-decorate/room-ideas/g25619140/eclectic-bohemian-kitchens/'
html = requests.get(url)
soup= BeautifulSoup(html.content, 'lxml')

container = soup.findAll('main', class_='site-content')
c = 1
cont = container[0].findAll('p')
for i in range(len(cont)):
    clean = cont[i].text.strip()
    elle_boho6 = {'b' + str(c): clean}
    c += 1

In [43]:
elle_boho2

{'b29': 'Want more ELLE Decor? Get Instant Access!'}

#### BHG

In [44]:
urls = ['https://www.bhg.com/decorating/decorating-style/bohemian-decor-ideas-281474979538765/',
       'https://www.bhg.com/decorating/decorating-style/flea-market/eclectic-global-vintage-style/',
       'https://www.bhg.com/decorating/makeovers/retro-bohemian-home-tour/',
       'https://www.bhg.com/decorating/decorating-style/bohemian-decor-ideas-281474979538765/']

bhg_boho = BHG_scrape(urls, 'b')

#### Dwell 

In [45]:
urls = ['https://www.dwell.com/home/venice-bohemian-d2b7d91a',
       'https://www.dwell.com/home/bohemian-modern-kitchen-604f9984',
       'https://www.dwell.com/article/silver-lake-bohemian-renovation-nickey-kehoe-los-angeles-real-estate-9bd26c42']

dwell_boho = Dwell_scrape(urls,'b')
dwell_boho

{'b1': "Designer curated, light and airy, architectural home between Abbot Kinney and the Beach, on quiet side of Cabrillo, steps away from eateries and shopping. A gentle entry fountain leads to a voluminous living room with grand modern chandelier, fireplace, and rich walnut millwork throughout downstairs level. Gourmet Chef's Kitchen. Dining room feels like a high-end restaurant. Downstairs patio beyond Fleetwood sliding doors is a beautiful extension of living area. Dramatic staircase leads to second story with two generous bedrooms, a wrap-around corridor, full bath, and a laundry room. Entire third story is a luxurious master suite, with a 15-ft glass ceiling in the bedroom with Velux skylights that open. See the stars at night through the ceiling or cover the skylights with the automatic shades. A huge dressing area/hangout space leads to a giant walk-in closet, and a sumptuous spa-like bathroom with sexy shower and oversized bathtub. Outside the master bedroom is a spacious dec

#### HGTV

In [46]:
urls = ['https://www.hgtv.com/design/decorating/design-101/bohemian-kid-rooms-pictures',
       'https://www.hgtv.com/design/decorating/design-101/bohemian-living-rooms-pictures'
       'https://www.hgtv.com/design/decorating/design-101/bohemian-bedrooms-pictures',
       'https://www.hgtv.com/profiles/professionals/natalie-myers/bright-bohemian-inspired-home-in-calabasas-california-pictures',
       'https://www.hgtv.com/profiles/professionals/Lexi-Grace-Designs/bohemian-hainsworth-and-co-salon-with-glam-gold-touches-pictures']
hgtv_boho = HGTV_scrape(urls,'b')

### Farmhouse

#### Architectual Digest

In [47]:
urls = ['https://www.architecturaldigest.com/story/sam-page-pacific-palisades-home',
       'https://www.architecturaldigest.com/story/casey-wilson-los-angeles-home']

ad_farm = AD_scrape(urls,'f')
#ad_farm

#### ElleDecor

In [48]:
url = 'https://www.elledecor.com/design-decorate/trends/g32338857/farmhouse-living-room/'
html = requests.get(url)
soup = BeautifulSoup(html.content, 'lxml')

In [50]:
container = soup.findAll('div', class_ = 'listicle-body')
cont = container[0].findAll('p')
c = 1
for i in range(len(cont)):
        clean = cont[i].text.strip()
        elle_farm1 = {'f' + str(c): clean}
        c += 1
#elle_farm

#### BHG

In [51]:
urls = ['https://www.bhg.com/farmhouse-table-finds-to-fit-the-whole-family/',
       'https://www.bhg.com/decorating/decorating-style/farmhouse-fabrics-281474979509404/',
       'https://www.bhg.com/decorating/decorating-style/fabulous-farmhouse-mudrooms/',
       'https://www.bhg.com/decorating/makeovers/family-modern-farmhouse-tour/',
       'https://www.bhg.com/decorating/decorating-style/country/farmhouse-bedroom-ideas/',
       'https://www.bhg.com/decorating/decorating-style/country/farmhouse-wall-decor/',
       'https://www.bhg.com/kitchen/styles/country/decorate-a-farmhouse-kitchen/',
       'https://www.bhg.com/bathroom/remodeling/makeover/farmhouse-master-bath-makeover/',
       'https://www.bhg.com/these-farmhouse-finds-will-warm-up-your-bedroom/',
       'https://www.bhg.com/decorating/decorating-style/modern-farmhouse-decor/',
       'https://www.bhg.com/decorating/decorating-style/country/farmhouse-living-room-ideas/',
       'https://www.bhg.com/decorating/makeovers/farmhouse-style-living-room/',
       'https://www.bhg.com/decorating/makeovers/antique-modern-farmhouse-design/',
       'https://www.bhg.com/farmhouse-chic-walmart-finds/',
       'https://www.bhg.com/decorating/makeovers/farmhouse-vintage-ranch-house-design/']

bhg_farm = BHG_scrape(urls,'f')

#### Dwell

In [52]:
urls = ['https://www.dwell.com/article/texas-farmhouse-patrick-ousey-aa57de8c',
       'https://www.dwell.com/article/a-maine-farmhouse-built-with-salvaged-materials-e6aa4fc4',
       'https://www.dwell.com/article/modern-farmhouses-a9bf6970']

dwell_farm = Dwell_scrape(urls,'f')

#### HGTV Magazine

In [53]:
# def HGTV_scrape(urls, style):
#     hgtv_scraped = {}

#     for c,link in enumerate(urls): 
#         html = requests.get(link)
#         soup = BeautifulSoup(html.content, 'lxml')
#         container = soup.findAll('div', class_ = 'o-PhotoGalleryPromo__a-Description asset-description')
#         paragraph = []
#         for i in range(len(container)):
#             clean = container[i].text.strip()
#             paragraph.append(clean)
#             combined_text = ' '.join(paragraph)
#         hgtv_scraped.update({style + str(c): combined_text})
#     return hgtv_scraped

In [54]:
urls = ['https://www.hgtv.com/design/rooms/living-and-dining-rooms/farmhouse-living-room-designs-pictures',
       'https://www.hgtv.com/shows/farmhouse-fixer/create-farmhouse-style-pictures',
       'https://www.hgtv.com/design/rooms/living-and-dining-rooms/farmhouse-dining-room-ideas-pictures',
       'https://www.hgtv.com/design/rooms/bathrooms/farmhouse-bathroom-design-ideas-pictures',
       'https://www.hgtv.com/design/decorating/design-101/rustic-farmhouse-designs-pictures']

hgtv_farm = HGTV_scrape(urls, 'f')

#### Combining Scraped Lists

In [56]:
columns = ['text']
def create_df(dictionaries):
    all_df = pd.DataFrame()
    for d in dictionaries: 
        df = pd.DataFrame.from_dict(d, orient = 'index', columns = columns).reset_index()
        all_df = all_df.append(df)
    return all_df

In [57]:
dicts = [ad_farm, dwell_farm, hgtv_farm, elle_farm1, bhg_farm, ad_mcm, dwell_mcm, hgtv_mcm, elle_mcm, bhg_mcm, 
        clever_bohod, dwell_boho, hgtv_boho, bhg_boho, ad_boho, elle_boho2, elle_boho3, elle_boho4, elle_boho5, elle_boho6]

data = create_df(dicts)

In [58]:
data

Unnamed: 0,index,text
0,f1,When actor Sam Page and his wife Cassidy Ellio...
1,f2,The couple ultimately decided on Pacific Palis...
2,f3,The kitchen remodel was the hardest part of th...
3,f4,"To modernize the kitchen and bathrooms, as wel..."
4,f5,Elliott describes her personal design style as...
...,...,...
0,b29,Want more ELLE Decor? Get Instant Access!
0,b9,A Turkish towel is one of the chicest textiles...
0,b9,"Removable Wallpaper, $59Shop Now"
0,b5,“We didn’t want the house to take itself so se...


In [59]:
#data['index'] = [x.rstrip(x[-1]) for x in data['index']]

In [60]:
data.set_index('index', inplace=True)

In [61]:
data.to_pickle('paragraphed_corpus.pkl')

### Additional Webscraping - Round 2

#### House Beaufiful

In [62]:
url = 'https://www.housebeautiful.com/design-inspiration/house-tours/a35940421/melanie-turner-lewis-buck-crook-atlanta-home/'
html = requests.get(url)
soup = BeautifulSoup(html.text, 'lxml')

In [63]:
container = soup.findAll('div', class_ = 'article-body-content standard-body-content')

In [64]:
cont = container[0].findAll('p', class_ = 'body-text')

In [67]:
def HouseBeautifulScrape(urls, style): 
    hs_scraped = {}
    c = 1
    for link in urls: 
        html = requests.get(link)
        soup = BeautifulSoup(html.content, 'lxml')
    
        container = soup.findAll('div', class_ = 'article-body-content standard-body-content')
        cont = container[0].findAll('p', class_ = 'body-text')
                
        for i in range(len(cont)):
            clean = cont[i].text.strip()
            hs_scraped.update({style + str(c): clean})
            c += 1
    return hs_scraped

In [68]:
urls = ['https://www.housebeautiful.com/design-inspiration/house-tours/a35940421/melanie-turner-lewis-buck-crook-atlanta-home/',
       'https://www.housebeautiful.com/design-inspiration/house-tours/a34716459/formarch-architecture-palm-springs-home/',
       'https://www.housebeautiful.com/design-inspiration/house-tours/a33437685/ana-claudia-schultz-brownstone/',
       'https://www.housebeautiful.com/shopping/furniture/a36362399/christian-siriano-furniture-collection/']

hs_mcm = HouseBeautifulScrape(urls, 'm')

In [69]:
urls = ['https://www.housebeautiful.com/design-inspiration/house-tours/a35809069/banner-day-interiors-home-tour/',
       'https://www.housebeautiful.com/design-inspiration/a35153638/colleen-bashaw-new-jersey-home/',
       'https://www.housebeautiful.com/design-inspiration/a34587195/studio-munroe-bay-area-family-home/',
       'https://www.housebeautiful.com/design-inspiration/house-tours/a32661987/montauk-bungalow-brooklyn-home-company/',
       'https://www.housebeautiful.com/lifestyle/a29427124/christina-anstead-house-photos/',
       'https://www.housebeautiful.com/home-remodeling/interior-designers/q-and-a/a6968/colleen-bashaw-beach-house-interview/',
       'https://www.housebeautiful.com/design-inspiration/a23748087/what-is-bohemian-design-style/']

hs_boho = HouseBeautifulScrape(urls, 'b')

In [70]:
urls = ['https://www.housebeautiful.com/design-inspiration/house-tours/a35486331/krystal-matthews-louisiana-farmhouse/']

hs_farm = HouseBeautifulScrape(urls, 'f')

#### Martha Stewart

In [71]:
url = 'https://www.marthastewart.com/1011251/midcentury-modern-home-los-angeles?search_key=los+angeles+home&slide=af5aeb0b-ec3d-4795-9d67-ee90f73acf90#af5aeb0b-ec3d-4795-9d67-ee90f73acf90'
html = requests.get(url)
soup = BeautifulSoup(html.content, 'lxml')

In [72]:
container = soup.findAll('div', class_ = 'glide-track')

In [73]:
cont = container[0].findAll('p')

In [74]:
def MarthaScrape(urls, style): 
    martha_scraped = {}
    c = 1
    for link in urls: 
        html = requests.get(link)
        soup = BeautifulSoup(html.content, 'lxml')
    
        container = soup.findAll('div', class_ = 'glide-track')
        cont = container[0].findAll('p')
                
        for i in range(len(cont)):
            clean = cont[i].text.strip()
            martha_scraped.update({style + str(c): clean})
            c += 1
    return martha_scraped

In [75]:
urls = ['https://www.marthastewart.com/1011251/midcentury-modern-home-los-angeles?search_key=los+angeles+home&slide=8694dd6f-4bcb-4607-aec0-9fcedead4dfa#8694dd6f-4bcb-4607-aec0-9fcedead4dfa',
       'https://www.marthastewart.com/7949226/midcentury-modern-palm-springs-wedding-rebecca-yale?slide=16e3b728-6f9b-47d9-9b4f-0aec23052106#16e3b728-6f9b-47d9-9b4f-0aec23052106']

martha_mcm = MarthaScrape(urls, 'm')

In [76]:
url = 'https://www.sfgirlbybay.com/2020/08/19/on-trend-bohemian-whimsy/'
html = requests.get(url)
soup = BeautifulSoup(html.content, 'lxml')

html.status_code

403

In [77]:
container = soup.findAll('div', class_ = 'block text this-week dropcap')

In [78]:
container

[]

In [79]:
url = 'https://design-milk.com/mid-century-modern-bent-plywood-home-decor-by-ciseal/'
html = requests.get(url)
soup = BeautifulSoup(html.content, 'lxml')

html.status_code

200

In [80]:
container = soup.findAll('div', class_ = 'article-content')

In [81]:
cont = container[0].findAll('p')

In [82]:
def MilkScrape(urls, style): 
    milk_scraped = {}
    c = 1
    for link in urls: 
        html = requests.get(link)
        soup = BeautifulSoup(html.content, 'lxml')
    
        container = soup.findAll('div', class_ = 'article-content')
        cont = container[0].findAll('p')
                
        for i in range(len(cont)):
            clean = cont[i].text.strip()
            milk_scraped.update({style + str(c): clean})
            c += 1
    return milk_scraped

In [83]:
urls = ['https://design-milk.com/mid-century-modern-bent-plywood-home-decor-by-ciseal/',
       'https://design-milk.com/a-mid-century-wexler-is-transformed-with-vibrant-colors-and-bold-patterns/',
       'https://design-milk.com/a-remodeled-mid-century-home-located-steps-away-from-huntington-harbor/',
       'https://design-milk.com/spark-grills-inspired-by-mid-century-modern-design/',
       'https://design-milk.com/stuart-silk-architects-updates-a-mid-century-house-in-rancho-mirage/',
       'https://design-milk.com/warm-nordic-unveils-a-mid-century-table-lamp-named-ambience/',
       'https://design-milk.com/post-11-15-later-block-shop-textiles-amsterdam-modern-collaborate-limited-edition-collection-mid-century-furniture/',
       'https://design-milk.com/newmade-la-great-mid-century-design-at-affordable-prices/',
       'https://design-milk.com/smilow-design-launches-authentic-mid-century-lighting-designs/']

milk_mcm = MilkScrape(urls, 'm')

In [84]:
urls = ['https://design-milk.com/sophisticated-brooklyn-apartment-bohemian-edge/',
       'https://design-milk.com/french-fine-dining-restaurant-bohemian-twist/',
        'https://design-milk.com/beth-buccinis-home-exhibits-a-playful-mix-of-texture-pattern-color/']

milk_boho = MilkScrape(urls, 'b')

In [85]:
url = 'https://www.thespruce.com/decorating-in-farmhouse-style-1977571'
html = requests.get(url)
soup = BeautifulSoup(html.content, 'lxml')

html.status_code

200

In [86]:
container = soup.findAll('div', class_ = 'article__body right-rail')

In [87]:
cont = container[0].findAll('p')

In [88]:
def SpruceScrape(urls, style): 
    spruce_scraped = {}
    c = 1
    for link in urls: 
        html = requests.get(link)
        soup = BeautifulSoup(html.content, 'lxml')
    
        container = soup.findAll('div', class_ = 'article__body right-rail')
        cont = container[0].findAll('p')
        
        for i in range(len(cont)):
            clean = cont[i].text.strip()
            spruce_scraped.update({style + str(c): clean})
            c += 1
    return spruce_scraped

In [89]:
urls = ['https://www.thespruce.com/farmhouse-style-bedroom-ideas-4136415',
       'https://www.thespruce.com/inside-beautiful-farmhouse-style-kitchens-4129346',
       'https://www.thespruce.com/modern-farmhouse-bathroom-ideas-4147466',
       'https://www.thespruce.com/modern-farmhouse-style-living-rooms-4135941',
       'https://www.thespruce.com/10-unmistakable-signs-you-re-low-key-obsessed-with-farmhouse-decor-5095348',
       'https://www.thespruce.com/ways-to-add-farmhouse-style-4102345',
       'https://www.thespruce.com/farmhouse-architecture-4692188',
       'https://www.thespruce.com/classic-farmhouse-style-2213409',
       'https://www.thespruce.com/ways-to-bring-your-farmhouse-style-outdoors-5115287',
       'https://www.thespruce.com/hearth-and-hand-decor-at-target-4154081']

spruce_farm = SpruceScrape(urls, 'f')

In [90]:
urls = ['https://www.thespruce.com/decorators-guide-to-bohemian-style-1977570',
       'https://www.thespruce.com/chic-modern-nurseries-with-bohemian-charm-4051727',
       'https://www.thespruce.com/beautiful-boho-bedroom-decorating-ideas-4119470',
       'https://www.thespruce.com/create-a-boho-chic-home-1791335',
#        'https://www.thespruce.com/best-boho-bedding-4163343',
        'https://www.thespruce.com/best-instagram-accounts-for-boho-lovers-5075353']

spruce_boho = SpruceScrape(urls, 'b')

In [91]:
url = 'https://www.thespruce.com/best-boho-bedding-4163343'
html = requests.get(url)
soup = BeautifulSoup(html.content, 'lxml')

container = soup.findAll('div', class_ = 'loc content-body')
cont = container[0].findAll('p')

c = 1
for i in range(len(cont)):
    clean = cont[i].text.strip()
    spruce_boho.update({'bb' + str(c): clean})
    c += 1

In [92]:
urls = ['https://www.thespruce.com/midcentury-modern-kitchen-ideas-4582208',
       'https://www.thespruce.com/top-mid-century-modern-paint-colors-798000',
       'https://www.thespruce.com/mid-century-modern-furniture-designers-to-know-4123681',
       'https://www.thespruce.com/midcentury-modern-bedroom-ideas-4142250',
       'https://www.thespruce.com/mid-century-modern-furniture-designers-to-know-4123681',
       'https://www.thespruce.com/mid-century-modern-architecture-5072981',
       'https://www.thespruce.com/mid-century-furniture-liven-up-decor-4098179',
       'https://www.thespruce.com/mid-century-modern-lighting-4685246',
       'https://www.thespruce.com/mid-century-modern-living-rooms-4769744',
       'https://www.thespruce.com/best-mid-century-modern-instagram-accounts-5072412',
       'https://www.thespruce.com/mid-century-modern-home-furnishings-148808',
       'https://www.thespruce.com/things-you-should-know-about-mid-century-1391827',
       'https://www.thespruce.com/guide-to-mid-century-modern-patio-furniture-4145802',
       'https://www.thespruce.com/mid-century-modern-living-room-elements-4120959']

spruce_mcm = SpruceScrape(urls, 'm')

In [93]:
dicts = [spruce_mcm, spruce_boho, spruce_farm, milk_boho, milk_mcm, martha_mcm, hs_farm, hs_boho, hs_mcm]

data2 = create_df(dicts)

In [94]:
data2.shape

(1069, 2)

In [95]:
data2.set_index('index', inplace=True)

In [96]:
data2.to_pickle('paragraphed_corpus2.pkl')