# Web Scraping — Part 2 — Workbook Solutions

*Don't forget to rename this notebook if you want to save changes!*

In this lesson, we're going to introduce how to scrape multiple web pages from the internet with the Python libraries requests and BeautifulSoup.

---

## Quick Demonstration of Image Scraping — NYT Front Page

### Import Requests and BeautifulSoup

Once again, we're going to use the `requests` library and the `BeautifulSoup` library to scrape data.

In [1]:
import requests
from bs4 import BeautifulSoup

### Get HTML Data and Extract Text

*The New York Times* Front Page: https://nytimes.com

Here we're going to request the url for *The New York Times* front page, extract the text of the web page, then transform it into BeautifulSoup document.

In [2]:
response = requests.get("https://nytimes.com")
html_string = response.text
document = BeautifulSoup(html_string, "html.parser")

Here we search through the HTML code to find all the `<img>` tags:

In [3]:
document.find_all('img')

[<img alt="" decoding="async" src="https://static01.nyt.com/images/2017/01/29/podcasts/the-daily-album-art/the-daily-album-art-square320-v4.png"/>,
 <img alt="" decoding="async" src="https://static01.nyt.com/images/2021/02/11/podcasts/00argument-albumart/00argument-albumart-square320.png"/>,
 <img alt="" decoding="async" src="https://static01.nyt.com/images/2021/03/31/insider/event-bar/event-bar-square640.png?quality=75&amp;auto=webp&amp;disable=upscale&amp;width=350"/>,
 <img class="desktop" src="https://static01.nyt.com/images/2020/07/21/us/cases_orphan_usa-1595349567192/cases_orphan_usa-1595349567192-master1050-v834.png"/>,
 <img class="mobile" src="https://static01.nyt.com/images/2020/07/21/us/cases_orphan_usa-1595349567192/cases_orphan_usa-1595349567192-square640-v1659.png"/>,
 <img src="https://static01.nyt.com/images/2021/04/01/homepage/PM-virus-fader-update-slide-JCQZ/PM-virus-fader-update-slide-JCQZ-master1050.jpg"/>,
 <img src="https://static01.nyt.com/images/2021/04/01/homep

To display these images in our Jupyter notebook, we're going to import the Python modules `Markdown` and `display`, which allow us to transform code output into Markdown and thus display the images in this notebook

In [4]:
from IPython.display import Markdown, display

# Loop through all the images on the NYT front page
for image in document.find_all('img'):
    
    # Convert the image tag to a string
    image_string = str(image)
    
    # Transform the tag to Markdown and then display it as Markdown
    display(Markdown(image_string))

<img alt="" decoding="async" src="https://static01.nyt.com/images/2017/01/29/podcasts/the-daily-album-art/the-daily-album-art-square320-v4.png"/>

<img alt="" decoding="async" src="https://static01.nyt.com/images/2021/02/11/podcasts/00argument-albumart/00argument-albumart-square320.png"/>

<img alt="" decoding="async" src="https://static01.nyt.com/images/2021/03/31/insider/event-bar/event-bar-square640.png?quality=75&amp;auto=webp&amp;disable=upscale&amp;width=350"/>

<img class="desktop" src="https://static01.nyt.com/images/2020/07/21/us/cases_orphan_usa-1595349567192/cases_orphan_usa-1595349567192-master1050-v834.png"/>

<img class="mobile" src="https://static01.nyt.com/images/2020/07/21/us/cases_orphan_usa-1595349567192/cases_orphan_usa-1595349567192-square640-v1659.png"/>

<img src="https://static01.nyt.com/images/2021/04/01/homepage/PM-virus-fader-update-slide-JCQZ/PM-virus-fader-update-slide-JCQZ-master1050.jpg"/>

<img src="https://static01.nyt.com/images/2021/04/01/homepage/PM-virus-fader-update-slide-JCQZ/PM-virus-fader-update-slide-JCQZ-square640.jpg"/>

<img src="https://static01.nyt.com/images/2021/04/01/homepage/PM-virus-fader-update-slide-QHJS/PM-virus-fader-update-slide-QHJS-master1050.jpg"/>

<img src="https://static01.nyt.com/images/2021/04/01/homepage/PM-virus-fader-update-slide-QHJS/PM-virus-fader-update-slide-QHJS-square640.jpg"/>

<img src="https://static01.nyt.com/images/2021/04/01/homepage/PM-virus-fader-update-slide-2NOK/PM-virus-fader-update-slide-2NOK-master1050.jpg"/>

<img src="https://static01.nyt.com/images/2021/04/01/homepage/PM-virus-fader-update-slide-2NOK/PM-virus-fader-update-slide-2NOK-square640.jpg"/>

<img src="https://static01.nyt.com/images/2021/04/01/homepage/PM-virus-fader-update-slide-UMJT/PM-virus-fader-update-slide-UMJT-master1050.jpg"/>

<img src="https://static01.nyt.com/images/2021/04/01/homepage/PM-virus-fader-update-slide-UMJT/PM-virus-fader-update-slide-UMJT-square640.jpg"/>

<img src="https://static01.nyt.com/images/2021/04/01/homepage/PM-virus-fader-update-slide-5GAV/PM-virus-fader-update-slide-5GAV-master1050.jpg"/>

<img src="https://static01.nyt.com/images/2021/04/01/homepage/PM-virus-fader-update-slide-5GAV/PM-virus-fader-update-slide-5GAV-square640.jpg"/>

<img src="https://static01.nyt.com/images/2021/04/01/homepage/PM-virus-fader-update-slide-48H1/PM-virus-fader-update-slide-48H1-master1050.jpg"/>

<img src="https://static01.nyt.com/images/2021/04/01/homepage/PM-virus-fader-update-slide-48H1/PM-virus-fader-update-slide-48H1-square640.jpg"/>

<img src="https://static01.nyt.com/images/2021/04/01/homepage/PM-virus-fader-update-slide-V603/PM-virus-fader-update-slide-V603-master1050.jpg"/>

<img src="https://static01.nyt.com/images/2021/04/01/homepage/PM-virus-fader-update-slide-V603/PM-virus-fader-update-slide-V603-square640.jpg"/>

<img src="https://static01.nyt.com/images/2021/04/01/homepage/PM-virus-fader-update-slide-VC94/PM-virus-fader-update-slide-VC94-master1050.jpg"/>

<img src="https://static01.nyt.com/images/2021/04/01/homepage/PM-virus-fader-update-slide-VC94/PM-virus-fader-update-slide-VC94-square640.jpg"/>

<img src="https://static01.nyt.com/images/2021/04/01/homepage/PM-virus-fader-update-slide-38H0/PM-virus-fader-update-slide-38H0-master1050.jpg"/>

<img src="https://static01.nyt.com/images/2021/04/01/homepage/PM-virus-fader-update-slide-38H0/PM-virus-fader-update-slide-38H0-square640.jpg"/>

<img alt="US coronavirus cases" class="svelte-zl0f6y" src="https://static01.nyt.com/newsgraphics/2020/03/16/coronavirus-maps/8c96f04c6de658a5d841582ab5d0dcbae9109438/images/orphan_usa-threeByTwoSmallAt2X.png"/>

<img alt="Worldwide coronavirus cases" class="svelte-zl0f6y" src="https://static01.nyt.com/newsgraphics/2020/03/16/coronavirus-maps/8c96f04c6de658a5d841582ab5d0dcbae9109438/images/orphan_world-threeByTwoSmallAt2X.png"/>

<img alt="Where states are reporting vaccines given" class="svelte-zl0f6y" src="https://static01.nytimes.com/newsgraphics/2020/12/09/vaccine-distribution-tracker/assets/scoop-vaccine-distribution-tracker-threeByTwoSmallAt2X.png"/>

<img alt="US coronavirus cases" class="svelte-zl0f6y" src="https://static01.nyt.com/newsgraphics/2020/03/16/coronavirus-maps/8c96f04c6de658a5d841582ab5d0dcbae9109438/images/orphan_usa-threeByTwoSmallAt2X.png"/>

<img alt="Worldwide coronavirus cases" class="svelte-zl0f6y" src="https://static01.nyt.com/newsgraphics/2020/03/16/coronavirus-maps/8c96f04c6de658a5d841582ab5d0dcbae9109438/images/orphan_world-threeByTwoSmallAt2X.png"/>

<img alt="Vaccine tracker" class="svelte-zl0f6y" src="https://static01.nyt.com/newsgraphics/2020/03/16/coronavirus-maps/8c96f04c6de658a5d841582ab5d0dcbae9109438/images/footer-thumbs/vaccines.png"/>

<img alt="US coronavirus cases" class="svelte-zl0f6y" src="https://static01.nyt.com/newsgraphics/2020/03/16/coronavirus-maps/8c96f04c6de658a5d841582ab5d0dcbae9109438/images/orphan_usa-threeByTwoSmallAt2X.png"/>

<img alt="Worldwide coronavirus cases" class="svelte-zl0f6y" src="https://static01.nyt.com/newsgraphics/2020/03/16/coronavirus-maps/8c96f04c6de658a5d841582ab5d0dcbae9109438/images/orphan_world-threeByTwoSmallAt2X.png"/>

<img alt="Vaccine tracker" class="svelte-zl0f6y" src="https://static01.nyt.com/newsgraphics/2020/03/16/coronavirus-maps/8c96f04c6de658a5d841582ab5d0dcbae9109438/images/footer-thumbs/vaccines.png"/>

<img src="https://static01.nyt.com/images/2021/04/01/multimedia/01chauvin-hpfader1b/merlin_185848461_c216fa52-b810-4451-9abb-41ea4351227c-threeByTwoMediumAt2X.jpg">
</img>

<img src="https://static01.nyt.com/images/2021/04/01/multimedia/01chauvin-hpfader2/merlin_185847222_09a383e8-7ba2-45f3-8726-30aefdf04778-threeByTwoMediumAt2X.jpg">
</img>

<img src="https://static01.nyt.com/images/2021/04/01/multimedia/01chauvin-hpfader-slide1/merlin_185786850_d1d9fe71-4c62-4d4a-bc2a-4a1048ecec2b-threeByTwoMediumAt2X.jpg">
</img>

<img src="https://static01.nyt.com/images/2021/04/01/business/01nytchauvin-HPfader-slide-5UKS/01nytchauvin-HPfader-slide-5UKS-threeByTwoMediumAt2X.jpg">
</img>

<img src="https://static01.nyt.com/images/2021/04/01/business/01nytchauvin-HPfader-slide-OQ08/01nytchauvin-HPfader-slide-OQ08-threeByTwoMediumAt2X.jpg">
</img>

<img src="https://static01.nyt.com/images/2021/03/31/multimedia/31nytchauvin13/merlin_185807448_2721d030-0d49-4306-92b7-08eccad98588-threeByTwoMediumAt2X.jpg">
</img>

<img src="https://static01.nyt.com/images/2021/04/01/business/01nytchauvin-HPfader-slide-MK68/01nytchauvin-HPfader-slide-MK68-threeByTwoMediumAt2X.jpg">
</img>

<img class="css-hdqqnp" loading="lazy" role="presentation" src="https://static01.nyt.com/images/2021/03/31/science/31cli-bidenclimate-1/merlin_180788583_3a9e790e-f13d-4d3d-8036-f5b90d213506-threeByTwoSmallAt2X.jpg?format=pjpg&amp;quality=75&amp;auto=webp&amp;disable=upscale"/>

<img class="css-hdqqnp" loading="lazy" role="presentation" src="https://static01.nyt.com/images/2021/03/25/world/00ethiopia-assaults-1/00ethiopia-assaults-1-threeByTwoMediumAt2X-v4.jpg?format=pjpg&amp;quality=75&amp;auto=webp&amp;disable=upscale"/>

<img src="https://static01.nyt.com/images/2020/12/08/multimedia/home-fader-hp-2-slides-slide-LCPY/home-fader-hp-2-slides-slide-LCPY-master1050.jpg"/>

<img src="https://static01.nyt.com/images/2020/12/08/multimedia/home-fader-hp-2-slides-slide-LCPY/home-fader-hp-2-slides-slide-LCPY-square640.jpg"/>

<img src="https://static01.nyt.com/images/2020/12/08/multimedia/home-fader-hp-2-slides-slide-R65N/home-fader-hp-2-slides-slide-R65N-master1050.jpg"/>

<img src="https://static01.nyt.com/images/2020/12/08/multimedia/home-fader-hp-2-slides-slide-R65N/home-fader-hp-2-slides-slide-R65N-square640.jpg"/>

<img src="https://static01.nyt.com/images/2020/12/08/multimedia/home-fader-hp-2-slides-slide-9OFH/home-fader-hp-2-slides-slide-9OFH-master1050.jpg"/>

<img src="https://static01.nyt.com/images/2020/12/08/multimedia/home-fader-hp-2-slides-slide-9OFH/home-fader-hp-2-slides-slide-9OFH-square640.jpg"/>

<img src="https://static01.nyt.com/images/2020/12/08/multimedia/home-fader-hp-2-slides-slide-DSSY/home-fader-hp-2-slides-slide-DSSY-master1050.jpg"/>

<img src="https://static01.nyt.com/images/2020/12/08/multimedia/home-fader-hp-2-slides-slide-DSSY/home-fader-hp-2-slides-slide-DSSY-square640.jpg"/>

<img src="https://static01.nyt.com/images/2020/12/08/multimedia/home-fader-hp-2-slides-slide-G9KM/home-fader-hp-2-slides-slide-G9KM-master1050.jpg"/>

<img src="https://static01.nyt.com/images/2020/12/08/multimedia/home-fader-hp-2-slides-slide-G9KM/home-fader-hp-2-slides-slide-G9KM-square640.jpg"/>

<img src="https://static01.nyt.com/images/2020/12/08/multimedia/home-fader-hp-2-slides-slide-PS44/home-fader-hp-2-slides-slide-PS44-master1050.jpg"/>

<img src="https://static01.nyt.com/images/2020/12/08/multimedia/home-fader-hp-2-slides-slide-PS44/home-fader-hp-2-slides-slide-PS44-square640.jpg"/>

<img src="https://static01.nyt.com/images/2020/12/08/multimedia/home-fader-hp-2-slides-slide-CNRH/home-fader-hp-2-slides-slide-CNRH-master1050.jpg"/>

<img src="https://static01.nyt.com/images/2020/12/08/multimedia/home-fader-hp-2-slides-slide-CNRH/home-fader-hp-2-slides-slide-CNRH-square640.jpg"/>

## Quick Demonstration of Image Scraping — Bill Gates's LinkedIn Page

https://www.linkedin.com/in/williamhgates/

In [5]:
response = requests.get("https://www.linkedin.com/in/williamhgates/")
html_string = response.text
document = BeautifulSoup(html_string, "html.parser")

In [6]:
from IPython.display import Markdown, display

# Loop through all the images on the NYT front page
for image in document.find_all('img'):
    # Convert the image tag to a string
    image_string = str(image)
    # Transform the tag to Markdown and then display it as Markdown
    display(Markdown(image_string))

What's going wrong here?

In [7]:
response

<Response [999]>

## Scraping Multiple Web Pages At a Time

In the last lesson, we figured out how to scrape the lyrics for a single Missy Elliott song.

In [8]:
response = requests.get("https://genius.com/Missy-elliott-work-it-lyrics")
html_string = response.text
document = BeautifulSoup(html_string, "html.parser")

In [9]:
document.find('p').text

"[Intro]\nDJ, please pick up your phone, I'm on the request line\nThis is a Missy Elliott one-time exclusive, come on\n\n[Chorus]\nIs it worth it? Let me work it\nI put my thing down, flip it and reverse it\nTi esrever dna ti pilf, nwod gniht ym tup\nTi esrever dna ti pilf, nwod gniht ym tup\nIf you got a big *elephant trumpet*, let me search ya\nAnd find out how hard I gotta work ya\nTi esrever dna ti pilf, nwod gniht ym tup\nTi esrever dna ti pilf, nwod gniht ym tup\nC'mon\n\n[Verse 1]\nI'd like to get to know ya so I could show ya\nPut the pussy on ya like I told ya\nGive me all your numbers so I can phone ya\nYour girl acting stank, then call me over\nNot on the bed, lay me on your sofa\nCall before you come, I need to shave my chocha\nYou do or you don't or you will or won't ya?\nGo downtown and eat it like a vulture\nSee my hips and my tips, don't ya?\nSee my ass and my lips, don't ya?\nLost a few pounds and my waist for ya\nThis the kinda beat that go ra-ta-ta\nRa-ta-ta-ta-ta-ta

But how can we scrape lyrics for multiple Missy Elliott songs at a time?

### Figure Out the Pattern

What we need to do is figure out how to progammatically generate the correct Genius web page URL for each song we're interested in:

`f"https://genius.com/Missy-elliott-{formatted_song}-lyrics"`

In [152]:
song_titles = ['Work It', 'WTF (Where They From)', 'The Rain (Supa Dupa Fly)']

```
for song in song_titles:
    formatted_song = ?????
    response = requests.get(f"https://genius.com/Missy-elliott-{formatted_song}-lyrics")
    html_string = response.text
    document = BeautifulSoup(html_string, "html.parser")
    document.find('p').text
```

Let's inspect the Genius web pages for each of these songs:

https://genius.com/Missy-elliott-work-it-lyrics

https://genius.com/Missy-elliott-the-rain-supa-dupa-fly-lyrics

https://genius.com/Missy-elliott-wtf-where-they-from-lyrics

### Make Song Titles Fit Pattern — Your Turn!

Create a function called `format_song()` that will take in a song title and then return the song title correctly formatted for its Genius web page.

For example, the song `WTF (Where They From)` needs to be converted to `wtf-where-they-from`.

Hint: You will need to use [string methods](https://info1350.github.io/Intro-CA-SP21/02-Python/06-String-Methods.html#id1)!

In [18]:
def format_song(song):
    formatted_song = song.lower()
    formatted_song = formatted_song.replace(' ', '-')
    formatted_song = formatted_song.replace('(', '')
    formatted_song = formatted_song.replace(')', '')
    
    return formatted_song

Test of your function on these two song titles to make sure it's working correctly.

In [19]:
format_song('WTF (Where They From)')

'wtf-where-they-from'

In [20]:
format_song('Work It')

'work-it'

### Put It All Together

In [22]:
song_titles = ['Work It', 'WTF (Where They From)', 'The Rain (Supa Dupa Fly)']

Now use your `format_song()` function to create the variable `formatted_song`, which will allow the code below to work.

In [23]:
for song in song_titles:
    formatted_song = format_song(song)
    response = requests.get(f"https://genius.com/Missy-elliott-{formatted_song}-lyrics")
    html_string = response.text
    document = BeautifulSoup(html_string, "html.parser")
    lyrics = document.find('p').text
    print(lyrics)

[Intro]
DJ, please pick up your phone, I'm on the request line
This is a Missy Elliott one-time exclusive, come on

[Chorus]
Is it worth it? Let me work it
I put my thing down, flip it and reverse it
Ti esrever dna ti pilf, nwod gniht ym tup
Ti esrever dna ti pilf, nwod gniht ym tup
If you got a big *elephant trumpet*, let me search ya
And find out how hard I gotta work ya
Ti esrever dna ti pilf, nwod gniht ym tup
Ti esrever dna ti pilf, nwod gniht ym tup
C'mon

[Verse 1]
I'd like to get to know ya so I could show ya
Put the pussy on ya like I told ya
Give me all your numbers so I can phone ya
Your girl acting stank, then call me over
Not on the bed, lay me on your sofa
Call before you come, I need to shave my chocha
You do or you don't or you will or won't ya?
Go downtown and eat it like a vulture
See my hips and my tips, don't ya?
See my ass and my lips, don't ya?
Lost a few pounds and my waist for ya
This the kinda beat that go ra-ta-ta
Ra-ta-ta-ta-ta-ta-ta-ta-ta-ta
Sex me so good I

## Write Lyrics to a Text File

In [152]:
song_titles = ['Work It', 'WTF (Where They From)', 'The Rain (Supa Dupa Fly)']

Here we are writing the lyrics to a text file rather than printing them out.

Again, use your `format_song()` function to create the variable `formatted_song`, which will allow the code below to work.

In [160]:
with open('Missy-Elliott-Lyrics.txt', mode='w') as file_object:
    
    for song in song_titles:
        formatted_song = format_song(song)  #Use your format_song() function here
        response = requests.get(f"https://genius.com/Missy-elliott-{formatted_song}-lyrics")
        html_string = response.text
        document = BeautifulSoup(html_string, "html.parser")
        lyrics = document.find('p').text
        
        file_object.write(lyrics)

## Count Top Words From File

If we wanted to find out the most frequent words in Missy Elliott's lyrics, we could use the word counter code that we've used in previous lessons.

In [24]:
import re
from collections import Counter

stopwords = ['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', 'your', 'yours',
 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', 'her', 'hers',
 'herself', 'it', 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves',
 'what', 'which', 'who', 'whom', 'this', 'that', 'these', 'those', 'am', 'is', 'are',
 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does',
 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until',
 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into',
 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down',
 'in', 'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here',
 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each', 'few', 'more',
 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so',
 'than', 'too', 'very', 's', 't', 'can', 'will', 'just', 'don', 'should', 'now', 've', 'll', 'amp']


def split_into_words(any_chunk_of_text):
    lowercase_text = any_chunk_of_text.lower()
    split_words = re.split("\W+", lowercase_text)
    return split_words

def get_top_words(full_text, number_of_words=20):
    all_the_words = split_into_words(full_text)
    meaningful_words = [word for word in all_the_words if word not in stopwords]
    meaningful_words_tally = Counter(meaningful_words)
    most_frequent_meaningful_words = meaningful_words_tally.most_common(number_of_words)
    return most_frequent_meaningful_words

Let's read in the file that we created and get the top words.

In [25]:
missy_lyrics = open('Missy-Elliott-Lyrics.txt').read()
get_top_words(missy_lyrics)

[('ti', 33),
 ('rain', 32),
 ('stand', 29),
 ('ya', 23),
 ('like', 21),
 ('m', 20),
 ('window', 20),
 ('esrever', 16),
 ('dna', 16),
 ('pilf', 16),
 ('nwod', 16),
 ('gniht', 16),
 ('ym', 16),
 ('tup', 16),
 ('missy', 14),
 ('elliott', 13),
 ('make', 13),
 ('let', 12),
 ('get', 12),
 ('work', 11)]

## What patterns do you notice about the top 20 words from these Missy Elliott songs?
Feel free to open the text file in the file browser at the left and inspect the lyrics manually

## Bonus: If You Wanted to Change the Artist...

In [26]:
artist = 'Bts'
song_titles = ['Dynamite', 'Euphoria', 'Fake Love']

for song in song_titles:
    formatted_song = format_song(song)
    response = requests.get(f"https://genius.com/{artist}-{formatted_song}-lyrics")
    html_string = response.text
    document = BeautifulSoup(html_string, "html.parser")
    lyrics = document.find('p').text
    print(lyrics)

[Intro: Jungkook]
'Cause I, I, I'm in the stars tonight
So watch me bring the fire and set the night alight

[Verse 1: Jungkook]
Shoes on, get up in the morn'
Cup of milk, let's rock and roll
King Kong, kick the drum
Rolling on like a Rolling Stone
Sing song when I'm walkin' home
Jump up to the top, LeBron
Ding-dong, call me on my phone
Ice tea and a game of ping pong

[Pre-Chorus: RM, j-hope]
This is gettin' heavy, can you hear the bass boom? I'm ready (Woo-hoo)
Life is sweet as honey, yeah, this beat cha-ching like money, huh
Disco overload, I'm into that, I'm good to go
I'm diamond, you know I glow up
Hey, so let's go

[Chorus: Jungkook, Jimin]
'Cause I, I, I'm in the stars tonight
So watch me bring the fire and set the night alight (Hey)
Shinin' through the city with a little funk and soul
So I'ma light it up like dynamite, woah-oh-oh

[Verse 2: V, RM]
Bring a friend, join the crowd, whoever wanna come along
Word up, talk the talk, just move like we off the wall
Day or night, the s

## Group Discussion

* Do you think scholars should use web scraping in their research? Why or why not?
* How would you feel if you found out that one of your social media posts had been included in an academic article without your knowledge?
* What are some strategies that you think scholars might use to do web scraping in an ethical way?