# Getting Data from the Web

We can use the `requests` library to make API requests, and the [Beautiful Soup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/) library to parse the data.

In [1]:
# import our libraries
import pandas as pd
import requests
from bs4 import BeautifulSoup

Let's start by making [an API request](https://www.cloudflare.com/learning/security/api/what-is-api-call/) to https://icanhazdadjoke.com/, a site that provides random dad jokes. The documentation for the API is at https://icanhazdadjoke.com/api.

In [2]:
# make a request to the website
res = requests.get("https://icanhazdadjoke.com/", headers={"Accept": "application/json"})

# convert the response to json
res_json = res.json()

# add the joke to a dataframe
df = pd.DataFrame([res_json])

# print the dataframe
df

Unnamed: 0,id,joke,status
0,RCQKuHJBdib,I been watching a channel on TV that is strict...,200


We can also use Beautiful Soup to parse the HTML from a web page. Let's try scraping the [Wikipedia page for the Python programming language](https://en.wikipedia.org/wiki/Python_(programming_language)).

In [7]:
# Scrape the Wikipedia page

# use requests to get the page html
res = requests.get('https://en.wikipedia.org/wiki/Python_(programming_language)')

# use beautiful soup to parse the html
soup = BeautifulSoup(res.text, 'html.parser')

# find all the links on the page
links = soup.find_all('a', href=True)

# extract urls and link texts
link_data = []
for link in links:
    if link['href'].startswith('/wiki/') and not link['href'].startswith('/wiki/File:'):
        link_data.append({'Text': link.get_text(), 'URL': 'https://en.wikipedia.org' + link['href']})

# make a dataframe from the link data
df = pd.DataFrame(link_data)

df

Unnamed: 0,Text,URL
0,Main page,https://en.wikipedia.org/wiki/Main_Page
1,Contents,https://en.wikipedia.org/wiki/Wikipedia:Contents
2,Current events,https://en.wikipedia.org/wiki/Portal:Current_e...
3,Random article,https://en.wikipedia.org/wiki/Special:Random
4,About Wikipedia,https://en.wikipedia.org/wiki/Wikipedia:About
...,...,...
1128,Articles with NKC identifiers,https://en.wikipedia.org/wiki/Category:Article...
1129,Articles with SUDOC identifiers,https://en.wikipedia.org/wiki/Category:Article...
1130,Articles with example Python (programming lang...,https://en.wikipedia.org/wiki/Category:Article...
1131,About Wikipedia,https://en.wikipedia.org/wiki/Wikipedia:About


To learn more, check out the [documentation for requests](https://docs.python-requests.org/en/master/) and the [Beautiful Soup documentation](https://www.crummy.com/software/BeautifulSoup/bs4/doc/).