# 00 Introduction

## HTML

If our aim is to scrape websites we first have to talk about HTML. Because, behind every web page is an HTML document. While we're not going to write any HTML in this course, we do have to know how to read it! 

If you're coming from a web development background, or if you've written some HTML, this little introduction will be a breeze! And If you have no idea what HTML is or what it looks like, don't sweat! We'll start at the the beginning... 

Fire up your favourite web browser (I like Firefox), and bring up [Google](www.google.com):

<img src="images/google_home.png" width=500 />

Google is a great case study in HTML because it's famously minimal. To see the underlying HTML that renders the Google home page inside the browser, right click anywhere on the page and select `Inspect Element`:

<img src="images/google_inspect.png" width=500 />

This will bring up the "Inspector":

<img src="images/google_html.png" width=500 />

The Inspector connects each section of HTML code to each section of the displayed page. Hovering over a piece of code in the Inspector will highlight the linked element inside the browser.

## Boilerplate

There are a lot of `<angled>` brackets in HTML. And the Google home page is no exception. The page is riddled with `<div>`, `<span>` and `<style>` tags, each helping, in their own way, to structure and render the result that we see inside the browser. Though Google is (relatively) simple in HTML terms, there's a lot of code in the Inspector that deserves unpacking. We won't. Instead, let's take a couple of gigantic steps back to look at, and appreciate, the minimum amount of boilerplate HTML code required to render a (blank) page:

```html
<!DOCTYPE html>
<html>
  <head>
    <title></title>
  </head>
  <body>
  </body>
</html>
```

A couple of things to note:

1. The document type is declared at the top
2. The entire page is wrapped in an `<html>` tag
3. Open tags (`<tag>`) are eventually followed by close tags (`</tag>`)
4. The page is divided into two parts (`head` and `body`)


Every HTML is pretty well segmented into two parts:

- head: metadata and scripts and styling
- body: actual content

Here's a more complete page (still not very impressive):

In [1]:
with open('data/bad.html', 'r') as f:
    html = f.read()
    
from IPython.display import HTML; HTML(html)

Looking at the raw html text we can see the "page" rendered with the following code:

In [2]:
print(html)

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <title>This is HTML</title>
  </head>
  <body>
    <h1>This is HTML</h1>
    <p>It's not the greatest...</p>
    <div class='foo'>...but it is <i>functional</i>.</div>
    <br />
    <div>For good measure, here's some more of it!</div>
    <p>And an image:</p>
    <img src='https://invisiblebread.com/comics-firstpanel/2015-03-03-scrape.png' height='200' />
    <p id='bar'>Isn't HTML great?!</p>
  </body>
</html>



### gazpacho

Notice the various different tags in the "This is HTML" document. And now imagine that we want to extract information from this page. In order to get all of the `<p>` tags, for instance, we'll use a tool called [gazpacho](https://github.com/maxhumber/gazpacho) that can be installed at the command line with:

In [None]:
!pip install gazpacho

The main part of gazpacho is the `Soup` wrapper which allows us to parse over HTML documents, it's imported accordingly:

In [3]:
from gazpacho import Soup

To enable parsing, first wrap the html string in a gazpacho `Soup` object:

In [4]:
soup = Soup(html)

And use the main `find` method on the tag you wish to target:

In [5]:
soup.find('p')

[<p>It's not the greatest...</p>,
 <p>And an image:</p>,
 <p id="bar">Isn't HTML great?!</p>]

The `find` method, by default, will return a list if there is more than one element that shares that tag, or a soup object if there's just one.

To isolate on specific tags, we can target tag attributes (`attrs`) in a Python dictionary. So, if we're interested in scraping this slice of html: 

`<p id="bar">Isn't HTML great?!</p>` 

We can run:

In [6]:
soup.find('p', attrs={'id': 'bar'})

<p id="bar">Isn't HTML great?!</p>

To get the text inside the HTML, we can ask gazpacho to return the `.text` attribute:

In [7]:
soup.find('p', {'id': 'bar'}).text

"Isn't HTML great?!"

And to find all the `div`s on the page we can do the same thing but with `div` as the first argument:

In [8]:
soup.find('div')

[<div class="foo">...but it is <i>functional</i>.</div>,
 <div>For good measure, here's some more of it!</div>]

To get just the first `div` (and ignore the rest):

In [9]:
soup.find('div', mode='first')

<div class="foo">...but it is <i>functional</i>.</div>

And to isolate the `div` tags that have `class=foo`:

In [10]:
soup.find('div', {'class': 'foo'}).text

'...but it is'

You can literally isolate any tag!

In [11]:
soup.find('i').text

'functional'

But sometimes you want to just get rid of tags, so this is accomplished by calling:

In [12]:
soup.find('div', {'class': 'foo'}).remove_tags()

'...but it is functional.'

# 01 get

HTML is the stuff of websites. Importing HTML documents from our computer is neither fun nor realistic! So let's "get" HTML from an actual website.

To get, or download the HTML from a specific page we'll use the `get` function from gazpacho:

In [13]:
from gazpacho import get

### Status Codes

If every is hunkydory `get` will just return the raw HTML. But if something is wrong it will raise an HTTP Status code. 

While everyone is familiar with 404 and maybe 503, here's a helpful list of some common codes that you might encounter in the wild. Most importantly,  400s are your fault and 500s are the website's fault:

- 1xx Informational
- 2xx Sucess
    - 200 - OK
- 3xx Redirection
- 4xx Client Error (a.k.a. **your fault**)
    - 400 - Bad Request
    - 401 - Unauthorized
    - 403 - Forbidden
    - 404 - Not Found
    - 418 - 🍵
    - 429 - Too many requests
- 5xx Server Error (a.k.a. **their fault**)
    - 500 - Internal Server Error
    - 501 - Not Implemented
    - 502 - Bad Gateway
    - 503 - Service Unavailable
    - 504 - Gateway Timeout

Uncomment and run to see how gazpacho handles HTTP status codes:

In [None]:
# get('https://httpstat.us/403')

In [None]:
# get('https://httpstat.us/404')

In [None]:
# get('https://httpstat.us/418')

### Structuring a `get` request

Often we'll just need to point `get` at a URL. But sometimes, we'll need to manipulate the URL string to return specific information from a page. Here's a query string that perhaps searches for all cars with a year make of 2020 and a colour that equals black:

In [14]:
url = 'https://httpbin.org/anything?year=2020&colour=black'

get(url)

{'args': {'colour': 'black', 'year': '2020'},
 'data': '',
 'files': {},
 'form': {},
 'headers': {'Accept-Encoding': 'identity',
  'Host': 'httpbin.org',
  'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Firefox/78.0',
  'X-Amzn-Trace-Id': 'Root=1-5f47b813-16dff67d45adcf10433f4cd3'},
 'json': None,
 'method': 'GET',
 'origin': '50.101.35.196',
 'url': 'https://httpbin.org/anything?year=2020&colour=black'}

If instead we wanted red cars made in 2016 we could edit the string, or we could do something a little more Pythonic and use a params dictionary instead:

In [15]:
url = 'https://httpbin.org/anything'

r = get(
    url, 
    params={'year': 2016, 'colour': 'red'}, 
    headers={'User-Agent': 'gazpacho'}
)

r

{'args': {'colour': 'red', 'year': '2016'},
 'data': '',
 'files': {},
 'form': {},
 'headers': {'Accept-Encoding': 'identity',
  'Host': 'httpbin.org',
  'User-Agent': 'gazpacho',
  'X-Amzn-Trace-Id': 'Root=1-5f47b863-36773bcc54045184914673e4'},
 'json': None,
 'method': 'GET',
 'origin': '50.101.35.196',
 'url': 'https://httpbin.org/anything?year=2016&colour=red'}

# 02 Scrape World

The `get` requests that we've been looking at are still somewhat artificial... I bet you just want to start scraping already! Me too! But there's a problem...

Building a web scraping course is hard. Because by the time this is published it could be that all of the examples are out of date. And it wouldn't be my fault. The web is always changing! 

So, to solve this problem, I've created a Web Scraping Sandbox that replicates some familiar pages (that won't change) available at: www.scrape.world

If, for some reason www.scrape.world is down ($$$) you can grab source code from the repo [here](https://github.com/maxhumber/scrape.world), spin up a local application and change all the base urls accordingly:

In [16]:
local = False

if local: 
    url = 'localhost:5000'
else:
    url = "https://scrape.world"

In this first www.scrape.world example let's scrape all of the link tags in the `section-speech` part of the page:

In [18]:
from gazpacho import get, Soup

url = "https://scrape.world/soup"
html = get(url)
soup = Soup(html)

fos = soup.find("div", {"class": "section-speech"})

links = []
for a in fos.find("a"):
    try:
        link = a.attrs["href"]
        links.append(link)
    except AttributeError:
        pass

links = [l for l in links if "wikipedia.org" in l]

links

['https://en.wikipedia.org/wiki/Alphabet_soup_(linguistics)',
 'https://en.wikipedia.org/wiki/Alphabet',
 'https://en.wikipedia.org/wiki/Abiogenesis',
 'https://en.wikipedia.org/wiki/Soup_kitchen',
 'https://en.wikipedia.org/wiki/Stone_soup',
 'https://en.wikipedia.org/wiki/Souperism',
 'https://en.wikipedia.org/wiki/Great_Famine_(Ireland)',
 'https://en.wikipedia.org/wiki/Tag_soup',
 'https://en.wikipedia.org/wiki/HTML']

# 03 Tables

Here's how we might scrape the total spend for each team on this fictional Salary Cap page:

In [19]:
from gazpacho import get, Soup

url = "https://scrape.world/spend"
html = get(url)
soup = Soup(html)

trs = soup.find("tr", {"class": "tmx"})


def parse_tr(tr):
    team = tr.find("td", {"data-label": "TEAM"}).text
    spend = float(
        tr.find("td", {"data-label": "TODAYS CAP HIT"}).text.replace(",", "")[1:]
    )
    return team, spend


spend = [parse_tr(tr) for tr in trs]

spend

[('Toronto Pine Needles', 95929643.0),
 ('Arizona Dingos', 87349818.0),
 ('Buffalo Knives', 86968691.0),
 ('Dallas Celebrities', 82349165.0),
 ('St. Louis Doldrums', 82862927.0),
 ('Vancouver Whales', 83580706.0),
 ('Philadelphia Travellers', 83494245.0),
 ('Boston Kodiaks', 81394166.0),
 ('Chicago Greyfalcons', 82984294.0),
 ('Vegas Shining Templars', 81833332.0),
 ('Florida Jaguars', 82432002.0),
 ('San Jose Charlatans', 81395750.0),
 ('Washington Investments', 80589294.0),
 ('Edmonton Workers', 80901164.0),
 ('Detroit Carmine Feathers', 82133668.0),
 ('Pittsburgh Puffins', 80657875.0),
 ('Carolina Cyclones', 80405665.0),
 ('Calgary Flares', 78848375.0),
 ('Nashville Carnivores', 79779643.0),
 ('Tampa Bay Thunder', 79103331.0),
 ('Minnesota Savage', 78420255.0),
 ('New York Officials', 78837300.0),
 ('Anaheim Mallards', 78173090.0),
 ('Montreal Quebecers', 79868809.0),
 ('Winnipeg Airplanes', 77652021.0),
 ('Los Angeles Monarchs', 76517727.0),
 ('New York Indwellers', 76554999.0),
 (

# 04 Credentials

Sometimes what you're looking for is locked behind a login page. So long as you have a user account for that website, we can use Selenium to fake out a browser, capture the rendered HTML, and use gazpacho as normal.

To install Selenium run:

In [None]:
!pip install selenium

And follow the additional setup instructions [here](https://stackoverflow.com/a/42231328/3731467).

Using credentials to log in using Selenium we can grab the data at the /season endpoint by running:

In [20]:
%%writefile credentials.py

from gazpacho import Soup
import pandas as pd
from selenium.webdriver import Firefox
from selenium.webdriver.firefox.options import Options

url = "https://scrape.world/season"

options = Options()
options.headless = True
browser = Firefox(executable_path="/usr/local/bin/geckodriver", options=options)
browser.get(url)

# username
username = browser.find_element_by_id("username")
username.clear()
username.send_keys("admin")

# password
password = browser.find_element_by_name("password")
password.clear()
password.send_keys("admin")

# submit
browser.find_element_by_xpath("/html/body/div/div/form/div/input[3]").click()

# refetch page (just incase)
browser.get(url)

html = browser.page_source
soup = Soup(html)

tables = pd.read_html(browser.page_source)
east = tables[0]
west = tables[1]
df = pd.concat([east, west], axis=0)
df["W"] = df["W"].apply(pd.to_numeric, errors="coerce")
df = df.dropna(subset=["W"])
df = df[["Team", "W"]]
df = df.rename(columns={"Team": "team", "W": "wins"})
df = df.sort_values("wins", ascending=False)

print(df)

Writing credentials.py


In [None]:
!python credentials.py

# 05 Interactions 1

Sometimes a website allows us to filter the data displayed on the page with dropdowns and search bars. To interact with dropdown and other page elements we can use Selenium as well:

In [21]:
%%writefile interactions1.py

import time
from gazpacho import Soup
import pandas as pd
from selenium.webdriver import Firefox
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.support.ui import Select

url = "https://scrape.world/results"

options = Options()
options.headless = True
browser = Firefox(executable_path="/usr/local/bin/geckodriver", options=options)
browser.get(url)

# username
username = browser.find_element_by_id("username")
username.clear()
username.send_keys("admin")
time.sleep(0.5)

# password
password = browser.find_element_by_name("password")
password.clear()
password.send_keys("admin")
time.sleep(0.5)

# submit
browser.find_element_by_xpath("/html/body/div/div/form/div/input[3]").click()
time.sleep(0.5)

# refetch page (just incase)
browser.get(url)

search = browser.find_element_by_xpath("/html/body/div/div/div[2]/div[2]/label/input")
search.clear()
search.send_keys("toronto")
time.sleep(0.5)

drop_down = Select(
    browser.find_element_by_xpath("/html/body/div/div/div[2]/div[1]/label/select")
)
drop_down.select_by_visible_text("100")
time.sleep(0.5)

html = browser.page_source
soup = Soup(html)
df = pd.read_html(str(soup.find("table")))[0]

print(df)

Writing interactions1.py


In [None]:
!python interactions1.py

# 06 Interactions 2

Piggybacking on the last example, here's how we might extract data that iteratively loads on scroll:

In [None]:
%%writefile interactions2.py

from gazpacho import Soup
import pandas as pd
from selenium.webdriver import Firefox
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.common.keys import Keys

base = "https://scrape.world/"
endpoint = "population"
url = base + endpoint

options = Options()
options.headless = True
browser = Firefox(executable_path="/usr/local/bin/geckodriver", options=options)
browser.get(url)

poplist = browser.find_element_by_id('infinite-list')

days = 365
n = 0
while n < 365:
    browser.execute_script('arguments[0].scrollTop = arguments[0].scrollHeight', poplist)
    html = browser.page_source
    soup = Soup(html)
    n = len(soup.find('ul', {'id': 'infinite-list'}).find('li'))

lis = soup.find('ul', {'id': 'infinite-list'}).find('li')

def parse_li(li):
    day, population = li.text.split(' Population ')
    population = int(population)
    day = int(day.split('Day ')[-1])
    return {'day': day, 'population': population}

population = [parse_li(li) for li in lis][:days]
print(poplation[:20])

In [None]:
!python interactions2.py

# 07 Downloading

Sometimes we don't want HTML, but instead to extract an image, a video, or an audio clip from a web page. Here's how we might do that:

In [None]:
from pathlib import Path
from shutil import rmtree as delete
from urllib.request import urlretrieve as download
from gazpacho import get, Soup

dir = "media"
Path(dir).mkdir(exist_ok=True)

base = "https://scrape.world"
url = base + "/books"
html = get(url)
soup = Soup(html)

# download images
imgs = soup.find("img")
srcs = [i.attrs["src"] for i in imgs]

for src in srcs:
    name = src.split("/")[-1]
    download(base + src, f"{dir}/{name}")

# download audio
audio = soup.find("audio").find("source").attrs["src"]
name = audio.split("/")[-1]
download(base + audio, f"{dir}/{name}")

# download video
video = soup.find("video").find("source").attrs["src"]
name = video.split("/")[-1]
download(base + video, f"{dir}/{name}")

# clean up
delete(dir)

# 08 Scheduling (Local)

Everything up until this point has been (hopefully interesting, but nonetheless) table stakes. We want to take our scraping skills to the next level by building a modern web scraper that can run on a schedule. 

Imagine we want to point our scraper at a page to monitor prices and send us notifications for when a sale is happening. Here's how we'd start building:

In [None]:
%%writefile books.py

from gazpacho import get, Soup
import pandas as pd

def parse(book):
    name = book.find("h4").text
    price = float(book.find("p").text[1:].split(" ")[0])
    return name, price

def fetch_books():
    url = "https://scrape.world/books"
    html = get(url)
    soup = Soup(html)
    books = soup.find("div", {"class": "book-"})
    return [parse(book) for book in books]

data = fetch_books()
books = pd.DataFrame(data, columns=["title", "price"])

string = f"Current Prices:\n```\n{books.to_markdown(index=False, tablefmt='grid')}\n```"

print(string)

**Scheduling** 

In order to schedule this script to execute at some cadence we'll use [hickory](https://github.com/maxhumber/hickory) (`pip install hickory`):

```
hickory schedule books.py --every=30seconds
```

To check the status of a hickory script, run:

```
hickory status
```

And to kill a schedule:

```
hickory kill books.py
```

**Slack over print()**

To send results to Slack instead of printing to a log file we'll use [`slackclient`](https://github.com/slackapi/python-slackclient) the official Slack API for Python:

```python
pip install slackclient
```

In order to build a Slack Bot, we'll need a Slack API token, which will require us to do the following:

1. Create a new Slack App

Follow this [link](https://api.slack.com/apps) to open up the Apps Portal and click *Create New App*

2. Add permissions

In the menu on the left, find *OAuth and Permissions*. Click it, and scroll down to the *Scopes* section. Click *Add an OAuth Scope*.

Search for the *chat:write* and *chat:write.public* scopes, and add them. At this point, you can install the app to your workspace.

3. Copy the token to a `.env` file

On the same page you'll find your access token under the label *Bot User OAuth Access Token*. Copy this token, and save it to a `.env` file

It should look like this:

```
SLACK_API_TOKEN=xoxb-000000000-000000000-a0a0a0a0a0a0a0a0a0a0a0a0
```

Once you have a Slack API token we can now adjust the original Python script to send messages to a Slack Channel of our choosing:

In [None]:
%%writefile booksbot.py

import os
import sqlite3

from gazpacho import get, Soup
from dotenv import find_dotenv, load_dotenv # pip install python-dotenv
import pandas as pd
from slack import WebClient # pip install slackclient

load_dotenv(find_dotenv())

con = sqlite3.connect("data/books.db")
cur = con.cursor()

slack_token = os.environ["SLACK_API_TOKEN"]
client = WebClient(token=slack_token)

def parse(book):
    name = book.find("h4").text
    price = float(book.find("p").text[1:].split(" ")[0])
    return name, price

def fetch_books():
    url = "https://scrape.world/books"
    html = get(url)
    soup = Soup(html)
    books = soup.find("div", {"class": "book-"})
    return [parse(book) for book in books]

data = fetch_books()
books = pd.DataFrame(data, columns=["title", "price"])
books['date'] = pd.Timestamp("now")

books.to_sql('books', con, if_exists='append', index=False)
average = pd.read_sql("select title, round(avg(price),2) as average from books group by title", con)
df = pd.merge(books[['title', 'price']], average)

string = f"Current Prices:```\n{df.to_markdown(index=False, tablefmt='grid')}\n```"

response = client.chat_postMessage(
    channel="books",
    text=string
)

Schedule with `hickory schedule booksbot.py --every=30seconds` to monitor prices on a 30 second cadence.

# 09 Serverless (Lambda)

Let's say we want to build an app that scrapes something every day and have it be scheduled on AWS Lambda. Here's what we'll schedule:

In [1]:
import json
import os
import sys
from urllib.request import Request, urlopen

import pandas as pd

from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv())

def post(url, data):
    data = bytes(json.dumps(data).encode("utf-8"))
    request = Request(url=url, data=data, method="POST")
    request.add_header("Content-type", "application/json; charset=UTF-8")
    with urlopen(request) as response:
        response = json.loads(response.read().decode("utf-8"))
    return response

url = "https://scrape.world/demand"

tomorrow = (pd.Timestamp('today') + pd.Timedelta('1 day')).strftime("%Y-%m-%d %H:00")
temperature = 21
data = {"date": tomorrow, "temperature": temperature}
response = post(url, data)

text = f"{tomorrow=} demand will be ~{response['demand']} MW"
print(text)

tomorrow='2020-08-28 16:00' demand will be ~6420.0 MW


**Serverless**

To get it on Lambda we'll use [chalice](https://aws.github.io/chalice/index).

**Pricing**

The AWS Lambda free usage tier includes [1M free requests per month and 400,000 GB-seconds of compute time per month](https://aws.amazon.com/lambda/pricing/).

**Install**

To install Chalice, create and activate a virtual environment:

```
python -m venv venv
source venv/bin/activate
```

And install the Python package with `pip`:

```
pip install chalice
```

**Credentials**

In order to deploy a Chalice app, you'll need to configure some AWS credentials. If you have previously configured your machine to run boto3 (the AWS SDK for Python) or the AWS CLI then you can skip this section. Otherwise, signup for an AWS account and generate a new key [here](https://console.aws.amazon.com/iam/home?#/security_credentials) (click the Access Keys dropdown)

If this is your first time configuring credentials for AWS you can follow these steps to quickly get started:

```
mkdir ~/.aws
cat >> ~/.aws/config
[default]
aws_access_key_id=YOUR_ACCESS_KEY_HERE
aws_secret_access_key=YOUR_SECRET_ACCESS_KEY
region=YOUR_REGION (such as ca-central-1, us-east-1, us-west-1, etc)
```

**Project**

With our credntials configured the next thing to do is use the `chalice` command to create a new project, I'm going to call this project `energybot`:

```
chalice new-project energybot
```

This will create a `energybot` directory.  

`cd` into this directory:

```
cd energybot
```

You should see several files that have been created for you:

```
ls -la
drwxr-xr-x   3 max  staff   96 27 Aug 15:38 .chalice
-rw-r--r--   1 max  staff   37 27 Aug 15:38 .gitignore
-rw-r--r--   1 max  staff  734 27 Aug 15:38 app.py
-rw-r--r--   1 max  staff    0 27 Aug 15:38 requirements.txt
```

You can ignore the `.chalice` directory for now, the two main files we'll focus on is `app.py` and `requirements.txt`.

Let's take a look at the `app.py` file:

```python
from chalice import Chalice

app = Chalice(app_name='helloworld')

@app.route('/')
def index():
    return {'hello': 'world'}
```

The `new-project` command created a sample app that defines a single view, `/`, that when called will return the JSON body `{"hello": "world"}`.

**Deploying**

Let's deploy this app.  Make sure you're in the `energybot` directory and run `chalice deploy`:

```
chalice deploy
```

> Creating deployment package.
> Creating IAM role: helloworld-dev
> Creating lambda function: helloworld-dev
> Creating Rest API
> Resources deployed:
>
>   - Lambda ARN: arn:aws:lambda:us-west-2:12345:function:helloworld-dev
>   - Rest API URL: https://abcd.execute-api.us-west-2.amazonaws.com/api/

You now have an API up and running using API Gateway and Lambda:

```
curl https://9my19y6h9l.execute-api.ca-central-1.amazonaws.com/api/
{"hello": "world"}
```

Try making a change to the returned dictionary from the `index()` function.  You can then redeploy your changes by running `chalice deploy`.

**Customizing**

Let's intergrate our `energy.py` file into the app:

```python
import json
import os
import sys
from urllib.request import Request, urlopen

from chalice import Chalice, Rate
import pandas as pd
from slack import WebClient

app = Chalice(app_name='energybot')

if sys.platform == 'darwin':
    from dotenv import load_dotenv, find_dotenv
    load_dotenv(find_dotenv())

slack_token = os.environ["SLACK_API_TOKEN"]
client = WebClient(token=slack_token)

def post(url, data):
    data = bytes(json.dumps(data).encode("utf-8"))
    request = Request(url=url, data=data, method="POST")
    request.add_header("Content-type", "application/json; charset=UTF-8")
    with urlopen(request) as response:
        response = json.loads(response.read().decode("utf-8"))
    return response

@app.route("/")
def index():
    return {"hello": "world"}

@app.schedule(Rate(1, unit=Rate.MINUTES))
def tomorrow_demand(event):
    url = "https://scrape.world/demand"
    tomorrow = (pd.Timestamp('today') + pd.Timedelta('1 day')).strftime("%Y-%m-%d %H:00")
    temperature = 21
    data = {"date": tomorrow, "temperature": temperature}
    response = post(url, data)
    text = f"{tomorrow=} demand will be ~{response['demand']} MW"
    client.chat_postMessage(channel="energy",text=text)
```

Install the new requirements into the chalice virtual environment app:

```
pip install pandas slackclient
```

pip freeze the dependencies:

```
pip freeze > requirements.txt
```

Add `energybot/chalice/config.json` to your `.gitignore` file before adding environment variables.

Change the config.json to include your slack api token:

```
{
  "version": "2.0",
  "app_name": "energybot",
  "stages": {
    "dev": {
      "environment_variables": {
        "SLACK_API_TOKEN": "xoxb-1121314111312-1331577646004-shOuF65gofpftXrTaB6WcYAZ"
      },
      "api_gateway_stage": "api"
    }
  }
}
```

**Clean up**

If you're done experimenting with Chalice and you'd like to cleanup, you can use the `chalice delete` command, and Chalice will delete all the resources it created when running the `chalice deploy` command.

```
chalice delete
```

> Deleting Rest API: abcd4kwyl4
> Deleting function aws:arn:lambda:region:123456789:helloworld-dev
> Deleting IAM Role helloworld-dev



**Appendix A: Note on Rate Schedule**

[Source](https://aws.github.io/chalice/api.html?highlight=rate#Rate)

An instance of this class can be used as the `expression` value in the [`Chalice.schedule()`](https://aws.github.io/chalice/api.html?highlight=rate#Chalice.schedule) method:

```
@app.schedule(Rate(5, unit=Rate.MINUTES))
def handler(event):
    pass
```

Examples:

```
# Run every minute.
Rate(1, unit=Rate.MINUTES)

# Run every 2 hours.
Rate(2, unit=Rate.HOURS)
```

`unit`[¶](https://aws.github.io/chalice/api.html?highlight=rate#Rate.unit)

The unit of the provided `value` attribute.  This can be either `Rate.MINUTES`, `Rate.HOURS`, or `Rate.DAYS`.

**Appendix B: Adding Environment Variables**

[Source](https://aws.github.io/chalice/topics/configfile.html?highlight=environment#id1)

Adding environment variables to `.chalice/config.json`:

In the following example, environment variables are specified both as top level keys as well as per stage.  This allows us to provide environment variables that all stages should have as well as stage specific environment variables:

```
{
  "version": "2.0",
  "app_name": "app",
  "environment_variables": {
    "SHARED_CONFIG": "foo",
    "OTHER_CONFIG": "from-top"
  },
  "stages": {
    "dev": {
      "environment_variables": {
        "TABLE_NAME": "dev-table",
        "OTHER_CONFIG": "dev-value"
      }
    },
    "prod": {
      "environment_variables": {
        "TABLE_NAME": "prod-table",
        "OTHER_CONFIG": "prod-value"
      }
    }
  }
}
```