![LU Logo](https://www.lu.lv/fileadmin/user_upload/LU.LV/www.lu.lv/Logo/Logo_jaunie/LU_logo_LV_horiz.png)


# Week 15 - Web Development with Flask, Web Scraping with Scrapy

## Lesson Overview

We will cover the following topics in this lesson:

* Flask - a Python framework for server-side web development
* Scrapy - a Python web scraping library

## Learning Objectives

At the end of this lesson, you will be able to:

* Understand the basics of server side web development
* Create a simple web application using Flask
* Understand the basics of web scraping
* Scrape data from a website using Scrapy

## Lesson Prerequisites

Before exploring this lesson, you should be able to:

* Understand the basics of Python programming - variables, data types, control structures, functions, OOP, and file I/O
* Know how to install Python packages using `pip`
* Know how to set up a virtual environment using `venv`
* Understand the basics of HTML - see MDN Web Docs' [Introduction to HTML](https://developer.mozilla.org/en-US/docs/Learn/HTML/Introduction_to_HTML)

## Front End Web Development vs. Back End Web Development

Web development can be divided into two main categories: front end and back end.

### Front End Web Development

Front end web development involves creating the user interface and user experience of a website. This includes designing the layout, colors, fonts, and interactive elements of the website. Front end developers use HTML, CSS, and JavaScript to create the visual elements of a website that users interact with.

### Back End Web Development

Back end web development involves creating the server-side logic and database interactions of a website. This includes handling user requests, processing data, and generating dynamic content. Back end developers use server-side programming languages like Python, PHP, Ruby, Java and others to build the server-side components of a website.

Providing APIs, handling user authentication, and managing databases are some of the tasks that back end developers are responsible for.

### Full Stack Web Development

Full stack web development involves working on both the front end and back end of a website. Full stack developers are proficient in both front end and back end technologies and can build complete web applications from start to finish. 

## Flask - A Python Web Framework

Flask is a lightweight Python web framework that allows you to build web applications quickly and easily. It is designed to be simple and easy to use, making it a great choice for beginners and experienced developers alike.

### How Flask Works

Flask is a micro web framework that provides the basic tools and libraries needed to build web applications. It is built on top of the WSGI (Web Server Gateway Interface) standard, which allows it to work with a variety of web servers.

### Virtual Environment Setup

Before installing Flask, it is HIGHLY recommended to create a virtual environment for your project. This will help you manage dependencies and avoid conflicts with other projects.

There are multiple ways to create a virtual environment in Python. One common way is to use the built-in `venv` module. Here's how you can create a virtual environment using `venv`:

```bash
# Create a new directory for your project
mkdir myproject
cd myproject
python -m venv myvenv
```

Instead of myenv, you can use any name you like for your virtual environment. To activate the virtual environment, you can use the following command:

```bash
# On Windows
myvenv\Scripts\activate

# On macOS and Linux
source myvenv/bin/activate
```

### Installing Flask

Once you have set up and ACTIVATED your virtual environment, you can install Flask using `pip`:

```bash

pip install Flask
```


## Creating a Simple Web Application with Flask


### Hello World Example

In [None]:
# let's create a simple Flask app

from flask import Flask

app = Flask(__name__) # this creates a new Flask app object

# we will be using app.route() decorator to define the URL that will trigger the function below
@app.route('/') # this route means that the function below will be called when the user goes to the root URL of your website
def hello_world():
    return 'Hello, World!'

# you'd add this line to run the app in script mode
# if __name__ == '__main__':
#     app.run()

app.run() # this is the same as the above line, but it's not recommended to use this in script mode
# usually you would not run this from Jupiter notebook, but from a terminal

# use Ctrl+C to stop the server on terminal
# in Jupyter notebook, you can stop the server by clicking on the stop button

 * Serving Flask app '__main__'
 * Debug mode: off


 * Running on http://127.0.0.1:5000
Press CTRL+C to quit
127.0.0.1 - - [09/Dec/2024 16:52:30] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [09/Dec/2024 16:52:30] "GET /favicon.ico HTTP/1.1" 404 -


### Using parameters in routes

Next step is to create a simple web application that takes a parameter in the URL and displays it on the page. Here's an example:

```python
from flask import Flask

app = Flask(__name__)

@app.route('/')
def hello_world():
    return 'Hello, World!'

@app.route('/greet/<name>')
def greet(name):
    return f'Hello, {name}!'

if __name__ == '__main__':
    app.run()
``` 

In [None]:
from flask import Flask

app = Flask(__name__)

@app.route('/')
def hello_world():
    return 'Hello, World!'

# note that the URL is case-sensitive
# note the use of <name> in the URL this is a variable part of the URL
@app.route('/greet/<name>')
def greet(name):
    return f'Hello, {name}!'

# note that only first level of the URL will be caught by this route
# so /greet/Janis/Berzins will not work
# but /greet/Janis will work

if __name__ == '__main__':
    app.run()

# use Ctrl+C to stop the server on terminal
# in Jupyter notebook, you can stop the server by clicking on the stop button

 * Serving Flask app '__main__'
 * Debug mode: off


 * Running on http://127.0.0.1:5000
Press CTRL+C to quit
127.0.0.1 - - [09/Dec/2024 16:54:53] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [09/Dec/2024 16:54:59] "GET /greet/Valdis HTTP/1.1" 200 -
127.0.0.1 - - [09/Dec/2024 16:55:05] "GET /greet/LUPython HTTP/1.1" 200 -
127.0.0.1 - - [09/Dec/2024 16:55:11] "GET /greet/LUPython/Latvia HTTP/1.1" 404 -


### Using query parameters

Query parameters are another way to pass data to a web application. They are added to the URL after a question mark `?` and are in the form `key=value`. Here's an example:

```python

from flask import Flask, request

app = Flask(__name__)

@app.route('/')
def hello_world():
    return 'Hello, World!'

@app.route('/sveiks')
def greet():
    name = request.args.get('name', 'World')
    return f'Hello, {name}!'

if __name__ == '__main__':
    app.run()
```

Now you can access the greet route with a query parameter like this: `http://localhost:5000/sveiks?name=Uldis`


In [None]:
## Using query parameters

from flask import Flask, request

app = Flask(__name__)

@app.route('/')
def hello_world():
    return 'Hello, World!'

@app.route('/sveiks')
def greet():
    name = request.args.get('name', 'Pasaule') #if no name argument is given, we will use 'Pasaule'
    return f'Hello, {name}!'

if __name__ == '__main__':
    app.run()

# on local server try something like http://127.0.0.1:5000/sveiks?name=Valdis

 * Serving Flask app '__main__'
 * Debug mode: off


 * Running on http://127.0.0.1:5000
Press CTRL+C to quit
127.0.0.1 - - [09/Dec/2024 17:07:54] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [09/Dec/2024 17:08:03] "GET /sveiks?name=Valdis HTTP/1.1" 200 -


### Using templates

## Flask project structure

```
project_root/
│
├── app.py                # Your main Flask application file
├── static/               # Folder for static files (CSS, JS, images, etc.)
│   └── styles.css        # Your CSS file
├── templates/            # Folder for HTML templates
│   └── base.html         # Your base template
│   └── index.html        # Other templates
└── requirements.txt      # (Optional) Python dependencies file
```


### Flask Learning References 

For further exploration of Flask, you can refer to the following resources:
- Official Flask Documentation: https://flask.palletsprojects.com/en/2.0.x/
- Miguel Grinberg's Flask Mega-Tutorial: https://blog.miguelgrinberg.com/post/the-flask-mega-tutorial-part-i-hello-world
- Corey Schafer's Flask Tutorial Series: https://www.youtube.com/playlist?list=PL-osiE80TeTs4UjLw5MM6OjgkjFeUxCYH

#### Handling forms with Flask

#### Using databases with Flask

#### User authentication with Flask

## Web Scraping - what is it?

Web scraping is the process of extracting data from websites. It involves sending HTTP requests to a website, parsing the HTML content, and extracting the data you need. Web scraping is commonly used for data mining, price monitoring, and content aggregation.

Web scraping can be done manually by simply saving contents of web site you are visiting, but it is often more efficient to use a web scraping library like Scrapy to automate the process.

### Rules of Web Scraping

* Play nice! Don't overload a website with requests, as this can cause server issues and get you banned.

* Before scraping a website, make sure to check the website's terms of service and robots.txt file to ensure that you are not violating any rules or policies.

* Also if it is possible to obtain the data you need through an API, it is recommended to use the API instead of scraping the website.

* Even better if data can be obtained from a public dataset, it is recommended to use the dataset instead of scraping the website.

* Note the difference between scraping something and using the data. Scraping is just the process of extracting data from a website, while using the data is a separate issue. For example there is a difference between doing a research on some website and using the same data to setup a competing service.

## Scrapy - A Python Web Scraping Library

Scrapy is a powerful web scraping library for Python that makes it easy to extract data from websites. It provides a high-level API for crawling websites and extracting data, making it a great choice for web scraping projects.

### How Scrapy Works

Scrapy works by sending HTTP requests to a website, parsing the HTML content, and extracting the data you need. It provides a set of tools and libraries for building web scrapers, including a built-in web crawler, a powerful selector system, and support for handling cookies and sessions.

### Installing Scrapy

As usual it is best to create and activate a virtual environment before installing Scrapy. 

You can install Scrapy using `pip`:

```bash
pip install Scrapy
```

### Simple web scraping example with Scrapy

Let's say we want to scrape the cities and their populations from wikipedia page: https://en.wikipedia.org/wiki/List_of_cities_in_Latvia
(Note that Wikipedia offers an API for accessing its data, so scraping is not necessary in this case. This is just an example.)

Here is a simple example of how to scrape data from a website using Scrapy:

```python   
import scrapy
url = 'https://en.wikipedia.org/wiki/List_of_cities_in_Latvia'
class CitySpider(scrapy.Spider):
    name = 'city_spider'
    start_urls = [url]
    def parse(self, response):
        for row in response.css('table.wikitable tr'):
            city = row.css('td:nth-child(2) a::text').get()
            population = row.css('td:nth-child(4)::text').get()
            if city and population:
                yield {
                    'city': city,
                    'population': population
                }

scraper = CitySpider()

from scrapy.crawler import CrawlerProcess
process = CrawlerProcess()
process.crawl(scraper)
process.start()
# see results in the console

```


### Scrapy Learning References

For further exploration of Scrapy, you can refer to the following resources:

- Official Scrapy Documentation: https://docs.scrapy.org/en/latest/
- Scrapy Tutorial: https://docs.scrapy.org/en/latest/intro/tutorial.html
- YouTube Tutorial: https://www.youtube.com/watch?v=ve_0h4Y8nuI

## Practice

### Flask Practice

1. Create a simple web application using Flask that displays a list of items. The list should be stored in a Python list and displayed on the web page.
2. Add a form to the web application that allows users to add new items to the list.
3. Add a delete button next to each item in the list that allows users to delete items from the list.

### Scrapy Practice

1. Create a Scrapy spider that scrapes data from a website of your choice. The spider should extract at least two fields from the website and save the data to a CSV file.
2. Modify the spider to save the data to a database instead of a CSV file.
3. Add error handling to the spider to handle cases where the website is down or the data is missing.

## Summary

In this lesson, we covered the basics of server-side web development using Flask and web scraping using Scrapy. We learned how to create a simple web application with Flask and how to scrape data from a website using Scrapy. We also discussed the rules of web scraping and best practices for working with web scraping libraries.