<a href="https://colab.research.google.com/github/DolicaAkelloEgwel/python-slides/blob/main/python-for-beginners/python-for-beginners-image-scraper.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Python For Beginners
# Overview

+ The Python Interpreter
+ Hello World / Variables
+ Variable Reassignment
+ Numbers
+ Lists
+ Combining Things
+ Libraries
+ List Comprehensions

# Using Python Notebooks

Python Notebooks are made up of blocks of code called cells. These cells can be run by clicking on the play button on the left of each code block. If the code runs successfully then you will see a green tick appear on the left. If there is any output then it will appear beneath the cell.

Python Notebooks are handy resources for teaching as they allow me to combine explanation with examples. But do keep in mind that they do not play especially well with version control and for a more serious project you will probably want to learn how to write standalone Python files.

# Connecting Drive

Run the cell below.

In [None]:
from google.colab import drive

drive.mount("/content/drive")

In the Files panel on the left-hand side, you should now see a `drive` folder. At the end of the workshop you will be able to save the images that you have created by moving them to `drive > MyDrive`.

![](connecting-drive.png)

# Some Python Basics

# Calculator

The Python interpreter can be used as a calculator.

**Exercise:** Find the value of `2 + 2` using Python.

In [None]:
# live coding goes here

# Comments

Comments are lines of text that are ignored by the Python interpreter. In Python you write comments with the `#` symbol. They can be used to indicate what a certain part of the code is doing to a fellow coder or your future self. 

In [None]:
# I am about to add some numbers
1 + 1  # The numbers are being added
# I have just added some numbers

# Hello World

Variables allow you to store values so that they can be used again or changed later. They are like labelled boxes for storing data. The `print()` command can be used to show the value of a variable.

To create a text variable in Python we store it in something called a **string.** A string is a sequence of characters that is **enclosed with quotation marks.**

In [None]:
my_text = "Hello World!"
print(my_text)

**Exercise:** Create a variable called `greeting` containing the text `My name is [name goes here]` then use the `print()` command to show it.

In [None]:
# live coding goes here

# Reassignment

We can change the value of variable by using the `=` operator to give it another value. This is known as _reassignment_. Be aware that Notebooks allow you to execute cells in any order, meaning that it can be a bit trickier to ensure others get the same results you do.

In [None]:
my_text = "hello!"

In [None]:
print(my_text)

In [None]:
my_text = "goodbye!"

# Numbers

Numbers in Python come in the form of whole numbers (Integers) or numbers with decimals places (Floats).

In [None]:
num = 10
pi = 3.14

# Lists

A list is a collection of different data that is stored in a single variable.

In [None]:
my_list = ["text-1", "text-2", "text-3"]
print(my_list)

To access an item from a list we need an index. An index is a number that corresponds with the item's place in the list. Because Python has zero-based indexing, the first element in the list (which is `"text-1"` in this case) has an index of 0. The second item has an index of 1, and the third item has an index of 2. 

To access an item in a list you use the name of the list followed by the index of the item you want to access enclosed in square brackets.

In [None]:
print(my_list[0])

# Loops

Loops allow you to repeat a block of code multiple times. They are especially helpful for doing things with lists.

In [None]:
cool_list = [1, 2, 3, 4, 5]
for num in cool_list:
    print(num)

# Combining Things with `+`

The `+` operator doesn't just allow us to add numbers. It can also be used to combine some of the different data types in Python. Below you can see that it can be used to combine lists.

In [None]:
first_list = ["a", "b", "c"]
second_list = [1, 2, 3]

combined_list = first_list + second_list
print(combined_list)

**Exercise**: What will I get from the code below?

In [None]:
print(second_list + first_list)

**Exercise:** The code below will take a bit of text called `first_text` and combine it with another text called `second_text`. Afterwards it will then `print()` the combined text. However, the lines in the code are out of order. What would be the right order for the code?

```python
combined_text = first_part + second_part
first_part = "Hello, my name is "
print(combined_text)
second_part = "name-goes-here."
```

<details>
<summary>Stuck?</summary>

Remember that in programming, the lines of code are executed in a sequential manner from **top to bottom**. This means that you cannot perform any operations or manipulations with a variable <em>before</em> it has been created or defined in the code.

If you encounter an error stating that a specific variable is not defined, it indicates that the computer has been instructed to perform an operation or access a value that, from its perspective, does not exist at that point in the code.

</details>

In [None]:
# live coding goes here

# Extra: Comprehensions

Comprehensions are the "Pythonic" way of doing things with lists. You don't have to do it this way but you may find it interesting...

In [None]:
my_list = [i for i in range(5)]
print(my_list)

# Using Libraries

+ Code written by other developers
+ Good chance someone has tried to solve the same problem as you
+ Don't have to reinvent the wheel
+ Tested and optimised solutions for common problems

I found this `emoji` library after a quick search on Google. You can find out more about it [here](https://carpedm20.github.io/emoji/docs/).

In Python Notebooks you install a library with the command `%pip install a-helpful-library` but in the terminal/console it's just `pip install a-helpful-library`.

In [None]:
%pip install emoji
import emoji

Now we can use the `emojize` command that is given to us by the `emoji` library.

In [None]:
emojified_text = emoji.emojize("There is a :snake: in my boot!")
print(emojified_text)

# Making an Image Scraper

For this portion of the workshop we'll use Python to download some images from the website Unsplash. We will then use some libraries to add "glitchy" effects to them.

# Setting Up Libraries

First things first, we need to download some libraries in order to download images from the web.

+ `requests` - Is capable of retrieving webpages.
+ `BeautifulSoup` - Allows us to extract certain elements from webpages, such as images.

To install libraries we need to use the `pip` command. In Notebooks you need to use the `%` symbol but this is not necessary when using `pip` in the terminal.

In [None]:
%pip install requests
%pip install beautifulsoup4

import requests
from bs4 import BeautifulSoup

# Getting Stuff from the Web

[Unsplash](https://unsplash.com) is a website that provides a lot of free images. The code below is a **function** that is capable of downloading images from Unsplash.

Functions in Python are blocks of reusable code that are used to perform specific tasks. They are defined using the `def` keyword and allow you to break down complex tasks into smaller parts. Functions can take input values, called arguments, and return output values. Function help with making your code more modular, more resuable, and more maintainable.

In [None]:
def photo_downloader(image_theme):
    """Downloads photos from Unsplash based on a theme then converts them to the PIL Image format.

    Args:
        image_theme: The theme for the type of photo that should be downloaded.

    Returns:
        A list of photos in the bytes format.
    """
    # Create a url for unsplash based on our theme
    source_url = "https://unsplash.com/s/photos/" + image_theme

    # Download the website with the pictures
    response = requests.get(source_url, allow_redirects=True)

    # Tell BeautifulSoup to process the website that we have just downloaded as HTML
    data = BeautifulSoup(response.text, "html.parser")

    # Retrieve the chunks of the HTML that contain photos
    all_found_photos = data.find_all("figure", itemprop="image")

    # Retrieve the photo URLs
    photo_urls = [image.find("a", rel="nofollow") for image in all_found_photos]
    photo_urls = filter(None, photo_urls)

    # Download the photos - the requests library will do this in bytes format
    photo_bytes = [
        requests.get(photo_url["href"], allow_redirects=True).content
        for photo_url in photo_urls
    ]

    # Use a command called len to determine the length of the list - this is equivalent to how many photos were downloaded
    print(f"Downloaded {len(photo_bytes)} images.")

    # Return the list of photos
    return photo_bytes

You might notice that beneath the function _header_ I have placed a special type of comment called a **docstring**. Docstrings are longer comments that describe what a function does. They do not affect how a function behaves, but they make your code much easier to understand.

Now we can run or _call_ the function above by providing it with an argument for the `image_theme`. For this example I'm using the word "robots" but you're free to change it to whatever interests you. I am taking the list that is _returned_ by the function and saving it to a variable called `photo_bytes`.

In [None]:
theme = "robots"
photo_bytes = photo_downloader(theme)

Python has a command called `type()` that can be used to show the data type of a given variable. When I use this with the first item in the list I can see that it has the `bytes` data type.

In [None]:
print(type(photo_bytes[0]))

The bytes format can't be displayed easily and the other libraries that can do things with images don't know how to read it. To work around this, we need a function that convert from bytes to a different format.

Pillow or PIL is a a Python library that allows us to work with images. It provides a certain format for images called (drumroll) `Image`. To make a function that can convert data from bytes to `Image` we'll need some extra imports.

In [None]:
from PIL import Image
import io

# Functions

Here is another function that we'll use for converting the Unsplash data from bytes to the PIL `Image` format. Like before, I've added a docstring to show what the function is for.

In [None]:
def bytes_to_image(image_data):
    """Converts bytes to a PIL Image.

    Args:
        image_data: The photo in the bytes format.

    Returns:
        The converted PIL Image object.
    """
    return Image.open(io.BytesIO(image_data))

Now I can convert all the photos in the list using the new function and a comprehension.

In [None]:
downloaded_photos = [bytes_to_image(img_bytes) for img_bytes in photo_bytes]

The PIL `Image` can be displayed in a Python Notebook by giving its name or its location in a list.

In [None]:
downloaded_photos[0]

# Saving the Images

The `os` library can be used to create folders on your computer. As it is part of the Python Standard Library, it doesn't need to be installed. It's already included with a Python installation.

In [None]:
import os

In [None]:
output_folder_name = "./drive/MyDrive/python-workshop"
os.makedirs(output_folder_name, exist_ok=True)

In [None]:
# Pick a name for the folder in which the images will be saved
picture_folder_name = "scraper-pictures"
# Create a combined path name
scraper_pictures_path = os.path.join(output_folder_name, picture_folder_name)
# Create a new "scraper-pictures" folder
os.makedirs(scraper_pictures_path, exist_ok=True)

The `Image` library also has a built-in `.save()` command. This can be used to save the photos that were downloaded.

The function below will take the `Image` object, the `theme` that we chose earlier, the count (a number that we will give to each of the photos), and the folder name. Using this, it will create a filename in the form of `folder-name/theme-XX.jpg` where `XX` is the count in two digit form.

The thing about `os.path` is that it guarantees the path will work no matter what operating system you're using.

In [None]:
def photo_saver(image, theme, count, folder_name):
    """Saves a PIL Image to the disk.

    Args:
        image: The PIL Image to save.
        theme: The image theme.
        count: The image count.
        folder_name: The name of the folder that the image will be saved to.
    """
    # Create a filename for the image - using os.path helps ensure that things go well no matter what type of system you're using
    img_filename = os.path.join(folder_name, f"{theme}-{count:02d}.jpg")

    # Save the image using the filename we have created
    image.save(img_filename)

    # Print a message for assurance that something happened
    print(f"Saved {img_filename}")

Now I can use a loop to go through the images one by one and use the `photo_saver()` function on them. 

<details>
<summary>What's <code>enumerate</code> doing?</summary>

In Python `enumerate` is a special way of looping that lets me go through every item in a list, but also gives me a number corresponding with the item's location in the list.

Example:
```python
list_of_animals = ["chicken", "frog", "dolphin"]
for count, animal in enumerate(list_of_animals):
    print(count, animal)
```
That will give the following output:
```
0 chicken
1 frog
2 dolphin
```

</details>

In [None]:
# Go through the downloaded images one by one and save them into the folder that was just created
for count, img in enumerate(downloaded_photos):
    photo_saver(img, theme, count, scraper_pictures_path)

# Extra: Applying a Glitch Effect

![](glitched-image-example.jpg)

A library called `glitch-this` can be used for adding glitchy effects to images. Here is a [link](https://github.com/TotallyNotChase/glitch-this/wiki/Documentation:-The-glitch-this-library) to its documentation.

Like before it's installed using the command `%pip install glitch-this`.

In [None]:
%pip install glitch-this
from glitch_this import ImageGlitcher

The images are quite large, meaning it will take longer to apply any operations to them. To fix this, we can resize the images so that their width does not go above 800 pixels.

In [None]:
MAX_WIDTH = 800


def image_resize(image):
    """Resizes an image so that its width does not go beyond 800 pixels.

    Args:
        image: The image to resize.

    Returns:
        The resized image.
    """
    if image.size[0] <= MAX_WIDTH:
        return image
    factor = MAX_WIDTH / image.size[0]
    return image.resize((MAX_WIDTH, int(image.size[1] * factor)))


downloaded_photos = [image_resize(photo) for photo in downloaded_photos]

When the `ImageGlitcher` has been imported we can now use it to create (or _initialise_) a variable for glitching photos.

In [None]:
glitcher = ImageGlitcher()

Python comes with a library called `random` that can choose a random item from a list. Because it is included with Python there is no need to install it with `pip`.

In [None]:
import random

Now let's pick a random photo from our list of downloaded photos and look at it.

In [None]:
random_photo = random.choice(downloaded_photos)
random_photo

Now we can apply the glitch effect to the random photo and see what it looks like afterwards.

The documentation goes into more detail about what the different parameters for the `glitch_image` command are doing.

The inputs that we must give to this command are the `src_img` that we want to glitch, a `glitch_amount` float value, and something called `color_offset`. In the documentation it says that the `glitch_amount` may be any number from 0.1 to 10.0. Feel free to mess around with this value if you like.

The developers have said setting `color_offset` to `True` makes things look best, so I'll just take their word for it...

In [None]:
glitched_image = glitcher.glitch_image(
    src_img=random_photo, glitch_amount=3.5, color_offset=True
)
glitched_image

Now to make things more interesting we can warp the image even further by using another library called `pixelsort`.

In [None]:
%pip install pixelsort
from pixelsort import pixelsort

The [pixelsort documentation](https://github.com/satyarth/pixelsort) says a bit about what the different inputs do. I was pretty lost while looking it up, so I just messed around until I found some that I liked the most. Try changing these values if you are unsatisfied with the result and see if that makes things better.

<!-- Add something here about why it's good to eff around until something cool happens. -->

In [None]:
sort_image = pixelsort(
    glitched_image, sorting_function="intensity", interval_function="edges"
)

In [None]:
sort_image = sort_image.convert("RGB")
sort_image.save(os.join(output_folder_name, "glitched-image.jpg"))
sort_image

Now you should be able to copy this image and the scraper-pictures folder to `drive > MyDrive`.

# Recap

+ Python Fundamentals
+ Using Python to download things
+ Going from one type of data to another
+ Using several libraries together to create more interesting programs
+ Python as a tool for adding effects to photos

# FutureCoder

[FutureCoder](futurecoder.io) is a helpful and beginner-friendly Python programming course than can be run in your browser.

# Feedback

Please fill in the course feedback form.

https://moodle.arts.ac.uk/mod/feedback/view.php?id=951280

# Tips for Learning Programming

+ You absolutey _don't_ need to learn/memorise everything
+ Most people remember a handful of things they use the most and look up the rest
+ Making mistakes is normal - [even the pros do it](https://github.com/MrMEEE/bumblebee-Old-and-abbandoned/issues/123)

<!-- Maybe add multiple examples here. -->

# What's Next

+ GANs with Python
+ Version Control and good coding practices (WIP)