# Scraping Data with Pinterest

This notebook enables you, to create an image dataset from Pinterest.

The creation and curation of datasets is an important part in the work with AI. Datasets enable AI models to learn certain things. Through the creation of a dataset, you can therefore determine what the AI model should learn.  
With the creation and curation of an image dataset, you as a designer are able to train an AI model that matches your visual imagination. That is why the creation and curation of image datasets is so important and interesting.  
In the end, you could use the created dataset from this notebook to train or fine tune an AI model. You can read more about this under "7. How to continue".

Pinterest is a very powerful visual search engine, which let you search through millions of images. It also enables the user to create a collection of images, which are called 'Boards'.  
Pinterest uses clever algorithms to find good visual results for a search term. That makes it perfect to create well curated image datasets.  
You can find out more about Pinterest [here.](https://about.pinterest.com/)

To collect the images from Pinterest, we are using a so called 'Crawler' or 'Scraper'. You can think of it, as a tiny little robot that collects data, in our case images, from a website for us. 
You just have to give him some instructions, which you are going to do in the progress of this notebook. ;)  
You can read more about web scraper [here.](https://www.geeksforgeeks.org/what-is-web-scraping-and-how-to-use-it/)

Let us continue with the next cell to get the scraping process started. Use, for that, the play button in the left, upper corner. 

## 1. Set requirements

First, you need to set some requirements and install some libraries.  
Click the play button to execute the next cell.

Be aware that it could take some time to set all requirements.

In [1]:
# -----------
# IMPORTS
# -----------
import os
from IPython.display import display, clear_output
import ipywidgets as widgets
import io_widget

# -----------
# SETUP WIDGET
# -----------
Setup = io_widget.Setup()
display(Setup)

# -----------
# VARIABLES
# -----------
HiddenOutput = widgets.Output()

# -----------
# UTILITIES
# -----------
with HiddenOutput:
    if os.path.isdir('/home/jovyan/utilities/pinterest-crawler/') is False:
        !git clone https://github.com/Francesco-Sch/Pinterest-crawler-for-jupyter-lab /home/jovyan/utilities/pinterest-crawler

    if os.path.isdir('/home/jovyan/utilities/pinterest-crawler/') is True:
        !pip install -r /home/jovyan/utilities/pinterest-crawler/requirements.txt

# Re-render widget after all processes are done
Setup.SetupProcessing = False
clear_output()
display(Setup)

Setup(SetupProcessing=False)

## 2. Enter your Pinterest login credentials

The Pinterest scraper needs access to Pinterest. For that reason, you have to add your user credentials so that the scraper can log in to Pinterest.
Do not worry, the user credentials will only be saved temporarily on this server.

Little hint: You could also create a new Pinterest account just for data collection purposes. ;)

CAUTION: Works only if 2FA is deactivated.

In [2]:
CrawlerLogin = io_widget.CrawlerLogin()
CrawlerLogin

CrawlerLogin()

## 3. Add Pinterest Links

With the following widgets, you are able to set all the settings for the scraping process.  
First, you have to add at least one Pinterest link. You can also add more if you want to scrape different links, to create a bigger image dataset. You can add any Pinterest link you want. The link could be the results of a search you made or a board you collected. 
Here are two example links, you could use: 

Link for the search term 'Typography':
> https://www.pinterest.de/search/pins/?q=typography&rs=typed&term_meta[]=typography%7Ctyped

Link to a Pinterest Board about Editorial Design:
> https://www.pinterest.de/loxdelux/cover-editorial-design/

After adding your Pinterest links, you can set an output folder for the dataset. You will find the folder in the `datasets` folder. Try to choose a name that represents the scraped dataset in some way.  
You can also set the maximum amount of images you want to scrape.

```
CAUTION: Right now it is only possible to scrape one link at a time. Batch crawling of multiple links is still under development.
```

In [3]:
CrawlerLinks = io_widget.CrawlerLinks()
CrawlerLinks

CrawlerLinks()

## 4. Validate your settings

Run the next cell to check if all your settings and inputs are correct.
If everything is correct, you can continue with the next cell. If something is wrong, you will get an error message which tells you what is missing and how to fix it.

In [4]:
Validation = io_widget.Validation()

try:
    CrawlerLogin.CrawlerLoginUserName
except NameError:
    Validation.ValidationStatus = "error"
    Validation.ValidationMessage = "The Username for Pinterest is not set. Execute the cell under '2. Enter your Pinterest login credentials' again."

    display(Validation)
    raise

try:
    CrawlerLogin.CrawlerLoginPassword
except NameError:
    Validation.ValidationStatus = "error"
    Validation.ValidationMessage = "The Password for Pinterest is not set. Execute the cell under '2. Enter your Pinterest login credentials' again."

    display(Validation)
    raise

try:
    CrawlerLinks.CrawlerLinks
except NameError:
    Validation.ValidationStatus = "error"
    Validation.ValidationMessage = "There are no links set, which can be scraped. Execute the cell under '3. Add Pinterest Links' again."

    display(Validation)
    raise

try:
    CrawlerLinks.CrawlerImagesAmount
except NameError:
    Validation.ValidationStatus = "error"
    Validation.ValidationMessage = "There is no amount of images set. Execute the cell under '3. Add Pinterest Links' again."

    display(Validation)
    raise

try:
    CrawlerLinks.CrawlerOutputFolder
except NameError:
    Validation.ValidationStatus = "error"
    Validation.ValidationMessage = "There is no output folder for the dataset defined. Execute the cell under '3. Add Pinterest Links' again."

    display(Validation)
    raise

Validation.ValidationStatus = "success"
Validation.ValidationMessage = "Everthing seems good. Continue with the next cell."

display(Validation)

Validation(ValidationMessage='Everthing seems good. Continue with the next cell.', ValidationStatus='success')

## 5. Start scraping

Now that all settings are set, you are ready to start the scraping process! :D 

Execute the next cell and press the button "Start scraping".  
You will see a success message as soon as the scraper is done.

In [5]:
CrawlerInit = io_widget.CrawlerInit()
ScraperOutput = widgets.Output()

with ScraperOutput:
    display(CrawlerInit)

# Create directory, if it is not existing
if os.path.isdir('/home/jovyan/datasets/') is False:
    os.mkdir('/home/jovyan/datasets/')

if os.path.isdir('/home/jovyan/datasets/' + CrawlerLinks.CrawlerOutputFolder) is False:
    os.mkdir('/home/jovyan/datasets/' + CrawlerLinks.CrawlerOutputFolder)


def scraping(change):
    if(change.new is True):
        with ScraperOutput:
            # Re-render widget to show process running
            CrawlerInit.CrawlerInitRunning = True
            clear_output()
            display(CrawlerInit)
        with HiddenOutput:
            !python ../utilities/pinterest-crawler/main.py -e '{CrawlerLogin.CrawlerLoginUserName}' -p '{CrawlerLogin.CrawlerLoginPassword}' -d '../datasets/{CrawlerLinks.CrawlerOutputFolder}' -l '{CrawlerLinks.CrawlerLinks[0]}' -g '250' -s '1024' -a '{CrawlerLinks.CrawlerImagesAmount}'

        with ScraperOutput:
            CrawlerInit.CrawlerInitRunning = False
            CrawlerInit.CrawlerInitFinished = True
            clear_output()
            display(CrawlerInit)


CrawlerInit.observe(scraping, names='CrawlerInitClick')

display(ScraperOutput)

Output()

## 6. Explore the collected dataset

With the following widget, you get an overview over the image dataset you just scraped.

Through clicking the plus and minus button in the top, right corner of the widget, you can zoom in and out the dataset.  
That way you can have a very close and detailed look at the dataset or get a general overview over it.

In [7]:
CrawlerGallery = io_widget.CrawlerGallery()

CrawlerGallery.CrawlerGalleryFolder = f'datasets/{CrawlerLinks.CrawlerOutputFolder}'
CrawlerGallery

CrawlerGallery(CrawlerGalleryFolder='datasets/pinterest-editorial')

# 7.  How to continue

Well done, you created a dataset via Pinterest! :)  
You could use it directly to fine tune StyleGAN3 on your dataset, to create your first own AI model.  
You can find out more under this notebook:

- [Fine tune StyleGAN3](#)

Or, if you want to get directly creative with AI, you could use VQGAN-CLIP to create new images through text prompts.  
You can find out more under this notebook:

- [Use CLIP-guided VQGAN to generate images](http://localhost:8888/lab/workspaces/VQGAN/tree/notebooks/Use-CLIP-guided-VQGAN-to-generate-images.ipynb)

If you want to use this notebook again, click on the reload button in the left, top corner of this editor window. After you have pressed the reload button, you can start at the top of this document and execute it cell by cell.