# Scraping Data with Pinterest

This notebook enables you, to create an image dataset from Pinterest.

The creation and curation of datasets is an important part in the work with AI. Datasets enables AI models to learn certain things.Through the creation of a dataset you can therefore determine what the AI model should learn.  
Through the creation and curation of an image dataset the designer is able to train an AI model that matches his visual imagination. That is why the creation and curation of image datasets is so important and interesting for designers.  
In the end, you could use the created dataset from this notebook to train or fine tune an AI model. More about this under "7. How to continue".

Pinterest is a very powerful visual search engine, which let you search through millions of images. It also enables user to create collection of images, which are called 'Boards'.  
Pinterest uses clever algorithms to find good visual results for a search term. That makes it perfect to create well curated image datasets.  
You can find out more about Pinterest through following [this link.](https://about.pinterest.com/)


Run the next cell, to initialize the notebook and install all necessary requirements.

## 1. Set requirements

In [1]:
# -----------
# IMPORTS
# -----------
import os
from IPython.display import display
import ipywidgets as widgets
import svelte_widget

# -----------
# UTILITIES
# -----------
if os.path.isdir('/home/jovyan/utilities/pinterest-crawler/') is False:
    !git clone https://github.com/Francesco-Sch/Pinterest-crawler-for-jupyter-lab /home/jovyan/utilities/pinterest-crawler

if os.path.isdir('/home/jovyan/utilities/pinterest-crawler/') is True:
    !pip install -r /home/jovyan/utilities/pinterest-crawler/requirements.txt



## 2. Enter your Pinterest login credentials

The Pinterest Scraper needs access to Pinterest. For that reason you have to add user credentials so that the scraper can login to Pinterest.
Do not worry, the user credentials will only be safed temporarily on that server.

Little hint: You could also create a new Pinterest account just for data collection purposes. ;)

CAUTION: Works only if 2FA is deactivated.

In [2]:
CrawlerLogin = svelte_widget.CrawlerLogin()
CrawlerLogin

CrawlerLogin()

## 3. Add Pinterest Links

With the following widgets you are able to set all the settings for the scraping process.  
First you have to add at least one Pinterest link. You can also add more if you want to scrape different links, to create a bigger image dataset. You can add any Pinterest link you want. The link could be the results of a search you made or a board you collected. 
Here are two example links, you could use: 

Link for the search term 'Typography':
> https://www.pinterest.de/search/pins/?q=typography&rs=typed&term_meta[]=typography%7Ctyped

Link to a Pinterest Board about Editoral Design:
> https://www.pinterest.de/loxdelux/cover-editorial-design/

After adding your Pinterest links, you can set an output folder for the dataset. You will find the folder in the `datasets` folder. Try to choose a name that represents the scraped dataset in some way.  
You can also set the maximum amount of images you want to scrape.

After you have set all the settings, you can continute with the next section.

In [5]:
CrawlerLinks = svelte_widget.CrawlerLinks()
CrawlerLinks

CrawlerLinks()

## 4. Validate your settings

Run the next cell to check if all your settings and inputs are correct.
If everything is correct, you can continue with the scraping process under “Start scraping”. If something is wrong you’ll get an error message which tells you what is missing and how to fix it.

In [6]:
Validation = svelte_widget.Validation()

try:
    CrawlerLogin.CrawlerLoginUserName
except NameError:
    Validation.ValidationStatus = "error"
    Validation.ValidationMessage = "The Username for Pinterest is not set. Execute the cell under '2. Enter your Pinterest login credentials' again."

    display(Validation)
    raise

try:
    CrawlerLogin.CrawlerLoginPassword
except NameError:
    Validation.ValidationStatus = "error"
    Validation.ValidationMessage = "The Password for Pinterest is not set. Execute the cell under '2. Enter your Pinterest login credentials' again."

    display(Validation)
    raise

try:
    CrawlerLinks.CrawlerLinks
except NameError:
    Validation.ValidationStatus = "error"
    Validation.ValidationMessage = "There are no links set, which can be scraped. Execute the cell under '3. Add Pinterest Links' again."

    display(Validation)
    raise

try:
    CrawlerLinks.CrawlerImagesAmount
except NameError:
    Validation.ValidationStatus = "error"
    Validation.ValidationMessage = "There is no amount of images set. Execute the cell under '3. Add Pinterest Links' again."

    display(Validation)
    raise

try:
    CrawlerLinks.CrawlerOutputFolder
except NameError:
    Validation.ValidationStatus = "error"
    Validation.ValidationMessage = "There is no output folder for the dataset defined. Execute the cell under '3. Add Pinterest Links' again."

    display(Validation)
    raise

Validation.ValidationStatus = "success"
Validation.ValidationMessage = "Everthing seems good. Continue with the next cell."

display(Validation)

Validation(ValidationMessage='Everthing seems good. Continue with the next cell.', ValidationStatus='success')

## 5. Start scraping

If the validation passed, you can click the “Start scraping” button to start the scraping process. A progress bar will indicate the progress in the scraping process.

While the scraping is running the results will appear in the gallery widget down below.

In [7]:
CrawlerInit = svelte_widget.CrawlerInit()
ScraperOutput = widgets.Output()

# Create directory, if it is not existing
if os.path.isdir('/home/jovyan/datasets/' + CrawlerLinks.CrawlerOutputFolder) is False:
    os.mkdir('/home/jovyan/datasets/' + CrawlerLinks.CrawlerOutputFolder)


def scraping(change):
    if(change.new is True):
        with ScraperOutput:
            !python ../utilities/pinterest-crawler/main.py -e '{CrawlerLogin.CrawlerLoginUserName}' -p '{CrawlerLogin.CrawlerLoginPassword}' -d '../datasets/{CrawlerLinks.CrawlerOutputFolder}' -l '{CrawlerLinks.CrawlerLinks[0]}' -g '250' -s '1024' -a '{CrawlerLinks.CrawlerImagesAmount}'


CrawlerInit.observe(scraping, names='CrawlerInitClick')

display(CrawlerInit, ScraperOutput)

CrawlerInit()

Output()

## 6. Explore the collected dataset

With the following widget, you get an overview over the image dataset you just scraped.  
Through clicking the plus and minus button on the top right corner of the widget, you can zoom in and out the dataset. That way you can have a very close and detailed look on the dataset or get an general overview over it.

In [8]:
CrawlerGallery = svelte_widget.CrawlerGallery()

CrawlerGallery.CrawlerGalleryFolder = 'datasets/brutalist-interface-design'
CrawlerGallery

CrawlerGallery(CrawlerGalleryFolder='datasets/brutalist-interface-design')

# 7.  How to continue

Well done, you created a dataset via Pinterest! :)  
You could use it directly to fine tune StyleGAN3 on your dataset, to create your first own AI model.  
You can find out more under this notebook:

- [Fine tune StyleGAN3](#)

Or if you want to get directly creative with AI you could use VQGAN-CLIP to create new images trough text prompts.  
You can find out more under this notebook:

- [Use CLIP-guided VQGAN to generate images](#)