# Introduction
## Description


# Web Scraping
## Description

The web scraping part consists of two different scraping scripts that build up on the selenium framework. The selenium framework simulates a virtual browser which allows surpasssing blocking mechanisms provided by most of the websites and search engines.

The first scraping script relies on the search results from the web and the wikipedia published SVG logos. The base for the searching builds a csv file which contains different company names for which the associated svg logos are searched from.

The second scraping script relies on the [worldvector website](https://worldvectorlogo.com) which collects different svg logos associated to a company. The scraping script downloads the SVG logos alphabetically from the website.

## Web Scraping Wikipedia
### Prerequesite
1. Download Chromedriver from following website matching the installed Chrome Version from [here](https://chromedriver.chromium.org/downloads)
2. Download company name dataset from [Kaggle](https://www.kaggle.com/datasets/peopledatalabssf/free-7-million-company-dataset)
### Input
- datasetPath: Path to kaggle dataset in format dir/.../file
- webdriverPath: Path to Chromedriver in format dir/.../file
- logPath: Path to desired logging file in format dir/.../file
- destPath: Path to Folder where SVG Files should be stored in format dir/.../dir

### Output
- SVG File of scraped logo stored in specified destPath

In [None]:
from WebScraping.SVGWebScraper import wikipediaScraper
wikipediaScrape = wikipediaScraper('Data/companies_sorted.csv', 'WebScraping/chromedriver',
                                    'WebScraping/ErrorLog.txt', 'Data/SVGLogo')
wikipediaScrape.scraper()


## Web Scraping Worldvector
### Prerequisite
1. Download Chromedriver from following website matching the installed Chrome Version from [here](https://chromedriver.chromium.org/downloads)
### Input
- webdriverPath: Path to Chromedriver in format dir/.../file
- destPath: Path to Folder where SVG Files should be stored in format dir/.../dir
### Output
- SVG File of scraped logo stored in specified destPath

In [None]:
from WebScraping.WorldvectorScraper import worldvectorScraper
worldvectorScrape = worldvectorScraper(
    'WebScraping/chromedriver', 'Data/Worldvector')
worldvectorScrape.scraper()

# Labelling

## Description
The labelling part consists in total of three different labelling strategies:
1. Rule-based labelling
2. Network-based labelling
3. MLGCN

## Rule-based labelling
### Input
- pngFolderPath: Folder Path consisting of png files in format dir/../dir
- textAreaThreshold: Threshold to determine when label text is assigned to image(**default**: 0.55)
- destPath: Path where label will be stored in pickle file in format dir/../file
### Output
- Pickle file where rule-based labelling is stored

In [None]:
from Labelling.RuleBasedLogoDetection import ruleLogoLabelling
ruleLogo = ruleLogoLabelling('Data/PNGFolder', 0.55, 'Labelling/RuleBasedLogo.pkl')
ruleLogo.ruleLabelling()

## Google Vision API
### Input
- credentialsPath: Path to JSON credentials file in format dir/.../file
- imagePath: Path to folder where images are located in format dir/.../dir
- destPath: Path where generated Labels are stored
### Output
- CSV File containing the labels from the Google Cloud Vision API

In [None]:
from Labelling.GoogleVisionAPI import visionApi
credentialsPath = ''
imagePath = ''
destPath = ''
googleVision = visionApi(credentialsPath, imagePath, destPath)
googleVision.api()

## Network-based labelling
### Input
- inputPath: Path to image label dataset in format dir/.../file
- graphConstruction: Boolean value if graph should be constructed (default must be **True**)
- graphSimplify: Boolean value if graph should be simplified
- graphClustering: Boolean value if graph should be clustered
- graphVisualize: Boolean value if graph should be clustered
### Output
- Adjacency Matrix of graph in format pickle
- Majority Vote CSV file from results of clustering
- Weighted Majority Vote CSV file from results of clustering
- Graph Visualization in format HTML

In [None]:
from Labelling.NetworkLabels import networkLabelling
networkLabelling('FeatureCreation/LLD_GoogleLabels.csv',
                 True, True, True, True, 'Labelling/logo_adj.pkl', False)

## MLGCN
### Prequesite
1. Run networkLabelling method to create adjacency matrix beforehand (must match all considered labels, therefore consider not to simplify the constructed graph)
2. Run word embedding method to transform used labels into vectorspace
### Input
- mlgcnSettings:
    - data: Folder where image data is located in format dir/.../dir
    - image_size: Size of image given as an Integer
    - workers: number of workers used for dataloader 
    - epochs: Number of epochs to train
    - epoch_step: Number of steps within epoch
    - device_ids: GPU id on which the MLGCN model must be trained on
    - lr: Learning rate
    - lrp: Learning rate p
    - momentum:
    - weight_decay: Weight decay for the model
    - print-freq: Frequency for log printing
    - resume: Model Checkpoint in format dir/.../file
    - evaluate: If the model needs to be put in evaluate format in format Boolean
- dataPath: Dataset with extracted Google Vision API Labels in csv Format with path format dir/.../file
- labelEmbeddingPath: Label embedding created from the label embedding method in pkl format
- adjPath: Path to the network adjacency matrix created from the networkLabelling method
- checkpointPath: Path to the logging folder in format dir/.../dir
### Output
- MLGCN Labelling file consisting of assigned cluster number in csv format

In [None]:
from Labelling.MLGCN.train import mlgcnTrain
from Labelling.MLGCN.vectorCreation import mlgcnVector
trainMLGCN = False
mlgcnSettings = {
    'data': 'LogoData/',
    'image_size': 448,
    'workers': 2,
    'epochs': 20,
    'epoch_step': 30,
    'device_ids': 0,
    'start_epoch': 0,
    'batch_size': 8,
    'lr': 0.1,
    'lrp': 0.1,
    'momentum': 0.9,
    'weight_decay': 0.0001,
    'print-freq': 10,
    'resume': None,
    'evaluate': '0'
    }
if trainMLGCN:
    mlgcnTrain(mlgcnSettings, 'FeatureCreation/LLD_GoogleLabels.csv',
               'FeatureCreation/logo_label_embedding.pkl', 'FeatureCreation/logo_adj.pkl', 'Labelling/MLGCN/Checkpoint')
    mlgcnVector(mlgcnSettings, 'Labelling/MLGCN/Checkpoint/model_best.pth.tar', 'FeatureCreation/LLD_GoogleLabels.csv',
                'FeatureCreation/logo_adj.pkl', 'Labelling/MLGCN/MLGCNLabelling.csv')
else:
    mlgcnVector(mlgcnSettings, 'Labelling/MLGCN/Checkpoint/model_best.pth.tar', 'FeatureCreation/LLD_GoogleLabels.csv',
                'FeatureCreation/logo_adj.pkl', 'Labelling/MLGCN/MLGCNLabelling.csv')

## Final Model
### Description
The following cell describes the final model which produces the results from the 
### Input
- imageType: Describes the type of image used to output for the demonstration

In [7]:
from ImageCombination.saliencyCombination import saliencyCombine
from PNG_SVG_Conversion.PNGtoSVG import PNGtoSVGConv

imageType = 'merge'
if imageType == 'font':
    !python Model/Font_svg/fontmerge.py University 0000
elif imageType == 'icon':
    # generate image
    !python Model/GANs/StyleGAN2/stylegan2_ada_pytorch/generate.py - -class = 1 - -network = Model/GANs/StyleGAN2/network-snapshot-003400.pkl - -outdir = Model/GANs/StyleGAN2/results/Martin - -trunc = 1 - -seeds = 30
    # transform to svg
    PNGtoSVGConv('ImageCombination/TestResult/Test.png', 'ImageCombination/TestSVG/TestRes.png',
                 'ImageCombination/TestSVG/quant.png', 'ImageCombination/TestSVG/svg.svg', '/Users/martinbockling/.cargo/bin/vtracer')

elif imageType == 'merge':
    # generate font
    !python Model/Font_svg/fontmerge.py CNN 1420 png
    # generate image
    # !python Model/GANs/StyleGAN2/stylegan2_ada_pytorch/generate.py --class=1 --network=Model/GANs/StyleGAN2/network-snapshot-003400.pkl --outdir=Model/GANs/StyleGAN2/results/Martin --trunc=1 --seeds=30
    # merge image
    saliencyCombine(
        'Model/GANs/generated_img/StyleGAN2/imagesMV/0/seed0104.png', 'CNN1420.png', 'ImageCombination/TestResult/Test.png', 'above', False)
    # convert to svg
    PNGtoSVGConv('ImageCombination/TestResult/Test.png', 'ImageCombination/TestSVG/TestRes.png',
                 'ImageCombination/TestSVG/quant.png', 'ImageCombination/TestSVG/svg.svg', '/Users/martinbockling/.cargo/bin/vtracer')
