<a href="https://colab.research.google.com/github/Leo-2017/DL101/blob/main/Pattern_Player.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Pattern Player 🎹

## **1. Introduction**

## a. What business problem are you solving?

A text generator for non-design professionals to design wallpaper for their homes by themselves.

## b. How can AI/ML technology help to solve that?

The AI/ML technology can help to slove it by using the GPT-2 model (pretrained on a dataset of 8 million web pages) as a foundation, and using dataset from Gutenberg Project to fine-tune the model using Text Generation to make it more applicable for the use case.

## c. What is your AI project's objective?

The objective is to build a language model as a pattern design generator that helps user to predict the probability of what kind of wallpaper user would like to have.

## d. Who are the users of your AI application and how will they interact with it? 

The user of the AI application will be the non-design professionals who have a needs in design & decorating the wallpaper at their e.g. homes by themselves; 
<br>
<br>
User can simply input the **keywords** or **short description** such as **"How to decorate the wall in living room with Botanical pattern in william morris style?"** The application will predict the output as a suggestion based on the dataset;

## **2. Benchmark Study**

**Similar existing AI solutions** <br>

Below are the pattern generator I found online; for most of them the way how it works is to: <br>
A. require user to upload an asset (image/illustration, etc.), the application will generator a pattern based on it; or <br>
B. Let user pre-select the pattern from their dataset and then upload their homeliving photo to see the model; 

**Picsart** <br>
https://picsart.com/pattern-generator <br>

**Patternico** <br>
https://patternico.com/

**Wowpatterns** <br>
https://www.wowpatterns.com/pattern-generator

**Laurentvw** <br>
https://www.laurentvw.com/tools/seamless-pattern-generator/

**Designhill** <br>
https://www.designhill.com/tools/wallpaper-maker

**Patterncooler** <br>
https://www.patterncooler.com/

**Astreetprints** <br>
https://www.astreetprints.com/room-previewer

---
All the applications I could find above is not exactly what I'm looking for; <br>
Below is the usecase which is very similar with what I would like to build for my model:

**Delle-mini** <br> 
https://huggingface.co/spaces/dalle-mini/dalle-mini <br>
https://www.craiyon.com/ <br>
<img src="https://drive.google.com/file/d/1Sy2Zzs8vZOjtu3WeWhsC7B-V5r6cHm4V/view?usp=sharing" alt="dallemini" width="300"/> <br>
**Imagen** <br>
https://imagen.research.google/ <br>

---
Due to the limited time and my limited knowledges for the MVP I will only build a text generator;

# **3. Data**

## a. Libraries & Dependencies needed

In [None]:
!pip install -Uqq unpackai
# !pip install -q git+https://github.com/unpackai/unpackai
!pip install -Uqq transformers==4.10.2
!pip install -q datasets transformers[sentencepiece]

[K     |████████████████████████████████| 73 kB 2.7 MB/s 
[K     |████████████████████████████████| 42 kB 1.3 MB/s 
[K     |████████████████████████████████| 976 kB 37.0 MB/s 
[K     |████████████████████████████████| 2.8 MB 12.2 MB/s 
[K     |████████████████████████████████| 596 kB 43.6 MB/s 
[K     |████████████████████████████████| 880 kB 49.5 MB/s 
[K     |████████████████████████████████| 3.3 MB 57.3 MB/s 
[K     |████████████████████████████████| 101 kB 11.6 MB/s 
[?25h  Building wheel for sacremoses (setup.py) ... [?25l[?25hdone
[K     |████████████████████████████████| 362 kB 16.9 MB/s 
[K     |████████████████████████████████| 1.1 MB 63.0 MB/s 
[K     |████████████████████████████████| 140 kB 56.7 MB/s 
[K     |████████████████████████████████| 212 kB 74.3 MB/s 
[K     |████████████████████████████████| 127 kB 52.8 MB/s 
[K     |████████████████████████████████| 271 kB 47.6 MB/s 
[K     |████████████████████████████████| 144 kB 62.7 MB/s 
[K     |██████████

In [None]:
import torch
import numpy as np
from transformers import (
    AutoTokenizer,
    AutoModel,
    pipeline,
    set_seed,
    Trainer,
    TextDataset,
    DataCollatorForLanguageModeling,
    TrainingArguments,
    AutoModelForSequenceClassification,
)
from unpackai.nlp import Textual, InterpEmbeddingsTokenizer
from ipywidgets import interact
import logging
from fastai import *

## 2. Build an Language Model

### a. What's your data? Is it relevant to AI project you are intending to build? Is it inclusive, complete and unbiased?


#### The dataset I'm using for this model training is collected from [Gutenberg Project](https://gutenberg.org/); a source which is a collection of over 60,000 free eBooks.
**Dataset:** 
<br>
1. [Arts and Crafts Essays by members of Arts and Crafts Exhibition Society](https://gutenberg.org/ebooks/36250) Subject: Decorative arts <br>
2. [Principles of Decorative Design by Christopher Dresser](https://gutenberg.org/ebooks/39749) Subject: Decorative arts Decoration and ornament -- Victorian style<br>
3. [The Botanical Magazine, Vol. 02 by William Curtis](https://gutenberg.org/ebooks/17531) Subject: Plants, Cultivated/Ornamental/ Botanical illustration/Periodicals <br>
4. [The Principles of Ornament by James Ward](https://gutenberg.org/ebooks/60034) Subject: Decoration and ornament <br>
5. [The Botanical Magazine, Vol. 08 by William Curtis](https://gutenberg.org/ebooks/24670) Subject: Plants, Cultivated/Ornamental/ Botanical illustration/Periodicals
6. [Evolution in Art: As Illustrated by the Life-histories of Designs by Haddon](https://gutenberg.org/ebooks/46079) Subject: Decoration and ornament, Primitive <br>
7. [Hardy Ornamental Flowering Trees and Shrubs by Angus D. Webster](https://gutenberg.org/ebooks/10852) Subject: Flowering shrubs <br>
8. [Origin and Development of Form and Ornament in Ceramic Art. by William Henry Holmes](https://gutenberg.org/ebooks/19953) Subject: 	Decoration and ornament -- History, Indian art -- North America <br>
9.  [Talks About Flowers. by Mrs. M. D. Wellcome](https://gutenberg.org/ebooks/40534) Subject: Flowers <br>
10.  [Historic Ornament, Vol. 1 (of 2) by James Ward](https://gutenberg.org/ebooks/59746) Subject: Decorative arts -- History<br>
11.  [Historic Ornament, Vol. 2 (of 2) by James Ward](https://gutenberg.org/ebooks/59971) Subject: Decorative arts -- History<br>
---

#### Most of the books collected are relevant to AI project I'm intending to build; As if it's inclusive, complete and unbiased, I cannot genature as many books I've found online are not Free of use; Therefore I have included all the books/dataset I could find from  Gutenberg;

### b. Collect and construct your dataset

In order to collect and design my dataset I'm using `Textual` scraping tool below.

**Collect text from an URL.**


Below are all the URL I've collected from [Gutenberg Project] https://gutenberg.org/; a source which is a collection of over 60,000 free eBooks. 


In [None]:
ArtsandCraftsEssays = Textual.from_url("https://gutenberg.org/cache/epub/36250/pg36250.txt")
ArtsandCraftsEssays

Text (393777 chars), textual(),
    train_path, val_path = textual.create_train_val()

In [None]:
PrinciplesofDecorativeDesignbyChristopherDresser = Textual.from_url("https://gutenberg.org/cache/epub/39749/pg39749.txt")
PrinciplesofDecorativeDesignbyChristopherDresser

Text (414489 chars), textual(),
    train_path, val_path = textual.create_train_val()

In [None]:
TheBotanicalMagazineVol02byWilliamCurtis = Textual.from_url("https://gutenberg.org/cache/epub/17531/pg17531.txt")
TheBotanicalMagazineVol02byWilliamCurtis

Text (80332 chars), textual(),
    train_path, val_path = textual.create_train_val()

In [None]:
ThePrinciplesofOrnamentbyJamesWard = Textual.from_url("https://gutenberg.org/files/60034/60034-0.txt")
ThePrinciplesofOrnamentbyJamesWard

Text (234063 chars), textual(),
    train_path, val_path = textual.create_train_val()

In [None]:
TheBotanicalMagazineVol08byWilliamCurtis = Textual.from_url("https://gutenberg.org/cache/epub/24670/pg24670.txt")
TheBotanicalMagazineVol08byWilliamCurtis

Text (103125 chars), textual(),
    train_path, val_path = textual.create_train_val()

In [None]:
EvolutionInArt = Textual.from_url("https://gutenberg.org/files/46079/46079-0.txt")
EvolutionInArt

Text (667372 chars), textual(),
    train_path, val_path = textual.create_train_val()

In [None]:
HardyOrnamentalFloweringTreesandShrubsbyAngusDWebster = Textual.from_url("https://gutenberg.org/cache/epub/10852/pg10852.txt")
HardyOrnamentalFloweringTreesandShrubsbyAngusDWebster

Text (370515 chars), textual(),
    train_path, val_path = textual.create_train_val()

In [None]:
OriginandDevelopmentofFormandOrnamentinCeramicArtbyWilliamHenryHolmes = Textual.from_url("https://gutenberg.org/cache/epub/19953/pg19953.txt")
OriginandDevelopmentofFormandOrnamentinCeramicArtbyWilliamHenryHolmes

Text (72466 chars), textual(),
    train_path, val_path = textual.create_train_val()

In [None]:
TalksAboutFlowersbyMrsMDWellcome = Textual.from_url("https://gutenberg.org/cache/epub/40534/pg40534.txt")
TalksAboutFlowersbyMrsMDWellcome

Text (305686 chars), textual(),
    train_path, val_path = textual.create_train_val()

In [None]:
HistoricOrnamentVol1of2byJamesWard = Textual.from_url("https://gutenberg.org/files/59746/59746-0.txt")
HistoricOrnamentVol1of2byJamesWard

Text (439720 chars), textual(),
    train_path, val_path = textual.create_train_val()

In [None]:
HistoricOrnamentVol2of2byJamesWard = Textual.from_url("https://gutenberg.org/files/59971/59971-0.txt")
HistoricOrnamentVol2of2byJamesWard

Text (522737 chars), textual(),
    train_path, val_path = textual.create_train_val()

### c. Is the size of your dataset large enough to build the baseline model to validate the usefulness and accuracy of the model predictions?


I have collected 11 books to build the baseline model; 


### **4. Model Training**

We skip the Data Transformation as the Tokenization and Numericalisation happens during the Model Training.

In [None]:
pretrained_model = pipeline("text-generation", model='gpt2')

Downloading:   0%|          | 0.00/665 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/548M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

### **5. Interpret the model and generate text.**

In [None]:
arguments = TrainingArguments(
    output_dir="./write_style",
    overwrite_output_dir=True,  
    num_train_epochs=3,
    per_device_train_batch_size=24,
    per_device_eval_batch_size=64,
    )

In [None]:
textual= ArtsandCraftsEssays.text+PrinciplesofDecorativeDesignbyChristopherDresser.text+TheBotanicalMagazineVol02byWilliamCurtis.text+ThePrinciplesofOrnamentbyJamesWard.text+TheBotanicalMagazineVol08byWilliamCurtis.text+EvolutionInArt.text+HardyOrnamentalFloweringTreesandShrubsbyAngusDWebster.text+OriginandDevelopmentofFormandOrnamentinCeramicArtbyWilliamHenryHolmes.text+TalksAboutFlowersbyMrsMDWellcome.text+HistoricOrnamentVol1of2byJamesWard.text+HistoricOrnamentVol2of2byJamesWard.text

In [None]:
str1="ArtsandCraftsEssays.text"
str2="PrinciplesofDecorativeDesignbyChristopherDresser.text"
str3="TheBotanicalMagazineVol02byWilliamCurtis.text"
str4="ThePrinciplesofOrnamentbyJamesWard.text"
str5="TheBotanicalMagazineVol08byWilliamCurtis.text"
str6="EvolutionInArt.text"
str7="HardyOrnamentalFloweringTreesandShrubsbyAngusDWebster.text"
str8="OriginandDevelopmentofFormandOrnamentinCeramicArtbyWilliamHenryHolmes.text"
str9="TalksAboutFlowersbyMrsMDWellcome.text"
str10="HistoricOrnamentVol1of2byJamesWard.text’"
str11="HistoricOrnamentVol2of2byJamesWard.text"
File_object.writelines(L) for L = [str1, str2, str3, str4, str5, str6, str7, str8, str9, str10, str11] 

# Opening a file
file1 = open("ArtsandCraftsEssays.text", "PrinciplesofDecorativeDesignbyChristopherDresser.text", "TheBotanicalMagazineVol02byWilliamCurtis.text", "ThePrinciplesofOrnamentbyJamesWard.text", "TheBotanicalMagazineVol08byWilliamCurtis.text", "EvolutionInArt.text", "HardyOrnamentalFloweringTreesandShrubsbyAngusDWebster.text", "OriginandDevelopmentofFormandOrnamentinCeramicArtbyWilliamHenryHolmes.text", "TalksAboutFlowersbyMrsMDWellcome.text", "HistoricOrnamentVol1of2byJamesWard.text", "HistoricOrnamentVol2of2byJamesWard.text"
)
L = textual
  
# Writing multiple strings
# at a time
file1.writelines(L)

Traceback [1;36m(most recent call last)[0m:
[1;36m  File [1;32m"/usr/local/lib/python3.7/dist-packages/IPython/core/compilerop.py"[1;36m, line [1;32m100[1;36m, in [1;35mast_parse[1;36m[0m
[1;33m    return compile(source, filename, symbol, self.flags | PyCF_ONLY_AST, 1)[0m
[1;36m  File [1;32m"<ipython-input-24-a75a2dc12757>"[1;36m, line [1;32m12[0m
[1;33m    File_object.writelines(L) for L = [str1, str2, str3, str4, str5, str6, str7, str8, str9, str10, str11][0m
[1;37m                                ^[0m
[1;31mSyntaxError[0m[1;31m:[0m invalid syntax



In [None]:
trainer = Textual.get_hf_trainer(
    model=pretrained_model.model,
    tokenizer = pretrained_model.tokenizer,
    arguments = arguments
    )

Traceback [1;36m(most recent call last)[0m:
[1;36m  File [1;32m"<ipython-input-23-19b5967f9f81>"[1;36m, line [1;32m1[1;36m, in [1;35m<module>[1;36m[0m
[1;33m    trainer = Textual.get_hf_trainer([0m
[1;31mAttributeError[0m[1;31m:[0m 'str' object has no attribute 'get_hf_trainer'



In [None]:
trainer.train()

Traceback [1;36m(most recent call last)[0m:
[1;36m  File [1;32m"<ipython-input-22-3435b262f1ae>"[1;36m, line [1;32m1[1;36m, in [1;35m<module>[1;36m[0m
[1;33m    trainer.train()[0m
[1;31mNameError[0m[1;31m:[0m name 'trainer' is not defined



In [None]:
trainer.save_model()

In [None]:
style_writer = pipeline('text-generation',
                        model='./write_style',
                        tokenizer=pretrained_model.tokenizer
                        )

In [None]:
style_writer("How to design botanical pattern?",
            max_length=150,
            num_return_sequences=1)

Using pad_token, but it is not set yet.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'How to design botanical pattern? This is the first book that directly addresses this subject.  This introduction provides a more technical, more elegant explanation than the usual work of the botanists, and, with its many references, is perhaps best produced in a form which will be freely available to the general public.  While the general method of the manufacture of flowers may be described in more detail elsewhere, in detail a description of how one plants leaves flowers with respect to their parent or object must still be given.  Although many useful tips and exercises are found in the preceding volumes, they should not be neglected if a suitable system are to be developed.  This book is being written for general purposes, not suited for those of ornamental design,'}]

In [None]:
style_writer("How to design pattern in william morris style?",
            max_length=150,
            num_return_sequences=1)

Using pad_token, but it is not set yet.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': "How to design pattern in william morris style? Flowers, with white heads and yellow-white berries, are produced from December till April in these beautiful forms. Each shoots four feet high, and grows from about one foot deep. They are very valuable early-growing varieties, and produced not only at our garden plants, but as also being of great value in the making of decorative shrubs, and in the growing centre of the walls of houses.  WILLIAM HUGBETTS (_syn Japonica_).--Common Willingham's Cabbage. North America. 2 vols., London. 416 or 417 c.  WILLIAM HUGBETTS (_syn Japonica_).--Japanese Green"}]

In [None]:
style_writer("How to decorate the wall in living room with Botanical pattern in william morris style?",
            max_length=150,
            num_return_sequences=1)

Using pad_token, but it is not set yet.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'How to decorate the wall in living room with Botanical pattern in william morris style?  The only practical way to do this is by hanging it onto a wall. The larger the wall, the more interesting the flowers will become.  For the younger generations, these flowers should be planted to the inside of a window sill or hanging box.  The decorative flowers are arranged in clusters. The leaves of the flower-bearing flowers are produced in clusters.  Flowers that are larger are usually well-known shrubs like The Botanical Nursery of Wales, or the Almond Tree of the Isles, where they are produced in clusters.   A large number of ornamental flowers usually fall into the branches of this family, but not'}]

In [None]:
style_writer("How to decorate the wall in the corridor with pattern in William Morris’s style?",
             max_length=100,
             num_return_sequences=1)

### **6. Model's input and output**

Your ML model's output comes back to what machine learning task you initially selected. Supervised machine learning tasks require labeled datasets (x and y) for the ML model to learn how to map your data (x) to the desired output/label (y).
Please describe your data labels (you can also insert the actual example of the label) and discuss if the model output satisfies the project's objective. 

What data type and format (e.g. jpeg images, csv file, pdf file) will you utilize for data pre-processing before you feed them for model training? <br>
The data type and format I'm using for the pre-processing is .txt file; <br>
x="How to design botanical pattern?" <br>
x="How to design pattern in william morris style?" <br>
x="How to decorate the wall in living room with Botanical pattern in william morris style?" <br>


What's your data label? What do you want to predict? <br>
I want to predict that the trained text geneator can answer user's questions or predict what they're looking for by user's inputed keywords; <br>
y= "max_length=150"

### 7. What metrics explain your model performance?

First think in terms of "Business Metrics" or business/practical objectives, not technical metrics. Then apply this thinking to figuring out one or a combination of ML evaluation metrics and human readable metrics that match the business objectives. <br>
For example, for an image classification model to evaluate its prediction results you could use error rate or accuracy, while confusion matrix can be better in explaining how model performs on a particular class. <br>
To learn about the metrics for NLP-related problems, please check: Hugging Face <br>
For general classification and regression problems, please check: neptune.ai's blog <br>

1. What are non-ML baselines you can consider as reference for a model performance?

2. What are your model evaluation metrics and why?

*   Syntactically acctuate
*   Formal grammar accruate
*   Personal perception

3. Is the metric you selected intuitive enough to communicate the results of the model training to non-ML guys? <br>
I'm not sure as I think the result so far is not bad, but it can be better with more data;

4. Can your metric explain model's biases? <br>
It defintelly can, as the data I've been feeding only covers certain styles of the design; more comprehensive data is needed to make the output more neutral and no biases;
