# Problem

- To predict what kind of contraceptive measurements are taken 
- We shall use neural network to solve this classification problem.

# 1)- Import Key Modules

In [1]:
# support both Python 2 and Python 3 with minimal overhead.
from __future__ import absolute_import, division, print_function

# I am an engineer. I care only about error not warning. So, let's be maverick and ignore warnings.
import warnings
warnings.filterwarnings('ignore')

In [2]:
##tensorflow >2.0
from tensorflow.keras.preprocessing.text import one_hot
from tensorflow.keras.layers import Embedding
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
import numpy as np

# 2-Example Data

In [3]:
sent=[  'the glass of milk',
     'the glass of juice',
     'the cup of tea',
    'I am a good boy',
     'I am a good developer',
     'understand the meaning of words',
     'your videos are good',]

# 3-Vectorization

In [4]:
### Vocabulary size
voc_size=10000

### 3.a.One Hot Representation

In [5]:
onehot_repr=[one_hot(words,voc_size)for words in sent] 
print(onehot_repr)

[[3973, 2748, 3351, 8365], [3973, 2748, 3351, 9198], [3973, 5632, 3351, 7310], [7859, 4168, 7526, 615, 1186], [7859, 4168, 7526, 615, 3968], [6238, 3973, 2043, 3351, 9500], [5382, 7964, 8514, 615]]


We can check from block 3 if words and numbers are matched

### 3.b.Word Embedding Represntation

In [6]:
sent_length=8 # this is what we write as input_dim
embedded_docs=pad_sequences(onehot_repr,padding='pre',maxlen=sent_length)
print(embedded_docs)

[[   0    0    0    0 3973 2748 3351 8365]
 [   0    0    0    0 3973 2748 3351 9198]
 [   0    0    0    0 3973 5632 3351 7310]
 [   0    0    0 7859 4168 7526  615 1186]
 [   0    0    0 7859 4168 7526  615 3968]
 [   0    0    0 6238 3973 2043 3351 9500]
 [   0    0    0    0 5382 7964 8514  615]]


In [7]:
dim=10 # for embedding layer

In [8]:
model=Sequential()
model.add(Embedding(voc_size,10,input_length=sent_length)) # voc_size=10k, sent_len=8, 10 is dim
model.compile('adam','mse')

In [9]:
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding (Embedding)        (None, 8, 10)             100000    
Total params: 100,000
Trainable params: 100,000
Non-trainable params: 0
_________________________________________________________________


##### understanding summary

10 k(vocabulary size) multplied by 10 (embedding dimension size)

In [10]:
prob=model.predict(embedded_docs)
prob[0]

array([[-0.00763921, -0.04744663,  0.01065066,  0.02329439, -0.03941736,
         0.00849686,  0.0017413 ,  0.02973366, -0.01896145,  0.02623491],
       [-0.00763921, -0.04744663,  0.01065066,  0.02329439, -0.03941736,
         0.00849686,  0.0017413 ,  0.02973366, -0.01896145,  0.02623491],
       [-0.00763921, -0.04744663,  0.01065066,  0.02329439, -0.03941736,
         0.00849686,  0.0017413 ,  0.02973366, -0.01896145,  0.02623491],
       [-0.00763921, -0.04744663,  0.01065066,  0.02329439, -0.03941736,
         0.00849686,  0.0017413 ,  0.02973366, -0.01896145,  0.02623491],
       [-0.02333568, -0.04619519, -0.04012273,  0.04725363, -0.02203379,
        -0.01118412, -0.01843702, -0.01922396,  0.04244978,  0.01568328],
       [-0.03894145, -0.02323535,  0.0391776 ,  0.01057632,  0.03727842,
        -0.00653756, -0.02123624, -0.01711967, -0.02055456,  0.02044381],
       [-0.01296425,  0.02319032,  0.00405625, -0.04039415,  0.03479341,
         0.02240255, -0.00627903,  0.04904303

In [11]:
prob.shape

(7, 8, 10)

- 7 is amount of text documents we have i.e rows
- 8 is sent_length i.e column or could be called features
- 10 is embedding dimension i.e how many values presented as vector inside each sample.

In [12]:
embedded_docs[0]

array([   0,    0,    0,    0, 3973, 2748, 3351, 8365])

In [13]:
embedded_docs.shape

(7, 8)

here [-0.00763921, -0.04744663,  0.01065066,  0.02329439, -0.03941736,
         0.00849686,  0.0017413 ,  0.02973366, -0.01896145,  0.02623491] is for "0"

[0.0119185 , -0.01346827, -0.04068682,  0.03973157, -0.00616077,
         0.0130179 , -0.03786733,  0.04726738, -0.02849425, -0.04487189] is for 8365 ie. word milk

# END OF NOTEBOOK