<a href="https://colab.research.google.com/github/TeamMAMI/MAMI/blob/main/main.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

*Note: Turn on GPU for this notebook.*

## Dependencies and Libraries

In [None]:
# download the required pacakges
!pip install tensorflow_text

In [2]:
# import the libraries
import tensorflow as tf
import tensorflow_hub as hub
import tensorflow_text as text
import pandas as pd

## Load the data

In [3]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [42]:
# Load the data
### data_path is the path of the csv file

data_path = "/content/drive/Shareddrives/team_MAMI/MAMI/TRIAL/trial.csv"
df = pd.read_csv(data_path, delimiter="\t")
# df = df.sort_values('file_name')
df.head(5)

Unnamed: 0,file_name,misogynous,shaming,stereotype,objectification,violence,Text Transcription
0,28.jpg,0,0,0,0,0,"not now, dad. We should burn Jon Snow. stop it..."
1,30.jpg,0,0,0,0,0,there may have been a mixcommunication with th...
2,33.jpg,0,0,0,0,0,i shouldn't have sold my boat
3,58.jpg,1,0,0,0,1,"Bitches be like, It was my fault i made him mad"
4,89.jpg,0,0,0,0,0,find a picture of 4 girls together on FB make ...


In [43]:
caption_data_path = "/content/drive/Shareddrives/team_MAMI/flickr8k/flickr8k_dataset/clean_image_captions.csv"
df_caption = pd.read_csv(caption_data_path, header=None, names=["image_name", "caption"])
df_caption = df_caption.sort_values('image_name')
df_caption

Unnamed: 0,image_name,caption
15,image_0.png,man and a woman sitting on a bench in front of a
63,image_1.png,little girl in a pink dress blows bubbles out...
88,image_10.png,little girl in a pink shirt blows bubbles in ...
17,image_11.png,little boy and a woman sitting on a bench in ...
4,image_12.png,man and a woman looking at each other in fron...
...,...,...
35,image_95.png,group of people stand in front of a body of w...
93,image_96.png,group of older women standing in front of a b...
11,image_97.png,man and a woman sitting on a bench in front o...
54,image_98.png,people sit on a bench in front of a store


In [44]:
cap = df_caption.caption.to_list()
df["caption"] = cap

In [50]:
df_req = df[['Text Transcription', 'caption']].copy()
df_req

Unnamed: 0,Text Transcription,caption
0,"not now, dad. We should burn Jon Snow. stop it...",man and a woman sitting on a bench in front of a
1,there may have been a mixcommunication with th...,little girl in a pink dress blows bubbles out...
2,i shouldn't have sold my boat,little girl in a pink shirt blows bubbles in ...
3,"Bitches be like, It was my fault i made him mad",little boy and a woman sitting on a bench in ...
4,find a picture of 4 girls together on FB make ...,man and a woman looking at each other in fron...
...,...,...
95,Rape culture. It's what every oxymoronic femi...,group of people stand in front of a body of w...
96,"walking, running, telereporting, not going to ...",group of older women standing in front of a b...
97,taking the time to get her pussy wet. always p...,man and a woman sitting on a bench in front o...
98,what men play with vs what women play with,people sit on a bench in front of a store


In [67]:
concatenated = df_req['Text Transcription'] + df_req['caption']

## Data Summary and Preprocessing

In [47]:
# checking if the classes are balanced
df['misogynous'].value_counts()

0    56
1    44
Name: misogynous, dtype: int64

The classes are evenly distributed (i.e. balanced class).

In [71]:
# splitting the data into train and test set
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(concatenated, df['misogynous'], stratify=df['misogynous'], test_size=0.25)

## BERT Implementation

In [72]:
# downloading the pre-trained BERT model from tfhub
bert_preprocess = hub.KerasLayer("https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3")
bert_encoder = hub.KerasLayer("https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/4")

Functional model below.

In [73]:
# initializing BERT layers
text_input = tf.keras.layers.Input(shape=(), dtype=tf.string, name='text') # input layer
preprocessed_text = bert_preprocess(text_input)
outputs = bert_encoder(preprocessed_text)

# initializing NN layers

### The Dropout layer randomly sets input units to 0 with a frequency of rate at 
### each step during training time, which helps prevent overfitting
### Source: https://keras.io/api/layers/regularization_layers/dropout/
l = tf.keras.layers.Dropout(0.1, name="dropout")(outputs['pooled_output']) # 0.1% neuron is dropped out randomly
l = tf.keras.layers.Dense(1, activation='sigmoid', name="output")(l) # 1 neuron tells if the text is misogynous or not (i.e. 1 or 0)

model = tf.keras.Model(inputs=[text_input], outputs = [l])

In [74]:
model.summary()

Model: "model_1"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 text (InputLayer)              [(None,)]            0           []                               
                                                                                                  
 keras_layer_2 (KerasLayer)     {'input_word_ids':   0           ['text[0][0]']                   
                                (None, 128),                                                      
                                 'input_mask': (Non                                               
                                e, 128),                                                          
                                 'input_type_ids':                                                
                                (None, 128)}                                                

## Model Training

In [75]:
# accuracy metrics
METRICS = [
      tf.keras.metrics.BinaryAccuracy(name='accuracy'),
      tf.keras.metrics.Precision(name='precision'),
      tf.keras.metrics.Recall(name='recall')
]

model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=METRICS)

In [76]:
model.fit(X_train , y_train, epochs=10)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7f6c62417cd0>

In [77]:
model.evaluate(X_test, y_test)



[0.6802049875259399, 0.6000000238418579, 1.0, 0.09090909361839294]