# Feature extraction

## Image vectorisation
Images are vectorised using the penultimate layer of Keras Xception model <cite data-cite="chollet2017">(Chollet, 2013)</cite> pre-trained on imagenet <cite data-cite="deng2009">(Deng et al., 2009)</cite>.

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
from src.preprocessing.preprocess_dataset import extract_all
import numpy as np
import pandas as pd
import tensorflow_hub as hub
from src.utils.file_utils import save_as_pickle, load_pickle_file

In [3]:
extract_all("xception", "data/train_images/", "data/features/xception.pkl.train")

  "Palette images with Transparency expressed in bytes should be "


Remove an image which was not annotated

In [4]:
img_embeddings = load_pickle_file("data/features/xception.pkl.train")
img_embeddings.pop("chandler_Friday-Mood-AF.-meme-Friends-ChandlerBing.jpg", None)
save_as_pickle(img_embeddings, "data/features/xception.pkl.train")

In [5]:
extract_all("xception", "data/dev_images/", "data/features/xception.pkl.dev")

## Sentences vectorisation
Text of memes are vectorised using pretrained Universal sentence encoding <cite data-cite="cer2018">(Cer et al., 2018)</cite>. The dataset of training is not specified nor open sourced by the authors. 

In [6]:
use = hub.load("https://tfhub.dev/google/universal-sentence-encoder/4")

In [7]:
def extract_embeddings(model, sents, img_ids, save_file):
    embeddings = model(sents)
    features = {}
    for i, (embed, sent, img_id) in enumerate(zip(embeddings, sents, img_ids)):
        print("{}\t{}\t\t{}".format(i, sent, img_id))
        features[img_id] = embed
    save_as_pickle(features, save_file)

In [8]:
def extract_sentences_and_ids(dataset_path):
    df = pd.read_csv(dataset_path)
    sents = df["Corrected_text"]
    img_ids = df["image_name"]
    return sents, img_ids

In [9]:
train_sents, train_img_ids = extract_sentences_and_ids("data/train_text_cleaned_final.csv")
extract_embeddings(use, train_sents, train_img_ids, "data/features/use.pkl.train")

0	LOOK THERE MY FRIEND LIGHTYEAR NOW ALL SOHALIKUT TREND PLAY THE 10 YEARS CHALLENGE AT FACEBOOK		10_year_2r94rv.jpg
1	The best of #10 YearChallenge! Completed in less the 4 years. Kudus to @narendramodi ji 8:05 PM - 16 Jan 2019 from Mumbai  India		10_year_10-year-challenge_1547788782.jpeg
2	Sam Thorne @Strippin ( Follow Follow Saw everyone posting these 2009 vs 2019 pics so here's mine 6:23 PM - 12 Jan 2019 O 636 Retweets 3 224 LIKES 65 636		10_year_10yearchallenge-5c75f8b946e0fb0001edc739.JPG
3	10 Year Challenge - Sweet Dee Edition		10_year_10-year-challenge-sweet-dee-edition-40184302.png
4	10 YEAR CHALLENGE WITH NO FILTER 47 Hilarious 10 Year Challenge Memes | What is #10 Year Challenge?		10_year_10-year-challenge-with-no-filter-47-hilarious-10-year-42949168.png
5	1998: Don't get in car with strangers 2008: Don't meet people from the internet alone.  2019: UBER.. Order yourself a stranger from the internet to get into a car with alone.		10_year_10-years-challenge-about-humanity_o_72

1504	WE ARE LOOKING AT YOUR BROWSER HISTORY		gf_girl-friend-memes-18.jpg
1505	I DON'T HAVE A GIRLFRIEND BUT I DO KNOW A WOMAN WHO'D BE REALLY MAD IF SHE HEARD ME SAY THAT		gf_girl-friend-memes-19.jpg
1506	MY GIRLFRIEND WANTED A IM MAD AT YOU AND I'M GONNA BE VERY SPECIFIC IN TELLING YOU WHY CAT. I DIDN'T WANT A CAT. SO WE COMPRIMISED AND GOT A CAT. SAID NO GIRLFRIEND EVER		gf_girl-friend-memes-20.jpg
1507	WHEN SHE KNOWS YOU VE BOUGHT THE RING BUT YOU HAVEN'T GIVEN IT TO HER YET.		gf_guOIKP2.jpg
1508	MY BOYFRIEND SAID HE WAS GOING TO GNE ME SOME VITAMIIND ITOLD HIMI TAKE ASUPPLEMENT FOR THAT		gf_hnxg1.jpg
1509	How you look at her when y'all first meet vs how you look at her after 2 years bc you love her and she makes you happy. himai: kushandwizdom: I love these wholesome memes So wholesome! I love my girlfriend via /r/wholesomememes		gf_how-you-look-at-her-when-yall-first-meet-vs-33263647.png
1510	IF YOUR BOYFRIEND HASN'T BEEN IN THE MILITARY THEN YOU HAVE A GIRLFRIEND		gf_ifyour-boyfr

2925	HOW DO CLOWNFISH TASTE? THEY TASTE FUNNY!		nemo_1t2osr.jpg
2926	WHEN THE TEACHER WANTS TO KNOW WHY - YOU DIDN'T FINISH YOUR HOMEWORK		nemo_1ynij0.jpg
2927	The most amazing thing about Finding Dory is how they managed to put a receding hairline on a fish		nemo_2bc.jpg
2928	WHEN YOUR COOL UNCLE FINDS YOUR SEAWEED STASH CALGATALK		nemo_2vtng7.jpg
2929	WHY IS FINDING NEMO CLOSED? WE DONE FINDED HIM.		nemo_4ebb1da7349f4f91275e1d09bf66ed04ce562092198af0b3100e344a253f0313.jpg
2930	I NEED TO STOP BUYING BOOKS OH LOOK A BOOK SALE		nemo_5d89ae15c2c3c778a78e741824bc6dad.jpg
2931	FISHIE Y U NO WAKE UP?!		nemo_6a8251c2ce93568bacdfa34f6fe2433b.jpg
2932	When your constantly refreshing your meme but the upvotes don't change u/vivaanranka Why aren't you moving?!		nemo_6muwc14wspn21.jpg
2933	NOT ONLY IS MY SHORT TERM MEMORY BAD  BUT SO IS MY SHORT TERM MEMORY		nemo_53b4783efaa16d754b82deab0f5c3d5e.jpg
2934	I'm gonna explore the rock @_official_gerald		nemo_54a4ff83bfa9baf18bb523fbc33617e5--gerald-f

4276	FIVE TIME DRAFT DODGER WANTS MILIRARU		trump_trump-draft-dodger-parade.png
4277	DON'T MIND ME I'M FIXING WHAT OBAM BROKE		trump_TrumpFixingFlag1500-5ada49fbc6733500372d5fc6.jpg
4278	THE MOST DANGEROUS THING IN THE WORLD IS AN IMBECILE WHO THINKS HE'S A GENIUS!		trump_trump-imbecile-genius.jpg
4279	SUPPOSED TO BE PROTECTING US FROM ISIS		trump_trump-isis-twitter-hissy-fit-589d652f3df78c4758024dcc.jpg
4280	We love you		trump_trump_meme1.jpg
4281	REALLY  SICK AND TIRED OF WINNING		trump_trumpmeme-3-5c51da71c9e77c00016f38da.jpg
4282	IF I BECOME PRESIDENT I WILL DEPORT BEIBER		trump_Trump-Memes-6.jpg
4283	WILL YOU BE ON MY CABINET? We need new education secretary with experience.		trump_trump_memes_glowering.png
4284	I LOVE THE POORLY EDUCATED		trump_trump-poorly-educated-wish-granted-589bedd35f9b58819c3dc5e5.jpg
4285	I WILL BUILD A SPACE FORCE AND MAKE THE MARTIANS PAY FOR IT. BELIEVE ME! BIGLY! OCCUPY DEMOCRATS		trump_trump-space-force-martians.png
4286	FORIEGN POLICY? MESS WITH THE 

5623	MY REPORT IS ON HOW I SAW THE DEADPOOL MOVIE ... OH WHATS THAT I'M EXPELLED Img		deadpool_24-Deadpool-Movie-Meme.jpg
5624	Country music is just farm emo		country_o6znkk7rygp11.jpg
5625	I play fornite I can build walls for free		trump_t2r1bswofcz11.jpg
5626	me: *jokes about rape and suicidex kids on the school bus i drive: @fruitofthepoisonousmeme		best_2018_5c85b7ca3d42f.jpeg
5627	SO I'M NOT AMEME PERHAPS I DON'T TRANSLATE WELL TO THE INTERNET MAYBE I'M TOO OLD OF A REFRENCE FOR REDDIT TO GET		dr_evil_uRQDduP.jpg
5628	WHEN YOU SEE A FAKE COUNTRY GIRL TRYIN WAY TOO HATD		country_58382097.jpg
5629	AND THIS IS WHERE I PUT MY OSCAR NOW THAT I HAVE ONE		decaprio_p8DFT1d.png
5630	THIS GIRLS HOT AND LIKES HITLER MEMES		hitler_this-girls-hot-mdmft9.jpg
5631	CALL TECH SUPPORT MY MOUSE IS DEAD		tech_Call-Tech-Support-My-Mouse-Is-Dead-Funny-Technology-Meme-Image.jpg
5633	Canic abo paynesenterprise liam payne R 2011 || 2017 * honestly i wish i could calculate how much money i have spent on th

In [10]:
dev_sents, dev_img_ids = extract_sentences_and_ids("data/dev_text_cleaned_final.csv")
extract_embeddings(use, dev_sents, dev_img_ids, "data/features/use.pkl.dev")

0	ISAW DAD WITH MOM LASTNIGHT I THINK HE WAS STEALING MY MILK.		skeptical_stealing-my-milk.jpg
1	HOW AM I STILL BREATHING IF SHE HAS MY NOSE?		skeptical_breathing+if+she+has+my+nose.jpg
2	YOU MEAN TO TELL ME BIGG BOSSIS BETTER THAN KBC!!! The original photo		skeptical_603b3553d88441537f6c65abac8a1cec.jpg
3	MY SHARE OF THE NATIONAL DEBT IS HOW MUCH?		skeptical_e17ae5f069b21df5599460939047d4ae8db9852ea6ce2c277c15eeea6f7928ef.jpg
4	WAIT A SECOND BILLA AM I SKEPTICAL BABY GROWN UP?  XD		skeptical_75c34fa1-4d2b-45c1-9bda-5ff0f15d241e.jpg
5	LOVES A BAND DOESN'T PAY FOR THEIR MUSIC		skeptical_steve-5c00540f46e0fb00018941d4.jpg
6	You mean to tell me spoons don't actually sound like airplanes?		skeptical_59921.jpg
7	YOU PUT YOUR WHAT IN HER WHAT?		skeptical_iqtkv.jpg
8	YOU THINK I'M GONNA EAT VEGTABLES? made on imgur Skeptical baby - Meme on Imgur		skeptical_vegtables-made-on-imgur-skeptical-baby-meme-on-imgur-54337431.png
9	I'M SMART JUST ASK ME		skeptical_1p65xn.jpg
10	BITCH HOW DA HELL SHE L