# Data Ingestion

Having raw datasets is nice, but at this point it's still really difficult to work with! It's like we wanted a RollsRoyce dataset but ended up with an old 2007 Toyota Prius. (No offence to Prius owners).

Let's do some data processing and ingestion to **really** make this baby purr.

In this notebook we'll go from this:

* Image URLs pointing to... Something? Maybe it's a 404 page, maybe it's an audio file. Or a grayscale image. Who knows.
* Super nested List datatypes for representing image information

To all of this, nicely stored in your Data Lake:

* Clean images, all encoded as JPEG with the same mode (RGB)
* Image URLs pointing to these new clean images in your datalake (e.g. AWS S3), guaranteed to exist and not throw a 404
* A nice thumbnail in your table to help with quick visualization
* Clip embedding of the image, in case you want to get fancy with vector search
* All the extremely useful metadata about the image (height, width, num_channels etc)

Daft makes it super easy and performant to do all of this. Let's try it out.

In [1]:
import daft

df = daft.read_json("../data/mmc4_v1.1/uncompressed/docs_no_face_shard_0_v2.jsonl")
df.show()

url Utf8,text_list List[Utf8],"image_info List[Struct[image_name: Utf8, raw_url: Utf8, matched_text_index: Int64, matched_sim: Float64]]",similarity_matrix List[List[Float64]],could_have_url_duplicate Int64
https://gizmodo.com/record-breaking-galaxy-is-so-big-it-acts-like-a-magnify-1614257959,"[NASA's Hubble Space Telescope has just found the most distant lensing galaxy (which are massive enough to act as their own intergalactic microscopes) ever., And thanks to a rare alignment, it might just give us a peak at how our very own galaxy formed all those billions of years ago., When you look more than 9 billion years ago in the early universe, you don't expect to find this type of galaxy lensing at all., It's very difficult to see an alignment between two galaxies in the early universe., Imagine holding a magnifying glass close to you and then moving it much farther away., When you look through a magnifying glass held at arm's length, the chances that you will see an enlarged object are high., But if you move the magnifying glass across the room, your chances of seeing the magnifying glass nearly perfectly aligned with another object beyond it diminishes., Because we've stumbled upon this chance alignment, though, we're able to use the lensing galaxies distorting effects to determine its total mass (including dark matter) by ""gauging the intensity of its lensing effects on the background galaxy's light."", So how much does a record-breaking lensing galaxy weigh?, Over 180 billion times more than our sun.]","[{image_name: f5f8113da82f.jpg, raw_url: https://i.kinja-img.com/gawker-media/image/upload/s--3J8F-YDp--/c_fit,f_auto,fl_progressive,q_80,w_320/836357908648846882.jpg, matched_text_index: 8, matched_sim: 0.3148163855075836, }]","[[0.2267199158668518, 0.18886339664459229, 0.2548035979270935, 0.2684311270713806, 0.19809527695178986, 0.15881140530109406, 0.14154167473316193, 0.24295282363891602, 0.3148163855075836, 0.19724470376968384]]",0
http://visionplatforminc.com/dynamic-platform-vs-traditional-business-strategy/,"[Our dynamic platform harmonizes your leadership team around a clear vision in one day and then builds momentum long-term, ensuring that vision becomes a reality., It could be called a “Strategy Platform”., But let’s not., Our approach is so radically different from the disappointing image “strategy” can summon in the minds of business leaders, we don’t like to use the word., Unlike a traditional approach to strategy, our dynamic platform is an online, adaptive, affordable, collaborative method that guards against faulty conclusions or dust-gathering strategy documents because no one knows quite what to do with them., Don’t settle for the traditional approach of just getting people on the same page, offer them a framework for smart improvisation., Wondering if it’s actually possible?, Over a hundred organizations have found it highly beneficial., People lie once every ten minutes., What is this costing your company?, What’s replacing traditional strategic planning?, What does the empowered buyer demand?, A short history of why we need to know how we are unique as an organization., Why do so many business initiatives fail?, Do you have any questions about our business organization and management consulting methods or software?, We're happy to answer them., And what we can do for you.]","[{image_name: e39fe8050561.png, raw_url: http://visionplatforminc.com/wp-content/uploads/2018/03/VP-keep-calm-and-stop-telling-lies-500x383.png, matched_text_index: 2, matched_sim: 0.1953807771205902, }, {image_name: 42532abf37cf.png, raw_url: http://visionplatforminc.com/wp-content/uploads/2018/06/comparison-chart-1200x676.png, matched_text_index: 10, matched_sim: 0.29334038496017456, }, {image_name: 6e3961dc4767.jpg, raw_url: http://visionplatforminc.com/wp-content/uploads/2018/11/monitor-for-Business-Strategy.jpg, matched_text_index: 14, matched_sim: 0.24410296976566315, }]","[[0.13888245820999146, 0.15384167432785034, 0.1953807771205902, 0.16709916293621063, 0.18045303225517273, 0.1648285835981369, 0.16796952486038208, 0.17308980226516724, 0.19239084422588348, 0.17934882640838623, 0.19220402836799622, 0.1915411800146103, 0.14698170125484467, 0.18065980076789856, 0.17008471488952637, 0.1634514331817627, 0.18184798955917358], [0.24126467108726501, 0.22723841667175293, 0.1192258670926094, 0.24206474423408508, 0.2564660608768463, 0.2161899358034134, 0.11806226521730423, 0.1627596616744995, 0.09863521158695221, 0.20635978877544403, 0.29334038496017456, 0.208466038107872, 0.21782270073890686, 0.22355303168296814, 0.16166360676288605, 0.13091100752353668, 0.14293897151947021], [0.22902636229991913, 0.1869916319847107, 0.10615022480487823, 0.19604836404323578, 0.24055539071559906, 0.19517618417739868, 0.14319950342178345, 0.23251116275787354, 0.10010193288326263, 0.22138026356697083, 0.1989588886499405, 0.21306291222572327, 0.14781776070594788, 0.21081317961215973, 0.24410296976566315, 0.1561538279056549, 0.1869804859161377]]",0
https://www.printworxuk.com/product/roller-banner/,"[Perfect for exhibitions, displays, or advertising in small spaces., Do You Have Your Artwork?, Please upload your artwork as a pdf print ready file., Includes Hardware, printed banner, and carry case., Available with digitally printed ‘premium’* 165 micron grey-backed polyester film banners.]","[{image_name: ce9f0026d021.png, raw_url: https://www.printworxuk.com/wp-content/uploads/2017/06/roller-banner.png, matched_text_index: 0, matched_sim: 0.28204163908958435, }, {image_name: a41d8c7a197f.png, raw_url: https://www.printworxuk.com/wp-content/uploads/2015/07/banner-287x300.png, matched_text_index: 4, matched_sim: 0.31874263286590576, }]","[[0.28204163908958435, 0.15614458918571472, 0.174833744764328, 0.2311287373304367, 0.26510486006736755], [0.2898246943950653, 0.16946707665920258, 0.18884825706481934, 0.21569928526878357, 0.31874263286590576]]",0
https://xplorer.com.ng/2019/04/01/10-most-common-android-problems/,"[In this article, I have picked the most common 10 Android problems along with its simplest and most effective solutions., Battery drain is one of the most common problems in an Android smartphone., This is the price we have to pay for having the smartest devices in the world., While there are many reasons behind the battery drain issue, one of the best ways to solve this issue is to enable battery saving mode and reduce your Android smartphone brightness., To resolve the problem of battery drain, you can use DU Battery Saver & Fast Charge., Also, DU Battery Saver helps you to charge your battery fast by removing the apps running in the background of your Android device., Open your Settings menu, click on the location and enable battery saving mode., Always use low brightness in an Android smartphone., Many a times, we face problems connecting to the internet in spite of the Android smartphone being connected to the Wi-Fi., This may solve the problem., Just go to Wi-Fi > Settings > Menu > Advanced and choose to stay connected to Wi-Fi during sleep., Alternatively, you can restart your Android smartphone and Wi-Fi router or enable Airplane mode for atleast a minute., Then, again try to connect with your Android smartphone to Wi-Fi., Several users face syncing problem with Google server in an Android smartphone for many reasons., However, the solution is simple., Firstly, check if you have recently changed your account password., If yes, then you need to update it with the new password., If the problem still persists, then enable Airplane mode for 30 seconds and try again., Alternatively, you can do remove your Google account and add it again., By default, keyboard of the Android smartphone gets hung., If you are facing problems such as Keyboard taking too long to respond, or stopped responding, then it is suggested that you download Google Keyboard, as it is one of the most popular Google keyboard apps., You can make it as your default Keyboard app and can improve your typing experience too., Your smartphone screen automatically turned off when you plug your smartphone with charger., To recover this, you need to go to Settings / Applications / Development and tick the ‘Stay awake’ option to keep the screen on when charging., If you use laptop or Wi-Fi network, you can try Airdroid app for wirelessly transferring data in the Android device., Airdroid is one of the best apps for transfer data in PC to mobile and mobile to PC., Sometimes, the USB or USB port in PC doesn’t support each other, which is why you face these type of problem., Google Play Store is the main hub for downloading apps in Android devices., For instance, if you are getting error like not downloading an app from the Google Play Store, then your smartphone may have a serious problem., When you are unable to download Android apps, you can’t do anything in android mobile., Just follow some simple methods to sort out these errors., • You need to remove and add your Google Account., Removing it and adding your account can solve your problems within minutes., • You can clear the data and Cache of Google Play Store and Services., This will fix the problems., The main reason for such an error in an Android smartphone could be due to the device not supporting that particular game., Always ensure that the game you want to download is compatible with your Android OS versions., Another reason for the games not running is because of insufficient RAM space in Android smartphone., Use Clean Master for boosting games in Android mobile., If you are game lover then you can go through our list and download Best Free Car Racing Games for Android Mobile., Android provides limited storage for apps and we don’t have the authority to expand it., So, if you are running Android that frequently shows up insufficient space error, then there is no fix., You just need to install the app CCleaner in your Android device that will help you free up some storage.]","[{image_name: 2b05a3b4c88e.jpg, raw_url: https://xplorernet.files.wordpress.com/2019/03/img_20190302_181629_438.jpg, matched_text_index: 8, matched_sim: 0.17679670453071594, }]","[[0.16432377696037292, 0.13561537861824036, 0.1343795508146286, 0.16658496856689453, 0.17377106845378876, 0.14610256254673004, 0.15115123987197876, 0.10028307884931564, 0.17679670453071594, 0.14612779021263123, 0.14604255557060242, 0.15062597393989563, 0.1425563097000122, 0.15112616121768951, 0.1644369661808014, 0.1484353095293045, 0.13976217806339264, 0.15337888896465302, 0.14418822526931763, 0.10579082369804382, 0.1658637821674347, 0.12310595065355301, 0.10974544286727905, 0.07112862169742584, 0.1094629243016243, 0.1460697054862976, 0.12781348824501038, 0.1191161498427391, 0.17566411197185516, 0.14520303905010223, 0.17649254202842712, 0.17334702610969543, 0.1710623949766159, 0.15035775303840637, 0.15698276460170746, 0.1412508189678192, 0.1312795877456665, 0.12560468912124634, 0.14663833379745483, 0.10943581163883209, 0.1346634328365326, 0.11937364935874939, 0.15087033808231354]]",0
http://www.rejcanaynay.com/2017/03/thailand-on-our-way-and-where-we-stayed.html,"[I can't believe it's already been a year since my brother and I went to Thailand., We visited this country last Holy week (it's the only holiday we can leave our business) and now here we are preparing for another trip abroad in the next few weeks., And before our next big adventure come here's what happened on our first trip abroad together that happened 25th of March 2016., It is one heck of an adventure, definitely!, Ever since I met my Thai friends when I was in the US I promised that I will visit them when I got the chance., So when my brother and I decided to finally travel together I suggested Thailand so I could meet my friends that I haven't seen for years., Actually, my brother prefers being a solo traveller but on one of our daily conversations, I said that I want to go abroad as well (parents won't allow me going solo) so we thought why not travel together and visit one country a year every Holy week., Our whole trip was secretly prepared and arranged by my brother who didn't even bother to tell me how much our plane tickets and hotel cost because according to him, he doesn't like to hear me ranting about the amount of money he's going to spend., lol!, Anyway, he booked our flight early in the morning so we could make the most of our first day exploring the city., To be honest, we did not plan our whole trip in Bangkok., No itinerary or even a portable wifi to help us out., Well, we know the tourist spots, the places to visit and even the names of the temples and malls but we don't know how to get there., So when we arrived at Suvarnabhumi Airport we don't know what to do and where to go., Thankfully, their's wifi in the area and the airport staff are very informative as well as their airport signs., We could actually call our hotel for a pick-up service or take a cab but we decided to take the train instead because when we check the location of our place, it is near the train station and will only take about 5-minute walking., From the airport rail link station located on the basement level of Suvarnabhumi airport, we purchased the City Link Train ticket (blue line), went to the train and get off at Ratchaprarop station and walk our way to Baiyoke Sky Hotel., We stayed in Baiyoke Sky Hotel, the Thailand's tallest hotel with 88-storeys above Bangkok's skyline located in downtown area of Pathumwan., It is surrounded by various shopping centers like the famous Pratunam Mall which is right in front of the door of the hotel and other malls that are only within few minutes of walking., The location of the hotel might be one heck busy street with lots of tourist and local vendors around but it is surely a very easy access to train station and shopping malls which we totally love, right?!, As the tallest hotel in Thailand, Baiyoke surely has one of the best views you could see the city and as soon as you enter the lobby you can already start seeing scenic view since it is located on the 18th floor of the hotel., I didn't get the chance to explore the hotel because we are mostly out to see the city but I totally appreciate the size of our hotel room., With 2 queen size bed, a huge bathroom and a big space to put all your souvenirs and pasalubongs, it is a perfect size!, Tho my brother felt it was a little too plain because it's too large without so much in it., If you think about it, from the size of our room it can pass as an apartment., Hahaha!, Anyway, they have the best buffet breakfast ever with lots of food to choose from., Sadly, like what I said earlier, I wasn't able to fully explore the hotel due to our limited time but if you're planning to visit Thailand anytime soon I do recommend this staying here., After settling our luggage in our hotel room, refresh a little bit from the long flight and rest for an hour or two, we went straight back to the city and started our tour., Let the next blog post tell where we went and what we did in Bangkok, Thailand., The hotel room looks so nice and cozy!, It is!, And very spacious too., Hehehe!, I love how your hotel room is so quirky!, It's too bad you didn't explore more of the hotel, the interior and structure looks amazing., Anyway, taking the train is very brave of you., Given the same circumstances, I'd be too scared and I'd probably just take a cab instead., Haha., I'm looking forward to reading more of your Thailand travel posts., Yea, maybe the nextime we visit I'll make the most of it!, Our hotel I mean., Hehehe!, It's actually my brother who lead me everywhere!, Hahaha!, I also wanted to go to Thailand this year if time permits!, Ang ganda lang talaga!, I really love your shots!, Ang sarap mag travel super!, Nice post babe!, Hopefully, you could visit my blog as well!, Have a great day!, Girl, did your brother treat you?, I'm estimating the hotel room to at least be 5K per night!, Hahahaha., Sana ako din makapagstay sa ganyan., Most hotel/hostel stays ko have rooms at most half of yours., Can't wait to read about your other travels!]","[{image_name: e7e97bef7dac.jpg, raw_url: https://4.bp.blogspot.com/-2PnFWNn2-sU/WNXZ0dIMj4I/AAAAAAAAA6w/rRm8jvmiiF8iJOJNoU25EAVN1arsUOTngCLcB/s1600/IMG_5296.jpg, matched_text_index: 22, matched_sim: 0.2789805829524994, }, {image_name: 2048c6e4deda.jpg, raw_url: https://2.bp.blogspot.com/-wzn0AgkCMM0/Wi7vfliEXlI/AAAAAAAAA0U/n1aOnRYRksYIYBqrJhMNRR57g912-RT3gCLcBGAs/s1600/banner_320x220-1.jpg, matched_text_index: 32, matched_sim: 0.17884205281734467, }, {image_name: 0c0138223e72.jpg, raw_url: http://3.bp.blogspot.com/-s_BdpepawJM/WvkDjDzPOSI/AAAAAAAAAdE/ry4uzElyrFIc7l5vGUEMTmJvYKQMXL4jwCK4BGAYYCw/s1600/rej1.jpg, matched_text_index: 51, matched_sim: 0.20207133889198303, }]","[[0.16788014769554138, 0.16807110607624054, 0.17660807073116302, 0.16046054661273956, 0.1484687775373459, 0.14830976724624634, 0.15153062343597412, 0.18192975223064423, 0.09780848771333694, 0.20111289620399475, 0.1987701952457428, 0.1798335313796997, 0.13334181904792786, 0.17085158824920654, 0.1297556757926941, 0.18738098442554474, 0.2095731496810913, 0.20506754517555237, 0.2226984053850174, 0.19385895133018494, 0.18633514642715454, 0.2393738180398941, 0.2789805829524994, 0.17799630761146545, 0.20788730680942535, 0.1342388093471527, 0.1542152464389801, 0.19843629002571106, 0.1945418268442154, 0.19845902919769287, 0.25169897079467773, 0.1351592242717743, 0.20035339891910553, 0.09662342816591263, 0.20172712206840515, 0.22258269786834717, 0.07487604022026062, 0.10534877330064774, 0.13552089035511017, 0.17826369404792786, 0.17579905688762665, 0.195256307721138, 0.09662342816591263, 0.13869988918304443, 0.1342388093471527, 0.17038212716579437, 0.12993203103542328, 0.1502522975206375, 0.17649193108081818, 0.1499987095594406, 0.1864454299211502, 0.1348588615655899, 0.10964937508106232, 0.2373974323272705, 0.12805549800395966, 0.11371669173240662, 0.21239037811756134, 0.1517273485660553], [0.06328505277633667, 0.0897543653845787, 0.11740045249462128, 0.12450197339057922, 0.08986086398363113, 0.05190318077802658, 0.054899752140045166, 0.09049399942159653, 0.1061900407075882, 0.12440155446529388, 0.11099850386381149, 0.16072040796279907, 0.07844299077987671, 0.051144666969776154, 0.09035585820674896, 0.09115341305732727, 0.05259387195110321, 0.04426082223653793, 0.09064020216464996, 0.09017255157232285, 0.07099680602550507, 0.05829783156514168, 0.1270027905702591, 0.15298829972743988, 0.1677514761686325, 0.11231721937656403, 0.08095286786556244, 0.0807177722454071, 0.11307176947593689, 0.12081937491893768, 0.09683635085821152, 0.15125393867492676, 0.17884205281734467, 0.10309552401304245, 0.14032134413719177, 0.12450166046619415, 0.10814003646373749, 0.09879916906356812, 0.13005201518535614, 0.10041002184152603, 0.1337868571281433, 0.083918996155262, 0.10309552401304245, 0.11742781102657318, 0.11231721937656403, 0.08733730018138885, 0.09571588039398193, 0.14950618147850037, 0.11001060903072357, 0.14641420543193817, 0.16747985780239105, 0.1480441689491272, 0.11151068657636642, 0.0997694879770279, 0.1217360645532608, 0.09285322576761246, 0.12006786465644836, 0.11833642423152924], [0.09765944629907608, 0.14265140891075134, 0.10322384536266327, 0.13843294978141785, 0.15226060152053833, 0.08140214532613754, 0.09863670915365219, 0.11834049224853516, 0.12599317729473114, 0.11868564784526825, 0.09369812160730362, 0.1295543611049652, 0.09914474934339523, 0.10492326319217682, 0.1232178807258606, 0.1118573248386383, 0.1045442670583725, 0.0783216804265976, 0.11235539615154266, 0.10495100170373917, 0.09048469364643097, 0.07047991454601288, 0.14126789569854736, 0.15215426683425903, 0.11987308412790298, 0.14345107972621918, 0.12265834957361221, 0.11260366439819336, 0.10020564496517181, 0.12803736329078674, 0.10258281230926514, 0.18734276294708252, 0.12676949799060822, 0.13893039524555206, 0.12609350681304932, 0.09610968828201294, 0.12643373012542725, 0.11855123192071915, 0.1461164355278015, 0.12973664700984955, 0.13251829147338867, 0.08629012107849121, 0.13893039524555206, 0.13067203760147095, 0.14345112442970276, 0.13981223106384277, 0.12986211478710175, 0.1632772982120514, 0.12617920339107513, 0.1640249639749527, 0.19330483675003052, 0.20207133889198303, 0.1166008934378624, 0.13017864525318146, 0.13805781304836273, 0.12805821001529694, 0.12879279255867004, 0.14213477075099945]]",0
https://www.billingshomesforsale.com/homes/24-UNIT-3-S-Broadway-Avenue/Red-Lodge/MT/59068/65239183/,"[Commercial Space on Red Lodge main street., Completely redone in 2006 - high ceilings, southern thermal windows, full bath., Tasteful and ready to go., Dedicated 239 sq., ft. of storage in basement., Taxes not split., Buyer to verify info.]","[{image_name: b10d840ab85f.jpg, raw_url: https://bt-photos.global.ssl.fastly.net/billings/orig_boomver_3_263931-2.jpg, matched_text_index: 3, matched_sim: 0.1556776911020279, }, {image_name: 3342e1b9a1bc.jpg, raw_url: https://bt-photos.global.ssl.fastly.net/billings/orig_boomver_3_263931-1.jpg, matched_text_index: 0, matched_sim: 0.26643791794776917, }, {image_name: 756e81c7a3b7.jpg, raw_url: https://bt-photos.global.ssl.fastly.net/billings/orig_boomver_3_263931-3.jpg, matched_text_index: 1, matched_sim: 0.24308572709560394, }]","[[0.23804685473442078, 0.2319779247045517, 0.12190639227628708, 0.1556776911020279, 0.1269601434469223, 0.1291377693414688, 0.14280642569065094], [0.26643791794776917, 0.23703794181346893, 0.1332261562347412, 0.15318185091018677, 0.1276800036430359, 0.11617893725633621, 0.1399332880973816], [0.2334568202495575, 0.24308572709560394, 0.14550526440143585, 0.15236130356788635, 0.11064980924129486, 0.11011527478694916, 0.13686919212341309]]",0
https://www.acornishmum.com/random-posts/page/2/,"[I have seen so many people who I think of as vaguely sensible intelligent people, sharing absolute rubbish online as fact., People don’t just say ‘a workman is only as good as his tools’ for no reason – it’s because the right tools for the job can help you get things sorted quickly and efficiently., *Collaborative guest post With the summer holidays underway, you may be wondering how to keep the kids happy for another few weeks., It can be tough during the summer to continuously come up with ways to entertainment the family., You will never be everyone’s cup of tea… and that’s okay., When you hear the phrase ‘getting summer ready’ you usually think of all the adverts aimed at women for them to lose weight, get a tan, look their best etc., It means something different in this house of teen and tween boys though., I don’t often join in with tag posts, but I thought I would today as a bit of fun after being tagged by the lovely Laura., Here are some random questions and my answers., I refuse to have any gambling links on my site for a reason, because I would never want to be the reason that someone became addicted to gambling or lost more than they could afford.]","[{image_name: 3dfe7f1b8889.jpg, raw_url: https://www.acornishmum.com/wp-content/uploads/2018/10/light-681540_640.jpg, matched_text_index: 3, matched_sim: 0.18006281554698944, }, {image_name: 63ebb64fea64.jpg, raw_url: https://www.acornishmum.com/wp-content/uploads/2018/08/tools-15539_640.jpg, matched_text_index: 1, matched_sim: 0.21386536955833435, }, {image_name: da29310f2040.jpg, raw_url: https://www.acornishmum.com/wp-content/uploads/2018/05/Vouchercloud-summer-ready.jpg, matched_text_index: 2, matched_sim: 0.29001814126968384, }, {image_name: 8ef4a00d9e7c.jpg, raw_url: https://www.acornishmum.com/wp-content/uploads/2018/03/KindnessonSocial-Media.jpg, matched_text_index: 7, matched_sim: 0.22565722465515137, }, {image_name: 275fa8a92ec7.png, raw_url: https://www.acornishmum.com/wp-content/uploads/2018/03/Bloggers-Random.png, matched_text_index: 8, matched_sim: 0.19497966766357422, }, {image_name: d1a15d0b3c3b.jpg, raw_url: https://www.acornishmum.com/wp-content/uploads/2018/06/You-Will-Never-Be-Everyones-Cup-of-Teaand.jpg, matched_text_index: 4, matched_sim: 0.33313706517219543, }, {image_name: 76b7018f1332.jpg, raw_url: https://www.acornishmum.com/wp-content/uploads/2018/02/dice-18208_1280.jpg, matched_text_index: 9, matched_sim: 0.22152987122535706, }, {image_name: 15ea30341800.jpg, raw_url: https://www.acornishmum.com/wp-content/uploads/2018/09/danger-of-believing.jpg, matched_text_index: 0, matched_sim: 0.20117005705833435, }]","[[0.14934484660625458, 0.1899491250514984, 0.18121454119682312, 0.18006281554698944, 0.14075128734111786, 0.15800437331199646, 0.13229826092720032, 0.14702007174491882, 0.13074156641960144, 0.14305973052978516], [0.11303241550922394, 0.21386536955833435, 0.1578568071126938, 0.17234590649604797, 0.10869140923023224, 0.1567077785730362, 0.1547345668077469, 0.11880984902381897, 0.1530669927597046, 0.14681237936019897], [0.09302127361297607, 0.15564016997814178, 0.29001814126968384, 0.20211854577064514, 0.10574855655431747, 0.23577308654785156, 0.11411786079406738, 0.1317010074853897, 0.11174870282411575, 0.12093476951122284], [0.1872631311416626, 0.17265556752681732, 0.23918785154819489, 0.18000495433807373, 0.18595141172409058, 0.12759137153625488, 0.14002291858196259, 0.22565722465515137, 0.126609668135643, 0.18919497728347778], [0.121915802359581, 0.13888205587863922, 0.22823116183280945, 0.13096173107624054, 0.11534067243337631, 0.14627471566200256, 0.11142628639936447, 0.21759909391403198, 0.19497966766357422, 0.1456509679555893], [0.15246962010860443, 0.16278263926506042, 0.19810809195041656, 0.17068225145339966, 0.33313706517219543, 0.17147208750247955, 0.16221892833709717, 0.19443559646606445, 0.12447119504213333, 0.16683489084243774], [0.12707868218421936, 0.19710755348205566, 0.19396579265594482, 0.19054695963859558, 0.1419774889945984, 0.1311657577753067, 0.14418475329875946, 0.14256486296653748, 0.1435479372739792, 0.22152987122535706], [0.20117005705833435, 0.15798982977867126, 0.19155287742614746, 0.14810237288475037, 0.16591110825538635, 0.14935266971588135, 0.13394591212272644, 0.1700541377067566, 0.12773242592811584, 0.21471425890922546]]",0
https://www.michaeljabalee.com/,"[​We've Moved to a New Location!, Michael Jabalee has specialized in natural healing for the body, mind and spirit for over 23 years., He utilizes techniques such as acupuncture, Auricular Medicine and homeopathy to treat a large number of 21st Century health challenges., In today’s world, there has been a huge rise in complicated diseases., Chemicals, toxins, additives, heavy metals, mold, and stress overload our adrenals and burden our immune systems., Modern acupuncture has been expanded and integrated with European Bio Energetic Medicine by progressive physicians throughout Europe over the last 60 years., Only a tiny handful practice this method in the U.S., This combination has tremendous advantages for effectively treating so many of our 21st century health challenges, such as Lyme, fibromylagia, chronic fatigue, hormone imbalances, toxin overload, and chronic stealth viral/bacterial/fungal/parasitic infections., We accept UVA Health Plan-Aetna.]","[{image_name: 28233104a82c.jpg, raw_url: https://www.michaeljabalee.com/uploads/2/6/8/4/26843574/acupuncture-award-2018_orig.jpg, matched_text_index: 2, matched_sim: 0.2818508744239807, }]","[[0.20271681249141693, 0.2157273292541504, 0.2818508744239807, 0.17973767220973969, 0.15964151918888092, 0.27058082818984985, 0.1491580605506897, 0.223326176404953, 0.20073233544826508]]",0


In [2]:
df = df.explode("image_info", "similarity_matrix")
df.show()

url Utf8,text_list List[Utf8],"image_info Struct[image_name: Utf8, raw_url: Utf8, matched_text_index: Int64, matched_sim: Float64]",similarity_matrix List[Float64],could_have_url_duplicate Int64
https://gizmodo.com/record-breaking-galaxy-is-so-big-it-acts-like-a-magnify-1614257959,"[NASA's Hubble Space Telescope has just found the most distant lensing galaxy (which are massive enough to act as their own intergalactic microscopes) ever., And thanks to a rare alignment, it might just give us a peak at how our very own galaxy formed all those billions of years ago., When you look more than 9 billion years ago in the early universe, you don't expect to find this type of galaxy lensing at all., It's very difficult to see an alignment between two galaxies in the early universe., Imagine holding a magnifying glass close to you and then moving it much farther away., When you look through a magnifying glass held at arm's length, the chances that you will see an enlarged object are high., But if you move the magnifying glass across the room, your chances of seeing the magnifying glass nearly perfectly aligned with another object beyond it diminishes., Because we've stumbled upon this chance alignment, though, we're able to use the lensing galaxies distorting effects to determine its total mass (including dark matter) by ""gauging the intensity of its lensing effects on the background galaxy's light."", So how much does a record-breaking lensing galaxy weigh?, Over 180 billion times more than our sun.]","{image_name: f5f8113da82f.jpg, raw_url: https://i.kinja-img.com/gawker-media/image/upload/s--3J8F-YDp--/c_fit,f_auto,fl_progressive,q_80,w_320/836357908648846882.jpg, matched_text_index: 8, matched_sim: 0.3148163855075836, }","[0.2267199158668518, 0.18886339664459229, 0.2548035979270935, 0.2684311270713806, 0.19809527695178986, 0.15881140530109406, 0.14154167473316193, 0.24295282363891602, 0.3148163855075836, 0.19724470376968384]",0
http://visionplatforminc.com/dynamic-platform-vs-traditional-business-strategy/,"[Our dynamic platform harmonizes your leadership team around a clear vision in one day and then builds momentum long-term, ensuring that vision becomes a reality., It could be called a “Strategy Platform”., But let’s not., Our approach is so radically different from the disappointing image “strategy” can summon in the minds of business leaders, we don’t like to use the word., Unlike a traditional approach to strategy, our dynamic platform is an online, adaptive, affordable, collaborative method that guards against faulty conclusions or dust-gathering strategy documents because no one knows quite what to do with them., Don’t settle for the traditional approach of just getting people on the same page, offer them a framework for smart improvisation., Wondering if it’s actually possible?, Over a hundred organizations have found it highly beneficial., People lie once every ten minutes., What is this costing your company?, What’s replacing traditional strategic planning?, What does the empowered buyer demand?, A short history of why we need to know how we are unique as an organization., Why do so many business initiatives fail?, Do you have any questions about our business organization and management consulting methods or software?, We're happy to answer them., And what we can do for you.]","{image_name: e39fe8050561.png, raw_url: http://visionplatforminc.com/wp-content/uploads/2018/03/VP-keep-calm-and-stop-telling-lies-500x383.png, matched_text_index: 2, matched_sim: 0.1953807771205902, }","[0.13888245820999146, 0.15384167432785034, 0.1953807771205902, 0.16709916293621063, 0.18045303225517273, 0.1648285835981369, 0.16796952486038208, 0.17308980226516724, 0.19239084422588348, 0.17934882640838623, 0.19220402836799622, 0.1915411800146103, 0.14698170125484467, 0.18065980076789856, 0.17008471488952637, 0.1634514331817627, 0.18184798955917358]",0
http://visionplatforminc.com/dynamic-platform-vs-traditional-business-strategy/,"[Our dynamic platform harmonizes your leadership team around a clear vision in one day and then builds momentum long-term, ensuring that vision becomes a reality., It could be called a “Strategy Platform”., But let’s not., Our approach is so radically different from the disappointing image “strategy” can summon in the minds of business leaders, we don’t like to use the word., Unlike a traditional approach to strategy, our dynamic platform is an online, adaptive, affordable, collaborative method that guards against faulty conclusions or dust-gathering strategy documents because no one knows quite what to do with them., Don’t settle for the traditional approach of just getting people on the same page, offer them a framework for smart improvisation., Wondering if it’s actually possible?, Over a hundred organizations have found it highly beneficial., People lie once every ten minutes., What is this costing your company?, What’s replacing traditional strategic planning?, What does the empowered buyer demand?, A short history of why we need to know how we are unique as an organization., Why do so many business initiatives fail?, Do you have any questions about our business organization and management consulting methods or software?, We're happy to answer them., And what we can do for you.]","{image_name: 42532abf37cf.png, raw_url: http://visionplatforminc.com/wp-content/uploads/2018/06/comparison-chart-1200x676.png, matched_text_index: 10, matched_sim: 0.29334038496017456, }","[0.24126467108726501, 0.22723841667175293, 0.1192258670926094, 0.24206474423408508, 0.2564660608768463, 0.2161899358034134, 0.11806226521730423, 0.1627596616744995, 0.09863521158695221, 0.20635978877544403, 0.29334038496017456, 0.208466038107872, 0.21782270073890686, 0.22355303168296814, 0.16166360676288605, 0.13091100752353668, 0.14293897151947021]",0
http://visionplatforminc.com/dynamic-platform-vs-traditional-business-strategy/,"[Our dynamic platform harmonizes your leadership team around a clear vision in one day and then builds momentum long-term, ensuring that vision becomes a reality., It could be called a “Strategy Platform”., But let’s not., Our approach is so radically different from the disappointing image “strategy” can summon in the minds of business leaders, we don’t like to use the word., Unlike a traditional approach to strategy, our dynamic platform is an online, adaptive, affordable, collaborative method that guards against faulty conclusions or dust-gathering strategy documents because no one knows quite what to do with them., Don’t settle for the traditional approach of just getting people on the same page, offer them a framework for smart improvisation., Wondering if it’s actually possible?, Over a hundred organizations have found it highly beneficial., People lie once every ten minutes., What is this costing your company?, What’s replacing traditional strategic planning?, What does the empowered buyer demand?, A short history of why we need to know how we are unique as an organization., Why do so many business initiatives fail?, Do you have any questions about our business organization and management consulting methods or software?, We're happy to answer them., And what we can do for you.]","{image_name: 6e3961dc4767.jpg, raw_url: http://visionplatforminc.com/wp-content/uploads/2018/11/monitor-for-Business-Strategy.jpg, matched_text_index: 14, matched_sim: 0.24410296976566315, }","[0.22902636229991913, 0.1869916319847107, 0.10615022480487823, 0.19604836404323578, 0.24055539071559906, 0.19517618417739868, 0.14319950342178345, 0.23251116275787354, 0.10010193288326263, 0.22138026356697083, 0.1989588886499405, 0.21306291222572327, 0.14781776070594788, 0.21081317961215973, 0.24410296976566315, 0.1561538279056549, 0.1869804859161377]",0
https://www.printworxuk.com/product/roller-banner/,"[Perfect for exhibitions, displays, or advertising in small spaces., Do You Have Your Artwork?, Please upload your artwork as a pdf print ready file., Includes Hardware, printed banner, and carry case., Available with digitally printed ‘premium’* 165 micron grey-backed polyester film banners.]","{image_name: ce9f0026d021.png, raw_url: https://www.printworxuk.com/wp-content/uploads/2017/06/roller-banner.png, matched_text_index: 0, matched_sim: 0.28204163908958435, }","[0.28204163908958435, 0.15614458918571472, 0.174833744764328, 0.2311287373304367, 0.26510486006736755]",0
https://www.printworxuk.com/product/roller-banner/,"[Perfect for exhibitions, displays, or advertising in small spaces., Do You Have Your Artwork?, Please upload your artwork as a pdf print ready file., Includes Hardware, printed banner, and carry case., Available with digitally printed ‘premium’* 165 micron grey-backed polyester film banners.]","{image_name: a41d8c7a197f.png, raw_url: https://www.printworxuk.com/wp-content/uploads/2015/07/banner-287x300.png, matched_text_index: 4, matched_sim: 0.31874263286590576, }","[0.2898246943950653, 0.16946707665920258, 0.18884825706481934, 0.21569928526878357, 0.31874263286590576]",0
https://xplorer.com.ng/2019/04/01/10-most-common-android-problems/,"[In this article, I have picked the most common 10 Android problems along with its simplest and most effective solutions., Battery drain is one of the most common problems in an Android smartphone., This is the price we have to pay for having the smartest devices in the world., While there are many reasons behind the battery drain issue, one of the best ways to solve this issue is to enable battery saving mode and reduce your Android smartphone brightness., To resolve the problem of battery drain, you can use DU Battery Saver & Fast Charge., Also, DU Battery Saver helps you to charge your battery fast by removing the apps running in the background of your Android device., Open your Settings menu, click on the location and enable battery saving mode., Always use low brightness in an Android smartphone., Many a times, we face problems connecting to the internet in spite of the Android smartphone being connected to the Wi-Fi., This may solve the problem., Just go to Wi-Fi > Settings > Menu > Advanced and choose to stay connected to Wi-Fi during sleep., Alternatively, you can restart your Android smartphone and Wi-Fi router or enable Airplane mode for atleast a minute., Then, again try to connect with your Android smartphone to Wi-Fi., Several users face syncing problem with Google server in an Android smartphone for many reasons., However, the solution is simple., Firstly, check if you have recently changed your account password., If yes, then you need to update it with the new password., If the problem still persists, then enable Airplane mode for 30 seconds and try again., Alternatively, you can do remove your Google account and add it again., By default, keyboard of the Android smartphone gets hung., If you are facing problems such as Keyboard taking too long to respond, or stopped responding, then it is suggested that you download Google Keyboard, as it is one of the most popular Google keyboard apps., You can make it as your default Keyboard app and can improve your typing experience too., Your smartphone screen automatically turned off when you plug your smartphone with charger., To recover this, you need to go to Settings / Applications / Development and tick the ‘Stay awake’ option to keep the screen on when charging., If you use laptop or Wi-Fi network, you can try Airdroid app for wirelessly transferring data in the Android device., Airdroid is one of the best apps for transfer data in PC to mobile and mobile to PC., Sometimes, the USB or USB port in PC doesn’t support each other, which is why you face these type of problem., Google Play Store is the main hub for downloading apps in Android devices., For instance, if you are getting error like not downloading an app from the Google Play Store, then your smartphone may have a serious problem., When you are unable to download Android apps, you can’t do anything in android mobile., Just follow some simple methods to sort out these errors., • You need to remove and add your Google Account., Removing it and adding your account can solve your problems within minutes., • You can clear the data and Cache of Google Play Store and Services., This will fix the problems., The main reason for such an error in an Android smartphone could be due to the device not supporting that particular game., Always ensure that the game you want to download is compatible with your Android OS versions., Another reason for the games not running is because of insufficient RAM space in Android smartphone., Use Clean Master for boosting games in Android mobile., If you are game lover then you can go through our list and download Best Free Car Racing Games for Android Mobile., Android provides limited storage for apps and we don’t have the authority to expand it., So, if you are running Android that frequently shows up insufficient space error, then there is no fix., You just need to install the app CCleaner in your Android device that will help you free up some storage.]","{image_name: 2b05a3b4c88e.jpg, raw_url: https://xplorernet.files.wordpress.com/2019/03/img_20190302_181629_438.jpg, matched_text_index: 8, matched_sim: 0.17679670453071594, }","[0.16432377696037292, 0.13561537861824036, 0.1343795508146286, 0.16658496856689453, 0.17377106845378876, 0.14610256254673004, 0.15115123987197876, 0.10028307884931564, 0.17679670453071594, 0.14612779021263123, 0.14604255557060242, 0.15062597393989563, 0.1425563097000122, 0.15112616121768951, 0.1644369661808014, 0.1484353095293045, 0.13976217806339264, 0.15337888896465302, 0.14418822526931763, 0.10579082369804382, 0.1658637821674347, 0.12310595065355301, 0.10974544286727905, 0.07112862169742584, 0.1094629243016243, 0.1460697054862976, 0.12781348824501038, 0.1191161498427391, 0.17566411197185516, 0.14520303905010223, 0.17649254202842712, 0.17334702610969543, 0.1710623949766159, 0.15035775303840637, 0.15698276460170746, 0.1412508189678192, 0.1312795877456665, 0.12560468912124634, 0.14663833379745483, 0.10943581163883209, 0.1346634328365326, 0.11937364935874939, 0.15087033808231354]",0
http://www.rejcanaynay.com/2017/03/thailand-on-our-way-and-where-we-stayed.html,"[I can't believe it's already been a year since my brother and I went to Thailand., We visited this country last Holy week (it's the only holiday we can leave our business) and now here we are preparing for another trip abroad in the next few weeks., And before our next big adventure come here's what happened on our first trip abroad together that happened 25th of March 2016., It is one heck of an adventure, definitely!, Ever since I met my Thai friends when I was in the US I promised that I will visit them when I got the chance., So when my brother and I decided to finally travel together I suggested Thailand so I could meet my friends that I haven't seen for years., Actually, my brother prefers being a solo traveller but on one of our daily conversations, I said that I want to go abroad as well (parents won't allow me going solo) so we thought why not travel together and visit one country a year every Holy week., Our whole trip was secretly prepared and arranged by my brother who didn't even bother to tell me how much our plane tickets and hotel cost because according to him, he doesn't like to hear me ranting about the amount of money he's going to spend., lol!, Anyway, he booked our flight early in the morning so we could make the most of our first day exploring the city., To be honest, we did not plan our whole trip in Bangkok., No itinerary or even a portable wifi to help us out., Well, we know the tourist spots, the places to visit and even the names of the temples and malls but we don't know how to get there., So when we arrived at Suvarnabhumi Airport we don't know what to do and where to go., Thankfully, their's wifi in the area and the airport staff are very informative as well as their airport signs., We could actually call our hotel for a pick-up service or take a cab but we decided to take the train instead because when we check the location of our place, it is near the train station and will only take about 5-minute walking., From the airport rail link station located on the basement level of Suvarnabhumi airport, we purchased the City Link Train ticket (blue line), went to the train and get off at Ratchaprarop station and walk our way to Baiyoke Sky Hotel., We stayed in Baiyoke Sky Hotel, the Thailand's tallest hotel with 88-storeys above Bangkok's skyline located in downtown area of Pathumwan., It is surrounded by various shopping centers like the famous Pratunam Mall which is right in front of the door of the hotel and other malls that are only within few minutes of walking., The location of the hotel might be one heck busy street with lots of tourist and local vendors around but it is surely a very easy access to train station and shopping malls which we totally love, right?!, As the tallest hotel in Thailand, Baiyoke surely has one of the best views you could see the city and as soon as you enter the lobby you can already start seeing scenic view since it is located on the 18th floor of the hotel., I didn't get the chance to explore the hotel because we are mostly out to see the city but I totally appreciate the size of our hotel room., With 2 queen size bed, a huge bathroom and a big space to put all your souvenirs and pasalubongs, it is a perfect size!, Tho my brother felt it was a little too plain because it's too large without so much in it., If you think about it, from the size of our room it can pass as an apartment., Hahaha!, Anyway, they have the best buffet breakfast ever with lots of food to choose from., Sadly, like what I said earlier, I wasn't able to fully explore the hotel due to our limited time but if you're planning to visit Thailand anytime soon I do recommend this staying here., After settling our luggage in our hotel room, refresh a little bit from the long flight and rest for an hour or two, we went straight back to the city and started our tour., Let the next blog post tell where we went and what we did in Bangkok, Thailand., The hotel room looks so nice and cozy!, It is!, And very spacious too., Hehehe!, I love how your hotel room is so quirky!, It's too bad you didn't explore more of the hotel, the interior and structure looks amazing., Anyway, taking the train is very brave of you., Given the same circumstances, I'd be too scared and I'd probably just take a cab instead., Haha., I'm looking forward to reading more of your Thailand travel posts., Yea, maybe the nextime we visit I'll make the most of it!, Our hotel I mean., Hehehe!, It's actually my brother who lead me everywhere!, Hahaha!, I also wanted to go to Thailand this year if time permits!, Ang ganda lang talaga!, I really love your shots!, Ang sarap mag travel super!, Nice post babe!, Hopefully, you could visit my blog as well!, Have a great day!, Girl, did your brother treat you?, I'm estimating the hotel room to at least be 5K per night!, Hahahaha., Sana ako din makapagstay sa ganyan., Most hotel/hostel stays ko have rooms at most half of yours., Can't wait to read about your other travels!]","{image_name: e7e97bef7dac.jpg, raw_url: https://4.bp.blogspot.com/-2PnFWNn2-sU/WNXZ0dIMj4I/AAAAAAAAA6w/rRm8jvmiiF8iJOJNoU25EAVN1arsUOTngCLcB/s1600/IMG_5296.jpg, matched_text_index: 22, matched_sim: 0.2789805829524994, }","[0.16788014769554138, 0.16807110607624054, 0.17660807073116302, 0.16046054661273956, 0.1484687775373459, 0.14830976724624634, 0.15153062343597412, 0.18192975223064423, 0.09780848771333694, 0.20111289620399475, 0.1987701952457428, 0.1798335313796997, 0.13334181904792786, 0.17085158824920654, 0.1297556757926941, 0.18738098442554474, 0.2095731496810913, 0.20506754517555237, 0.2226984053850174, 0.19385895133018494, 0.18633514642715454, 0.2393738180398941, 0.2789805829524994, 0.17799630761146545, 0.20788730680942535, 0.1342388093471527, 0.1542152464389801, 0.19843629002571106, 0.1945418268442154, 0.19845902919769287, 0.25169897079467773, 0.1351592242717743, 0.20035339891910553, 0.09662342816591263, 0.20172712206840515, 0.22258269786834717, 0.07487604022026062, 0.10534877330064774, 0.13552089035511017, 0.17826369404792786, 0.17579905688762665, 0.195256307721138, 0.09662342816591263, 0.13869988918304443, 0.1342388093471527, 0.17038212716579437, 0.12993203103542328, 0.1502522975206375, 0.17649193108081818, 0.1499987095594406, 0.1864454299211502, 0.1348588615655899, 0.10964937508106232, 0.2373974323272705, 0.12805549800395966, 0.11371669173240662, 0.21239037811756134, 0.1517273485660553]",0


In [3]:
df = df.with_columns({key: df[f"image_info.{key}"] for key in ["image_name", "raw_url", "matched_text_index", "matched_sim"]})
df = df.exclude("image_info")
df.show()

url Utf8,text_list List[Utf8],similarity_matrix List[Float64],could_have_url_duplicate Int64,image_name Utf8,raw_url Utf8,matched_text_index Int64,matched_sim Float64
https://gizmodo.com/record-breaking-galaxy-is-so-big-it-acts-like-a-magnify-1614257959,"[NASA's Hubble Space Telescope has just found the most distant lensing galaxy (which are massive enough to act as their own intergalactic microscopes) ever., And thanks to a rare alignment, it might just give us a peak at how our very own galaxy formed all those billions of years ago., When you look more than 9 billion years ago in the early universe, you don't expect to find this type of galaxy lensing at all., It's very difficult to see an alignment between two galaxies in the early universe., Imagine holding a magnifying glass close to you and then moving it much farther away., When you look through a magnifying glass held at arm's length, the chances that you will see an enlarged object are high., But if you move the magnifying glass across the room, your chances of seeing the magnifying glass nearly perfectly aligned with another object beyond it diminishes., Because we've stumbled upon this chance alignment, though, we're able to use the lensing galaxies distorting effects to determine its total mass (including dark matter) by ""gauging the intensity of its lensing effects on the background galaxy's light."", So how much does a record-breaking lensing galaxy weigh?, Over 180 billion times more than our sun.]","[0.2267199158668518, 0.18886339664459229, 0.2548035979270935, 0.2684311270713806, 0.19809527695178986, 0.15881140530109406, 0.14154167473316193, 0.24295282363891602, 0.3148163855075836, 0.19724470376968384]",0,f5f8113da82f.jpg,"https://i.kinja-img.com/gawker-media/image/upload/s--3J8F-YDp--/c_fit,f_auto,fl_progressive,q_80,w_320/836357908648846882.jpg",8,0.3148163855075836
http://visionplatforminc.com/dynamic-platform-vs-traditional-business-strategy/,"[Our dynamic platform harmonizes your leadership team around a clear vision in one day and then builds momentum long-term, ensuring that vision becomes a reality., It could be called a “Strategy Platform”., But let’s not., Our approach is so radically different from the disappointing image “strategy” can summon in the minds of business leaders, we don’t like to use the word., Unlike a traditional approach to strategy, our dynamic platform is an online, adaptive, affordable, collaborative method that guards against faulty conclusions or dust-gathering strategy documents because no one knows quite what to do with them., Don’t settle for the traditional approach of just getting people on the same page, offer them a framework for smart improvisation., Wondering if it’s actually possible?, Over a hundred organizations have found it highly beneficial., People lie once every ten minutes., What is this costing your company?, What’s replacing traditional strategic planning?, What does the empowered buyer demand?, A short history of why we need to know how we are unique as an organization., Why do so many business initiatives fail?, Do you have any questions about our business organization and management consulting methods or software?, We're happy to answer them., And what we can do for you.]","[0.13888245820999146, 0.15384167432785034, 0.1953807771205902, 0.16709916293621063, 0.18045303225517273, 0.1648285835981369, 0.16796952486038208, 0.17308980226516724, 0.19239084422588348, 0.17934882640838623, 0.19220402836799622, 0.1915411800146103, 0.14698170125484467, 0.18065980076789856, 0.17008471488952637, 0.1634514331817627, 0.18184798955917358]",0,e39fe8050561.png,http://visionplatforminc.com/wp-content/uploads/2018/03/VP-keep-calm-and-stop-telling-lies-500x383.png,2,0.1953807771205902
http://visionplatforminc.com/dynamic-platform-vs-traditional-business-strategy/,"[Our dynamic platform harmonizes your leadership team around a clear vision in one day and then builds momentum long-term, ensuring that vision becomes a reality., It could be called a “Strategy Platform”., But let’s not., Our approach is so radically different from the disappointing image “strategy” can summon in the minds of business leaders, we don’t like to use the word., Unlike a traditional approach to strategy, our dynamic platform is an online, adaptive, affordable, collaborative method that guards against faulty conclusions or dust-gathering strategy documents because no one knows quite what to do with them., Don’t settle for the traditional approach of just getting people on the same page, offer them a framework for smart improvisation., Wondering if it’s actually possible?, Over a hundred organizations have found it highly beneficial., People lie once every ten minutes., What is this costing your company?, What’s replacing traditional strategic planning?, What does the empowered buyer demand?, A short history of why we need to know how we are unique as an organization., Why do so many business initiatives fail?, Do you have any questions about our business organization and management consulting methods or software?, We're happy to answer them., And what we can do for you.]","[0.24126467108726501, 0.22723841667175293, 0.1192258670926094, 0.24206474423408508, 0.2564660608768463, 0.2161899358034134, 0.11806226521730423, 0.1627596616744995, 0.09863521158695221, 0.20635978877544403, 0.29334038496017456, 0.208466038107872, 0.21782270073890686, 0.22355303168296814, 0.16166360676288605, 0.13091100752353668, 0.14293897151947021]",0,42532abf37cf.png,http://visionplatforminc.com/wp-content/uploads/2018/06/comparison-chart-1200x676.png,10,0.2933403849601745
http://visionplatforminc.com/dynamic-platform-vs-traditional-business-strategy/,"[Our dynamic platform harmonizes your leadership team around a clear vision in one day and then builds momentum long-term, ensuring that vision becomes a reality., It could be called a “Strategy Platform”., But let’s not., Our approach is so radically different from the disappointing image “strategy” can summon in the minds of business leaders, we don’t like to use the word., Unlike a traditional approach to strategy, our dynamic platform is an online, adaptive, affordable, collaborative method that guards against faulty conclusions or dust-gathering strategy documents because no one knows quite what to do with them., Don’t settle for the traditional approach of just getting people on the same page, offer them a framework for smart improvisation., Wondering if it’s actually possible?, Over a hundred organizations have found it highly beneficial., People lie once every ten minutes., What is this costing your company?, What’s replacing traditional strategic planning?, What does the empowered buyer demand?, A short history of why we need to know how we are unique as an organization., Why do so many business initiatives fail?, Do you have any questions about our business organization and management consulting methods or software?, We're happy to answer them., And what we can do for you.]","[0.22902636229991913, 0.1869916319847107, 0.10615022480487823, 0.19604836404323578, 0.24055539071559906, 0.19517618417739868, 0.14319950342178345, 0.23251116275787354, 0.10010193288326263, 0.22138026356697083, 0.1989588886499405, 0.21306291222572327, 0.14781776070594788, 0.21081317961215973, 0.24410296976566315, 0.1561538279056549, 0.1869804859161377]",0,6e3961dc4767.jpg,http://visionplatforminc.com/wp-content/uploads/2018/11/monitor-for-Business-Strategy.jpg,14,0.2441029697656631
https://www.printworxuk.com/product/roller-banner/,"[Perfect for exhibitions, displays, or advertising in small spaces., Do You Have Your Artwork?, Please upload your artwork as a pdf print ready file., Includes Hardware, printed banner, and carry case., Available with digitally printed ‘premium’* 165 micron grey-backed polyester film banners.]","[0.28204163908958435, 0.15614458918571472, 0.174833744764328, 0.2311287373304367, 0.26510486006736755]",0,ce9f0026d021.png,https://www.printworxuk.com/wp-content/uploads/2017/06/roller-banner.png,0,0.2820416390895843
https://www.printworxuk.com/product/roller-banner/,"[Perfect for exhibitions, displays, or advertising in small spaces., Do You Have Your Artwork?, Please upload your artwork as a pdf print ready file., Includes Hardware, printed banner, and carry case., Available with digitally printed ‘premium’* 165 micron grey-backed polyester film banners.]","[0.2898246943950653, 0.16946707665920258, 0.18884825706481934, 0.21569928526878357, 0.31874263286590576]",0,a41d8c7a197f.png,https://www.printworxuk.com/wp-content/uploads/2015/07/banner-287x300.png,4,0.3187426328659057
https://xplorer.com.ng/2019/04/01/10-most-common-android-problems/,"[In this article, I have picked the most common 10 Android problems along with its simplest and most effective solutions., Battery drain is one of the most common problems in an Android smartphone., This is the price we have to pay for having the smartest devices in the world., While there are many reasons behind the battery drain issue, one of the best ways to solve this issue is to enable battery saving mode and reduce your Android smartphone brightness., To resolve the problem of battery drain, you can use DU Battery Saver & Fast Charge., Also, DU Battery Saver helps you to charge your battery fast by removing the apps running in the background of your Android device., Open your Settings menu, click on the location and enable battery saving mode., Always use low brightness in an Android smartphone., Many a times, we face problems connecting to the internet in spite of the Android smartphone being connected to the Wi-Fi., This may solve the problem., Just go to Wi-Fi > Settings > Menu > Advanced and choose to stay connected to Wi-Fi during sleep., Alternatively, you can restart your Android smartphone and Wi-Fi router or enable Airplane mode for atleast a minute., Then, again try to connect with your Android smartphone to Wi-Fi., Several users face syncing problem with Google server in an Android smartphone for many reasons., However, the solution is simple., Firstly, check if you have recently changed your account password., If yes, then you need to update it with the new password., If the problem still persists, then enable Airplane mode for 30 seconds and try again., Alternatively, you can do remove your Google account and add it again., By default, keyboard of the Android smartphone gets hung., If you are facing problems such as Keyboard taking too long to respond, or stopped responding, then it is suggested that you download Google Keyboard, as it is one of the most popular Google keyboard apps., You can make it as your default Keyboard app and can improve your typing experience too., Your smartphone screen automatically turned off when you plug your smartphone with charger., To recover this, you need to go to Settings / Applications / Development and tick the ‘Stay awake’ option to keep the screen on when charging., If you use laptop or Wi-Fi network, you can try Airdroid app for wirelessly transferring data in the Android device., Airdroid is one of the best apps for transfer data in PC to mobile and mobile to PC., Sometimes, the USB or USB port in PC doesn’t support each other, which is why you face these type of problem., Google Play Store is the main hub for downloading apps in Android devices., For instance, if you are getting error like not downloading an app from the Google Play Store, then your smartphone may have a serious problem., When you are unable to download Android apps, you can’t do anything in android mobile., Just follow some simple methods to sort out these errors., • You need to remove and add your Google Account., Removing it and adding your account can solve your problems within minutes., • You can clear the data and Cache of Google Play Store and Services., This will fix the problems., The main reason for such an error in an Android smartphone could be due to the device not supporting that particular game., Always ensure that the game you want to download is compatible with your Android OS versions., Another reason for the games not running is because of insufficient RAM space in Android smartphone., Use Clean Master for boosting games in Android mobile., If you are game lover then you can go through our list and download Best Free Car Racing Games for Android Mobile., Android provides limited storage for apps and we don’t have the authority to expand it., So, if you are running Android that frequently shows up insufficient space error, then there is no fix., You just need to install the app CCleaner in your Android device that will help you free up some storage.]","[0.16432377696037292, 0.13561537861824036, 0.1343795508146286, 0.16658496856689453, 0.17377106845378876, 0.14610256254673004, 0.15115123987197876, 0.10028307884931564, 0.17679670453071594, 0.14612779021263123, 0.14604255557060242, 0.15062597393989563, 0.1425563097000122, 0.15112616121768951, 0.1644369661808014, 0.1484353095293045, 0.13976217806339264, 0.15337888896465302, 0.14418822526931763, 0.10579082369804382, 0.1658637821674347, 0.12310595065355301, 0.10974544286727905, 0.07112862169742584, 0.1094629243016243, 0.1460697054862976, 0.12781348824501038, 0.1191161498427391, 0.17566411197185516, 0.14520303905010223, 0.17649254202842712, 0.17334702610969543, 0.1710623949766159, 0.15035775303840637, 0.15698276460170746, 0.1412508189678192, 0.1312795877456665, 0.12560468912124634, 0.14663833379745483, 0.10943581163883209, 0.1346634328365326, 0.11937364935874939, 0.15087033808231354]",0,2b05a3b4c88e.jpg,https://xplorernet.files.wordpress.com/2019/03/img_20190302_181629_438.jpg,8,0.1767967045307159
http://www.rejcanaynay.com/2017/03/thailand-on-our-way-and-where-we-stayed.html,"[I can't believe it's already been a year since my brother and I went to Thailand., We visited this country last Holy week (it's the only holiday we can leave our business) and now here we are preparing for another trip abroad in the next few weeks., And before our next big adventure come here's what happened on our first trip abroad together that happened 25th of March 2016., It is one heck of an adventure, definitely!, Ever since I met my Thai friends when I was in the US I promised that I will visit them when I got the chance., So when my brother and I decided to finally travel together I suggested Thailand so I could meet my friends that I haven't seen for years., Actually, my brother prefers being a solo traveller but on one of our daily conversations, I said that I want to go abroad as well (parents won't allow me going solo) so we thought why not travel together and visit one country a year every Holy week., Our whole trip was secretly prepared and arranged by my brother who didn't even bother to tell me how much our plane tickets and hotel cost because according to him, he doesn't like to hear me ranting about the amount of money he's going to spend., lol!, Anyway, he booked our flight early in the morning so we could make the most of our first day exploring the city., To be honest, we did not plan our whole trip in Bangkok., No itinerary or even a portable wifi to help us out., Well, we know the tourist spots, the places to visit and even the names of the temples and malls but we don't know how to get there., So when we arrived at Suvarnabhumi Airport we don't know what to do and where to go., Thankfully, their's wifi in the area and the airport staff are very informative as well as their airport signs., We could actually call our hotel for a pick-up service or take a cab but we decided to take the train instead because when we check the location of our place, it is near the train station and will only take about 5-minute walking., From the airport rail link station located on the basement level of Suvarnabhumi airport, we purchased the City Link Train ticket (blue line), went to the train and get off at Ratchaprarop station and walk our way to Baiyoke Sky Hotel., We stayed in Baiyoke Sky Hotel, the Thailand's tallest hotel with 88-storeys above Bangkok's skyline located in downtown area of Pathumwan., It is surrounded by various shopping centers like the famous Pratunam Mall which is right in front of the door of the hotel and other malls that are only within few minutes of walking., The location of the hotel might be one heck busy street with lots of tourist and local vendors around but it is surely a very easy access to train station and shopping malls which we totally love, right?!, As the tallest hotel in Thailand, Baiyoke surely has one of the best views you could see the city and as soon as you enter the lobby you can already start seeing scenic view since it is located on the 18th floor of the hotel., I didn't get the chance to explore the hotel because we are mostly out to see the city but I totally appreciate the size of our hotel room., With 2 queen size bed, a huge bathroom and a big space to put all your souvenirs and pasalubongs, it is a perfect size!, Tho my brother felt it was a little too plain because it's too large without so much in it., If you think about it, from the size of our room it can pass as an apartment., Hahaha!, Anyway, they have the best buffet breakfast ever with lots of food to choose from., Sadly, like what I said earlier, I wasn't able to fully explore the hotel due to our limited time but if you're planning to visit Thailand anytime soon I do recommend this staying here., After settling our luggage in our hotel room, refresh a little bit from the long flight and rest for an hour or two, we went straight back to the city and started our tour., Let the next blog post tell where we went and what we did in Bangkok, Thailand., The hotel room looks so nice and cozy!, It is!, And very spacious too., Hehehe!, I love how your hotel room is so quirky!, It's too bad you didn't explore more of the hotel, the interior and structure looks amazing., Anyway, taking the train is very brave of you., Given the same circumstances, I'd be too scared and I'd probably just take a cab instead., Haha., I'm looking forward to reading more of your Thailand travel posts., Yea, maybe the nextime we visit I'll make the most of it!, Our hotel I mean., Hehehe!, It's actually my brother who lead me everywhere!, Hahaha!, I also wanted to go to Thailand this year if time permits!, Ang ganda lang talaga!, I really love your shots!, Ang sarap mag travel super!, Nice post babe!, Hopefully, you could visit my blog as well!, Have a great day!, Girl, did your brother treat you?, I'm estimating the hotel room to at least be 5K per night!, Hahahaha., Sana ako din makapagstay sa ganyan., Most hotel/hostel stays ko have rooms at most half of yours., Can't wait to read about your other travels!]","[0.16788014769554138, 0.16807110607624054, 0.17660807073116302, 0.16046054661273956, 0.1484687775373459, 0.14830976724624634, 0.15153062343597412, 0.18192975223064423, 0.09780848771333694, 0.20111289620399475, 0.1987701952457428, 0.1798335313796997, 0.13334181904792786, 0.17085158824920654, 0.1297556757926941, 0.18738098442554474, 0.2095731496810913, 0.20506754517555237, 0.2226984053850174, 0.19385895133018494, 0.18633514642715454, 0.2393738180398941, 0.2789805829524994, 0.17799630761146545, 0.20788730680942535, 0.1342388093471527, 0.1542152464389801, 0.19843629002571106, 0.1945418268442154, 0.19845902919769287, 0.25169897079467773, 0.1351592242717743, 0.20035339891910553, 0.09662342816591263, 0.20172712206840515, 0.22258269786834717, 0.07487604022026062, 0.10534877330064774, 0.13552089035511017, 0.17826369404792786, 0.17579905688762665, 0.195256307721138, 0.09662342816591263, 0.13869988918304443, 0.1342388093471527, 0.17038212716579437, 0.12993203103542328, 0.1502522975206375, 0.17649193108081818, 0.1499987095594406, 0.1864454299211502, 0.1348588615655899, 0.10964937508106232, 0.2373974323272705, 0.12805549800395966, 0.11371669173240662, 0.21239037811756134, 0.1517273485660553]",0,e7e97bef7dac.jpg,https://4.bp.blogspot.com/-2PnFWNn2-sU/WNXZ0dIMj4I/AAAAAAAAA6w/rRm8jvmiiF8iJOJNoU25EAVN1arsUOTngCLcB/s1600/IMG_5296.jpg,22,0.2789805829524994


## Downloading images

Let's try downloading some of these images to display and see what we're working with!

In [4]:
df.with_column("image", df["raw_url"].url.download().image.decode()).show()

ScanWithTask-MapPartition-LocalLimit [Stage:8]:   0%|          | 0/1 [00:00<?, ?it/s]

FileNotFoundError: File: http://visionplatforminc.com/wp-content/uploads/2018/11/monitor-for-Business-Strategy.jpg not found
HTTP status client error (404 Not Found) for url (http://visionplatforminc.com/wp-content/uploads/2018/11/monitor-for-Business-Strategy.jpg)

**Error!!!!**

Oops, what happened here?

Turns out, most of these datasets are very messy and some of these images don't even exist anymore on those servers. That's why you see that we get a 404 error when running the URL download operation.

Even more egregiously, sometimes when we download data from the URL, the data itself might not be a valid image!

We can get around this by passing `on_error="null"` into Daft, which causes Daft to log the error but proceed with execution and return a `null` entry for URLs that cannot be parsed into images.

In [5]:
df = df.with_column("image", df["raw_url"].url.download(on_error="null").image.decode(on_error="null"))

df.show()

url Utf8,text_list List[Utf8],similarity_matrix List[Float64],could_have_url_duplicate Int64,image_name Utf8,raw_url Utf8,matched_text_index Int64,matched_sim Float64,image Image[MIXED]
https://gizmodo.com/record-breaking-galaxy-is-so-big-it-acts-like-a-magnify-1614257959,"[NASA's Hubble Space Telescope has just found the most distant lensing galaxy (which are massive enough to act as their own intergalactic microscopes) ever., And thanks to a rare alignment, it might just give us a peak at how our very own galaxy formed all those billions of years ago., When you look more than 9 billion years ago in the early universe, you don't expect to find this type of galaxy lensing at all., It's very difficult to see an alignment between two galaxies in the early universe., Imagine holding a magnifying glass close to you and then moving it much farther away., When you look through a magnifying glass held at arm's length, the chances that you will see an enlarged object are high., But if you move the magnifying glass across the room, your chances of seeing the magnifying glass nearly perfectly aligned with another object beyond it diminishes., Because we've stumbled upon this chance alignment, though, we're able to use the lensing galaxies distorting effects to determine its total mass (including dark matter) by ""gauging the intensity of its lensing effects on the background galaxy's light."", So how much does a record-breaking lensing galaxy weigh?, Over 180 billion times more than our sun.]","[0.2267199158668518, 0.18886339664459229, 0.2548035979270935, 0.2684311270713806, 0.19809527695178986, 0.15881140530109406, 0.14154167473316193, 0.24295282363891602, 0.3148163855075836, 0.19724470376968384]",0,f5f8113da82f.jpg,"https://i.kinja-img.com/gawker-media/image/upload/s--3J8F-YDp--/c_fit,f_auto,fl_progressive,q_80,w_320/836357908648846882.jpg",8,0.3148163855075836,
http://visionplatforminc.com/dynamic-platform-vs-traditional-business-strategy/,"[Our dynamic platform harmonizes your leadership team around a clear vision in one day and then builds momentum long-term, ensuring that vision becomes a reality., It could be called a “Strategy Platform”., But let’s not., Our approach is so radically different from the disappointing image “strategy” can summon in the minds of business leaders, we don’t like to use the word., Unlike a traditional approach to strategy, our dynamic platform is an online, adaptive, affordable, collaborative method that guards against faulty conclusions or dust-gathering strategy documents because no one knows quite what to do with them., Don’t settle for the traditional approach of just getting people on the same page, offer them a framework for smart improvisation., Wondering if it’s actually possible?, Over a hundred organizations have found it highly beneficial., People lie once every ten minutes., What is this costing your company?, What’s replacing traditional strategic planning?, What does the empowered buyer demand?, A short history of why we need to know how we are unique as an organization., Why do so many business initiatives fail?, Do you have any questions about our business organization and management consulting methods or software?, We're happy to answer them., And what we can do for you.]","[0.13888245820999146, 0.15384167432785034, 0.1953807771205902, 0.16709916293621063, 0.18045303225517273, 0.1648285835981369, 0.16796952486038208, 0.17308980226516724, 0.19239084422588348, 0.17934882640838623, 0.19220402836799622, 0.1915411800146103, 0.14698170125484467, 0.18065980076789856, 0.17008471488952637, 0.1634514331817627, 0.18184798955917358]",0,e39fe8050561.png,http://visionplatforminc.com/wp-content/uploads/2018/03/VP-keep-calm-and-stop-telling-lies-500x383.png,2,0.1953807771205902,
http://visionplatforminc.com/dynamic-platform-vs-traditional-business-strategy/,"[Our dynamic platform harmonizes your leadership team around a clear vision in one day and then builds momentum long-term, ensuring that vision becomes a reality., It could be called a “Strategy Platform”., But let’s not., Our approach is so radically different from the disappointing image “strategy” can summon in the minds of business leaders, we don’t like to use the word., Unlike a traditional approach to strategy, our dynamic platform is an online, adaptive, affordable, collaborative method that guards against faulty conclusions or dust-gathering strategy documents because no one knows quite what to do with them., Don’t settle for the traditional approach of just getting people on the same page, offer them a framework for smart improvisation., Wondering if it’s actually possible?, Over a hundred organizations have found it highly beneficial., People lie once every ten minutes., What is this costing your company?, What’s replacing traditional strategic planning?, What does the empowered buyer demand?, A short history of why we need to know how we are unique as an organization., Why do so many business initiatives fail?, Do you have any questions about our business organization and management consulting methods or software?, We're happy to answer them., And what we can do for you.]","[0.24126467108726501, 0.22723841667175293, 0.1192258670926094, 0.24206474423408508, 0.2564660608768463, 0.2161899358034134, 0.11806226521730423, 0.1627596616744995, 0.09863521158695221, 0.20635978877544403, 0.29334038496017456, 0.208466038107872, 0.21782270073890686, 0.22355303168296814, 0.16166360676288605, 0.13091100752353668, 0.14293897151947021]",0,42532abf37cf.png,http://visionplatforminc.com/wp-content/uploads/2018/06/comparison-chart-1200x676.png,10,0.2933403849601745,
http://visionplatforminc.com/dynamic-platform-vs-traditional-business-strategy/,"[Our dynamic platform harmonizes your leadership team around a clear vision in one day and then builds momentum long-term, ensuring that vision becomes a reality., It could be called a “Strategy Platform”., But let’s not., Our approach is so radically different from the disappointing image “strategy” can summon in the minds of business leaders, we don’t like to use the word., Unlike a traditional approach to strategy, our dynamic platform is an online, adaptive, affordable, collaborative method that guards against faulty conclusions or dust-gathering strategy documents because no one knows quite what to do with them., Don’t settle for the traditional approach of just getting people on the same page, offer them a framework for smart improvisation., Wondering if it’s actually possible?, Over a hundred organizations have found it highly beneficial., People lie once every ten minutes., What is this costing your company?, What’s replacing traditional strategic planning?, What does the empowered buyer demand?, A short history of why we need to know how we are unique as an organization., Why do so many business initiatives fail?, Do you have any questions about our business organization and management consulting methods or software?, We're happy to answer them., And what we can do for you.]","[0.22902636229991913, 0.1869916319847107, 0.10615022480487823, 0.19604836404323578, 0.24055539071559906, 0.19517618417739868, 0.14319950342178345, 0.23251116275787354, 0.10010193288326263, 0.22138026356697083, 0.1989588886499405, 0.21306291222572327, 0.14781776070594788, 0.21081317961215973, 0.24410296976566315, 0.1561538279056549, 0.1869804859161377]",0,6e3961dc4767.jpg,http://visionplatforminc.com/wp-content/uploads/2018/11/monitor-for-Business-Strategy.jpg,14,0.2441029697656631,
https://www.printworxuk.com/product/roller-banner/,"[Perfect for exhibitions, displays, or advertising in small spaces., Do You Have Your Artwork?, Please upload your artwork as a pdf print ready file., Includes Hardware, printed banner, and carry case., Available with digitally printed ‘premium’* 165 micron grey-backed polyester film banners.]","[0.28204163908958435, 0.15614458918571472, 0.174833744764328, 0.2311287373304367, 0.26510486006736755]",0,ce9f0026d021.png,https://www.printworxuk.com/wp-content/uploads/2017/06/roller-banner.png,0,0.2820416390895843,
https://www.printworxuk.com/product/roller-banner/,"[Perfect for exhibitions, displays, or advertising in small spaces., Do You Have Your Artwork?, Please upload your artwork as a pdf print ready file., Includes Hardware, printed banner, and carry case., Available with digitally printed ‘premium’* 165 micron grey-backed polyester film banners.]","[0.2898246943950653, 0.16946707665920258, 0.18884825706481934, 0.21569928526878357, 0.31874263286590576]",0,a41d8c7a197f.png,https://www.printworxuk.com/wp-content/uploads/2015/07/banner-287x300.png,4,0.3187426328659057,
https://xplorer.com.ng/2019/04/01/10-most-common-android-problems/,"[In this article, I have picked the most common 10 Android problems along with its simplest and most effective solutions., Battery drain is one of the most common problems in an Android smartphone., This is the price we have to pay for having the smartest devices in the world., While there are many reasons behind the battery drain issue, one of the best ways to solve this issue is to enable battery saving mode and reduce your Android smartphone brightness., To resolve the problem of battery drain, you can use DU Battery Saver & Fast Charge., Also, DU Battery Saver helps you to charge your battery fast by removing the apps running in the background of your Android device., Open your Settings menu, click on the location and enable battery saving mode., Always use low brightness in an Android smartphone., Many a times, we face problems connecting to the internet in spite of the Android smartphone being connected to the Wi-Fi., This may solve the problem., Just go to Wi-Fi > Settings > Menu > Advanced and choose to stay connected to Wi-Fi during sleep., Alternatively, you can restart your Android smartphone and Wi-Fi router or enable Airplane mode for atleast a minute., Then, again try to connect with your Android smartphone to Wi-Fi., Several users face syncing problem with Google server in an Android smartphone for many reasons., However, the solution is simple., Firstly, check if you have recently changed your account password., If yes, then you need to update it with the new password., If the problem still persists, then enable Airplane mode for 30 seconds and try again., Alternatively, you can do remove your Google account and add it again., By default, keyboard of the Android smartphone gets hung., If you are facing problems such as Keyboard taking too long to respond, or stopped responding, then it is suggested that you download Google Keyboard, as it is one of the most popular Google keyboard apps., You can make it as your default Keyboard app and can improve your typing experience too., Your smartphone screen automatically turned off when you plug your smartphone with charger., To recover this, you need to go to Settings / Applications / Development and tick the ‘Stay awake’ option to keep the screen on when charging., If you use laptop or Wi-Fi network, you can try Airdroid app for wirelessly transferring data in the Android device., Airdroid is one of the best apps for transfer data in PC to mobile and mobile to PC., Sometimes, the USB or USB port in PC doesn’t support each other, which is why you face these type of problem., Google Play Store is the main hub for downloading apps in Android devices., For instance, if you are getting error like not downloading an app from the Google Play Store, then your smartphone may have a serious problem., When you are unable to download Android apps, you can’t do anything in android mobile., Just follow some simple methods to sort out these errors., • You need to remove and add your Google Account., Removing it and adding your account can solve your problems within minutes., • You can clear the data and Cache of Google Play Store and Services., This will fix the problems., The main reason for such an error in an Android smartphone could be due to the device not supporting that particular game., Always ensure that the game you want to download is compatible with your Android OS versions., Another reason for the games not running is because of insufficient RAM space in Android smartphone., Use Clean Master for boosting games in Android mobile., If you are game lover then you can go through our list and download Best Free Car Racing Games for Android Mobile., Android provides limited storage for apps and we don’t have the authority to expand it., So, if you are running Android that frequently shows up insufficient space error, then there is no fix., You just need to install the app CCleaner in your Android device that will help you free up some storage.]","[0.16432377696037292, 0.13561537861824036, 0.1343795508146286, 0.16658496856689453, 0.17377106845378876, 0.14610256254673004, 0.15115123987197876, 0.10028307884931564, 0.17679670453071594, 0.14612779021263123, 0.14604255557060242, 0.15062597393989563, 0.1425563097000122, 0.15112616121768951, 0.1644369661808014, 0.1484353095293045, 0.13976217806339264, 0.15337888896465302, 0.14418822526931763, 0.10579082369804382, 0.1658637821674347, 0.12310595065355301, 0.10974544286727905, 0.07112862169742584, 0.1094629243016243, 0.1460697054862976, 0.12781348824501038, 0.1191161498427391, 0.17566411197185516, 0.14520303905010223, 0.17649254202842712, 0.17334702610969543, 0.1710623949766159, 0.15035775303840637, 0.15698276460170746, 0.1412508189678192, 0.1312795877456665, 0.12560468912124634, 0.14663833379745483, 0.10943581163883209, 0.1346634328365326, 0.11937364935874939, 0.15087033808231354]",0,2b05a3b4c88e.jpg,https://xplorernet.files.wordpress.com/2019/03/img_20190302_181629_438.jpg,8,0.1767967045307159,
http://www.rejcanaynay.com/2017/03/thailand-on-our-way-and-where-we-stayed.html,"[I can't believe it's already been a year since my brother and I went to Thailand., We visited this country last Holy week (it's the only holiday we can leave our business) and now here we are preparing for another trip abroad in the next few weeks., And before our next big adventure come here's what happened on our first trip abroad together that happened 25th of March 2016., It is one heck of an adventure, definitely!, Ever since I met my Thai friends when I was in the US I promised that I will visit them when I got the chance., So when my brother and I decided to finally travel together I suggested Thailand so I could meet my friends that I haven't seen for years., Actually, my brother prefers being a solo traveller but on one of our daily conversations, I said that I want to go abroad as well (parents won't allow me going solo) so we thought why not travel together and visit one country a year every Holy week., Our whole trip was secretly prepared and arranged by my brother who didn't even bother to tell me how much our plane tickets and hotel cost because according to him, he doesn't like to hear me ranting about the amount of money he's going to spend., lol!, Anyway, he booked our flight early in the morning so we could make the most of our first day exploring the city., To be honest, we did not plan our whole trip in Bangkok., No itinerary or even a portable wifi to help us out., Well, we know the tourist spots, the places to visit and even the names of the temples and malls but we don't know how to get there., So when we arrived at Suvarnabhumi Airport we don't know what to do and where to go., Thankfully, their's wifi in the area and the airport staff are very informative as well as their airport signs., We could actually call our hotel for a pick-up service or take a cab but we decided to take the train instead because when we check the location of our place, it is near the train station and will only take about 5-minute walking., From the airport rail link station located on the basement level of Suvarnabhumi airport, we purchased the City Link Train ticket (blue line), went to the train and get off at Ratchaprarop station and walk our way to Baiyoke Sky Hotel., We stayed in Baiyoke Sky Hotel, the Thailand's tallest hotel with 88-storeys above Bangkok's skyline located in downtown area of Pathumwan., It is surrounded by various shopping centers like the famous Pratunam Mall which is right in front of the door of the hotel and other malls that are only within few minutes of walking., The location of the hotel might be one heck busy street with lots of tourist and local vendors around but it is surely a very easy access to train station and shopping malls which we totally love, right?!, As the tallest hotel in Thailand, Baiyoke surely has one of the best views you could see the city and as soon as you enter the lobby you can already start seeing scenic view since it is located on the 18th floor of the hotel., I didn't get the chance to explore the hotel because we are mostly out to see the city but I totally appreciate the size of our hotel room., With 2 queen size bed, a huge bathroom and a big space to put all your souvenirs and pasalubongs, it is a perfect size!, Tho my brother felt it was a little too plain because it's too large without so much in it., If you think about it, from the size of our room it can pass as an apartment., Hahaha!, Anyway, they have the best buffet breakfast ever with lots of food to choose from., Sadly, like what I said earlier, I wasn't able to fully explore the hotel due to our limited time but if you're planning to visit Thailand anytime soon I do recommend this staying here., After settling our luggage in our hotel room, refresh a little bit from the long flight and rest for an hour or two, we went straight back to the city and started our tour., Let the next blog post tell where we went and what we did in Bangkok, Thailand., The hotel room looks so nice and cozy!, It is!, And very spacious too., Hehehe!, I love how your hotel room is so quirky!, It's too bad you didn't explore more of the hotel, the interior and structure looks amazing., Anyway, taking the train is very brave of you., Given the same circumstances, I'd be too scared and I'd probably just take a cab instead., Haha., I'm looking forward to reading more of your Thailand travel posts., Yea, maybe the nextime we visit I'll make the most of it!, Our hotel I mean., Hehehe!, It's actually my brother who lead me everywhere!, Hahaha!, I also wanted to go to Thailand this year if time permits!, Ang ganda lang talaga!, I really love your shots!, Ang sarap mag travel super!, Nice post babe!, Hopefully, you could visit my blog as well!, Have a great day!, Girl, did your brother treat you?, I'm estimating the hotel room to at least be 5K per night!, Hahahaha., Sana ako din makapagstay sa ganyan., Most hotel/hostel stays ko have rooms at most half of yours., Can't wait to read about your other travels!]","[0.16788014769554138, 0.16807110607624054, 0.17660807073116302, 0.16046054661273956, 0.1484687775373459, 0.14830976724624634, 0.15153062343597412, 0.18192975223064423, 0.09780848771333694, 0.20111289620399475, 0.1987701952457428, 0.1798335313796997, 0.13334181904792786, 0.17085158824920654, 0.1297556757926941, 0.18738098442554474, 0.2095731496810913, 0.20506754517555237, 0.2226984053850174, 0.19385895133018494, 0.18633514642715454, 0.2393738180398941, 0.2789805829524994, 0.17799630761146545, 0.20788730680942535, 0.1342388093471527, 0.1542152464389801, 0.19843629002571106, 0.1945418268442154, 0.19845902919769287, 0.25169897079467773, 0.1351592242717743, 0.20035339891910553, 0.09662342816591263, 0.20172712206840515, 0.22258269786834717, 0.07487604022026062, 0.10534877330064774, 0.13552089035511017, 0.17826369404792786, 0.17579905688762665, 0.195256307721138, 0.09662342816591263, 0.13869988918304443, 0.1342388093471527, 0.17038212716579437, 0.12993203103542328, 0.1502522975206375, 0.17649193108081818, 0.1499987095594406, 0.1864454299211502, 0.1348588615655899, 0.10964937508106232, 0.2373974323272705, 0.12805549800395966, 0.11371669173240662, 0.21239037811756134, 0.1517273485660553]",0,e7e97bef7dac.jpg,https://4.bp.blogspot.com/-2PnFWNn2-sU/WNXZ0dIMj4I/AAAAAAAAA6w/rRm8jvmiiF8iJOJNoU25EAVN1arsUOTngCLcB/s1600/IMG_5296.jpg,22,0.2789805829524994,


## Data processing/metadata extraction

In [6]:
# Add a column to figure out if the image is invalid
df = df.with_column("image_invalid", df["image"].is_null())

# Extract a thumbnail of the image
df = df.with_column("thumbnail", df["image"].image.resize(64, 64))

# Extract shape of image
# df = df.with_column("image_width", df["image"].image.width())
# df = df.with_column("image_height", df["image"].image.height())
# df = df.with_column("image_channels", df["image"].image.channels())
# df = df.with_column("image_mode", df["image"].image.mode())

# Normalize the image into a hardcoded mode so that everything is standardized
# df = df.with_column("image", df["image"].image.to_mode("RGB"))

df.show()

url Utf8,text_list List[Utf8],similarity_matrix List[Float64],could_have_url_duplicate Int64,image_name Utf8,raw_url Utf8,matched_text_index Int64,matched_sim Float64,image Image[MIXED],image_invalid Boolean,thumbnail Image[MIXED]
https://gizmodo.com/record-breaking-galaxy-is-so-big-it-acts-like-a-magnify-1614257959,"[NASA's Hubble Space Telescope has just found the most distant lensing galaxy (which are massive enough to act as their own intergalactic microscopes) ever., And thanks to a rare alignment, it might just give us a peak at how our very own galaxy formed all those billions of years ago., When you look more than 9 billion years ago in the early universe, you don't expect to find this type of galaxy lensing at all., It's very difficult to see an alignment between two galaxies in the early universe., Imagine holding a magnifying glass close to you and then moving it much farther away., When you look through a magnifying glass held at arm's length, the chances that you will see an enlarged object are high., But if you move the magnifying glass across the room, your chances of seeing the magnifying glass nearly perfectly aligned with another object beyond it diminishes., Because we've stumbled upon this chance alignment, though, we're able to use the lensing galaxies distorting effects to determine its total mass (including dark matter) by ""gauging the intensity of its lensing effects on the background galaxy's light."", So how much does a record-breaking lensing galaxy weigh?, Over 180 billion times more than our sun.]","[0.2267199158668518, 0.18886339664459229, 0.2548035979270935, 0.2684311270713806, 0.19809527695178986, 0.15881140530109406, 0.14154167473316193, 0.24295282363891602, 0.3148163855075836, 0.19724470376968384]",0,f5f8113da82f.jpg,"https://i.kinja-img.com/gawker-media/image/upload/s--3J8F-YDp--/c_fit,f_auto,fl_progressive,q_80,w_320/836357908648846882.jpg",8,0.3148163855075836,,False,
http://visionplatforminc.com/dynamic-platform-vs-traditional-business-strategy/,"[Our dynamic platform harmonizes your leadership team around a clear vision in one day and then builds momentum long-term, ensuring that vision becomes a reality., It could be called a “Strategy Platform”., But let’s not., Our approach is so radically different from the disappointing image “strategy” can summon in the minds of business leaders, we don’t like to use the word., Unlike a traditional approach to strategy, our dynamic platform is an online, adaptive, affordable, collaborative method that guards against faulty conclusions or dust-gathering strategy documents because no one knows quite what to do with them., Don’t settle for the traditional approach of just getting people on the same page, offer them a framework for smart improvisation., Wondering if it’s actually possible?, Over a hundred organizations have found it highly beneficial., People lie once every ten minutes., What is this costing your company?, What’s replacing traditional strategic planning?, What does the empowered buyer demand?, A short history of why we need to know how we are unique as an organization., Why do so many business initiatives fail?, Do you have any questions about our business organization and management consulting methods or software?, We're happy to answer them., And what we can do for you.]","[0.13888245820999146, 0.15384167432785034, 0.1953807771205902, 0.16709916293621063, 0.18045303225517273, 0.1648285835981369, 0.16796952486038208, 0.17308980226516724, 0.19239084422588348, 0.17934882640838623, 0.19220402836799622, 0.1915411800146103, 0.14698170125484467, 0.18065980076789856, 0.17008471488952637, 0.1634514331817627, 0.18184798955917358]",0,e39fe8050561.png,http://visionplatforminc.com/wp-content/uploads/2018/03/VP-keep-calm-and-stop-telling-lies-500x383.png,2,0.1953807771205902,,True,
http://visionplatforminc.com/dynamic-platform-vs-traditional-business-strategy/,"[Our dynamic platform harmonizes your leadership team around a clear vision in one day and then builds momentum long-term, ensuring that vision becomes a reality., It could be called a “Strategy Platform”., But let’s not., Our approach is so radically different from the disappointing image “strategy” can summon in the minds of business leaders, we don’t like to use the word., Unlike a traditional approach to strategy, our dynamic platform is an online, adaptive, affordable, collaborative method that guards against faulty conclusions or dust-gathering strategy documents because no one knows quite what to do with them., Don’t settle for the traditional approach of just getting people on the same page, offer them a framework for smart improvisation., Wondering if it’s actually possible?, Over a hundred organizations have found it highly beneficial., People lie once every ten minutes., What is this costing your company?, What’s replacing traditional strategic planning?, What does the empowered buyer demand?, A short history of why we need to know how we are unique as an organization., Why do so many business initiatives fail?, Do you have any questions about our business organization and management consulting methods or software?, We're happy to answer them., And what we can do for you.]","[0.24126467108726501, 0.22723841667175293, 0.1192258670926094, 0.24206474423408508, 0.2564660608768463, 0.2161899358034134, 0.11806226521730423, 0.1627596616744995, 0.09863521158695221, 0.20635978877544403, 0.29334038496017456, 0.208466038107872, 0.21782270073890686, 0.22355303168296814, 0.16166360676288605, 0.13091100752353668, 0.14293897151947021]",0,42532abf37cf.png,http://visionplatforminc.com/wp-content/uploads/2018/06/comparison-chart-1200x676.png,10,0.2933403849601745,,True,
http://visionplatforminc.com/dynamic-platform-vs-traditional-business-strategy/,"[Our dynamic platform harmonizes your leadership team around a clear vision in one day and then builds momentum long-term, ensuring that vision becomes a reality., It could be called a “Strategy Platform”., But let’s not., Our approach is so radically different from the disappointing image “strategy” can summon in the minds of business leaders, we don’t like to use the word., Unlike a traditional approach to strategy, our dynamic platform is an online, adaptive, affordable, collaborative method that guards against faulty conclusions or dust-gathering strategy documents because no one knows quite what to do with them., Don’t settle for the traditional approach of just getting people on the same page, offer them a framework for smart improvisation., Wondering if it’s actually possible?, Over a hundred organizations have found it highly beneficial., People lie once every ten minutes., What is this costing your company?, What’s replacing traditional strategic planning?, What does the empowered buyer demand?, A short history of why we need to know how we are unique as an organization., Why do so many business initiatives fail?, Do you have any questions about our business organization and management consulting methods or software?, We're happy to answer them., And what we can do for you.]","[0.22902636229991913, 0.1869916319847107, 0.10615022480487823, 0.19604836404323578, 0.24055539071559906, 0.19517618417739868, 0.14319950342178345, 0.23251116275787354, 0.10010193288326263, 0.22138026356697083, 0.1989588886499405, 0.21306291222572327, 0.14781776070594788, 0.21081317961215973, 0.24410296976566315, 0.1561538279056549, 0.1869804859161377]",0,6e3961dc4767.jpg,http://visionplatforminc.com/wp-content/uploads/2018/11/monitor-for-Business-Strategy.jpg,14,0.2441029697656631,,True,
https://www.printworxuk.com/product/roller-banner/,"[Perfect for exhibitions, displays, or advertising in small spaces., Do You Have Your Artwork?, Please upload your artwork as a pdf print ready file., Includes Hardware, printed banner, and carry case., Available with digitally printed ‘premium’* 165 micron grey-backed polyester film banners.]","[0.28204163908958435, 0.15614458918571472, 0.174833744764328, 0.2311287373304367, 0.26510486006736755]",0,ce9f0026d021.png,https://www.printworxuk.com/wp-content/uploads/2017/06/roller-banner.png,0,0.2820416390895843,,True,
https://www.printworxuk.com/product/roller-banner/,"[Perfect for exhibitions, displays, or advertising in small spaces., Do You Have Your Artwork?, Please upload your artwork as a pdf print ready file., Includes Hardware, printed banner, and carry case., Available with digitally printed ‘premium’* 165 micron grey-backed polyester film banners.]","[0.2898246943950653, 0.16946707665920258, 0.18884825706481934, 0.21569928526878357, 0.31874263286590576]",0,a41d8c7a197f.png,https://www.printworxuk.com/wp-content/uploads/2015/07/banner-287x300.png,4,0.3187426328659057,,True,
https://xplorer.com.ng/2019/04/01/10-most-common-android-problems/,"[In this article, I have picked the most common 10 Android problems along with its simplest and most effective solutions., Battery drain is one of the most common problems in an Android smartphone., This is the price we have to pay for having the smartest devices in the world., While there are many reasons behind the battery drain issue, one of the best ways to solve this issue is to enable battery saving mode and reduce your Android smartphone brightness., To resolve the problem of battery drain, you can use DU Battery Saver & Fast Charge., Also, DU Battery Saver helps you to charge your battery fast by removing the apps running in the background of your Android device., Open your Settings menu, click on the location and enable battery saving mode., Always use low brightness in an Android smartphone., Many a times, we face problems connecting to the internet in spite of the Android smartphone being connected to the Wi-Fi., This may solve the problem., Just go to Wi-Fi > Settings > Menu > Advanced and choose to stay connected to Wi-Fi during sleep., Alternatively, you can restart your Android smartphone and Wi-Fi router or enable Airplane mode for atleast a minute., Then, again try to connect with your Android smartphone to Wi-Fi., Several users face syncing problem with Google server in an Android smartphone for many reasons., However, the solution is simple., Firstly, check if you have recently changed your account password., If yes, then you need to update it with the new password., If the problem still persists, then enable Airplane mode for 30 seconds and try again., Alternatively, you can do remove your Google account and add it again., By default, keyboard of the Android smartphone gets hung., If you are facing problems such as Keyboard taking too long to respond, or stopped responding, then it is suggested that you download Google Keyboard, as it is one of the most popular Google keyboard apps., You can make it as your default Keyboard app and can improve your typing experience too., Your smartphone screen automatically turned off when you plug your smartphone with charger., To recover this, you need to go to Settings / Applications / Development and tick the ‘Stay awake’ option to keep the screen on when charging., If you use laptop or Wi-Fi network, you can try Airdroid app for wirelessly transferring data in the Android device., Airdroid is one of the best apps for transfer data in PC to mobile and mobile to PC., Sometimes, the USB or USB port in PC doesn’t support each other, which is why you face these type of problem., Google Play Store is the main hub for downloading apps in Android devices., For instance, if you are getting error like not downloading an app from the Google Play Store, then your smartphone may have a serious problem., When you are unable to download Android apps, you can’t do anything in android mobile., Just follow some simple methods to sort out these errors., • You need to remove and add your Google Account., Removing it and adding your account can solve your problems within minutes., • You can clear the data and Cache of Google Play Store and Services., This will fix the problems., The main reason for such an error in an Android smartphone could be due to the device not supporting that particular game., Always ensure that the game you want to download is compatible with your Android OS versions., Another reason for the games not running is because of insufficient RAM space in Android smartphone., Use Clean Master for boosting games in Android mobile., If you are game lover then you can go through our list and download Best Free Car Racing Games for Android Mobile., Android provides limited storage for apps and we don’t have the authority to expand it., So, if you are running Android that frequently shows up insufficient space error, then there is no fix., You just need to install the app CCleaner in your Android device that will help you free up some storage.]","[0.16432377696037292, 0.13561537861824036, 0.1343795508146286, 0.16658496856689453, 0.17377106845378876, 0.14610256254673004, 0.15115123987197876, 0.10028307884931564, 0.17679670453071594, 0.14612779021263123, 0.14604255557060242, 0.15062597393989563, 0.1425563097000122, 0.15112616121768951, 0.1644369661808014, 0.1484353095293045, 0.13976217806339264, 0.15337888896465302, 0.14418822526931763, 0.10579082369804382, 0.1658637821674347, 0.12310595065355301, 0.10974544286727905, 0.07112862169742584, 0.1094629243016243, 0.1460697054862976, 0.12781348824501038, 0.1191161498427391, 0.17566411197185516, 0.14520303905010223, 0.17649254202842712, 0.17334702610969543, 0.1710623949766159, 0.15035775303840637, 0.15698276460170746, 0.1412508189678192, 0.1312795877456665, 0.12560468912124634, 0.14663833379745483, 0.10943581163883209, 0.1346634328365326, 0.11937364935874939, 0.15087033808231354]",0,2b05a3b4c88e.jpg,https://xplorernet.files.wordpress.com/2019/03/img_20190302_181629_438.jpg,8,0.1767967045307159,,False,
http://www.rejcanaynay.com/2017/03/thailand-on-our-way-and-where-we-stayed.html,"[I can't believe it's already been a year since my brother and I went to Thailand., We visited this country last Holy week (it's the only holiday we can leave our business) and now here we are preparing for another trip abroad in the next few weeks., And before our next big adventure come here's what happened on our first trip abroad together that happened 25th of March 2016., It is one heck of an adventure, definitely!, Ever since I met my Thai friends when I was in the US I promised that I will visit them when I got the chance., So when my brother and I decided to finally travel together I suggested Thailand so I could meet my friends that I haven't seen for years., Actually, my brother prefers being a solo traveller but on one of our daily conversations, I said that I want to go abroad as well (parents won't allow me going solo) so we thought why not travel together and visit one country a year every Holy week., Our whole trip was secretly prepared and arranged by my brother who didn't even bother to tell me how much our plane tickets and hotel cost because according to him, he doesn't like to hear me ranting about the amount of money he's going to spend., lol!, Anyway, he booked our flight early in the morning so we could make the most of our first day exploring the city., To be honest, we did not plan our whole trip in Bangkok., No itinerary or even a portable wifi to help us out., Well, we know the tourist spots, the places to visit and even the names of the temples and malls but we don't know how to get there., So when we arrived at Suvarnabhumi Airport we don't know what to do and where to go., Thankfully, their's wifi in the area and the airport staff are very informative as well as their airport signs., We could actually call our hotel for a pick-up service or take a cab but we decided to take the train instead because when we check the location of our place, it is near the train station and will only take about 5-minute walking., From the airport rail link station located on the basement level of Suvarnabhumi airport, we purchased the City Link Train ticket (blue line), went to the train and get off at Ratchaprarop station and walk our way to Baiyoke Sky Hotel., We stayed in Baiyoke Sky Hotel, the Thailand's tallest hotel with 88-storeys above Bangkok's skyline located in downtown area of Pathumwan., It is surrounded by various shopping centers like the famous Pratunam Mall which is right in front of the door of the hotel and other malls that are only within few minutes of walking., The location of the hotel might be one heck busy street with lots of tourist and local vendors around but it is surely a very easy access to train station and shopping malls which we totally love, right?!, As the tallest hotel in Thailand, Baiyoke surely has one of the best views you could see the city and as soon as you enter the lobby you can already start seeing scenic view since it is located on the 18th floor of the hotel., I didn't get the chance to explore the hotel because we are mostly out to see the city but I totally appreciate the size of our hotel room., With 2 queen size bed, a huge bathroom and a big space to put all your souvenirs and pasalubongs, it is a perfect size!, Tho my brother felt it was a little too plain because it's too large without so much in it., If you think about it, from the size of our room it can pass as an apartment., Hahaha!, Anyway, they have the best buffet breakfast ever with lots of food to choose from., Sadly, like what I said earlier, I wasn't able to fully explore the hotel due to our limited time but if you're planning to visit Thailand anytime soon I do recommend this staying here., After settling our luggage in our hotel room, refresh a little bit from the long flight and rest for an hour or two, we went straight back to the city and started our tour., Let the next blog post tell where we went and what we did in Bangkok, Thailand., The hotel room looks so nice and cozy!, It is!, And very spacious too., Hehehe!, I love how your hotel room is so quirky!, It's too bad you didn't explore more of the hotel, the interior and structure looks amazing., Anyway, taking the train is very brave of you., Given the same circumstances, I'd be too scared and I'd probably just take a cab instead., Haha., I'm looking forward to reading more of your Thailand travel posts., Yea, maybe the nextime we visit I'll make the most of it!, Our hotel I mean., Hehehe!, It's actually my brother who lead me everywhere!, Hahaha!, I also wanted to go to Thailand this year if time permits!, Ang ganda lang talaga!, I really love your shots!, Ang sarap mag travel super!, Nice post babe!, Hopefully, you could visit my blog as well!, Have a great day!, Girl, did your brother treat you?, I'm estimating the hotel room to at least be 5K per night!, Hahahaha., Sana ako din makapagstay sa ganyan., Most hotel/hostel stays ko have rooms at most half of yours., Can't wait to read about your other travels!]","[0.16788014769554138, 0.16807110607624054, 0.17660807073116302, 0.16046054661273956, 0.1484687775373459, 0.14830976724624634, 0.15153062343597412, 0.18192975223064423, 0.09780848771333694, 0.20111289620399475, 0.1987701952457428, 0.1798335313796997, 0.13334181904792786, 0.17085158824920654, 0.1297556757926941, 0.18738098442554474, 0.2095731496810913, 0.20506754517555237, 0.2226984053850174, 0.19385895133018494, 0.18633514642715454, 0.2393738180398941, 0.2789805829524994, 0.17799630761146545, 0.20788730680942535, 0.1342388093471527, 0.1542152464389801, 0.19843629002571106, 0.1945418268442154, 0.19845902919769287, 0.25169897079467773, 0.1351592242717743, 0.20035339891910553, 0.09662342816591263, 0.20172712206840515, 0.22258269786834717, 0.07487604022026062, 0.10534877330064774, 0.13552089035511017, 0.17826369404792786, 0.17579905688762665, 0.195256307721138, 0.09662342816591263, 0.13869988918304443, 0.1342388093471527, 0.17038212716579437, 0.12993203103542328, 0.1502522975206375, 0.17649193108081818, 0.1499987095594406, 0.1864454299211502, 0.1348588615655899, 0.10964937508106232, 0.2373974323272705, 0.12805549800395966, 0.11371669173240662, 0.21239037811756134, 0.1517273485660553]",0,e7e97bef7dac.jpg,https://4.bp.blogspot.com/-2PnFWNn2-sU/WNXZ0dIMj4I/AAAAAAAAA6w/rRm8jvmiiF8iJOJNoU25EAVN1arsUOTngCLcB/s1600/IMG_5296.jpg,22,0.2789805829524994,,False,


In [7]:
# Encode the image as JPEG and get the file size and upload it to get a URL
# Then, remove the original image to avoid saving the image in your Parquet file
df = df.with_column("image_uri_s3", df["image"].image.encode("jpeg").url.upload("~/code/multimodal-data-warehouse/data/images/"))

df.show()

url Utf8,text_list List[Utf8],similarity_matrix List[Float64],could_have_url_duplicate Int64,image_name Utf8,raw_url Utf8,matched_text_index Int64,matched_sim Float64,image Image[MIXED],image_invalid Boolean,thumbnail Image[MIXED],image_uri_s3 Utf8
https://gizmodo.com/record-breaking-galaxy-is-so-big-it-acts-like-a-magnify-1614257959,"[NASA's Hubble Space Telescope has just found the most distant lensing galaxy (which are massive enough to act as their own intergalactic microscopes) ever., And thanks to a rare alignment, it might just give us a peak at how our very own galaxy formed all those billions of years ago., When you look more than 9 billion years ago in the early universe, you don't expect to find this type of galaxy lensing at all., It's very difficult to see an alignment between two galaxies in the early universe., Imagine holding a magnifying glass close to you and then moving it much farther away., When you look through a magnifying glass held at arm's length, the chances that you will see an enlarged object are high., But if you move the magnifying glass across the room, your chances of seeing the magnifying glass nearly perfectly aligned with another object beyond it diminishes., Because we've stumbled upon this chance alignment, though, we're able to use the lensing galaxies distorting effects to determine its total mass (including dark matter) by ""gauging the intensity of its lensing effects on the background galaxy's light."", So how much does a record-breaking lensing galaxy weigh?, Over 180 billion times more than our sun.]","[0.2267199158668518, 0.18886339664459229, 0.2548035979270935, 0.2684311270713806, 0.19809527695178986, 0.15881140530109406, 0.14154167473316193, 0.24295282363891602, 0.3148163855075836, 0.19724470376968384]",0,f5f8113da82f.jpg,"https://i.kinja-img.com/gawker-media/image/upload/s--3J8F-YDp--/c_fit,f_auto,fl_progressive,q_80,w_320/836357908648846882.jpg",8,0.3148163855075836,,False,,file:///Users/jaychia/code/multimodal-data-warehouse/data/images/9379a497-bfc8-40b0-9430-972f76e800d5
http://visionplatforminc.com/dynamic-platform-vs-traditional-business-strategy/,"[Our dynamic platform harmonizes your leadership team around a clear vision in one day and then builds momentum long-term, ensuring that vision becomes a reality., It could be called a “Strategy Platform”., But let’s not., Our approach is so radically different from the disappointing image “strategy” can summon in the minds of business leaders, we don’t like to use the word., Unlike a traditional approach to strategy, our dynamic platform is an online, adaptive, affordable, collaborative method that guards against faulty conclusions or dust-gathering strategy documents because no one knows quite what to do with them., Don’t settle for the traditional approach of just getting people on the same page, offer them a framework for smart improvisation., Wondering if it’s actually possible?, Over a hundred organizations have found it highly beneficial., People lie once every ten minutes., What is this costing your company?, What’s replacing traditional strategic planning?, What does the empowered buyer demand?, A short history of why we need to know how we are unique as an organization., Why do so many business initiatives fail?, Do you have any questions about our business organization and management consulting methods or software?, We're happy to answer them., And what we can do for you.]","[0.13888245820999146, 0.15384167432785034, 0.1953807771205902, 0.16709916293621063, 0.18045303225517273, 0.1648285835981369, 0.16796952486038208, 0.17308980226516724, 0.19239084422588348, 0.17934882640838623, 0.19220402836799622, 0.1915411800146103, 0.14698170125484467, 0.18065980076789856, 0.17008471488952637, 0.1634514331817627, 0.18184798955917358]",0,e39fe8050561.png,http://visionplatforminc.com/wp-content/uploads/2018/03/VP-keep-calm-and-stop-telling-lies-500x383.png,2,0.1953807771205902,,True,,
http://visionplatforminc.com/dynamic-platform-vs-traditional-business-strategy/,"[Our dynamic platform harmonizes your leadership team around a clear vision in one day and then builds momentum long-term, ensuring that vision becomes a reality., It could be called a “Strategy Platform”., But let’s not., Our approach is so radically different from the disappointing image “strategy” can summon in the minds of business leaders, we don’t like to use the word., Unlike a traditional approach to strategy, our dynamic platform is an online, adaptive, affordable, collaborative method that guards against faulty conclusions or dust-gathering strategy documents because no one knows quite what to do with them., Don’t settle for the traditional approach of just getting people on the same page, offer them a framework for smart improvisation., Wondering if it’s actually possible?, Over a hundred organizations have found it highly beneficial., People lie once every ten minutes., What is this costing your company?, What’s replacing traditional strategic planning?, What does the empowered buyer demand?, A short history of why we need to know how we are unique as an organization., Why do so many business initiatives fail?, Do you have any questions about our business organization and management consulting methods or software?, We're happy to answer them., And what we can do for you.]","[0.24126467108726501, 0.22723841667175293, 0.1192258670926094, 0.24206474423408508, 0.2564660608768463, 0.2161899358034134, 0.11806226521730423, 0.1627596616744995, 0.09863521158695221, 0.20635978877544403, 0.29334038496017456, 0.208466038107872, 0.21782270073890686, 0.22355303168296814, 0.16166360676288605, 0.13091100752353668, 0.14293897151947021]",0,42532abf37cf.png,http://visionplatforminc.com/wp-content/uploads/2018/06/comparison-chart-1200x676.png,10,0.2933403849601745,,True,,
http://visionplatforminc.com/dynamic-platform-vs-traditional-business-strategy/,"[Our dynamic platform harmonizes your leadership team around a clear vision in one day and then builds momentum long-term, ensuring that vision becomes a reality., It could be called a “Strategy Platform”., But let’s not., Our approach is so radically different from the disappointing image “strategy” can summon in the minds of business leaders, we don’t like to use the word., Unlike a traditional approach to strategy, our dynamic platform is an online, adaptive, affordable, collaborative method that guards against faulty conclusions or dust-gathering strategy documents because no one knows quite what to do with them., Don’t settle for the traditional approach of just getting people on the same page, offer them a framework for smart improvisation., Wondering if it’s actually possible?, Over a hundred organizations have found it highly beneficial., People lie once every ten minutes., What is this costing your company?, What’s replacing traditional strategic planning?, What does the empowered buyer demand?, A short history of why we need to know how we are unique as an organization., Why do so many business initiatives fail?, Do you have any questions about our business organization and management consulting methods or software?, We're happy to answer them., And what we can do for you.]","[0.22902636229991913, 0.1869916319847107, 0.10615022480487823, 0.19604836404323578, 0.24055539071559906, 0.19517618417739868, 0.14319950342178345, 0.23251116275787354, 0.10010193288326263, 0.22138026356697083, 0.1989588886499405, 0.21306291222572327, 0.14781776070594788, 0.21081317961215973, 0.24410296976566315, 0.1561538279056549, 0.1869804859161377]",0,6e3961dc4767.jpg,http://visionplatforminc.com/wp-content/uploads/2018/11/monitor-for-Business-Strategy.jpg,14,0.2441029697656631,,True,,
https://www.printworxuk.com/product/roller-banner/,"[Perfect for exhibitions, displays, or advertising in small spaces., Do You Have Your Artwork?, Please upload your artwork as a pdf print ready file., Includes Hardware, printed banner, and carry case., Available with digitally printed ‘premium’* 165 micron grey-backed polyester film banners.]","[0.28204163908958435, 0.15614458918571472, 0.174833744764328, 0.2311287373304367, 0.26510486006736755]",0,ce9f0026d021.png,https://www.printworxuk.com/wp-content/uploads/2017/06/roller-banner.png,0,0.2820416390895843,,True,,
https://www.printworxuk.com/product/roller-banner/,"[Perfect for exhibitions, displays, or advertising in small spaces., Do You Have Your Artwork?, Please upload your artwork as a pdf print ready file., Includes Hardware, printed banner, and carry case., Available with digitally printed ‘premium’* 165 micron grey-backed polyester film banners.]","[0.2898246943950653, 0.16946707665920258, 0.18884825706481934, 0.21569928526878357, 0.31874263286590576]",0,a41d8c7a197f.png,https://www.printworxuk.com/wp-content/uploads/2015/07/banner-287x300.png,4,0.3187426328659057,,True,,
https://xplorer.com.ng/2019/04/01/10-most-common-android-problems/,"[In this article, I have picked the most common 10 Android problems along with its simplest and most effective solutions., Battery drain is one of the most common problems in an Android smartphone., This is the price we have to pay for having the smartest devices in the world., While there are many reasons behind the battery drain issue, one of the best ways to solve this issue is to enable battery saving mode and reduce your Android smartphone brightness., To resolve the problem of battery drain, you can use DU Battery Saver & Fast Charge., Also, DU Battery Saver helps you to charge your battery fast by removing the apps running in the background of your Android device., Open your Settings menu, click on the location and enable battery saving mode., Always use low brightness in an Android smartphone., Many a times, we face problems connecting to the internet in spite of the Android smartphone being connected to the Wi-Fi., This may solve the problem., Just go to Wi-Fi > Settings > Menu > Advanced and choose to stay connected to Wi-Fi during sleep., Alternatively, you can restart your Android smartphone and Wi-Fi router or enable Airplane mode for atleast a minute., Then, again try to connect with your Android smartphone to Wi-Fi., Several users face syncing problem with Google server in an Android smartphone for many reasons., However, the solution is simple., Firstly, check if you have recently changed your account password., If yes, then you need to update it with the new password., If the problem still persists, then enable Airplane mode for 30 seconds and try again., Alternatively, you can do remove your Google account and add it again., By default, keyboard of the Android smartphone gets hung., If you are facing problems such as Keyboard taking too long to respond, or stopped responding, then it is suggested that you download Google Keyboard, as it is one of the most popular Google keyboard apps., You can make it as your default Keyboard app and can improve your typing experience too., Your smartphone screen automatically turned off when you plug your smartphone with charger., To recover this, you need to go to Settings / Applications / Development and tick the ‘Stay awake’ option to keep the screen on when charging., If you use laptop or Wi-Fi network, you can try Airdroid app for wirelessly transferring data in the Android device., Airdroid is one of the best apps for transfer data in PC to mobile and mobile to PC., Sometimes, the USB or USB port in PC doesn’t support each other, which is why you face these type of problem., Google Play Store is the main hub for downloading apps in Android devices., For instance, if you are getting error like not downloading an app from the Google Play Store, then your smartphone may have a serious problem., When you are unable to download Android apps, you can’t do anything in android mobile., Just follow some simple methods to sort out these errors., • You need to remove and add your Google Account., Removing it and adding your account can solve your problems within minutes., • You can clear the data and Cache of Google Play Store and Services., This will fix the problems., The main reason for such an error in an Android smartphone could be due to the device not supporting that particular game., Always ensure that the game you want to download is compatible with your Android OS versions., Another reason for the games not running is because of insufficient RAM space in Android smartphone., Use Clean Master for boosting games in Android mobile., If you are game lover then you can go through our list and download Best Free Car Racing Games for Android Mobile., Android provides limited storage for apps and we don’t have the authority to expand it., So, if you are running Android that frequently shows up insufficient space error, then there is no fix., You just need to install the app CCleaner in your Android device that will help you free up some storage.]","[0.16432377696037292, 0.13561537861824036, 0.1343795508146286, 0.16658496856689453, 0.17377106845378876, 0.14610256254673004, 0.15115123987197876, 0.10028307884931564, 0.17679670453071594, 0.14612779021263123, 0.14604255557060242, 0.15062597393989563, 0.1425563097000122, 0.15112616121768951, 0.1644369661808014, 0.1484353095293045, 0.13976217806339264, 0.15337888896465302, 0.14418822526931763, 0.10579082369804382, 0.1658637821674347, 0.12310595065355301, 0.10974544286727905, 0.07112862169742584, 0.1094629243016243, 0.1460697054862976, 0.12781348824501038, 0.1191161498427391, 0.17566411197185516, 0.14520303905010223, 0.17649254202842712, 0.17334702610969543, 0.1710623949766159, 0.15035775303840637, 0.15698276460170746, 0.1412508189678192, 0.1312795877456665, 0.12560468912124634, 0.14663833379745483, 0.10943581163883209, 0.1346634328365326, 0.11937364935874939, 0.15087033808231354]",0,2b05a3b4c88e.jpg,https://xplorernet.files.wordpress.com/2019/03/img_20190302_181629_438.jpg,8,0.1767967045307159,,False,,file:///Users/jaychia/code/multimodal-data-warehouse/data/images/0c2a9811-a54e-49d7-a3f4-f1b6068d43c9
http://www.rejcanaynay.com/2017/03/thailand-on-our-way-and-where-we-stayed.html,"[I can't believe it's already been a year since my brother and I went to Thailand., We visited this country last Holy week (it's the only holiday we can leave our business) and now here we are preparing for another trip abroad in the next few weeks., And before our next big adventure come here's what happened on our first trip abroad together that happened 25th of March 2016., It is one heck of an adventure, definitely!, Ever since I met my Thai friends when I was in the US I promised that I will visit them when I got the chance., So when my brother and I decided to finally travel together I suggested Thailand so I could meet my friends that I haven't seen for years., Actually, my brother prefers being a solo traveller but on one of our daily conversations, I said that I want to go abroad as well (parents won't allow me going solo) so we thought why not travel together and visit one country a year every Holy week., Our whole trip was secretly prepared and arranged by my brother who didn't even bother to tell me how much our plane tickets and hotel cost because according to him, he doesn't like to hear me ranting about the amount of money he's going to spend., lol!, Anyway, he booked our flight early in the morning so we could make the most of our first day exploring the city., To be honest, we did not plan our whole trip in Bangkok., No itinerary or even a portable wifi to help us out., Well, we know the tourist spots, the places to visit and even the names of the temples and malls but we don't know how to get there., So when we arrived at Suvarnabhumi Airport we don't know what to do and where to go., Thankfully, their's wifi in the area and the airport staff are very informative as well as their airport signs., We could actually call our hotel for a pick-up service or take a cab but we decided to take the train instead because when we check the location of our place, it is near the train station and will only take about 5-minute walking., From the airport rail link station located on the basement level of Suvarnabhumi airport, we purchased the City Link Train ticket (blue line), went to the train and get off at Ratchaprarop station and walk our way to Baiyoke Sky Hotel., We stayed in Baiyoke Sky Hotel, the Thailand's tallest hotel with 88-storeys above Bangkok's skyline located in downtown area of Pathumwan., It is surrounded by various shopping centers like the famous Pratunam Mall which is right in front of the door of the hotel and other malls that are only within few minutes of walking., The location of the hotel might be one heck busy street with lots of tourist and local vendors around but it is surely a very easy access to train station and shopping malls which we totally love, right?!, As the tallest hotel in Thailand, Baiyoke surely has one of the best views you could see the city and as soon as you enter the lobby you can already start seeing scenic view since it is located on the 18th floor of the hotel., I didn't get the chance to explore the hotel because we are mostly out to see the city but I totally appreciate the size of our hotel room., With 2 queen size bed, a huge bathroom and a big space to put all your souvenirs and pasalubongs, it is a perfect size!, Tho my brother felt it was a little too plain because it's too large without so much in it., If you think about it, from the size of our room it can pass as an apartment., Hahaha!, Anyway, they have the best buffet breakfast ever with lots of food to choose from., Sadly, like what I said earlier, I wasn't able to fully explore the hotel due to our limited time but if you're planning to visit Thailand anytime soon I do recommend this staying here., After settling our luggage in our hotel room, refresh a little bit from the long flight and rest for an hour or two, we went straight back to the city and started our tour., Let the next blog post tell where we went and what we did in Bangkok, Thailand., The hotel room looks so nice and cozy!, It is!, And very spacious too., Hehehe!, I love how your hotel room is so quirky!, It's too bad you didn't explore more of the hotel, the interior and structure looks amazing., Anyway, taking the train is very brave of you., Given the same circumstances, I'd be too scared and I'd probably just take a cab instead., Haha., I'm looking forward to reading more of your Thailand travel posts., Yea, maybe the nextime we visit I'll make the most of it!, Our hotel I mean., Hehehe!, It's actually my brother who lead me everywhere!, Hahaha!, I also wanted to go to Thailand this year if time permits!, Ang ganda lang talaga!, I really love your shots!, Ang sarap mag travel super!, Nice post babe!, Hopefully, you could visit my blog as well!, Have a great day!, Girl, did your brother treat you?, I'm estimating the hotel room to at least be 5K per night!, Hahahaha., Sana ako din makapagstay sa ganyan., Most hotel/hostel stays ko have rooms at most half of yours., Can't wait to read about your other travels!]","[0.16788014769554138, 0.16807110607624054, 0.17660807073116302, 0.16046054661273956, 0.1484687775373459, 0.14830976724624634, 0.15153062343597412, 0.18192975223064423, 0.09780848771333694, 0.20111289620399475, 0.1987701952457428, 0.1798335313796997, 0.13334181904792786, 0.17085158824920654, 0.1297556757926941, 0.18738098442554474, 0.2095731496810913, 0.20506754517555237, 0.2226984053850174, 0.19385895133018494, 0.18633514642715454, 0.2393738180398941, 0.2789805829524994, 0.17799630761146545, 0.20788730680942535, 0.1342388093471527, 0.1542152464389801, 0.19843629002571106, 0.1945418268442154, 0.19845902919769287, 0.25169897079467773, 0.1351592242717743, 0.20035339891910553, 0.09662342816591263, 0.20172712206840515, 0.22258269786834717, 0.07487604022026062, 0.10534877330064774, 0.13552089035511017, 0.17826369404792786, 0.17579905688762665, 0.195256307721138, 0.09662342816591263, 0.13869988918304443, 0.1342388093471527, 0.17038212716579437, 0.12993203103542328, 0.1502522975206375, 0.17649193108081818, 0.1499987095594406, 0.1864454299211502, 0.1348588615655899, 0.10964937508106232, 0.2373974323272705, 0.12805549800395966, 0.11371669173240662, 0.21239037811756134, 0.1517273485660553]",0,e7e97bef7dac.jpg,https://4.bp.blogspot.com/-2PnFWNn2-sU/WNXZ0dIMj4I/AAAAAAAAA6w/rRm8jvmiiF8iJOJNoU25EAVN1arsUOTngCLcB/s1600/IMG_5296.jpg,22,0.2789805829524994,,False,,file:///Users/jaychia/code/multimodal-data-warehouse/data/images/ac7d0a6d-6b6e-4d8e-8af8-5ff58d1de886


In [11]:
# Remove columns that are large to avoid saving them in your Parquet files
df = df.exclude("image", "processed_image")

In [12]:
# For the sake of demonstration, limit to 100 rows
df = df.limit(100)

# Execute the entire dataframe, and write the results to a Parquet file!
df.write_parquet("data/tables/mmc4_v1.1.parquet")

ScanWithTask-MapPartition-LocalLimit [Stage:18]:   0%|          | 0/1 [00:00<?, ?it/s]

Error occurred during url_download at index: 3 Object at location http://visionplatforminc.com/wp-content/uploads/2018/11/monitor-for-Business-Strategy.jpg not found
Details:
HTTP status client error (404 Not Found) for url (http://visionplatforminc.com/wp-content/uploads/2018/11/monitor-for-Business-Strategy.jpg) (falling back to Null)
Error occurred during url_download at index: 1 Object at location http://visionplatforminc.com/wp-content/uploads/2018/03/VP-keep-calm-and-stop-telling-lies-500x383.png not found
Details:
HTTP status client error (404 Not Found) for url (http://visionplatforminc.com/wp-content/uploads/2018/03/VP-keep-calm-and-stop-telling-lies-500x383.png) (falling back to Null)
Error occurred during url_download at index: 2 Object at location http://visionplatforminc.com/wp-content/uploads/2018/06/comparison-chart-1200x676.png not found
Details:
HTTP status client error (404 Not Found) for url (http://visionplatforminc.com/wp-content/uploads/2018/06/comparison-chart-12

DaftCoreException: DaftError::ValueError All images in a column must have the same dtype, but got: UInt16 and UInt8

## And you're done! Your data is now "ingested".

You can now reliably run analysis on your dataset:

1. Image metadata has been extracted so you can run analyses (e.g. plot a histogram of image sizes)
2. Thumbnails have been extracted (for easy visualization)
3. Images have been normalized to the same mode (for easier downstream processing)
4. Raw URL is still available (e.g. for analysis on which domains the images were retrieved from)

Now THAT's a beautiful dataset 😍😍