# Loading the Unsplash Research dataset in Pandas dataframes

This notebooks is used to load the Unsplash Research dataset in Pandas dataframes for analysis (pre-requisite to download Unsplash image dataset)


## Loading libraries

In [1]:
import numpy as np
import pandas as pd
import glob

## Loading the datasets in Pandas

Make sure that you correctly point to the correct path.

In [11]:
path = '**/*' #/unsplash-research-dataset-lite-latest'
documents = ['photos', 'keywords', 'collections', 'conversions', 'colors']
datasets = {}

for doc in documents:
  files = glob.glob(path + doc + ".tsv*")
  print(files)
  subsets = []
  for filename in files:
    print(filename) #
    df = pd.read_csv(filename, sep='\t', header=0)
    subsets.append(df)

  datasets[doc] = pd.concat(subsets, axis=0, ignore_index=True)

['unsplash-research-dataset-lite-latest/photos.tsv000']
unsplash-research-dataset-lite-latest/photos.tsv000
['unsplash-research-dataset-lite-latest/keywords.tsv000']
unsplash-research-dataset-lite-latest/keywords.tsv000
['unsplash-research-dataset-lite-latest/collections.tsv000']
unsplash-research-dataset-lite-latest/collections.tsv000
['unsplash-research-dataset-lite-latest/conversions.tsv000']
unsplash-research-dataset-lite-latest/conversions.tsv000
['unsplash-research-dataset-lite-latest/colors.tsv000']
unsplash-research-dataset-lite-latest/colors.tsv000


## Exploring the datasets (optional)

Here are the first couple of rows from each dataset, as an example.

In [21]:
datasets['photos'].head()

Unnamed: 0,photo_id,photo_url,photo_image_url,photo_submitted_at,photo_featured,photo_width,photo_height,photo_aspect_ratio,photo_description,photographer_username,...,photo_location_country,photo_location_city,stats_views,stats_downloads,ai_description,ai_primary_landmark_name,ai_primary_landmark_latitude,ai_primary_landmark_longitude,ai_primary_landmark_confidence,blur_hash
0,bygTaBey1Xk,https://unsplash.com/photos/bygTaBey1Xk,https://images.unsplash.com/uploads/1413387620...,2014-10-15 15:40:40.111061,t,4635,3070,1.51,,jaspervandermeij,...,,,1708356,19085,sea and rock cliff with grasses under cloudy sky,Neist Point,57.428387,-6.783028,30.348906,LcE{wnIVRixt~WR+NGjbxukCWBWB
1,gXSFnk2a9V4,https://unsplash.com/photos/gXSFnk2a9V4,https://images.unsplash.com/reserve/jEs6K0y1Sb...,2014-07-10 18:36:06,t,2448,3264,0.75,Coastline view,kimberlyrichards,...,United States,Tillamook,9895033,74702,aerial photography of seashore,,,,,LXE4G#IARjj]GdWFxaWBDOxaofj[
2,grg6-DNJuaU,https://unsplash.com/photos/grg6-DNJuaU,https://images.unsplash.com/uploads/1412192004...,2014-10-01 19:33:56.393181,t,5184,3456,1.5,,marcusdallcol,...,,,8967968,38338,man surfboarding on ocean wave during daytime,,,,,LcHx?5R%Rjof01bHWBof4ooMoeax
3,sO42hhChB1c,https://unsplash.com/photos/sO42hhChB1c,https://images.unsplash.com/reserve/ijl3tATFRp...,2014-08-19 21:15:40,t,4896,3264,1.5,Hazy Ocean Waters,arturpokusin,...,,,2071752,10860,body of water,,,,,LyOzVsj[aefQ_4j[ayj[IUayj[ay
4,tkk8_HakQ98,https://unsplash.com/photos/tkk8_HakQ98,https://images.unsplash.com/reserve/6vaWXsQuSW...,2014-05-05 18:31:06,t,2000,1333,1.5,Silhouettes In Desert,carlov,...,,,2720281,9081,car on desert during sunset,,,,,"LYEV]I%19ZR+-=s,RkWW00WB%2j["


In [13]:
datasets['keywords'].head()

Unnamed: 0,photo_id,keyword,ai_service_1_confidence,ai_service_2_confidence,suggested_by_user
0,zzwTUqvzIFg,rock,15.485713,,f
1,zzwTUqvzIFg,cross,19.598213,,f
2,zzwTUqvzIFg,eruption,34.787167,,f
3,zzwTUqvzIFg,sunset,30.080654,,f
4,zzwTUqvzIFg,eclipse,39.832775,,f


In [14]:
datasets['collections'].head()

Unnamed: 0,photo_id,collection_id,collection_title,photo_collected_at
0,--2IBUMom1I,1230101,Travel,2017-09-27 11:24:17.575047
1,--2IBUMom1I,9832457,business,2020-04-04 14:26:10.506402
2,--2IBUMom1I,2143051,Travel / Places,2018-05-22 23:20:05.898545
3,--2IBUMom1I,FBJEaBSjBvg,Settings,2022-06-04 03:56:40.892078
4,--2IBUMom1I,162470,Majestical Sunsets,2016-03-15 17:04:25.089589


In [15]:
datasets['conversions'].head()

Unnamed: 0,converted_at,conversion_type,keyword,photo_id,anonymous_user_id,conversion_country
0,2023-05-09 11:03:40.445,download,Mond,jlV2k_Fx0fc,4589085a-75df-417b-93de-22adf2fc627d,DE
1,2023-05-09 11:12:05.109,download,16.9 camel desert,yNGQ830uFB4,e05af0fe-4930-421d-b20d-f904f316e2c3,CN
2,2023-05-09 11:17:33.417,download,bird,BFsm5vldl2I,64fd6739-db67-46e0-99f2-022efb498447,RU
3,2023-05-09 11:32:03.943,download,night sky,-cKXtsJWU-I,2f9c6ac4-02c8-4d0f-82b3-0482a82ab0bf,IN
4,2023-05-09 11:36:56.557,download,zoom background office,CEeoDFpVxxw,a7abbff5-4a50-4c65-b463-18139e2978e9,IN


In [16]:
datasets['colors'].head()

Unnamed: 0,photo_id,hex,red,green,blue,keyword,ai_coverage,ai_score
0,XDPk8ndzNho,5B534C,91,83,76,darkolivegreen,0.065067,0.030752
1,IfL3QovlAbI,371511,55,21,17,black,0.105533,0.203291
2,GKzgF32piaE,8ACCD5,138,204,213,skyblue,0.044867,0.128655
3,T5WR9adosj8,A59A99,165,154,153,darkgray,0.0508,0.048626
4,T5WR9adosj8,7F7575,127,117,117,gray,0.050533,0.030054


Analyze data where we have latitude information

In [39]:
df = datasets['photos']
df[~df['ai_primary_landmark_latitude'].isna()]

Unnamed: 0,photo_id,photo_url,photo_image_url,photo_submitted_at,photo_featured,photo_width,photo_height,photo_aspect_ratio,photo_description,photographer_username,...,photo_location_country,photo_location_city,stats_views,stats_downloads,ai_description,ai_primary_landmark_name,ai_primary_landmark_latitude,ai_primary_landmark_longitude,ai_primary_landmark_confidence,blur_hash
0,bygTaBey1Xk,https://unsplash.com/photos/bygTaBey1Xk,https://images.unsplash.com/uploads/1413387620...,2014-10-15 15:40:40.111061,t,4635,3070,1.51,,jaspervandermeij,...,,,1708356,19085,sea and rock cliff with grasses under cloudy sky,Neist Point,57.428387,-6.783028,30.348906,LcE{wnIVRixt~WR+NGjbxukCWBWB
44,WuAKYGlcgmA,https://unsplash.com/photos/WuAKYGlcgmA,https://images.unsplash.com/photo-1543487546-5...,2018-11-29 10:33:27.297023,t,6000,4000,1.50,,martinpechy,...,,,7136266,6576,green grass mountain near sea,Man O'War Bay,50.621846,-2.274685,0.455769,LnOg1w_4M{ad_4IUofkDt7RikDf6
89,oGBhhnsIk6c,https://unsplash.com/photos/oGBhhnsIk6c,https://images.unsplash.com/photo-156781392398...,2019-09-07 00:09:00.964888,t,3628,4535,0.80,#montenegro #travel #outdoors #adventures #fol...,todorraw,...,,,322072,1700,green leafed plants on hill,Mausoleum of Petar II Petrovic-Njegos,42.399899,18.837534,0.232970,L~J7~@ozaxj@~qkDf6j@-;bIjZfR
104,vg9_eeLjnzw,https://unsplash.com/photos/vg9_eeLjnzw,https://images.unsplash.com/photo-157859641741...,2020-01-09 19:00:32.132759,t,5168,3445,1.50,,alex_akimenko,...,Италия,Манарола,716780,2127,aerial photography of city with high-rise buil...,Parco Nazionale delle Cinque Terre,44.128109,9.712391,0.726395,LCB34$~80z585ARj-U%159EMs-%1
110,xRsPg9Z8Ls0,https://unsplash.com/photos/xRsPg9Z8Ls0,https://images.unsplash.com/photo-158052169873...,2020-02-01 01:49:46.811042,t,2765,3456,0.80,The mighty Gorges Du Verdon.,william_sinclair,...,France,,519254,2052,river between green trees during daytime,Verdon Gorge,43.767623,6.345978,0.294208,LLEC%l9s4TMx55sS-pX9RQ%2Nat6
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
24769,7wk0ja-DP_c,https://unsplash.com/photos/7wk0ja-DP_c,https://images.unsplash.com/photo-147604102652...,2016-10-09 19:24:37.20838,t,3024,3780,0.80,,lifeasyoshi,...,United States,,8473740,97827,lake water scenery,Blanca Lake,47.936241,-121.342106,41.991177,LjG9BPD%9Fj]_NIUIUay-pM{ofWB
24818,ptp1UHJ6c5M,https://unsplash.com/photos/ptp1UHJ6c5M,https://images.unsplash.com/photo-151733923966...,2018-01-30 19:09:21.081108,t,2000,3008,0.66,Gone with the wind,bertrand1212,...,United States,,710531,4272,"antelope Canyon, Arizona",Antelope Canyon,36.865956,-111.377850,75.364596,LCB2S|=xWBWB-A=xo0NH0zR*S2Na
24899,S-5qu7iwQfc,https://unsplash.com/photos/S-5qu7iwQfc,https://images.unsplash.com/photo-157367666051...,2019-11-13 20:24:50.319771,t,3886,5829,0.67,Early cold morning at Lake Louise. Instagram ...,touann,...,,,1426487,12267,white snow mountain,Banff National Park,51.496846,-115.928056,0.844728,LNFZsfxa4TWX?b%Mf+WBE2tR.8oz
24900,-6hvB84fyYA,https://unsplash.com/photos/-6hvB84fyYA,https://images.unsplash.com/photo-157640551554...,2019-12-15 10:27:00.313398,t,4000,6000,0.67,THE OLD MAN - Scotland Landscape,therawhunter,...,Regno Unito,,5505807,148112,a group of rocks sitting on top of a lush gree...,Old Man of Storr,57.506323,-6.176891,0.708316,LRBDB9EL$+S2WGRjWVae0c-VNGs.
