# Stage 4 — Feature Engineering Dataset (Booking + Grid Enrichment)

**Goal:** Create a single dataset at the *listing level* that contains:
- Booking cleaned listing data (from `booking_clean.parquet`)
- The same **1km `grid_id`** used in Stage 3 (exact formula)
- Grid-level enrichment features 
- Simple engineered features (log counts, distance caps, proximity flags, etc.)
- A `has_enrichment` flag and an **enriched-only** subset for ML

## 1) Imports + Parameters

In [0]:
from pyspark.sql import functions as F

BOOKING_PATH = "dbfs:/tmp/booking_clean/booking_clean.parquet"
GRID_ENRICH_PATH = "dbfs:/tmp/booking_grid_enrichment_final"
OUT_BASE = "dbfs:/tmp/booking_stage4"  

METERS_PER_DEG_LAT = 111320.0
GRID_M = 1000.0        # 1km
RADIUS_M = 800.0       


PROX_THRESHOLDS_M = [200.0, 500.0]


## 2) Load Booking Clean + Grid Enrichment

In [0]:
booking = spark.read.parquet(BOOKING_PATH)
enrich = spark.read.parquet(GRID_ENRICH_PATH).dropDuplicates(["grid_id"])

print("Booking rows:", booking.count())
print("Enriched grids:", enrich.count())

print("Booking columns:", len(booking.columns))
print("Enrichment columns:", len(enrich.columns))

display(booking.limit(5))
display(enrich.limit(5))

Booking rows: 3239391
Enriched grids: 2198
Booking columns: 23
Enrichment columns: 21


hotel_id,url,title,city,country,location,lat,lon,metro_railway_access,description,fine_print,property_highlights,property_information,most_popular_facilities,house_rules,manager_language_spoken,availability,review_score,manager_score,number_of_reviews,top_reviews,property_surroundings,images
8908679,https://www.booking.com/hotel/pl/komfortowe-noclegi.html,Komfortowe Noclegi,Szałsza,Poland,"Ptasia 18, 42-677 Szałsza, Poland",50.3330617,18.7037022,True,"Providing a garden, Komfortowe Noclegi provides accommodations in Szałsza. This homestay offers free private parking, private check-in and check-out, and free Wifi. Górnik Zabrze is 5.6 miles away and Ruch Chorzów Stadium is 18 miles from the homestay. Offering a balcony and garden views, the homestay includes 2 bedrooms, a living room, cable flat-screen TV, an equipped kitchen, and 1 bathroom with a shower. Towels and bed linen are featured in the homestay. The accommodation is non-smoking. Stadion Śląski is 19 miles from the homestay, while Silesia City Center shopping mall is 21 miles away. Katowice Airport is 25 miles from the property.","Please inform Komfortowe Noclegi of your expected arrival time in advance. You can use the Special Requests box when booking, or contact the property directly using the contact details in your confirmation. This property does not accommodate bachelor(ette) or similar parties. Quiet hours are between 21:00:00 and 06:00:00. When staying at the property with children, note that the property is legally obliged to apply standards for the protection of minors to determine the identity of the minors and their relationship with the adult they’re staying with.",Free parking. Non-smoking rooms. Free Wifi,,"List(Free parking, Non-smoking rooms, Free Wifi)","List(List(Check-in, From 3:00 PM to 10:00 PM You need to let the property know what time you'll be arriving in advance.), List(Check-out, From 12:00 AM to 11:30 AM), List(Cancellation/ prepayment, Cancellation and prepayment policies vary according to accommodation type. Enter your stay dates and check the conditions of your selected option.), List(Children & Beds, Child policies Children over 5 are welcome. To see correct prices and occupancy info, add the number and ages of children in your group to your search. Crib and extra bed policies Cribs and extra beds aren't available at this property.), List(No age restriction, There's no age requirement for check-in), List(Pets, Pets are allowed on request. Charges may apply.), List(Payments by Booking.com, Booking.com takes your payment for this stay on behalf of the property, but make sure you have cash for any extras once you get there.), List(Smoking, Smoking is not allowed.), List(Parties, Parties/events are not allowed), List(Quiet hours, Guests need be quiet between 9:00 PM and 6:00 AM.))","List(English, Polish)","List(List(1 full bed | 1 twin bed | 2 sofa beds, 5, Two-Bedroom Apartment))",9.8,,48,"List(List(gb, Lesley, Beautiful, spacious and very well equipped living space - the hostess had thought of everything you might need Friendly and responsive hostess Quiet location), List(ee, Roman, Very comfortable home stay, will definately come again), List(fr, Xavier, Everything was perfect, well decorated, clean, with a huge bathroom, a nice kitchen and 2 comfy bedrooms.), List(pl, Stefan, Nice appartment in new house in suburbs of Gliwice. Private free parking, nice garden. Host was very welcoming and nice. -), List(pl, Goran, 1. Welcome dinner prepared; 2. Clean and tidy; 3. Garage offered; 4. Quite and communicative place (closet to A1 highway); 5. Contacting you before arrival; 6. Any kind of vanity sets available; 7. Strongly recommended. Everything was excellent.), List(cz, Denisa, Klidná lokalita na okraji obce. Naprosto luxusní ubytování, skvěle vybavené. Pohodlné postele v oddělených ložnicích. Je super mít soukromí. Prostorná krásná koupelna s vanou i sprchovým koutem. Cítili jsme se snad líp než doma. 😊😉 Paní majitelka byla velice milá a ochotná. Nemám vůbec žádné výhrady. Bylo to jedno z nejhezčích ubytování, které jsem měla.), List(at, Karin, Schöne Lage in gepflegter Umgebung, beste Vermieterin die ich je getroffen habe , super freundlich und hilfsbereit, sehr schönes Appartement, alles vorhanden was man sich vorstellen kann, sogar eine Garage fürs Auto vorhanden, kurzum-besser geht nicht Alles war perfekt), List(cz, Natalie, Vše naprosto perfektní! Čisté utulné! Pani hostitelka velmi příjemná ochotná!), List(pl, Magdalena, Przemili właściciele, wszystko czego potrzebujesz jest na miejscu, czyściutko, świetna lokalizacja, miejsce godne polecenia i ponownego odwiedzenia!), List(de, Tobias, Die Freundlichkeit der Gastgeber war unglaublich und allein das wäre die Reise wert gewesen. Es war einer der besten Aufenthalte, die wir je über booking.com gebucht haben. Es gibt absolut nichts kritisch anzumerken. Es hat uns alles gefallen.))","List(List(2.49, km, Zespół Radiostacji), List(3.481, km, Zespół Kościoła Par. PW. św. Bartłomieja), List(3.226, km, Park Miliona Świateł), List(4.434, km, Ewangelicki Kościół Zbawiciela), List(4.69, km, Port Żeglugi Śródlądowej), List(3.856, km, Kraina Dzieci), List(4.554, km, Klinika Onkologiczna), List(5.027, km, Sparta Zabrze), List(4.999, km, Rzeźba Lew Czuwający), List(4.693, km, Zespół Huty Gliwickiej))","List(https://cf.bstatic.com/xdata/images/hotel/max200/383121978.jpg?k=307af5ed04eee85a3f903520b2c004d86e6d9505c7cff8ba4e19712927988822&o=, https://cf.bstatic.com/xdata/images/hotel/max200/398748202.jpg?k=f2a0c5110eeaf803ff602529f910b7b75085587948d3e740486efbb798b0110b&o=, https://cf.bstatic.com/xdata/images/hotel/max200/383122385.jpg?k=438e3a34c5e0f3950e943f82a62b488853689555a2a0f0ab095214afb31997ed&o=, https://cf.bstatic.com/xdata/images/hotel/max200/382440739.jpg?k=304e1016199b376846d6ed47d225942f1100ce9fd150477e2bcd3af7de4b6daa&o=, https://cf.bstatic.com/xdata/images/hotel/max200/398748177.jpg?k=68db0922f3621ba1bbb152a2dec83da9cf600465c80b06ed78cbc6b473dc6d58&o=, https://cf.bstatic.com/xdata/images/hotel/max200/383837546.jpg?k=2e76e90b53671e32973a38571fc121bc89622a62d9e71304703b28eb8078154d&o=, https://cf.bstatic.com/xdata/images/hotel/max200/398748198.jpg?k=00eaa6306e45a6554f73a3ec156a85c447c00c9e929b1954ea054eeb13f9df80&o=, https://cf.bstatic.com/xdata/images/hotel/max200/398748254.jpg?k=6ab783b1d9d02c3a7ad5dfc50db21f89679f9bc73a6b0884f5f1fe73bf727e25&o=, https://cf.bstatic.com/xdata/images/hotel/max200/398748392.jpg?k=9e3af5df479c3d154df605c05172f38bc75feaadb24aad1416b24d28e6a7fb71&o=, https://cf.bstatic.com/xdata/images/hotel/max200/398748338.jpg?k=fedb569bf4b94eb73e72bf8991324509a7aa0c8a1b2d04ff7ff62b031bce4c37&o=, https://cf.bstatic.com/xdata/images/hotel/max200/398748289.jpg?k=c7c44db047480c9bed033d724c2827c0639e601a9ee7fdb962c34ddb346172e5&o=, https://cf.bstatic.com/xdata/images/hotel/max200/398748211.jpg?k=864b84d062aaec31f762a334dc861eb30babb14fd842ebd23dd7bc50033a50a1&o=, https://cf.bstatic.com/xdata/images/hotel/max200/398748123.jpg?k=9f8bba4372ce9049e1bb8c16b4d257fa4ca84f4db1bde00660cbb01547fedb2e&o=, https://cf.bstatic.com/xdata/images/hotel/max200/398748053.jpg?k=b0ddf6bfca8496ee60cb01b0d3dd1226407dff7cdfe65c12d665414deebdc46d&o=, https://cf.bstatic.com/xdata/images/hotel/max200/398747952.jpg?k=67a2221cc7dadcb5e49ba3468ad1a0a88a7be5c514abf6517101d730337f37b0&o=, https://cf.bstatic.com/xdata/images/hotel/max200/398747864.jpg?k=e80fec087856d0280ea6b569d290176f8a8a9734ec164778a9cd3d43b7e9f91f&o=, https://cf.bstatic.com/xdata/images/hotel/max200/398747751.jpg?k=5645d4e1a717288489e639f49062b010709130da479c388fd9c3d45db8d01e41&o=, https://cf.bstatic.com/xdata/images/hotel/max200/398747730.jpg?k=9f7264e6f31ad738c2741780560f664d5562b15ab3d0eec17c4d36a1df636f01&o=, https://cf.bstatic.com/xdata/images/hotel/max200/398747597.jpg?k=748159c80fb3d917ca2b38da2bdd96924372926fe722abfb2f6e67c018cdb0ce&o=, https://cf.bstatic.com/xdata/images/hotel/max200/398747656.jpg?k=dbe0a5ff355179df0e15fc3c2497afaa7e1de41e2f5f8c462cb2e50980ba77bc&o=, https://cf.bstatic.com/xdata/images/hotel/max200/388368033.jpg?k=210f9264385da0e65a6b30b3c80e49921bd60af0ce18b8fb9af92edf48b2a55b&o=, https://cf.bstatic.com/xdata/images/hotel/max200/382440628.jpg?k=0cbef974a41913fbc8a5fe7545a9e98151dd6be05b46f232d140d2d51eea4593&o=, https://cf.bstatic.com/xdata/images/hotel/max200/385468321.jpg?k=60661c38cfbd4616cbb27f09709e413c6bd2b18c5219d1ad335b6ca8ba29ea3c&o=, https://cf.bstatic.com/xdata/images/hotel/max200/382440830.jpg?k=8f03d3452ef8f25d8290270e87bcccfd205c1d745d4598b42714de4729b19023&o=, https://cf.bstatic.com/xdata/images/hotel/max200/385468719.jpg?k=3cdc92914d4d360e3d16daad80c1a087b7d8ab26fda752d75adf6c4bdf942a67&o=, https://cf.bstatic.com/xdata/images/hotel/max200/382440730.jpg?k=d9816b11153b8aed295d7151015e97c94383d27daf064467a8ff0e17e51f305b&o=, https://cf.bstatic.com/xdata/images/hotel/max200/383529971.jpg?k=716375bdf11edddbfda51b809563d66afaf55b9e6b8a547ac87f6a1c187683d4&o=, https://cf.bstatic.com/xdata/images/hotel/max200/383530130.jpg?k=262a86d0a91b1139ef5ec998b6b519952557a95b4a0fe9256fbfd80a1d410cb1&o=, https://cf.bstatic.com/xdata/images/hotel/max200/382440715.jpg?k=591746917f6a6cadf4ef27ef2e99c4b14cca94a63bc3c42e3ee9ec0dc4a44ac6&o=, https://cf.bstatic.com/xdata/images/hotel/max200/382440769.jpg?k=d7bbfa1c785de2631929e5e623329ca8fb7b9cc477ea3148dc278a4ddcd3f61c&o=, https://cf.bstatic.com/xdata/images/hotel/max200/383530054.jpg?k=6da35d8f7890aa0c81a7f7c965d9e41ef3acdc1713f08f6924a30a1df1410a41&o=, https://cf.bstatic.com/xdata/images/hotel/max200/383837519.jpg?k=1e14fe8a9aaf2544ec4d1afe20b5d17575b04ed2d5f26a87a5a21e8024e0fb26&o=, https://cf.bstatic.com/xdata/images/hotel/max200/382440669.jpg?k=9aefc92d032d547018821514f738785c57ecea1e094e267dbfff2ff9fb9a56b7&o=, https://cf.bstatic.com/xdata/images/hotel/max200/388368602.jpg?k=2d3fff3cd2890b71b7672cf750e978636b97035d75dec07df4e4b093fb28e78f&o=, https://cf.bstatic.com/xdata/images/hotel/max200/383101244.jpg?k=eee43049f75e4838aa14f13db6951d3af86862a93e2c0571ca50d40b22a91dfa&o=, https://cf.bstatic.com/xdata/images/hotel/max200/559328842.jpg?k=1193fbe5277213c2c427b3fdee98723ed19dad2da16a0244e79dbed55f5902cd&o=, https://cf.bstatic.com/xdata/images/hotel/max200/559328843.jpg?k=e9cd2a93cc4c9a1f3a7bc4e7848503d786e8e944bcc2f68f6dc9bf1fb0531bf8&o=, https://cf.bstatic.com/xdata/images/hotel/max200/559328844.jpg?k=3e4fde6e795a6988794ffbbd670cadc1eea6fdde5938d41d099c06c458cfcf3b&o=)"
2246827,https://www.booking.com/hotel/tw/sunshine-guest-house.html?label=gen173nr-1BCAso5wFCFHN1bnNoaW5lLWd1ZXN0LWhvdXNlSDNYBGiqAogBAZgBDbgBF8gBDNgBAegBAYgCAagCA7gCjvqQvQbAAgHSAiQ5OTYzZjc2Ny00ZjQ2LTQ5ZTMtYjk0OC1jODQ2YzhiYTU5ZDTYAgXgAgE&sid=ebedb28c4f17f15a343d68e625f8dfbd&keep_landing=1&sb_price_type=total&type=total&,Sunshine Guest House,Hualien City,Taiwan,"No.2, Kangle Street, Hualien City, 970 Hualien City, Taiwan",23.973931951127646,121.61510512232768,True,"Sunshine Guest House offers accommodations in Hualien City, 10 miles from Liyu Lake and 24 miles from Taroko National Park. Popular points of interest nearby include Nanbin Park, Hualien City God Temple, and Meilun Mountain Park. Free Wifi, a tour desk, and a shared lounge are featured. A terrace with sea view, a cable flat-screen TV, and air conditioning are available in some units. At the homestay, every unit is equipped with a private bathroom. A car rental service is available at the homestay. Popular points of interest near Sunshine Guest House include Beibin Park Beach, Pine Garden, and Eastern Railway Site. Hualien Airport is 2.5 miles from the property.","Please inform Sunshine Guest House of your expected arrival time in advance. You can use the Special Requests box when booking, or contact the property directly using the contact details in your confirmation. License number: 09601180720",Family rooms. Free Wifi. Non-smoking rooms. Terrace. Breakfast. Family rooms. Free Wifi. Non-smoking rooms. Terrace. Breakfast,,"List(Family rooms, Free Wifi, Non-smoking rooms, Terrace, Breakfast)","List(List(Check-in, From 3:00 PM to 10:00 PM You need to let the property know what time you'll be arriving in advance.), List(Check-out, From 11:00 AM to 11:30 AM), List(Cancellation/ prepayment, Cancellation and prepayment policies vary according to accommodation type. Enter your stay dates and check the conditions of your selected option.), List(Children & Beds, Child policies Children of all ages are welcome. Children 12 and above will be charged as adults at this property. To see correct prices and occupancy info, add the number and ages of children in your group to your search. Crib and extra bed policies Cribs and extra beds aren't available at this property.), List(No age restriction, There's no age requirement for check-in), List(Pets, Pets are not allowed.), List(Accepted payment methods, Cash), List(Smoking, Smoking is not allowed.))",List(Chinese),"List(List(1 queen bed, 3, Double Room with Balcony and Sea View), List(1 queen bed, 2, Deluxe Double Room with Balcony and Sea View), List(2 queen beds, 4, Standard Quadruple Room), List(1 queen bed, 2, Deluxe Double Room with Sea View), List(1 full bed, 2, Standard Double Room))",8.6,,82,"List(List(tw, Chun-chen, Location is good and Host is very nice also the shower is very strong 👍 No), List(tw, 如, 民宿主人很nice，民宿地理位置很棒，很熱心又有耐心的介紹哪邊好玩，哪邊好吃的，有機會是會願意再次回訪入住。), List(tw, Yu, 離東大門夜市很近，停車可以停在旁邊教會。 旁邊就是海堤防，早晨和晚上可以去散步很舒服。 海景房早上可以直接在床上看日出。 老闆很熱心，直接拿出花蓮旅遊地圖推薦山線和海線行程。 一樓客廳可以邊看電視邊吃東西，但是環境要自己收拾好，畢竟還有其他住客。 每天都有清潔阿姨來打掃，不過來的時間挺早，吸塵器有點吵😂 透天厝改建的民宿，隔音不好，遇到團體客和帶小孩的家庭客會很吵。), List(at, Huang, Schönes, geräumiges Zimmer mit netter Ausstattung, unser Zimmer hatte auch eine kleine nette Terrasse. Inhaber waren sehr nett und freundlich, haben uns viele Empfehlungen für Ausflüge und Sehenswürdigkeiten gegeben. Können wir auf jeden Fall weiterempfehlen!!!), List(us, Pei-chun, 老闆人超好。剛抵達就給我們地圖，解釋地點。早餐很棒，也無限制提供（冰）保特瓶水、飲水機水。盥洗用具、拖鞋其全。房間超乾淨、也有衣架、冷氣很涼。海景很美，陽台很優！價格超親民、我們有驚艷到。還有、地點就在海邊、可以散步、戲水！非常推薦！！再加、離大東夜市很近，很棒！ 因為是民宿、若有長輩、腳不好、選擇低樓層較好。), List(tw, Joy, 地點很好, 走路就可以到東大門夜市逛, 距離太平洋公園南濱區很近, 旁邊就是海堤可以帶小孩散步看海, 老闆人很親切, 推薦了幾家在地美食, 未來有機會還會選擇入住這裡, 很安靜的地點.))","List(List(0.0, km, Taroko National Park), List(1.029, km, Nanbin Park), List(1.26, km, Qingshui Cliff), List(0.983, km, Pine Garden), List(1.275, km, Meilun Mountain Park), List(1.846, km, Taroko National Park), List(2.202, km, Hualien County Stone Sculptural Museum), List(3.154, km, A Mei Wenhua Village), List(3.809, km, 空地(不可大聲喧嘩)), List(3.985, km, Zhikaxuan Forest Park))","List(https://cf.bstatic.com/xdata/images/hotel/max200/92476975.jpg?k=06062d9b8daff38e4de02e9a8bafd650ae811236b4efaa9784e87499e5a28cb8&o=, https://cf.bstatic.com/xdata/images/hotel/max200/92476771.jpg?k=f260c2d188bfd93af70c529631b4a487563d827ac7066038ce3536e32990263b&o=, https://cf.bstatic.com/xdata/images/hotel/max200/92472490.jpg?k=e8a320d87d2571624929a48dd9a25de5acc01e48c16f6c9b6d6698ac9d660320&o=, https://cf.bstatic.com/xdata/images/hotel/max200/92700669.jpg?k=23b604be28b94c25c10c846713363fc4cd0a8b93482f0c1ab1d93e7b7c9b5f11&o=, https://cf.bstatic.com/xdata/images/hotel/max200/92472416.jpg?k=2ec39467f4c47bc40fe8f1fa3a58e84b790e056b9d71a5023f71a51291325380&o=, https://cf.bstatic.com/xdata/images/hotel/max200/92472038.jpg?k=3bd519d286328924837089c275823f74c421ed7205c09f7c24ed615606dfed9a&o=, https://cf.bstatic.com/xdata/images/hotel/max200/92472417.jpg?k=220565946e96ed39e439a1c7fe81ac491b6bd99025c29cccbc9005dca295096a&o=, https://cf.bstatic.com/xdata/images/hotel/max200/92472423.jpg?k=793cc601b9d41dca7512a877817105a1b4c7864bd81611719c48e60ded6fdba7&o=, https://cf.bstatic.com/xdata/images/hotel/max200/708540937.jpg?k=5be7499c09e24eb3925e1e76c716d8c3e20539895fd4f397695ca46ef4d0e2ca&o=, https://cf.bstatic.com/xdata/images/hotel/max200/708541042.jpg?k=2a8cebd7e270e780a7124d6a32ced8d766b5afb13dfc57575e5c6cf3088ce58b&o=, https://cf.bstatic.com/xdata/images/hotel/max200/708541116.jpg?k=c2bbea92cb2bad471e53a56d2e87c3793b0072123beb2bab27937310abe4a1a1&o=, https://cf.bstatic.com/xdata/images/hotel/max200/92701081.jpg?k=ebc58b4edf518bb10f204c324ae79c5ed98a30ac9bf09b75a7a46594d6ea9ab1&o=, https://cf.bstatic.com/xdata/images/hotel/max200/92700948.jpg?k=80e800dd2923aa876ba748606adbd4a3bd6859d2c9d49f61291421bf328678ea&o=, https://cf.bstatic.com/xdata/images/hotel/max200/92472045.jpg?k=5edfb55102127876f19d999e454cd94057d6b7c06634820f0f6ce7c3d46092d3&o=, https://cf.bstatic.com/xdata/images/hotel/max200/92472493.jpg?k=209821f85674c7a746335374940df031cfaccdd21db325b3aa3a507f2a86f792&o=, https://cf.bstatic.com/xdata/images/hotel/max200/92472495.jpg?k=f031a45f836b69fd8fcf7ec22b4ec00e653bbe6b3c5c1436cfb25e9c56633da3&o=, https://cf.bstatic.com/xdata/images/hotel/max200/92472421.jpg?k=cb394b558dd12184bcbb50f1be7f420950fcfc6f6314151a711f2d9912f1daf8&o=, https://cf.bstatic.com/xdata/images/hotel/max200/92472425.jpg?k=60d9a333d1c233bdfcb41de1f56353c2f18e73b4087b5876cd6863de22a10f9a&o=, https://cf.bstatic.com/xdata/images/hotel/max200/92472427.jpg?k=8e2fb4150a49ee6cf68c2a0229fe76d62783aad033f3b99a1641a93a93351c00&o=, https://cf.bstatic.com/xdata/images/hotel/max200/92472041.jpg?k=b11bba3df392222817f709bac81b087c801d4d12e8fcf05d5c29dc75adfb3569&o=, https://cf.bstatic.com/xdata/images/hotel/max200/92472043.jpg?k=adfb8b3faa59b9948c3904f8832bde9370352d2b31a2011a04b360b3df6734b1&o=)"
12275653,https://www.booking.com/hotel/me/srna-igalo.html,"Apartmani Srna, Igalo",Igalo,Montenegro,"Janka Beka, 85347 Igalo, Montenegro",42.460103063041,18.509718935529,False,"Providing a garden, Apartmani Srna, Igalo provides accommodations in Igalo. The property is around a 7-minute walk from Titova Vila Galeb Beach, 1.8 miles from Herceg Novi Clock Tower, and 2.1 miles from Forte Mare Fortress. Private parking can be arranged at an extra charge. Units come with air conditioning, and certain units at the apartment complex have a balcony. At the apartment complex, every unit includes a terrace, a private bathroom, and a flat-screen TV. Roman Mosaics is 18 miles from the apartment, while Sub City Shopping Center is 23 miles away.",This property does not accommodate bachelor(ette) or similar parties.,Private Parking. Wifi in all areas,,"List(Private Parking, Wifi in all areas)",List(),"List(English, Serbian)","List(List(1 twin bed, 1 full bed, 3, One-Bedroom Apartment), List(1 twin bed, 1 full bed, 3, One-Bedroom Apartment))",9.3,,7,"List(List(rs, Milutin, Domacica Zdenka je divna! Izasla nam je u susret za sve sto nam je bilo potrebno. Zaista sve pohvale!), List(no, Oda, Utrolig koselig dame som drev stedet. Sjekket alltid om alt gikk fint med oss, smilte og var blid hele tiden. Første rommet vi lå på var set fortsatt mat fra forrige folkene i fryseren så luktet litt. Men ellers bra! Du får hva du betaler for og damen som eier er verdens hyggeligste dame. Ganske bratt opp til leilighetene, så passer ikke å gå for de som er dårlig til bens.))",,"List(https://cf.bstatic.com/xdata/images/hotel/max200/569747117.jpg?k=5daec40afa68ce7c8caf692a3884d754d9e77a6c76eb8d4b71aaeeb34338dda5&o=, https://cf.bstatic.com/xdata/images/hotel/max200/569747205.jpg?k=7c42315780fe09ffd2c0961420d0a9c1180267622715d01e8e4b586870aa540c&o=, https://cf.bstatic.com/xdata/images/hotel/max200/569747223.jpg?k=6671486b998998d5d550feecd6c2c7fc178b4665bfe0d751dd303972996d6405&o=, https://cf.bstatic.com/xdata/images/hotel/max200/569747239.jpg?k=b76844cbe0fae26fb67d7da0917d4694c3e02276bc1c928159791eeff563103a&o=, https://cf.bstatic.com/xdata/images/hotel/max200/569747250.jpg?k=464cedc9a311c3b4ea70fc1d517a65f225e388cbd8125cde1d78c57674556a5b&o=, https://cf.bstatic.com/xdata/images/hotel/max200/652038057.jpg?k=f8b74d1a4b283e5e06daede16b6ab66165645c31e0396988a046a24d84715a20&o=)"
7877735,https://www.booking.com/hotel/pl/filippo-apartments4rent.html,hvile 12- Hvile Stay,Toruń,Poland,"8 Kazimierza Jagiellończyka, Stare Miasto, 87-100 Toruń, Poland",53.011607650726,18.617122400664,True,"Hvile 12- Hvile Stay in Toruń offers accommodations with free Wifi, a 14-minute walk from Planetarium, 0.6 miles from Old Town Hall, and a 12-minute walk from Copernicus Monument. The property is around 1.6 miles from Toruń Wschodni Railway Station, 1.8 miles from Atrium Copernicus Shopping Center, and 2.4 miles from Central Torun Railway Station. The property is a 2-minute walk from Toruń Miasto Railway Station and within 0.4 miles of the city center. The apartment is composed of 1 separate bedroom, a fully equipped kitchenette with a microwave and a fridge, and 1 bathroom. Towels and bed linen are offered in the apartment. The accommodation is non-smoking. Nicolaus Copernicus University is 2.8 miles from the apartment, while Bulwar Filadelfijski Promenade is a 10-minute walk from the property. Bydgoszcz Ignacy Jan Paderewski Airport is 31 miles away.","This property does not accommodate bachelor(ette) or similar parties. If you cause damage to the property during your stay, you could be asked to pay up to 300 zł after check-out, according to this property's Damage Policy . When staying at the property with children, note that the property is legally obliged to apply standards for the protection of minors to determine the identity of the minors and their relationship with the adult they’re staying with.",Parking. Free Wifi. Parking. Free Wifi,"hvile 012 to a place with character and a beautiful finish, located 5 minutes from Toruń's Old Town. Thanks to this location, we have access to all offered by the ""city of Copernicus"" at your fingertips. Our apartment is equipped with all the necessary accessories needed for the travel-standard hotel bedding, towels, coffee machine (!), small household appliances and kitchen equipment uch as pots and frying pan. A huge advantage of the location in parking ""B"" are the available parking spaces around the tenement house! We invite you for gingerbread! Hotel standards in home edition, Enjoy the moment with hvile! Our facility is located in the Old Town, in the paid parking zone ""B"" (payable: Monday - Friday, 8AM-6PM, excluding Bank Holidays). There are public parking spots in the area: for the first hour: PLN 1.50, for the second hour: PLN 1.80, for the third hour: PLN 2.10, for the fourth and each consecutive hour: PLN 1.50 PLN. Total: 16 PLN/day.","List(Parking, Free Wifi)","List(List(Check-in, From 4:00 PM to 11:30 PM You need to let the property know what time you'll be arriving in advance.), List(Check-out, Until 11:00 AM), List(Cancellation/ prepayment, Cancellation and prepayment policies vary according to accommodation type. Enter your stay dates and check the conditions of your selected option.), List(Damage policy, If you cause damage to the property during your stay, you could be asked to pay up to 300 zł after check-out, according to this property's Damage Policy .), List(Children & Beds, Child policies Children of all ages are welcome. To see correct prices and occupancy info, add the number and ages of children in your group to your search. Crib and extra bed policies Cribs and extra beds aren't available at this property.), List(Age restriction, The minimum age for check-in is 18), List(Payments by Booking.com, Booking.com takes your payment for this stay on behalf of the property, but make sure you have cash for any extras once you get there.), List(Smoking, Smoking is not allowed.), List(Parties, Parties/events are not allowed), List(Pets, Pets are not allowed.))","List(,English, ,Spanish, ,Norwegian, Polish)","List(List(1 full bed, 2, One-Bedroom Apartment))",9.3,8.7,55,"List(List(pl, Piotrowska, Było czysto. W łazience była suszarka co było dużym plusem. Duży prysznic . Wygodne łóżko.), List(pl, Kristina, Pokój w rzeczywistości jest taki jak na zdjęciach. Jest wygodny.), List(pl, Adrian, Mieszkanie świetnie zlokalizowane, w pobliżu bulwar filadelfijski, którym w ciągu kilku minut można dojść do centrum Torunia. Bardzo czysto. W zasadzie to nie ma za wiele wspólnego z mieszkaniem, ale za największy minus uważam głośne tramwaje przejeżdżające przy samym oknie.), List(pl, Adam, bardzo blisko bulwaru i starego miasta, publiczne parkingi w okolicy, ładny wystrój i urokliwa kamienica, w której znajduje się apartament Słychać przejeżdżające tramwaje, ale nie jest to jakoś mocno uciążliwe.), List(pl, Ewa, Nowoczesny apartament dosłownie 3 kroki od Starego Miasta. Czyściutko i przyjemnie, duża komfortowa łazienka. Wszystko na wysokim poziomie. Miło spędzać czas w takim miejscu.), List(pl, Yuliya, Ogólnie wszystko super,czysto duży plus za ekspres do kawy i kapsułki . Blisko starego miasta parking odrazu pod kamienicą. Łóżko bardzo wygodne Minus daje za telewizje powieszoną przy stole leżąc w łóżko nie ma szans obejrzenia TV. A kanapa jest mało wygodna.), List(pl, Skórka, Lokalizacja, pięknie i nowocześnie urządzone wnętrze, czystość apartamentu, dobra komunikacja z właścicielami, łatwa obsługa lokalu Miłe zaskoczenie, w lodówce kostki lodu 😀 Jedyne, do czego można się przyczepić to brak gąbki do mycia naczyń I podstawowych kosmetyków w łazience (żel pod prysznic, szampon)), List(pl, Sebastian, 5 minut piechotą od starówki. Czysto, schludnie. Cisza i spokój. Problem z miejscem parkingowym ale to nie problem apartamentu, a niestety całego Torunia. Na wyposażeniu przydały by się ręczniki papierowe i jakiś odświeżacz do powietrza w łazience.), List(pl, Elżbieta, Apartament w pięknej, odnowionej kamienicy. Nie było problemu ze znalezieniem miejsca parnikowego na tej samej ulicy. Wnętrza czyste, zadbane. Kuchnia wyposażona tak jak potrzeba. Jasny przekaz co do sposób odbioru kluczy. Bardzo miły i pomocny personel, który udzielił mi informacji co do późniejszego wymeldowania, które bez problemu zostało zrealizowane. Polecam Nie mam zastrzeżeń tylko klimatyzacja podczas upału bardzo by się przydała), List(pl, Yana, Дуже гарне помешкання! Чудове розташування, все в пішій доступності, з гарним інтер'єром, чистота, зручне ліжко, все що потрібно з кухонного обладнання. З задоволенням повернемося в місто в саме це помешкання.))","List(List(0.291, km, Klasztor Benedyktynek), List(0.381, km, Przytułek Dla Ubogich), List(0.382, km, New Town Market), List(0.405, km, Gospoda Pod Modrym Fartuchem), List(0.432, km, Kościół Ewangelicki pw. św. Trójcy), List(0.515, km, Gospoda Cechowa), List(0.355, km, Zespół Koszar Bramy Lubickiej), List(0.451, km, Apteka pod Lwem), List(0.475, km, Młyn Garbarski), List(0.498, km, Teutonic Castle))","List(https://cf.bstatic.com/xdata/images/hotel/max200/325942604.jpg?k=fd1d51d2439a4cba0e27a170dc7ded3254a6fdb17fe06682c16591bcf0abcf1a&o=, https://cf.bstatic.com/xdata/images/hotel/max200/325942587.jpg?k=a34e6ee761473485579d59a432b0850796a0daeef5f63354d995cb0a96708b75&o=, https://cf.bstatic.com/xdata/images/hotel/max200/325942560.jpg?k=dc8c3dbf95ee774f88d2617bb0998d74d69e9bbc3fa915f6ba028d1076f13894&o=, https://cf.bstatic.com/xdata/images/hotel/max200/325942541.jpg?k=576cfd0dfc370386c67aae04d97c4be7cc3e31881293dc138829ce72caddf55b&o=, https://cf.bstatic.com/xdata/images/hotel/max200/325942569.jpg?k=4c8a445c7e9778f5f26d1689073c1779541dfa26d0a9d96b6dccd1db830b9fae&o=, https://cf.bstatic.com/xdata/images/hotel/max200/325942547.jpg?k=b25a8bd73ad1c59876cad76c1735bbaa886063f29e65ed64c045a1360186fa09&o=, https://cf.bstatic.com/xdata/images/hotel/max200/325942542.jpg?k=7eb035b1e716560cfbeece3589e83790cf0d38d6f53c2bcf51d98fc7edcdb89b&o=, https://cf.bstatic.com/xdata/images/hotel/max200/325942545.jpg?k=ca7e1aa099d1edd8390d29a465e6447088d9587e1d5f5adee0cfec26b6284207&o=, https://cf.bstatic.com/xdata/images/hotel/max200/325942543.jpg?k=7bfdf02332739b97f41f7f4638ae5552ae9cf6dcce198307a1edc823991fcbfa&o=, https://cf.bstatic.com/xdata/images/hotel/max200/325942519.jpg?k=65bf1ae68d05e79a6e5e9c46a9cc98d705993afd6be5098687100bfbcc6283c4&o=, https://cf.bstatic.com/xdata/images/hotel/max200/325942571.jpg?k=55889d4ac1900b622254e247a282d438b54123e3c111cf74c70899f602c4a0dc&o=, https://cf.bstatic.com/xdata/images/hotel/max200/325942579.jpg?k=5aea7bb65d0bab10e2dc123fb0edbc2f0c6c3824700800a6c80c597d997010cb&o=, https://cf.bstatic.com/xdata/images/hotel/max200/325942601.jpg?k=852ef0689ededc264b4dbd26517f7bc7ecf0b5f8ac5b71f790243c0699fd7c73&o=, https://cf.bstatic.com/xdata/images/hotel/max200/325942581.jpg?k=753f316a39307e20b7e1703ab12798e5ddfc5943dc690e7ca30fa5b644388433&o=, https://cf.bstatic.com/xdata/images/hotel/max200/325942521.jpg?k=98ae474709df1a4abbff61133b4bfe534fe39b099f1bb931b7fb9800ae395910&o=, https://cf.bstatic.com/xdata/images/hotel/max200/579229124.jpg?k=2465b2b0ad51b9d47376fdc5ab5ed4aaeea9d175ed7e19f47ee447737cec6987&o=)"
1985847,https://www.booking.com/hotel/au/undara-experience.html,Discovery Resorts - Undara,Mount Surprise,Australia,"Undara Volcanic National Park Savannah Way, 4871 Mount Surprise, Australia",-18.2022428262032,144.572267532349,False,"Located in Mount Surprise, Discovery Resorts - Undara has accommodations with a year-round outdoor pool, free WiFi, a garden and a restaurant. Some units are air-conditioned and include a balcony and/or a patio, as well as a seating area. A grill is available on site and hiking can be enjoyed within close proximity of the lodge.","Guests under the age of 18 (minors) must be supervised by a parent or guardian at all times while at the park. It is your responsibility to ensure the personal safety, welfare and protection of all minors in your group at all times during their stay at the park. Guests under the age of 18 can only check in with a parent or official guardian.",Outdoor swimming pool. Non-smoking rooms. Free Wifi. Restaurant. Free parking. Bar. Outdoor swimming pool. Non-smoking rooms. Free Wifi. Restaurant. Free parking. Bar,,"List(Outdoor swimming pool, Non-smoking rooms, Free Wifi, Restaurant, Free parking, Bar)","List(List(Check-in, From 2:00 PM to 5:00 PM), List(Check-out, Until 10:30 AM), List(Cancellation/ prepayment, Cancellation and prepayment policies vary according to accommodation type. Enter your stay dates and check the conditions of your selected option.), List(Children & Beds, Child policies Children of all ages are welcome. Children 17 and above will be charged as adults at this property. To see correct prices and occupancy info, add the number and ages of children in your group to your search. Crib and extra bed policies Cribs and extra beds aren't available at this property.), List(Age restriction, The minimum age for check-in is 18), List(Pets, Pets are not allowed.), List(Cards accepted at this property, Cash is not accepted))",List(English),"List(List(1 full bed, 1 sofa bed, 3, Pioneer Hut), List(1 full bed, 3, Railway Carriage - Queen), List(2 twin beds, 3, Railway Carriage - Twin), List(1 king bed, 2, The Homestead), List(2 twin beds, 2, Swag Tent - Twin), List(4 twin beds, 4, Swag Tent - Quad), List(2 twin beds, 1 full bed, 4, Swag Tent - Family))",8.4,,66,"List(List(us, Ralph, Love Undara. It was my third visit. The Tunnel Cave tour was excellent. And the food was pretty good. We stayed in a swag this time. They’re getting very tatty and some are in shambles. Very ghetto. Time to replace!!), List(au, Wayne, Great accommodation and facilities close to all local attractions. Price increase during tourist season), List(au, Tim, Great experience staying in the national park. Stayed in one of the rail carriages which was a unique experience. Bush breakfast and lava tube walking tour were both great. Food was delicious (kangaroo steaks). Wifi was non existent in the room. Had to use my device in the dining area.), List(au, Shelley, The property is amazing - we stayed in the train carriage and it was surprisingly spacious - with nice sitting area, a comfortable bed in good sized space and a good sized bathroom. Every thing about the experience was wonderful.), List(au, Alex, A very comfortable Cabin. Great meal. Excellent tour and other walks. Would highly recomend the experience.), List(au, Angelika, It was a great experience for us and our family from Germany Thee logs are not the most comfortable seating otherwise is was a fun morning), List(au, Ja, I made my own breakfast at the camp kitchen. I liked the walks around the accommodation - The Bluff, Atkinson's Lookout, The Swamp Walk. The Lava tubes/caves were incredible as was our guide. Also met a lovely fellow who was clearing and organising the grounds - lovely chap...sorry, forgotten his name. My room was spacious, clean and quiet and close to the fabulous eating spot. Bathroom fab too. No kitchenette in the room...just a kettle. The restaurant was fabulous and food delicious and the eating area was stunning! Enjoyed all my tours - Boardwalk tour, Wind Tunnel tour and Sunset tour with helpful and enthusiastic guides. Would have loved a microwave but not a problem really.), List(au, Kerrie, Nature setting, railway carriage accomodation and the bush breakfast and restuarant and bar Nil), List(au, Nina, Really awesome carriages! Such a cute set up. Lots of space in the common area to get together with the group in the evening.), List(us, Nicosia, Great place, beautiful views, great food and service. Bush Breakfast was fun and the sunrises should cost extra, I recommend.))",List(),"List(https://cf.bstatic.com/xdata/images/hotel/max200/80722901.jpg?k=ef100904dd61e6bce92b2366b83432a7e603574c01b27bb2b7be36c21b9ff5fa&o=, https://cf.bstatic.com/xdata/images/hotel/max200/578663419.jpg?k=e2b5045b0b4870057c41e78ca8c3b22f5675b77dc88a1a6641c7244d8fbb97ca&o=, https://cf.bstatic.com/xdata/images/hotel/max200/82285721.jpg?k=a75da2e679f144b2da0d70999d96cdd1f9b7f5985342b9b4a20c29d04e154749&o=, https://cf.bstatic.com/xdata/images/hotel/max200/80722312.jpg?k=e1da92f807cf01b115ca413dabde1e765974a8ae716a59d897e65ab77725acd6&o=, https://cf.bstatic.com/xdata/images/hotel/max200/79974769.jpg?k=2c4e20970bc2951eff22128078f5f2c3d9baf67bcd1cf95a0cf8eb89bc9847a6&o=, https://cf.bstatic.com/xdata/images/hotel/max200/578663991.jpg?k=271c7105c199866dcdad12f217beccb19827031c91c981e70b009201e4743a4f&o=, https://cf.bstatic.com/xdata/images/hotel/max200/578663990.jpg?k=d717345c009b1e7c6adbf0bec46cba3a2f4d863a91b80e84ed703a0277be44ae&o=, https://cf.bstatic.com/xdata/images/hotel/max200/80719179.jpg?k=7cf38588c5584b3f22501a6d90408717bfd6cd4466b321dce883a24163029ee2&o=, https://cf.bstatic.com/xdata/images/hotel/max200/79974767.jpg?k=571e4f4620233e034baa340ae9a4f806eca11dc0d6b5d89d7d5bfd0e5525091f&o=, https://cf.bstatic.com/xdata/images/hotel/max200/79974770.jpg?k=3d8233b0a2c09725c82d8fecfcad4b68fc37a43605567e684b9bf493a80a7b29&o=, https://cf.bstatic.com/xdata/images/hotel/max200/80722600.jpg?k=b482603b439edf475348e1b7ed9b18250f5f9d7a18429f9f9981345dedbc123c&o=, https://cf.bstatic.com/xdata/images/hotel/max200/80713990.jpg?k=96671a96d1e326d943d2f35da03923c15843c253edc70eac6ad279ee80752915&o=, https://cf.bstatic.com/xdata/images/hotel/max200/80722182.jpg?k=421cf06f0e771a219a14f58daed7f7b61ac8ef2f43be7a55c0541236fa9976cb&o=, https://cf.bstatic.com/xdata/images/hotel/max200/79974768.jpg?k=dea6e399c74f9a64f1b841d54e5cbe7c0a44cfca2e839d6287d2b6d0335910d5&o=, https://cf.bstatic.com/xdata/images/hotel/max200/80719931.jpg?k=d0773fb48710bd10c033a16aa1576eaebb1c950f3af1cd45beb2e63909a0fb2d&o=, https://cf.bstatic.com/xdata/images/hotel/max200/80720178.jpg?k=3198b013532d1b6e9888c9472973780c46ee1a8f535bfbd634369202f9b20555&o=, https://cf.bstatic.com/xdata/images/hotel/max200/80722583.jpg?k=86585f2a8b2710d62b2e53eca7e758e7c5d09d08887728ca4752343f870c9d8c&o=, https://cf.bstatic.com/xdata/images/hotel/max200/80721907.jpg?k=c0dff10e52be96b0c2f3e2423502dc60e000a2b35163d1674d67f89d9652aba0&o=, https://cf.bstatic.com/xdata/images/hotel/max200/80721873.jpg?k=4030cfa466e2485d3e90cca21ed636cfc0d0d97a38b529cf6fca75c291fe72c6&o=, https://cf.bstatic.com/xdata/images/hotel/max200/578662445.jpg?k=d86f2ccde98720315bc3c373061c7ea2529c687f5155dff732b5620017f8acc4&o=, https://cf.bstatic.com/xdata/images/hotel/max200/80720256.jpg?k=0b6c88f2c4a6e564228d13cd983882c5c658406145cd5b9a43bcbbabd848d41d&o=, https://cf.bstatic.com/xdata/images/hotel/max200/79974777.jpg?k=e9e68d425c96028b4dcf79e9977a819e20ea70f0c68b9f37094f75f2b7ee4aa1&o=, https://cf.bstatic.com/xdata/images/hotel/max200/80721074.jpg?k=c15e54001f4da9635ee291c47f423a6782873b2573e67982fb21b44d31a2e38a&o=, https://cf.bstatic.com/xdata/images/hotel/max200/578663417.jpg?k=422570456d3849d2e393747c36a4839ab69f7942b33dccf821f65de9d8751db1&o=, https://cf.bstatic.com/xdata/images/hotel/max200/578663418.jpg?k=c68bc0729f245b129e65d9520d32ccefac38a761041fbb8f163a25ea2b38f026&o=, https://cf.bstatic.com/xdata/images/hotel/max200/578663422.jpg?k=5f87c95c7eac3921cfc321c5aa7e3623b330bd2cea6b05608efc7d0ba7fd27eb&o=, https://cf.bstatic.com/xdata/images/hotel/max200/578664389.jpg?k=4f45fbd4e860c624843ac8dd3a975d31452d8c704492824d128a34bfc847d9e7&o=, https://cf.bstatic.com/xdata/images/hotel/max200/578664399.jpg?k=3fc59750be3d3f8c1524b7e2e8cec16c1236d067145f4b2f2a64d9a2a24ce05c&o=, https://cf.bstatic.com/xdata/images/hotel/max200/578664400.jpg?k=59b49a29c76c7216c507d80175bf8da783644c4dc18a9f5740e548557c964333&o=, https://cf.bstatic.com/xdata/images/hotel/max200/665235491.jpg?k=a943be311b3c93193fe1ddc779c1482319b7acec52c8e6ad05615e7c0be1c46a&o=)"


count_attractions,count_cafes,count_coworking,count_museums,count_nightlife,count_parks_playgrounds,count_pharmacies,count_restaurants,count_supermarkets,grid_id,grid_lat,grid_lon,min_dist_attractions,min_dist_cafes,min_dist_coworking,min_dist_museums,min_dist_nightlife,min_dist_parks_playgrounds,min_dist_pharmacies,min_dist_restaurants,min_dist_supermarkets
5,55,0,14,15,58,14,110,38,1031_4665,41.90994845673136,12.452050966503547,407.0389752309149,41.74892695746815,,411.8914551385195,23.3515835881438,143.04158551495254,119.41777498035307,11.362537115685626,79.06444594041352
4,29,0,0,1,4,1,64,4,5568_2804,25.19202328360665,55.280871035420496,389.86028782307466,169.85382741875907,,,297.72696380310526,392.51041626647367,127.2910866153603,133.52186396782363,141.05261177859663
2,20,0,0,10,5,13,43,33,1708_4438,39.87196281696668,19.99811947860087,391.87738653868365,177.3337686150511,,,176.5283513279118,421.3363820704234,269.2073670749508,79.5968919570971,207.15845287604589
0,0,0,0,0,0,0,0,0,-8295_3381,30.37694154221444,-86.3670273163289,,,,,,,,,
0,25,0,1,3,29,4,11,20,1217_4911,44.1198711150295,15.235810953063543,,164.09622910411272,,794.3900253445852,308.47120290637906,299.0474921693966,269.10129820425766,61.075687884002924,133.1685934156732


## 3) Recompute `grid_id` on Booking Listings (Exact Stage 3 Logic)

In [0]:
booking_with_grid = (
    booking
    .withColumn("lat_rad", F.radians("lat"))
    .withColumn("grid_y", F.floor(F.col("lat") * (METERS_PER_DEG_LAT / GRID_M)).cast("long"))
    .withColumn(
        "grid_x",
        F.floor(F.col("lon") * (METERS_PER_DEG_LAT * F.cos(F.col("lat_rad")) / GRID_M)).cast("long")
    )
    .withColumn("grid_id", F.concat_ws("_", "grid_x", "grid_y"))
    .drop("lat_rad")
)

display(booking_with_grid.select("hotel_id", "lat", "lon", "grid_x", "grid_y", "grid_id").limit(10))
print("Unique grids in booking:", booking_with_grid.select("grid_id").distinct().count())


hotel_id,lat,lon,grid_x,grid_y,grid_id
8908679,50.3330617,18.7037022,1329,5603,1329_5603
2246827,23.973931951127646,121.61510512232768,12370,2668,12370_2668
12275653,42.460103063041,18.509718935529,1520,4726,1520_4726
7877735,53.011607650726,18.617122400664,1246,5901,1246_5901
1985847,-18.2022428262032,144.572267532349,15288,-2027,15288_-2027
6921514,47.9016024,1.9014035,141,5332,141_5332
4767573,33.6760868104757,135.3375215895768,12537,3748,12537_3748
291872,46.08231778700002,11.102725267410278,857,5129,857_5129
1383532,4.641235779558954,-75.57255568846101,-8386,516,-8386_516
11140846,35.4390352,133.3418563,12093,3945,12093_3945


Unique grids in booking: 673318


## 5) Join Listing Rows with Grid Enrichment

In [0]:
joined = (
    booking_with_grid.alias("b")
    .join(enrich.alias("e"), on="grid_id", how="left")
    .withColumn("has_enrichment", F.when(F.col("e.grid_lat").isNotNull(), F.lit(1)).otherwise(F.lit(0)))
)

coverage = joined.agg(F.avg("has_enrichment").alias("coverage")).collect()[0]["coverage"]
print("Coverage fraction:", float(coverage))

display(joined.select("hotel_id", "grid_id", "has_enrichment").limit(10))


Coverage fraction: 0.15657881373381602


hotel_id,grid_id,has_enrichment
8908679,1329_5603,0
2246827,12370_2668,0
12275653,1520_4726,1
7877735,1246_5901,1
1985847,15288_-2027,0
6921514,141_5332,0
4767573,12537_3748,0
291872,857_5129,0
1383532,-8386_516,0
11140846,12093_3945,0


## 6) Feature Engineering (Numeric, Interpretable, ML-Friendly)

In [0]:
COUNT_COLS = [c for c in joined.columns if c.startswith("count_")]
DIST_COLS  = [c for c in joined.columns if c.startswith("min_dist_")]

print("COUNT_COLS:", COUNT_COLS)
print("DIST_COLS:", DIST_COLS)

df = joined

# 1) Fill counts null -> 0
for c in COUNT_COLS:
    df = df.withColumn(c, F.coalesce(F.col(c), F.lit(0)).cast("long"))

# 2) Cap distances at RADIUS_M (keep nulls as nulls)
for c in DIST_COLS:
    df = df.withColumn(
        c,
        F.when(F.col(c).isNull(), F.lit(None).cast("double"))
         .otherwise(F.least(F.col(c).cast("double"), F.lit(RADIUS_M)))
    )

# 3) log1p counts
for c in COUNT_COLS:
    df = df.withColumn(f"log1p_{c}", F.log1p(F.col(c).cast("double")))

# Helper for flags
def proximity_flag(dist_col: str, thr: float):
    return (
        F.when(
            (F.col("has_enrichment") == 1) &
            F.col(dist_col).isNotNull() &
            (F.col(dist_col) <= F.lit(thr)),
            F.lit(1)
        ).otherwise(F.lit(0)).cast("int")
    )

# 4) Proximity flags + 5) inverse distance
for d in DIST_COLS:
    base = d.replace("min_dist_", "")  # e.g., cafes, supermarkets, parks_playgrounds

    # Proximity flags
    for thr in PROX_THRESHOLDS_M:
        df = df.withColumn(f"{base}_within_{int(thr)}m", proximity_flag(d, thr))

    # Inverse distance (0 if null or not enriched)
    df = df.withColumn(
        f"inv_{base}_dist",
        F.when((F.col("has_enrichment") == 1) & F.col(d).isNotNull(), 1.0 / (1.0 + F.col(d)))
         .otherwise(F.lit(0.0))
         .cast("double")
    )


display(df.select(
    "hotel_id", "grid_id", "has_enrichment",
    *COUNT_COLS,
    *[c for c in df.columns if c.startswith("log1p_count_")][:5]  # preview a few
).limit(5))


COUNT_COLS: ['count_attractions', 'count_cafes', 'count_coworking', 'count_museums', 'count_nightlife', 'count_parks_playgrounds', 'count_pharmacies', 'count_restaurants', 'count_supermarkets']
DIST_COLS: ['min_dist_attractions', 'min_dist_cafes', 'min_dist_coworking', 'min_dist_museums', 'min_dist_nightlife', 'min_dist_parks_playgrounds', 'min_dist_pharmacies', 'min_dist_restaurants', 'min_dist_supermarkets']


hotel_id,grid_id,has_enrichment,count_attractions,count_cafes,count_coworking,count_museums,count_nightlife,count_parks_playgrounds,count_pharmacies,count_restaurants,count_supermarkets,log1p_count_attractions,log1p_count_cafes,log1p_count_coworking,log1p_count_museums,log1p_count_nightlife
8908679,1329_5603,0,0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,0.0
2246827,12370_2668,0,0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,0.0
12275653,1520_4726,1,0,27,0,0,5,6,3,18,18,0.0,3.332204510175204,0.0,0.0,1.791759469228055
7877735,1246_5901,1,22,31,0,19,13,19,9,100,22,3.1354942159291497,3.4657359027997265,0.0,2.995732273553991,2.639057329615259
1985847,15288_-2027,0,0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,0.0


## 7) Save Outputs

In [0]:
OUT_FULL = f"{OUT_BASE}/booking_listing_features_full"
OUT_ENRICHED = f"{OUT_BASE}/booking_listing_features_enriched"

# Save full (overwrite)
df.write.mode("overwrite").parquet(OUT_FULL)
print("Saved FULL dataset to:", OUT_FULL)

# Save enriched-only subset
df_enriched = df.filter(F.col("has_enrichment") == 1)
df_enriched.write.mode("overwrite").parquet(OUT_ENRICHED)
print("Saved ENRICHED dataset to:", OUT_ENRICHED)

print("Enriched rows:", df_enriched.count())


Saved FULL dataset to: dbfs:/tmp/booking_stage4/booking_listing_features_full
Saved ENRICHED dataset to: dbfs:/tmp/booking_stage4/booking_listing_features_enriched
Enriched rows: 507220


## 8) Sanity Checks (Quick)

In [0]:
full = spark.read.parquet(OUT_FULL)
enr  = spark.read.parquet(OUT_ENRICHED)

display(full.select("has_enrichment").groupBy("has_enrichment").count())
display(enr.select("hotel_id", "grid_id", "count_cafes", "min_dist_cafes").limit(5))


has_enrichment,count
1,507220
0,2732171


hotel_id,grid_id,count_cafes,min_dist_cafes
11505077,531_6112,12,15.89462229269765
7582063,-149_5414,8,77.98197577344027
51504,16_5501,7,63.93609068902704
4763541,566_4848,25,43.35349802596985
427228,-395_4087,92,91.82577124742332


In [0]:
from pyspark.sql import functions as F


ENRICH_COUNT_COLS = [c for c in full.columns if c.startswith("count_")]
ENRICH_LOG_COLS   = [c for c in full.columns if c.startswith("log1p_count_")]
ENRICH_WITHIN_COLS = [c for c in full.columns if c.endswith("_within_200m")
                                           or c.endswith("_within_500m")]
ENRICH_INV_DIST_COLS = [c for c in full.columns if c.startswith("inv_") and c.endswith("_dist")]

# Optional spatial metadata
OPTIONAL_GRID_COLS = ["grid_id"]  # add grid_lat/grid_lon if you want


SCRAPED_FEATURE_COLS = (
    ["hotel_id", "has_enrichment"] +
    OPTIONAL_GRID_COLS +
    ENRICH_COUNT_COLS +
    ENRICH_LOG_COLS +
    ENRICH_WITHIN_COLS +
    ENRICH_INV_DIST_COLS
)

print("Number of scraped ML features:", len(SCRAPED_FEATURE_COLS) - 1)
print("Preview feature names:")
print(SCRAPED_FEATURE_COLS[:15], "...")


scraped_features_ml = full.select(*SCRAPED_FEATURE_COLS)

print("Rows:", scraped_features_ml.count())
display(scraped_features_ml.limit(10))


Number of scraped ML features: 47
Preview feature names:
['hotel_id', 'has_enrichment', 'grid_id', 'count_attractions', 'count_cafes', 'count_coworking', 'count_museums', 'count_nightlife', 'count_parks_playgrounds', 'count_pharmacies', 'count_restaurants', 'count_supermarkets', 'log1p_count_attractions', 'log1p_count_cafes', 'log1p_count_coworking'] ...
Rows: 3239391


hotel_id,has_enrichment,grid_id,count_attractions,count_cafes,count_coworking,count_museums,count_nightlife,count_parks_playgrounds,count_pharmacies,count_restaurants,count_supermarkets,log1p_count_attractions,log1p_count_cafes,log1p_count_coworking,log1p_count_museums,log1p_count_nightlife,log1p_count_parks_playgrounds,log1p_count_pharmacies,log1p_count_restaurants,log1p_count_supermarkets,attractions_within_200m,attractions_within_500m,cafes_within_200m,cafes_within_500m,coworking_within_200m,coworking_within_500m,museums_within_200m,museums_within_500m,nightlife_within_200m,nightlife_within_500m,parks_playgrounds_within_200m,parks_playgrounds_within_500m,pharmacies_within_200m,pharmacies_within_500m,restaurants_within_200m,restaurants_within_500m,supermarkets_within_200m,supermarkets_within_500m,inv_attractions_dist,inv_cafes_dist,inv_coworking_dist,inv_museums_dist,inv_nightlife_dist,inv_parks_playgrounds_dist,inv_pharmacies_dist,inv_restaurants_dist,inv_supermarkets_dist
8908679,0,1329_5603,0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2246827,0,12370_2668,0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
12275653,1,1520_4726,0,27,0,0,5,6,3,18,18,0.0,3.332204510175204,0.0,0.0,1.791759469228055,1.9459101490553128,1.3862943611198906,2.9444389791664403,2.9444389791664403,0,0,1,1,0,0,0,0,1,1,1,1,1,1,1,1,1,1,0.0,0.0153036841356339,0.0,0.0,0.0058273144981756,0.0063266317325675,0.0134012763755067,0.0084283977470315,0.0129830824928303
7877735,1,1246_5901,22,31,0,19,13,19,9,100,22,3.1354942159291497,3.4657359027997265,0.0,2.995732273553991,2.639057329615259,2.995732273553991,2.302585092994046,4.61512051684126,3.1354942159291497,1,1,1,1,0,0,1,1,1,1,1,1,1,1,1,1,1,1,0.0088892505633872,0.0181223373616043,0.0,0.0091133810065073,0.0257114118144317,0.0060441873660903,0.0440120367088759,0.0160289511214863,0.0191414231546045
1985847,0,15288_-2027,0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6921514,0,141_5332,0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4767573,0,12537_3748,0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
291872,0,857_5129,0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1383532,0,-8386_516,0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
11140846,0,12093_3945,0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [0]:
OUT_SCRAPED_FEATURES = "dbfs:/tmp/booking_stage4/scraped_enrichment_features_ml"

scraped_features_ml.write.mode("overwrite").parquet(OUT_SCRAPED_FEATURES)

print("Saved scraped enrichment ML feature table to:")
print(OUT_SCRAPED_FEATURES)


Saved scraped enrichment ML feature table to:
dbfs:/tmp/booking_stage4/scraped_enrichment_features_ml
