# BabiGuide Data Cleaning Exploration

This notebook is used for exploring the data cleaning steps performed on the Abidjan local business dataset.  

All the **final cleaning steps** and the reproducible code can be found in the Python script `scripts/Cleaned_data.py`:


The **cleaned datasets** are saved in the `data/processed/` folder in both GeoJSON and CSV formats for further analysis.  

This notebook mainly serves as an **exploratory environment** to test, inspect, and validate the cleaning process before finalizing it in the script.  

Key steps included:
- Filtering and selecting relevant columns.
- Handling missing values (`name`, `amenity`, `shop`, `tourism`).
- Removing duplicates.
- Standardizing business types (amenity mapping and merging categories).
- Adding dummy reviews and ratings for analysis purposes.

In [1]:
import os
import geopandas as gpd

# Data path
file_path = "../data/raw/abidjan_pois.geojson"

# Check if file exists
if os.path.isfile(file_path):
    print("File found!")
    # Load raw data
    gdf = gpd.read_file(file_path)
    # Inspect first few rows
    print(gdf.head())
else:
    print("File not found. Check the path:", file_path)

File found!
            name name:en           amenity man_made  shop tourism  \
0          Powex    None              fuel     None  None    None   
1           None    None            police     None  None    None   
2  ASA Formation    None           college     None  None    None   
3    Lavage Auto    None          car_wash     None  None    None   
4   Orange Money    None  bureau_de_change     None  None    None   

  opening_hours  beds rooms addr:full addr:housenumber addr:street addr:city  \
0          None  None  None      None             None        None      None   
1          None  None  None      None             None        None      None   
2          None  None  None      None             None        None      None   
3          None  None  None      None             None        None      None   
4          None  None  None      None             None        None      None   

  source name:fr       osm_id osm_type                  geometry  
0   None    None  1193538

In [2]:
print("Original rows:", len(gdf))

Original rows: 55875


In [3]:
print("Columns:", gdf.columns)

Columns: Index(['name', 'name:en', 'amenity', 'man_made', 'shop', 'tourism',
       'opening_hours', 'beds', 'rooms', 'addr:full', 'addr:housenumber',
       'addr:street', 'addr:city', 'source', 'name:fr', 'osm_id', 'osm_type',
       'geometry'],
      dtype='object')


In [4]:
print("Missing values per column:\n", gdf.isna().sum())

Missing values per column:
 name                15605
name:en             55630
amenity             29305
man_made            53244
shop                29613
tourism             54151
opening_hours       54428
beds                55874
rooms               55863
addr:full           55844
addr:housenumber    55720
addr:street         54943
addr:city           55049
source              45858
name:fr             54995
osm_id                  0
osm_type                0
geometry                0
dtype: int64


In [5]:
columns_to_keep = ['name', 'amenity', 'shop', 'tourism', 'osm_id', 'osm_type', 'geometry', 'opening_hours']
gdf_clean = gdf[columns_to_keep]
gdf_clean = gdf_clean.dropna(subset=['amenity'])
gdf_clean = gdf_clean.drop_duplicates()
gdf_clean = gdf_clean.reset_index(drop=True)

In [6]:
print("Number of rows before cleaning:", len(gdf))
print("Number of rows after cleaning:", len(gdf_clean))
print("Missing values per column:\n", gdf_clean.isnull().sum())
print("Columns in cleaned data:\n", gdf_clean.columns)

Number of rows before cleaning: 55875
Number of rows after cleaning: 26570
Missing values per column:
 name              5836
amenity              0
shop             26435
tourism          26562
osm_id               0
osm_type             0
geometry             0
opening_hours    25552
dtype: int64
Columns in cleaned data:
 Index(['name', 'amenity', 'shop', 'tourism', 'osm_id', 'osm_type', 'geometry',
       'opening_hours'],
      dtype='object')


In [7]:
gdf_clean['name'] = gdf_clean['name'].fillna('Unknown')
gdf_clean['shop'] = gdf_clean['shop'].fillna('None')
gdf_clean['tourism'] = gdf_clean['tourism'].fillna('None')

In [8]:
print("Missing values per column:\n", gdf_clean.isnull().sum())

Missing values per column:


 name                 0
amenity              0
shop                 0
tourism              0
osm_id               0
osm_type             0
geometry             0
opening_hours    25552
dtype: int64


In [9]:
import random
import numpy as np

# Set random seed for reproducibility
random.seed(42)
np.random.seed(42)

# Add dummy reviews (number of reviews)
gdf_clean['reviews'] = [random.randint(0, 500) for _ in range(len(gdf_clean))]

# Add dummy ratings (1-5 stars, 1 decimal)
gdf_clean['rating'] = [round(random.uniform(1, 5), 1) for _ in range(len(gdf_clean))]

# Check
print(gdf_clean[['name', 'amenity', 'reviews', 'rating']].head())

            name           amenity  reviews  rating
0          Powex              fuel      327     2.1
1        Unknown            police       57     1.0
2  ASA Formation           college       12     3.7
3    Lavage Auto          car_wash      379     2.7
4   Orange Money  bureau_de_change      140     4.4


In [10]:
print(gdf_clean['amenity'].unique())

['fuel' 'police' 'college' 'car_wash' 'bureau_de_change' 'restaurant'
 'community_centre' 'events_venue' 'bank' 'place_of_worship' 'food_court'
 'pub' 'post_office' 'school' 'courthouse' 'bus_station' 'ferry_terminal'
 'clinic' 'marketplace' 'pharmacy' 'university' 'weighbridge' 'doctors'
 'money_transfer' 'bicycle_parking' 'taxi' 'nightclub' 'parking'
 'water_point' 'bar' 'toilets' 'cafe' 'prison' 'driving_school' 'mortuary'
 'ice_cream' 'waste_basket' 'hospital' 'shelter' 'townhall'
 'public_building' 'government' 'nursing_home' 'products' 'atm'
 'waste_disposal' 'internet_cafe' 'drinking_water' 'monastery' 'fast_food'
 'motorcycle_repair' 'social_facility' 'garage auto' 'social_centre'
 'veterinary' 'bench' 'car_rental' 'kindergarten' 'bicycle_repair_station'
 'recycling' 'health_post' 'Salon de coiffure DAME' 'vending_machine'
 'Magasin de meche' 'telephone' 'parking_space' 'post_box'
 'Maria coiffure' 'Garage auto' 'library' 'motocycle_repair' 'stripclub'
 'arts_centre' 'baby_hatc

In [11]:
print(gdf_clean['shop'].unique())

['None' 'yes' 'no' 'beverages' 'hardware' 'car_repair' 'coffee'
 'motorcycle_repair' 'bakery' 'beauty' 'fuel' 'Kiosque café'
 'Buvette traditionnelle' 'money_transfer' 'alcohol' 'kiosk' 'tattoo'
 'computer' 'dry_cleaning' 'Maquis kaplin' 'pastry' 'funeral_directors'
 'religion' 'supermarket' 'ice_cream' 'copyshop' 'chemist' 'car' 'music'
 'optician' 'orange money' 'seafood' 'jewelry']


In [12]:
print(gdf_clean['tourism'].unique())

['None' 'attraction' 'hotel']


In [13]:
import pandas as pd

df = gdf_clean.copy()

# Replace 'None' strings with real NA
df["amenity"] = df["amenity"].replace("None", pd.NA)
df["shop"] = df["shop"].replace("None", pd.NA)
df["tourism"] = df["tourism"].replace("None", pd.NA)

In [14]:
# Define which amenities are considered local businesses vs non-business/public
business_amenities = [
    'fuel', 'car_wash', 'bureau_de_change', 'restaurant', 'bank', 'food_court',
    'pub', 'clinic', 'pharmacy', 'doctors', 'money_transfer', 'nightclub',
    'bar', 'marketplace', 'driving_school', 'ice_cream', 'atm', 'internet_cafe',
    'cafe', 'fast_food', 'motorcycle_repair', 'garage auto', 'veterinary',
    'car_rental', 'bicycle_repair_station', 'stripclub', 'studio', 'boat_rental',
    'coworking_space', 'cinema', 'dentist', 'brothel', 'casino',
    'mobile_money_agent', 'shipping', 'car_sharing', 'microfinance_bank',
    'theatre', 'music_school', 'conference_centre', "O'TOPAZ, Pâtisserie",
    'charging_station', 'cars', 'tattoos', 'Pressing', 'animal_breeding', 'taxi', 'parking',
    'parking_space', 'motorcycle_parking'
]

non_business_amenities = [
    'college', 'school', 'university', 'kindergarten', 'prep_school', 'language_school',
    'community_centre', 'social_facility', 'social_centre', 'public_building', 'government',
    'townhall', 'reception_desk', 'research_institute', 'Etablissements sanitaires public',
    'place_of_worship', 'monastery', 'church', 'police', 'fire_station', 'ranger_station',
    'prison', 'mortuary', 'nursing_home', 'health_post', 'health_facility', 'medical_imaging',
    'water_point', 'drinking_water', 'waste_basket', 'toilets', 'recycling',
    'waste_transfer_station', 'sanitary_dump_station', 'watering_place', 'arts_centre', 'library', 'baby_hatch',
    'fountain', 'hookah_lounge', 'shelter', 'grave_yard', 'hunting_stand',
    'first_aid', 'fédération', 'transportation', 'clock', 'Administration prive', 'Garba'
]



In [15]:
# Clean shops (drop meaningless "yes"/"no")
df["shop"] = df["shop"].replace({"yes": pd.NA, "no": pd.NA})

# Tourism options
include_attraction = False  # change to True if you want to keep attractions
if include_attraction:
    business_tourism = ["hotel", "attraction"]
else:
    business_tourism = ["hotel"]

# Create unified "business_type" column
def get_business_type(row):
    if pd.notna(row["shop"]):
        return row["shop"]
    elif pd.notna(row["amenity"]) and row["amenity"] in business_amenities:
        return row["amenity"]
    elif pd.notna(row["tourism"]) and row["tourism"] in business_tourism:
        return row["tourism"]
    else:
        return None

df["business_type"] = df.apply(get_business_type, axis=1)

# Drop rows without a business type
df = df.dropna(subset=["business_type"])

In [16]:
print("Remaining rows after filtering:", len(df))
print(df["business_type"].value_counts().head(20))

Remaining rows after filtering: 17568
business_type
restaurant        4219
pub               3441
money_transfer    1930
cafe              1646
pharmacy           967
bank               756
doctors            694
fuel               694
bar                581
car_wash           471
internet_cafe      407
marketplace        331
clinic             297
fast_food          210
driving_school     113
nightclub          108
food_court          98
ice_cream           86
parking             72
atm                 49
Name: count, dtype: int64


In [17]:
# Count each business_type
counts = df['business_type'].value_counts()

# Show only those with less than 10 occurrences
rare_types = counts[counts < 10]
print(rare_types)


business_type
bakery                    9
pastry                    9
motorcycle_parking        7
casino                    7
brothel                   5
beverages                 5
animal_breeding           5
parking_space             4
car_repair                4
beauty                    4
chemist                   4
charging_station          3
copyshop                  3
optician                  3
stripclub                 2
alcohol                   2
boat_rental               2
microfinance_bank         2
religion                  2
computer                  2
music_school              2
Kiosque café              2
Buvette traditionnelle    1
coffee                    1
hardware                  1
coworking_space           1
shipping                  1
dry_cleaning              1
Maquis kaplin             1
tattoo                    1
funeral_directors         1
theatre                   1
car_sharing               1
supermarket               1
car                       1
music 

In [18]:
# Mapping for fixing / merging business types
mapping = {
    # Merge into restaurant
    'bakery': 'restaurant',
    'pastry': 'restaurant',
    'Maquis kaplin': 'restaurant',
    "O'TOPAZ, Pâtisserie": 'restaurant',
    'seafood': 'restaurant',

    # Merge into cafe
    'coffee': 'cafe',
    'Kiosque café': 'cafe',
    'Buvette traditionnelle': 'cafe',

    # Merge into bar/pub
    'alcohol': 'bar',
    'beverages': 'bar',

    # Merge into pharmacy
    'chemist': 'pharmacy',
    'optician': 'pharmacy',

    # Merge into internet cafe
    'copyshop': 'internet_cafe',
    'computer': 'internet_cafe',

    # Merge into bank
    'microfinance_bank': 'bank',

    # Merge into mobile money agent
    'orange money': 'mobile_money_agent',

    # Merge into marketplace
    'hardware': 'marketplace',
    'jewelry': 'marketplace',
    'supermarket': 'marketplace',

    # Merge into studio
    'tattoo': 'studio',
    'music_school': 'studio',

    # Merge into cinema
    'theatre': 'cinema',

    # Merge into car_rental
    'car_sharing': 'car_rental',

    # Merge into parking
    'motorcycle_parking': 'parking',
    'parking_space': 'parking',

    # Merge into clinic
    'beauty': 'clinic',

    # Merge into veterinary
    'animal_breeding': 'veterinary'

}

to_drop = [
    'casino',
    'brothel',
    'stripclub',
    'religion',
    'car',
    'music',
    'shipping',        
    'dry_cleaning',    
    'funeral_directors',
    'coworking_space',
    'conference_centre',
    'charging_station',
    'boat_rental',
]

# Apply mapping + dropping
df['business_type'] = df['business_type'].replace(mapping)
df = df[~df['business_type'].isin(to_drop)]

# Check result counts again
print(df['business_type'].value_counts())


business_type
restaurant                4240
pub                       3441
money_transfer            1930
cafe                      1650
pharmacy                   974
bank                       758
fuel                       694
doctors                    694
bar                        588
car_wash                   471
internet_cafe              412
marketplace                334
clinic                     301
fast_food                  210
driving_school             113
nightclub                  108
food_court                  98
ice_cream                   86
parking                     83
atm                         49
bicycle_repair_station      46
dentist                     41
car_rental                  39
bureau_de_change            34
kiosk                       30
veterinary                  25
mobile_money_agent          23
studio                      20
taxi                        17
motorcycle_repair           14
cinema                      13
car_repair               

In [19]:
print("Remaining rows after filtering:", len(df))

Remaining rows after filtering: 17540


In [20]:
# Find rows that don't match the pattern "HH:MM-HH:MM"
import re

pattern = re.compile(r"^\d{2}:\d{2}-\d{2}:\d{2}$")

bad_rows = df[~df['opening_hours'].astype(str).str.match(pattern, na=False)]

print("Number of bad rows:", len(bad_rows))
print(bad_rows['opening_hours'].unique())

Number of bad rows: 17526
[None 'Mo-Fr 08:30-16:00; Sa 09:00-14:00; PH off'
 'Mo-Fr 07:30-13:00, 14:00-16:30' 'Mo-Su 09:00-23:00' 'Mo-Su 09:00-22:00'
 'Tu-Su 08:00-22:00' '24/7' 'Mo-Su 08:00-15:00' 'Mo-Sa 08:00-17:00'
 'Mo-Fr 08:00-12:00' 'Mo-Fr 11:00-15:00,18:00-02:00; Sa-Su 18:00-02:00'
 'Mo-Fr 09:00-15:00; Sa 09:00-12:00' 'Mo-Fr 09:00-17:00'
 'Mo-Fr 08:00-18:00' 'PH,Mo-Su 11:00-23:00+' '0/7'
 'Mo-Su 07:00-23:30; Sa 09:00-17:00; PH unknown'
 'Mo 07:00-19:00; Tu-Sa 07:30-00:00; Su 07:00-17:00' 'Mo-Fr 07:00-18:00'
 'Mo-Su 09:00-00:00' 'Mo-Su 11:30-15:00; Mo-Su 17:45-23:00' 'Mo-Su'
 'Mo-Fr 12:00-00:00' 'Mo-Su 18:00-04:00' 'Mo-Su 11:00-24:00'
 'Mo-Fr 00:00-24:00' 'Mo-Su 12:00-14:30, 18:00-22:30'
 'Mo-Fr 08:00-12:00, 14:00-18:00' 'Mo-Fr 08:00-15:00; Sa 09:00-12:00'
 'Tu-Su; Su PM off; Mo off' 'Mo-Sa 07:30-19:00' 'Mo-Fr 07:30-16:00'
 'Mo-Sa 09:00-02:00' 'Mo-Su 08:00-20:00' 'Mo-Su 07:00-23:00'
 'Mo-Fr 08:30-15:00' 'Mo-Fr 09:00-14:00'
 'Mo, We 14:00-18:00; Tu, Th-Fr 09:00-18:00; Sa 09:00-13:

In [21]:
import re
import numpy as np

def simplify_opening_hours(value):
    if pd.isna(value):
        return np.nan

    value = str(value).strip()

    # 24/7 patterns
    if re.search(r'24/?7|00:00-24:00|00:00-00:00', value):
        return "00:00-23:59"

    # Closed or invalid
    if re.search(r'closed|off|unknown|n/a', value, re.IGNORECASE):
        return np.nan

    # Extract all HH:MM-HH:MM patterns
    matches = re.findall(r'(\d{1,2}:\d{2})-(\d{1,2}:\d{2})', value)
    if matches:
        opens = [m[0] for m in matches]
        closes = [m[1] for m in matches]

        # Normalize hours into sortable format
        opens_sorted = sorted(opens)
        closes_sorted = sorted(closes)

        # Earliest open, latest close
        return f"{opens_sorted[0]}-{closes_sorted[-1]}"

    # Single time like "08:18"
    single = re.match(r'^\d{1,2}:\d{2}$', value)
    if single:
        t = single.group(0)
        return f"{t}-{t}"

    # Couldn’t parse → NaN
    return np.nan

# Apply simplification
df['opening_hours'] = df['opening_hours'].apply(simplify_opening_hours)


In [22]:
import random
import pandas as pd

# Set seed for reproducibility
random.seed(42)

# Define opening hours ranges
opening_hours_range = {
    "restaurant": [(10, 12), (21, 23)],
    "pub": [(16, 19), (0, 2)],
    "money_transfer": [(8, 10), (16, 18)],
    "cafe": [(6, 8), (20, 23)],
    "pharmacy": [(8, 10), (20, 23)],
    "bank": [(8, 9), (14, 16)],
    "fuel": [(0, 0), (23, 23)],  # 24/7
    "doctors": [(9, 11), (16, 19)],
    "bar": [(17, 19), (1, 3)],
    "car_wash": [(8, 9), (17, 19)],
    "internet_cafe": [(9, 11), (22, 0)],
    "marketplace": [(7, 9), (16, 18)],
    "clinic": [(9, 11), (17, 19)],
    "fast_food": [(11, 13), (22, 0)],
    "driving_school": [(9, 11), (15, 17)],
    "nightclub": [(21, 23), (3, 6)],
    "food_court": [(10, 12), (22, 0)],
    "ice_cream": [(11, 13), (20, 22)],
    "parking": [(0, 0), (23, 23)],  # 24/7
    "atm": [(0, 0), (23, 23)],      # 24/7
    "bicycle_repair_station": [(9, 11), (17, 19)],
    "dentist": [(9, 11), (17, 19)],
    "car_rental": [(8, 10), (18, 20)],
    "bureau_de_change": [(9, 11), (17, 19)],
    "kiosk": [(7, 9), (21, 23)],
    "veterinary": [(9, 11), (17, 19)],
    "mobile_money_agent": [(8, 10), (17, 19)],
    "studio": [(10, 12), (20, 22)],
    "taxi": [(0, 0), (23, 23)],      # 24/7
    "motorcycle_repair": [(8, 10), (17, 19)],
    "cinema": [(14, 16), (23, 1)],
    "car_repair": [(8, 10), (18, 20)],
}

def random_opening_hours(btype):
    if btype not in opening_hours_range:
        return "09:00-18:00"

    (open_start, open_end), (close_start, close_end) = opening_hours_range[btype]

    # --- Handle 24/7 ---
    if open_start == 0 and open_end == 0 and close_start == 23 and close_end == 23:
        return "00:00-23:59"

    # --- Pick open hour safely ---
    if open_start > open_end:  # wraparound (e.g. 21–2)
        open_hour = random.choice(list(range(open_start, 24)) + list(range(0, open_end+1)))
    else:
        open_hour = random.randint(open_start, open_end)

    # --- Pick close hour safely ---
    if close_start > close_end:  # wraparound close range
        close_hour = random.choice(list(range(close_start, 24)) + list(range(0, close_end+1)))
    else:
        close_hour = random.randint(close_start, close_end)

    # --- Ensure closing is not before opening (same day) ---
    if (close_hour <= open_hour) and not (close_start > close_end):
        close_hour = open_hour + 1
        if close_hour >= 24:
            close_hour = 23

    return f"{open_hour:02d}:00-{close_hour:02d}:00"

# Example usage
df['opening_hours'] = df['opening_hours'].replace('None', pd.NA)
df['opening_hours'] = df.apply(
    lambda row: row['opening_hours'] if pd.notna(row['opening_hours']) else random_opening_hours(row['business_type']),
    axis=1
)

In [23]:
print(df.head())

           name           amenity  shop tourism       osm_id osm_type  \
0         Powex              fuel  <NA>    <NA>  11935384514    nodes   
3   Lavage Auto          car_wash  <NA>    <NA>   3936374511    nodes   
4  Orange Money  bureau_de_change  <NA>    <NA>   3931437128    nodes   
5   Lavage Auto          car_wash  <NA>    <NA>   3931437125    nodes   
6    Restaurant        restaurant  <NA>    <NA>   3930671817    nodes   

                   geometry opening_hours  reviews  rating     business_type  
0  POINT (-5.41568 6.73264)   00:00-23:59      327     2.1              fuel  
3  POINT (-4.04981 5.32225)   08:00-17:00      379     2.7          car_wash  
4   POINT (-4.0582 5.32378)   11:00-18:00      140     4.4  bureau_de_change  
5    POINT (-4.057 5.32371)   08:00-17:00      125     4.4          car_wash  
6  POINT (-4.05483 5.32221)   10:00-23:00      114     2.8        restaurant  


In [24]:
# Make sure opening_hours is string (NaN -> empty string)
df['opening_hours'] = df['opening_hours'].fillna("").astype(str)

# Split safely
df[['open_str', 'close_str']] = df['opening_hours'].str.split('-', n=1, expand=True)

# Convert only valid times
df['open_time'] = pd.to_datetime(df['open_str'], format="%H:%M", errors='coerce').dt.time
df['close_time'] = pd.to_datetime(df['close_str'], format="%H:%M", errors='coerce').dt.time

# Drop helpers
df = df.drop(columns=['open_str', 'close_str'])


In [25]:
print(df.head())

           name           amenity  shop tourism       osm_id osm_type  \
0         Powex              fuel  <NA>    <NA>  11935384514    nodes   
3   Lavage Auto          car_wash  <NA>    <NA>   3936374511    nodes   
4  Orange Money  bureau_de_change  <NA>    <NA>   3931437128    nodes   
5   Lavage Auto          car_wash  <NA>    <NA>   3931437125    nodes   
6    Restaurant        restaurant  <NA>    <NA>   3930671817    nodes   

                   geometry opening_hours  reviews  rating     business_type  \
0  POINT (-5.41568 6.73264)   00:00-23:59      327     2.1              fuel   
3  POINT (-4.04981 5.32225)   08:00-17:00      379     2.7          car_wash   
4   POINT (-4.0582 5.32378)   11:00-18:00      140     4.4  bureau_de_change   
5    POINT (-4.057 5.32371)   08:00-17:00      125     4.4          car_wash   
6  POINT (-4.05483 5.32221)   10:00-23:00      114     2.8        restaurant   

  open_time close_time  
0  00:00:00   23:59:00  
3  08:00:00   17:00:00  
4  11

In [26]:
# Find rows that don't match the pattern "HH:MM-HH:MM"
import re

pattern = re.compile(r"^\d{2}:\d{2}-\d{2}:\d{2}$")

bad_rows = df[~df['opening_hours'].astype(str).str.match(pattern, na=False)]

print("Number of bad rows:", len(bad_rows))
print(bad_rows['opening_hours'].unique())

Number of bad rows: 0
[]


In [27]:
import pandas as pd
import datetime

# Replace 24:00 with 00:00
df['opening_hours'] = df['opening_hours'].str.replace("24:00", "00:00")

# Split into open and close
df[['open_str', 'close_str']] = df['opening_hours'].str.split('-', expand=True)

# Convert to datetime.time
df['open_time'] = pd.to_datetime(df['open_str'], format="%H:%M").dt.time
df['close_time'] = pd.to_datetime(df['close_str'], format="%H:%M").dt.time

# Function to calculate duration
def calculate_duration(open_t, close_t):
    open_dt = datetime.datetime.combine(datetime.date.today(), open_t)
    close_dt = datetime.datetime.combine(datetime.date.today(), close_t)
    
    # Handle overnight times
    if close_dt <= open_dt:
        close_dt += datetime.timedelta(days=1)
    
    return (close_dt - open_dt).seconds / 3600

# Apply the function
df['duration_hours'] = df.apply(lambda row: calculate_duration(row['open_time'], row['close_time']), axis=1)


In [28]:
df.drop(columns=['open_str', 'close_str'], inplace=True)


In [29]:
print(df.head())

           name           amenity  shop tourism       osm_id osm_type  \
0         Powex              fuel  <NA>    <NA>  11935384514    nodes   
3   Lavage Auto          car_wash  <NA>    <NA>   3936374511    nodes   
4  Orange Money  bureau_de_change  <NA>    <NA>   3931437128    nodes   
5   Lavage Auto          car_wash  <NA>    <NA>   3931437125    nodes   
6    Restaurant        restaurant  <NA>    <NA>   3930671817    nodes   

                   geometry opening_hours  reviews  rating     business_type  \
0  POINT (-5.41568 6.73264)   00:00-23:59      327     2.1              fuel   
3  POINT (-4.04981 5.32225)   08:00-17:00      379     2.7          car_wash   
4   POINT (-4.0582 5.32378)   11:00-18:00      140     4.4  bureau_de_change   
5    POINT (-4.057 5.32371)   08:00-17:00      125     4.4          car_wash   
6  POINT (-4.05483 5.32221)   10:00-23:00      114     2.8        restaurant   

  open_time close_time  duration_hours  
0  00:00:00   23:59:00       23.983333 