## Step 1: Load the data ----------------------------------

In [1]:
import pandas as pan
import numpy as num
import matplotlib.pyplot as matpl

LasVegasTrip = pan.read_csv("LasVegasTripAdvisorReviews-Dataset.csv", sep=";")

## Step 2: Data cleaning -----------------------------------------

1. Check for missing values:

In [2]:
# Check how many missing values each column has:
LasVegasTrip.isnull().sum()


User country          35
Nr. reviews           35
Nr. hotel reviews     35
Helpful votes         35
Score                 48
Period of stay        45
Traveler type         35
Pool                  35
Gym                   35
Tennis court          35
Spa                   51
status               526
Casino                35
Free internet         35
Hotel name            35
Unnamed: 15          539
Hotel stars           35
Nr. rooms             35
User continent        35
Member years          35
Review month          35
Review weekday        35
dtype: int64

2. Remove missing data:

In [3]:
# Remove rows with missing values:
LasVegasTrip = LasVegasTrip.drop(columns=["Unnamed: 15"])
LasVegasTrip = LasVegasTrip.drop(columns=["status"])
LasVegasTrip = LasVegasTrip.dropna(
    subset=["Hotel name", "Score", "Traveler type", "User continent"])

3. Verify dataset is clean:

In [4]:
LasVegasTrip.isnull().sum()

User country          0
Nr. reviews           0
Nr. hotel reviews     0
Helpful votes         0
Score                 0
Period of stay       10
Traveler type         0
Pool                  0
Gym                   0
Tennis court          0
Spa                  16
Casino                0
Free internet         0
Hotel name            0
Hotel stars           0
Nr. rooms             0
User continent        0
Member years          0
Review month          0
Review weekday        0
dtype: int64

## Step 3: Explore the structure -------------------------------------

1. How many rows and columns?

In [5]:
LasVegasTrip.shape

(491, 20)

2. View first few rows:

In [6]:
print(LasVegasTrip.head())

  User country  Nr. reviews  Nr. hotel reviews  Helpful votes  Score  \
0          USA         11.0                4.0           13.0    5.0   
1          USA        119.0               21.0           75.0    3.0   
2          USA         36.0                9.0           25.0    5.0   
3           UK         14.0                7.0           14.0    4.0   
4       Canada          5.0                5.0            2.0    4.0   

  Period of stay Traveler type Pool  Gym Tennis court Spa Casino  \
0        Dec-Feb       Friends   NO  YES           NO  NO    YES   
1        Dec-Feb      Business   NO  YES           NO  NO    YES   
2        Mar-May      Families   NO  YES           NO  NO    YES   
3        Mar-May       Friends   NO  YES           NO  NO    YES   
4        Mar-May          Solo   NO  YES           NO  NO    YES   

  Free internet                              Hotel name Hotel stars  \
0           YES  Circus Circus Hotel & Casino Las Vegas           3   
1           YES 

3. List column names:

In [7]:
LasVegasTrip.columns

Index(['User country', 'Nr. reviews', 'Nr. hotel reviews', 'Helpful votes',
       'Score', 'Period of stay', 'Traveler type', 'Pool', 'Gym',
       'Tennis court', 'Spa', 'Casino', 'Free internet', 'Hotel name',
       'Hotel stars', 'Nr. rooms', 'User continent', 'Member years',
       'Review month', 'Review weekday'],
      dtype='str')

4. Check data types:

In [8]:
LasVegasTrip.dtypes

User country             str
Nr. reviews          float64
Nr. hotel reviews    float64
Helpful votes        float64
Score                float64
Period of stay           str
Traveler type            str
Pool                     str
Gym                      str
Tennis court             str
Spa                      str
Casino                   str
Free internet            str
Hotel name               str
Hotel stars              str
Nr. rooms            float64
User continent           str
Member years         float64
Review month             str
Review weekday           str
dtype: object

5. Summary of the dataset:

In [9]:
LasVegasTrip.info()

<class 'pandas.DataFrame'>
Index: 491 entries, 0 to 538
Data columns (total 20 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   User country       491 non-null    str    
 1   Nr. reviews        491 non-null    float64
 2   Nr. hotel reviews  491 non-null    float64
 3   Helpful votes      491 non-null    float64
 4   Score              491 non-null    float64
 5   Period of stay     481 non-null    str    
 6   Traveler type      491 non-null    str    
 7   Pool               491 non-null    str    
 8   Gym                491 non-null    str    
 9   Tennis court       491 non-null    str    
 10  Spa                475 non-null    str    
 11  Casino             491 non-null    str    
 12  Free internet      491 non-null    str    
 13  Hotel name         491 non-null    str    
 14  Hotel stars        491 non-null    str    
 15  Nr. rooms          491 non-null    float64
 16  User continent     491 non-null    str    

## Step 4: Understand the Categorical Info -----------------------------------------------------

1. How many unique hotels names in the data set?

In [10]:
LasVegasTrip['Hotel name'].nunique

#To list all unique hotel names:
LasVegasTrip['Hotel name'].unique

<bound method Series.unique of 0       Circus Circus Hotel & Casino Las Vegas
1       Circus Circus Hotel & Casino Las Vegas
2       Circus Circus Hotel & Casino Las Vegas
3       Circus Circus Hotel & Casino Las Vegas
4       Circus Circus Hotel & Casino Las Vegas
                        ...                   
533    The Westin las Vegas Hotel Casino & Spa
534    The Westin las Vegas Hotel Casino & Spa
535    The Westin las Vegas Hotel Casino & Spa
537    The Westin las Vegas Hotel Casino & Spa
538    The Westin las Vegas Hotel Casino & Spa
Name: Hotel name, Length: 491, dtype: str>

2. What traveler types are represented?

In [11]:
LasVegasTrip['Traveler type'].unique    # Count unique categories

LasVegasTrip['Traveler type'].value_counts      # Count how many reviews per travelers type


<bound method IndexOpsMixin.value_counts of 0       Friends
1      Business
2      Families
3       Friends
4          Solo
         ...   
533     Couples
534     Couples
535     Couples
537    Families
538    Families
Name: Traveler type, Length: 491, dtype: str>

3. What does each column describe?

In [12]:
for col in LasVegasTrip.columns: 
    print(col, "→", LasVegasTrip[col].dtype)

User country → str
Nr. reviews → float64
Nr. hotel reviews → float64
Helpful votes → float64
Score → float64
Period of stay → str
Traveler type → str
Pool → str
Gym → str
Tennis court → str
Spa → str
Casino → str
Free internet → str
Hotel name → str
Hotel stars → str
Nr. rooms → float64
User continent → str
Member years → float64
Review month → str
Review weekday → str
