# Pandas Basics 

The majority of the visualizations you will generate in the context of this lab, will visualize data stored in *pandas* dataframes. 
**Pandas** is an essential data analysis library for Python. It has functions for analyzing, cleaning, exploring, and manipulating data.

The main data structure introduced in Pandas is called a **Data Frame**.  This is a two-dimensional table of data, similar to an SQL table or a spreadsheet.  Pandas also provides a one-dimensional data structure called a **Series** that we will encounter when accesing a single column or row of a Data Frame.

> ### Datasets:
>
> **[International tourism, number of arrivals](https://data.worldbank.org/indicator/ST.INT.ARVL)** 
>
>This dataset contains the yearly number of inbound tourists for every country. The data on inbound tourists refer to the number of arrivals, not to the number of people traveling. Thus a person who makes several trips to a country during a given period is counted each time as a new arrival.
>    
>
>**[TripAdvisor European restaurants](https://www.kaggle.com/datasets/stefanoleone992/tripadvisor-european-restaurants)**
>
>This dataset includes restaurants with attributes such as location data, average rating, number of reviews, open hours, cuisine types, awards, etc. The dataset combines the restaurants from the main European countries. In the context of this lab, we will work with a subset of the dataset that includes restaurants in Greece.





## Importing Data

First, we'll import *pandas* and *numpy*. *Numpy* is a very useful library for working with arrays of data in Python.

In [1]:
import numpy as np  # useful for many scientific computing in Python
import pandas as pd # primary data structure library

Pandas has a variety of functions named `read_xxx` for reading data in different formats. In the context of this lab, and for the assignment, you will need to read `csv` files. However, Pandas supports several other file formats such as json, excel, and sql.


To read a CSV file, we use `read_csv`. There are many options to `read_csv` that can be used.  For example, you can set the specific delimiter used in your file, instead of the default `sep=','`.

Then, let's download and import our dataset using *pandas*'s `read_csv()` method.

In [17]:
# the url string of our CSV file
url = "./international_tourism.csv"

# Read the .csv file and store it as a pandas Data Frame
df_tourism = pd.read_csv(url)

# Output object type
type(df_tourism)

pandas.core.frame.DataFrame

## Viewing Data

We can view our Data Frame by calling the head() function:

In [18]:
df_tourism.head()

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,1965,...,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021
0,Aruba,ABW,"International tourism, number of arrivals",ST.INT.ARVL,,,,,,,...,1481000.0,1667000.0,1739000.0,1832000.0,1758000.0,1863000.0,1897000.0,1951000.0,,
1,Afghanistan,AFG,"International tourism, number of arrivals",ST.INT.ARVL,,,,,,,...,,,,,,,,,,
2,Angola,AGO,"International tourism, number of arrivals",ST.INT.ARVL,,,,,,,...,528000.0,650000.0,595000.0,592000.0,397000.0,261000.0,218000.0,218000.0,,
3,Albania,ALB,"International tourism, number of arrivals",ST.INT.ARVL,,,,,,,...,3514000.0,3256000.0,3673000.0,4131000.0,4736000.0,5118000.0,5927000.0,6406000.0,2658000.0,
4,Andorra,AND,"International tourism, number of arrivals",ST.INT.ARVL,,,,,,,...,7900000.0,7676000.0,7797000.0,7850000.0,8025000.0,8152000.0,8328000.0,8235000.0,5207000.0,


The head() function simply shows the first 5 rows of our Data Frame.  You can specify the number of rows you'd like to see as follows:

In [19]:
df_tourism.head(10)

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,1965,...,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021
0,Aruba,ABW,"International tourism, number of arrivals",ST.INT.ARVL,,,,,,,...,1481000.0,1667000.0,1739000.0,1832000.0,1758000.0,1863000.0,1897000.0,1951000.0,,
1,Afghanistan,AFG,"International tourism, number of arrivals",ST.INT.ARVL,,,,,,,...,,,,,,,,,,
2,Angola,AGO,"International tourism, number of arrivals",ST.INT.ARVL,,,,,,,...,528000.0,650000.0,595000.0,592000.0,397000.0,261000.0,218000.0,218000.0,,
3,Albania,ALB,"International tourism, number of arrivals",ST.INT.ARVL,,,,,,,...,3514000.0,3256000.0,3673000.0,4131000.0,4736000.0,5118000.0,5927000.0,6406000.0,2658000.0,
4,Andorra,AND,"International tourism, number of arrivals",ST.INT.ARVL,,,,,,,...,7900000.0,7676000.0,7797000.0,7850000.0,8025000.0,8152000.0,8328000.0,8235000.0,5207000.0,
5,United Arab Emirates,ARE,"International tourism, number of arrivals",ST.INT.ARVL,,,,,,,...,,,,19313000.0,20894000.0,21805000.0,23092000.0,25282000.0,8084000.0,
6,Argentina,ARG,"International tourism, number of arrivals",ST.INT.ARVL,,,,,,,...,6497000.0,6510000.0,7165000.0,6816000.0,6668000.0,6711000.0,6942000.0,7399000.0,,
7,Armenia,ARM,"International tourism, number of arrivals",ST.INT.ARVL,,,,,,,...,963000.0,1084000.0,1204000.0,1192000.0,1260000.0,1495000.0,1652000.0,1894000.0,375000.0,
8,American Samoa,ASM,"International tourism, number of arrivals",ST.INT.ARVL,,,,,,,...,22600.0,20800.0,21600.0,20300.0,20100.0,20000.0,20200.0,19200.0,900.0,
9,Antigua and Barbuda,ATG,"International tourism, number of arrivals",ST.INT.ARVL,,,,,,,...,819000.0,777000.0,771000.0,894000.0,874000.0,1040000.0,1064000.0,1035000.0,384500.0,


We can also view the bottom 5 rows of the dataset using the `tail()` function.


In [20]:
df_tourism.tail()

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,1965,...,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021
214,Kosovo,XKX,"International tourism, number of arrivals",ST.INT.ARVL,,,,,,,...,,,,,,,,,,
215,"Yemen, Rep.",YEM,"International tourism, number of arrivals",ST.INT.ARVL,,,,,,,...,1282000.0,1323000.0,1218000.0,398000.0,,,,,,
216,South Africa,ZAF,"International tourism, number of arrivals",ST.INT.ARVL,,,,,,,...,13069000.0,14318000.0,14530000.0,13952000.0,15121000.0,14975000.0,15004000.0,14797000.0,3886600.098,
217,Zambia,ZMB,"International tourism, number of arrivals",ST.INT.ARVL,,,,,,,...,859000.0,915000.0,947000.0,932000.0,956000.0,1009000.0,1072000.0,1266000.0,502000.0,
218,Zimbabwe,ZWE,"International tourism, number of arrivals",ST.INT.ARVL,,,,,,,...,1794000.0,1833000.0,1880000.0,2057000.0,2168000.0,2423000.0,2580000.0,2294000.0,639000.0,


If we wanted to view the entire Data Frame we would simply write the following:

In [21]:
# Output entire Data Frame
df_tourism

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,1965,...,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021
0,Aruba,ABW,"International tourism, number of arrivals",ST.INT.ARVL,,,,,,,...,1481000.0,1667000.0,1739000.0,1832000.0,1758000.0,1863000.0,1897000.0,1951000.0,,
1,Afghanistan,AFG,"International tourism, number of arrivals",ST.INT.ARVL,,,,,,,...,,,,,,,,,,
2,Angola,AGO,"International tourism, number of arrivals",ST.INT.ARVL,,,,,,,...,528000.0,650000.0,595000.0,592000.0,397000.0,261000.0,218000.0,218000.0,,
3,Albania,ALB,"International tourism, number of arrivals",ST.INT.ARVL,,,,,,,...,3514000.0,3256000.0,3673000.0,4131000.0,4736000.0,5118000.0,5927000.0,6406000.0,2658000.000,
4,Andorra,AND,"International tourism, number of arrivals",ST.INT.ARVL,,,,,,,...,7900000.0,7676000.0,7797000.0,7850000.0,8025000.0,8152000.0,8328000.0,8235000.0,5207000.000,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
214,Kosovo,XKX,"International tourism, number of arrivals",ST.INT.ARVL,,,,,,,...,,,,,,,,,,
215,"Yemen, Rep.",YEM,"International tourism, number of arrivals",ST.INT.ARVL,,,,,,,...,1282000.0,1323000.0,1218000.0,398000.0,,,,,,
216,South Africa,ZAF,"International tourism, number of arrivals",ST.INT.ARVL,,,,,,,...,13069000.0,14318000.0,14530000.0,13952000.0,15121000.0,14975000.0,15004000.0,14797000.0,3886600.098,
217,Zambia,ZMB,"International tourism, number of arrivals",ST.INT.ARVL,,,,,,,...,859000.0,915000.0,947000.0,932000.0,956000.0,1009000.0,1072000.0,1266000.0,502000.000,


As you can see, we have a table where each row is a record of our data, corresponding to a different country.

When analyzing a dataset, it's always a good idea to get some basic information about your dataframe. We can do this by using the `info()` method.

This method can be used to get a short summary of the dataframe.


In [22]:
df_tourism.info(verbose=True)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 219 entries, 0 to 218
Data columns (total 66 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Country Name    219 non-null    object 
 1   Country Code    219 non-null    object 
 2   Indicator Name  219 non-null    object 
 3   Indicator Code  219 non-null    object 
 4   1960            0 non-null      float64
 5   1961            0 non-null      float64
 6   1962            0 non-null      float64
 7   1963            0 non-null      float64
 8   1964            0 non-null      float64
 9   1965            0 non-null      float64
 10  1966            0 non-null      float64
 11  1967            0 non-null      float64
 12  1968            0 non-null      float64
 13  1969            0 non-null      float64
 14  1970            0 non-null      float64
 15  1971            0 non-null      float64
 16  1972            0 non-null      float64
 17  1973            0 non-null      flo

We can view the data types of our data frame columns with by calling .dtypes on our data frame:

In [8]:
df_tourism.dtypes

Country Name       object
Country Code       object
Indicator Name     object
Indicator Code     object
1960              float64
                   ...   
2017              float64
2018              float64
2019              float64
2020              float64
2021              float64
Length: 66, dtype: object

To get the list of column headers we can call upon the data frame's `columns` instance variable.


In [9]:
df_tourism.columns

Index(['Country Name', 'Country Code', 'Indicator Name', 'Indicator Code',
       '1960', '1961', '1962', '1963', '1964', '1965', '1966', '1967', '1968',
       '1969', '1970', '1971', '1972', '1973', '1974', '1975', '1976', '1977',
       '1978', '1979', '1980', '1981', '1982', '1983', '1984', '1985', '1986',
       '1987', '1988', '1989', '1990', '1991', '1992', '1993', '1994', '1995',
       '1996', '1997', '1998', '1999', '2000', '2001', '2002', '2003', '2004',
       '2005', '2006', '2007', '2008', '2009', '2010', '2011', '2012', '2013',
       '2014', '2015', '2016', '2017', '2018', '2019', '2020', '2021'],
      dtype='object')

Similarly, to get the index of the dataframe we use the `.index` instance variable.


In [10]:
df_tourism.index

RangeIndex(start=0, stop=219, step=1)

Note: The default type of instance variables `index` and `columns` are **NOT** `list`.


In [11]:
print(type(df_tourism.columns))
print(type(df_tourism.index))

<class 'pandas.core.indexes.base.Index'>
<class 'pandas.core.indexes.range.RangeIndex'>


To get the index and columns as lists, we can use the `tolist()` method.

In [12]:
df_tourism.columns.tolist()

['Country Name',
 'Country Code',
 'Indicator Name',
 'Indicator Code',
 '1960',
 '1961',
 '1962',
 '1963',
 '1964',
 '1965',
 '1966',
 '1967',
 '1968',
 '1969',
 '1970',
 '1971',
 '1972',
 '1973',
 '1974',
 '1975',
 '1976',
 '1977',
 '1978',
 '1979',
 '1980',
 '1981',
 '1982',
 '1983',
 '1984',
 '1985',
 '1986',
 '1987',
 '1988',
 '1989',
 '1990',
 '1991',
 '1992',
 '1993',
 '1994',
 '1995',
 '1996',
 '1997',
 '1998',
 '1999',
 '2000',
 '2001',
 '2002',
 '2003',
 '2004',
 '2005',
 '2006',
 '2007',
 '2008',
 '2009',
 '2010',
 '2011',
 '2012',
 '2013',
 '2014',
 '2015',
 '2016',
 '2017',
 '2018',
 '2019',
 '2020',
 '2021']

In [13]:
df_tourism.index.tolist()

[0,
 1,
 2,
 3,
 4,
 5,
 6,
 7,
 8,
 9,
 10,
 11,
 12,
 13,
 14,
 15,
 16,
 17,
 18,
 19,
 20,
 21,
 22,
 23,
 24,
 25,
 26,
 27,
 28,
 29,
 30,
 31,
 32,
 33,
 34,
 35,
 36,
 37,
 38,
 39,
 40,
 41,
 42,
 43,
 44,
 45,
 46,
 47,
 48,
 49,
 50,
 51,
 52,
 53,
 54,
 55,
 56,
 57,
 58,
 59,
 60,
 61,
 62,
 63,
 64,
 65,
 66,
 67,
 68,
 69,
 70,
 71,
 72,
 73,
 74,
 75,
 76,
 77,
 78,
 79,
 80,
 81,
 82,
 83,
 84,
 85,
 86,
 87,
 88,
 89,
 90,
 91,
 92,
 93,
 94,
 95,
 96,
 97,
 98,
 99,
 100,
 101,
 102,
 103,
 104,
 105,
 106,
 107,
 108,
 109,
 110,
 111,
 112,
 113,
 114,
 115,
 116,
 117,
 118,
 119,
 120,
 121,
 122,
 123,
 124,
 125,
 126,
 127,
 128,
 129,
 130,
 131,
 132,
 133,
 134,
 135,
 136,
 137,
 138,
 139,
 140,
 141,
 142,
 143,
 144,
 145,
 146,
 147,
 148,
 149,
 150,
 151,
 152,
 153,
 154,
 155,
 156,
 157,
 158,
 159,
 160,
 161,
 162,
 163,
 164,
 165,
 166,
 167,
 168,
 169,
 170,
 171,
 172,
 173,
 174,
 175,
 176,
 177,
 178,
 179,
 180,
 181,
 182,
 183,
 184,


To view the dimensions of the dataframe, we use the `shape` instance variable of it.


In [13]:
# size of dataframe (rows, columns)
df_tourism.shape    

(219, 66)

**Note**: The main types stored in *pandas* objects are `float`, `int`, `bool`, `datetime64[ns]`, `datetime64[ns, tz]`, `timedelta[ns]`, `category`, and `object` (that includes strings). In addition, these dtypes have item sizes, e.g. `int64` and `int32`.


## Cleaning and preparing the dataset

Let's clean the data set to remove a few unnecessary columns. We can use *pandas* `drop()` method as follows:


In [23]:
# axis=0 represents rows (default) and axis=1 represents columns.
df_tourism.drop(['Country Code', 'Indicator Name','Indicator Code'], axis=1, inplace=True)
df_tourism.drop([0], axis=0) # for info only
df_tourism.head()

Unnamed: 0,Country Name,1960,1961,1962,1963,1964,1965,1966,1967,1968,...,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021
0,Aruba,,,,,,,,,,...,1481000.0,1667000.0,1739000.0,1832000.0,1758000.0,1863000.0,1897000.0,1951000.0,,
1,Afghanistan,,,,,,,,,,...,,,,,,,,,,
2,Angola,,,,,,,,,,...,528000.0,650000.0,595000.0,592000.0,397000.0,261000.0,218000.0,218000.0,,
3,Albania,,,,,,,,,,...,3514000.0,3256000.0,3673000.0,4131000.0,4736000.0,5118000.0,5927000.0,6406000.0,2658000.0,
4,Andorra,,,,,,,,,,...,7900000.0,7676000.0,7797000.0,7850000.0,8025000.0,8152000.0,8328000.0,8235000.0,5207000.0,


Let's rename the "Country Name" column to simplify it. For this, we use the `rename()` function by passing in a Python dictionary of old and new names:


In [24]:
df_tourism.rename(columns={'Country Name':'Country'}, inplace=True)
df_tourism

Unnamed: 0,Country,1960,1961,1962,1963,1964,1965,1966,1967,1968,...,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021
0,Aruba,,,,,,,,,,...,1481000.0,1667000.0,1739000.0,1832000.0,1758000.0,1863000.0,1897000.0,1951000.0,,
1,Afghanistan,,,,,,,,,,...,,,,,,,,,,
2,Angola,,,,,,,,,,...,528000.0,650000.0,595000.0,592000.0,397000.0,261000.0,218000.0,218000.0,,
3,Albania,,,,,,,,,,...,3514000.0,3256000.0,3673000.0,4131000.0,4736000.0,5118000.0,5927000.0,6406000.0,2658000.000,
4,Andorra,,,,,,,,,,...,7900000.0,7676000.0,7797000.0,7850000.0,8025000.0,8152000.0,8328000.0,8235000.0,5207000.000,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
214,Kosovo,,,,,,,,,,...,,,,,,,,,,
215,"Yemen, Rep.",,,,,,,,,,...,1282000.0,1323000.0,1218000.0,398000.0,,,,,,
216,South Africa,,,,,,,,,,...,13069000.0,14318000.0,14530000.0,13952000.0,15121000.0,14975000.0,15004000.0,14797000.0,3886600.098,
217,Zambia,,,,,,,,,,...,859000.0,915000.0,947000.0,932000.0,956000.0,1009000.0,1072000.0,1266000.0,502000.000,


Also, as you can see, the default index of the dataset is a numeric range. Since we would like to be able to access the data by a specific country, we can set the "Country" column as the index of the dataframe:

In [25]:
df_tourism.set_index('Country', inplace=True)
df_tourism
# To reset the index, we can use 
# df_tourism.reset_index(inplace=True)


Unnamed: 0_level_0,1960,1961,1962,1963,1964,1965,1966,1967,1968,1969,...,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Aruba,,,,,,,,,,,...,1481000.0,1667000.0,1739000.0,1832000.0,1758000.0,1863000.0,1897000.0,1951000.0,,
Afghanistan,,,,,,,,,,,...,,,,,,,,,,
Angola,,,,,,,,,,,...,528000.0,650000.0,595000.0,592000.0,397000.0,261000.0,218000.0,218000.0,,
Albania,,,,,,,,,,,...,3514000.0,3256000.0,3673000.0,4131000.0,4736000.0,5118000.0,5927000.0,6406000.0,2658000.000,
Andorra,,,,,,,,,,,...,7900000.0,7676000.0,7797000.0,7850000.0,8025000.0,8152000.0,8328000.0,8235000.0,5207000.000,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Kosovo,,,,,,,,,,,...,,,,,,,,,,
"Yemen, Rep.",,,,,,,,,,,...,1282000.0,1323000.0,1218000.0,398000.0,,,,,,
South Africa,,,,,,,,,,,...,13069000.0,14318000.0,14530000.0,13952000.0,15121000.0,14975000.0,15004000.0,14797000.0,3886600.098,
Zambia,,,,,,,,,,,...,859000.0,915000.0,947000.0,932000.0,956000.0,1009000.0,1072000.0,1266000.0,502000.000,


We will also add a 'Average' column to hold the mean tourist yearly arrivals arrivals by country, as follows:


In [26]:
df_tourism['Average'] = df_tourism.mean(axis = 1, numeric_only=True)
df_tourism

Unnamed: 0_level_0,1960,1961,1962,1963,1964,1965,1966,1967,1968,1969,...,2013,2014,2015,2016,2017,2018,2019,2020,2021,Average
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Aruba,,,,,,,,,,,...,1667000.0,1739000.0,1832000.0,1758000.0,1863000.0,1897000.0,1951000.0,,,1.379000e+06
Afghanistan,,,,,,,,,,,...,,,,,,,,,,
Angola,,,,,,,,,,,...,650000.0,595000.0,592000.0,397000.0,261000.0,218000.0,218000.0,,,2.493200e+05
Albania,,,,,,,,,,,...,3256000.0,3673000.0,4131000.0,4736000.0,5118000.0,5927000.0,6406000.0,2658000.000,,2.094769e+06
Andorra,,,,,,,,,,,...,7676000.0,7797000.0,7850000.0,8025000.0,8152000.0,8328000.0,8235000.0,5207000.000,,9.276318e+06
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Kosovo,,,,,,,,,,,...,,,,,,,,,,
"Yemen, Rep.",,,,,,,,,,,...,1323000.0,1218000.0,398000.0,,,,,,,1.071833e+06
South Africa,,,,,,,,,,,...,14318000.0,14530000.0,13952000.0,15121000.0,14975000.0,15004000.0,14797000.0,3886600.098,,9.477946e+06
Zambia,,,,,,,,,,,...,915000.0,947000.0,932000.0,956000.0,1009000.0,1072000.0,1266000.0,502000.000,,6.928462e+05


We can check to see how many null/missing values we have for every column in the dataset as follows:


In [27]:
df_tourism.isna().sum(axis=1)

Country
Aruba           37
Afghanistan     63
Angola          37
Albania         36
Andorra         40
                ..
Kosovo          63
Yemen, Rep.     56
South Africa    36
Zambia          36
Zimbabwe        36
Length: 219, dtype: int64

Finally, let's view a quick summary of each column in our dataframe using the `describe()` method.


In [28]:
df_tourism.describe()

Unnamed: 0,1960,1961,1962,1963,1964,1965,1966,1967,1968,1969,...,2013,2014,2015,2016,2017,2018,2019,2020,2021,Average
count,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,192.0,192.0,196.0,196.0,192.0,188.0,181.0,132.0,0.0,205.0
mean,,,,,,,,,,,...,10654450.0,10968690.0,11140530.0,11508810.0,12263160.0,12956510.0,13660490.0,4685639.0,,8576624.0
std,,,,,,,,,,,...,28952890.0,29219460.0,29214700.0,29661040.0,30710900.0,31206010.0,32009530.0,13056970.0,,23890120.0
min,,,,,,,,,,,...,1300.0,1400.0,2400.0,2500.0,2500.0,3200.0,3600.0,900.0,,1548.0
25%,,,,,,,,,,,...,512000.0,503000.0,484250.0,505500.0,652750.0,818500.0,876000.0,287550.0,,348583.3
50%,,,,,,,,,,,...,1827500.0,1871000.0,1876500.0,2033000.0,2145500.0,2504500.0,2494000.0,877700.0,,1361080.0
75%,,,,,,,,,,,...,6592750.0,6982250.0,6822500.0,6785750.0,7832500.0,8871500.0,9429000.0,2902500.0,,5018833.0
max,,,,,,,,,,,...,205052900.0,206599000.0,205016500.0,206045500.0,207274000.0,211998000.0,217877000.0,117109000.0,,190032600.0


As you can see, there are some columns that are completely empty, containing only NaN values. To delete these columns, we can use the `dropna()` function.

In [29]:
df_tourism.dropna(how='all', axis=1, inplace=True)
df_tourism.head()

Unnamed: 0_level_0,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,...,2012,2013,2014,2015,2016,2017,2018,2019,2020,Average
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Aruba,912000.0,957000.0,947000.0,906000.0,972000.0,1211000.0,1178000.0,1225000.0,1184000.0,1304000.0,...,1481000.0,1667000.0,1739000.0,1832000.0,1758000.0,1863000.0,1897000.0,1951000.0,,1379000.0
Afghanistan,,,,,,,,,,,...,,,,,,,,,,
Angola,9000.0,21000.0,45000.0,52000.0,45000.0,51000.0,67000.0,91000.0,107000.0,194000.0,...,528000.0,650000.0,595000.0,592000.0,397000.0,261000.0,218000.0,218000.0,,249320.0
Albania,304000.0,287000.0,119000.0,184000.0,371000.0,317000.0,354000.0,470000.0,557000.0,645000.0,...,3514000.0,3256000.0,3673000.0,4131000.0,4736000.0,5118000.0,5927000.0,6406000.0,2658000.0,2094769.0
Andorra,,,,,9422000.0,10991000.0,11351000.0,11507000.0,11601000.0,11668000.0,...,7900000.0,7676000.0,7797000.0,7850000.0,8025000.0,8152000.0,8328000.0,8235000.0,5207000.0,9276318.0


Similarly, to delete countries that have no values for any year, we use the following:

In [30]:
df_tourism.dropna(how='all', axis=0, inplace=True)
df_tourism

Unnamed: 0_level_0,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,...,2012,2013,2014,2015,2016,2017,2018,2019,2020,Average
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Aruba,912000.0,957000.0,947000.0,906000.0,972000.0,1211000.0,1178000.0,1225000.0,1184000.0,1304000.0,...,1481000.0,1667000.0,1739000.0,1832000.0,1758000.0,1863000.0,1897000.0,1951000.0,,1.379000e+06
Angola,9000.0,21000.0,45000.0,52000.0,45000.0,51000.0,67000.0,91000.0,107000.0,194000.0,...,528000.0,650000.0,595000.0,592000.0,397000.0,261000.0,218000.0,218000.0,,2.493200e+05
Albania,304000.0,287000.0,119000.0,184000.0,371000.0,317000.0,354000.0,470000.0,557000.0,645000.0,...,3514000.0,3256000.0,3673000.0,4131000.0,4736000.0,5118000.0,5927000.0,6406000.0,2.658000e+06,2.094769e+06
Andorra,,,,,9422000.0,10991000.0,11351000.0,11507000.0,11601000.0,11668000.0,...,7900000.0,7676000.0,7797000.0,7850000.0,8025000.0,8152000.0,8328000.0,8235000.0,5.207000e+06,9.276318e+06
United Arab Emirates,,,,,,,,,,,...,,,,19313000.0,20894000.0,21805000.0,23092000.0,25282000.0,8.084000e+06,1.974500e+07
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Samoa,,,,,,,,,,,...,135000.0,125000.0,132000.0,139000.0,146000.0,158000.0,172000.0,181000.0,2.390000e+04,1.305933e+05
"Yemen, Rep.",,,,,,,,,,,...,1282000.0,1323000.0,1218000.0,398000.0,,,,,,1.071833e+06
South Africa,4684000.0,5186000.0,5170000.0,5898000.0,6026000.0,6001000.0,5908000.0,6550000.0,6640000.0,6815000.0,...,13069000.0,14318000.0,14530000.0,13952000.0,15121000.0,14975000.0,15004000.0,14797000.0,3.886600e+06,9.477946e+06
Zambia,163000.0,264000.0,341000.0,362000.0,404000.0,457000.0,492000.0,565000.0,413000.0,515000.0,...,859000.0,915000.0,947000.0,932000.0,956000.0,1009000.0,1072000.0,1266000.0,5.020000e+05,6.928462e+05


***

## Indexing and Slicing


To get the data for column 2020:


In [31]:
df_tourism['2020']  # returns a series

Country
Aruba                            NaN
Angola                           NaN
Albania                 2.658000e+06
Andorra                 5.207000e+06
United Arab Emirates    8.084000e+06
                            ...     
Samoa                   2.390000e+04
Yemen, Rep.                      NaN
South Africa            3.886600e+06
Zambia                  5.020000e+05
Zimbabwe                6.390000e+05
Name: 2020, Length: 205, dtype: float64

To get the data for years 2018 and 2020:


In [32]:
df_tourism[['2018', '2020']] # returns a dataframe

Unnamed: 0_level_0,2018,2020
Country,Unnamed: 1_level_1,Unnamed: 2_level_1
Aruba,1897000.0,
Angola,218000.0,
Albania,5927000.0,2.658000e+06
Andorra,8328000.0,5.207000e+06
United Arab Emirates,23092000.0,8.084000e+06
...,...,...
Samoa,172000.0,2.390000e+04
"Yemen, Rep.",,
South Africa,15004000.0,3.886600e+06
Zambia,1072000.0,5.020000e+05


### .loc() and .iloc()

.loc() and .iloc() takes two single/list/range parameters separated by ','.

The first one indicates the row and the second one indicates columns.

.iloc() is position based slicing, whereas .loc() uses labels.



In [33]:
# retrieve the complete row for Greece
df_tourism.loc['Greece']

1995       10712000.0
1996        9782000.0
1997       10588000.0
1998       11364000.0
1999       12606000.0
2000       13567000.0
2001       14678000.0
2002       14918000.0
2003       14785000.0
2004       14268000.0
2005       15938000.0
2006       17284000.0
2007              NaN
2008              NaN
2009              NaN
2010              NaN
2011              NaN
2012              NaN
2013       20112000.0
2014       24272000.0
2015       26114000.0
2016       28071000.0
2017       30161000.0
2018       33072000.0
2019       34005000.0
2020        7406000.0
Average    18185150.0
Name: Greece, dtype: float64

In [27]:
# alternate methods
df_tourism.iloc[75]

1995       10712000.0
1996        9782000.0
1997       10588000.0
1998       11364000.0
1999       12606000.0
2000       13567000.0
2001       14678000.0
2002       14918000.0
2003       14785000.0
2004       14268000.0
2005       15938000.0
2006       17284000.0
2007              NaN
2008              NaN
2009              NaN
2010              NaN
2011              NaN
2012              NaN
2013       20112000.0
2014       24272000.0
2015       26114000.0
2016       28071000.0
2017       30161000.0
2018       33072000.0
2019       34005000.0
2020        7406000.0
Average    18185150.0
Name: Greece, dtype: float64

In [34]:
df_tourism[df_tourism.index == 'Greece']

Unnamed: 0_level_0,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,...,2012,2013,2014,2015,2016,2017,2018,2019,2020,Average
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Greece,10712000.0,9782000.0,10588000.0,11364000.0,12606000.0,13567000.0,14678000.0,14918000.0,14785000.0,14268000.0,...,,20112000.0,24272000.0,26114000.0,28071000.0,30161000.0,33072000.0,34005000.0,7406000.0,18185150.0


In [35]:
# Greek data for year 2020
df_tourism.loc['Greece', '2020']

7406000.0

In [47]:
# same result using positional indices
df_tourism.iloc[71, 25]

7406000.0

In [31]:
# Greek data for years 2018, 2019 and 2020
df_tourism.loc['Greece', ['2018', '2019', '2020']]

2018    33072000.0
2019    34005000.0
2020     7406000.0
Name: Greece, dtype: float64

In [48]:
# Same result with .loc and slicing
df_tourism.loc['Greece', '2018':'2020']

2018    33072000.0
2019    34005000.0
2020     7406000.0
Name: Greece, dtype: float64

In [49]:
# same result using positional indices
df_tourism.iloc[71, [23, 24, 25]]

2018    33072000.0
2019    34005000.0
2020     7406000.0
Name: Greece, dtype: float64

In [50]:
# same result using slicing
df_tourism.iloc[71, 23:26]

2018    33072000.0
2019    34005000.0
2020     7406000.0
Name: Greece, dtype: float64

In [51]:
# Return all countries for 2019 and 2020
df_tourism.loc[:,['2019', '2020']]

Unnamed: 0_level_0,2019,2020
Country,Unnamed: 1_level_1,Unnamed: 2_level_1
Aruba,1951000.0,
Angola,218000.0,
Albania,6406000.0,2.658000e+06
Andorra,8235000.0,5.207000e+06
United Arab Emirates,25282000.0,8.084000e+06
...,...,...
Samoa,181000.0,2.390000e+04
"Yemen, Rep.",,
South Africa,14797000.0,3.886600e+06
Zambia,1266000.0,5.020000e+05


In [52]:
# Return all countries for 2019 and 2020
df_tourism.loc[:,['2019', '2020']]

Unnamed: 0_level_0,2019,2020
Country,Unnamed: 1_level_1,Unnamed: 2_level_1
Aruba,1951000.0,
Angola,218000.0,
Albania,6406000.0,2.658000e+06
Andorra,8235000.0,5.207000e+06
United Arab Emirates,25282000.0,8.084000e+06
...,...,...
Samoa,181000.0,2.390000e+04
"Yemen, Rep.",,
South Africa,14797000.0,3.886600e+06
Zambia,1266000.0,5.020000e+05


In [53]:
# Select first 5 rows for all
df_tourism.iloc[:5, :5]

Unnamed: 0_level_0,1995,1996,1997,1998,1999
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Aruba,912000.0,957000.0,947000.0,906000.0,972000.0
Angola,9000.0,21000.0,45000.0,52000.0,45000.0
Albania,304000.0,287000.0,119000.0,184000.0,371000.0
Andorra,,,,,9422000.0
United Arab Emirates,,,,,


In [54]:
# Select first 5 rows for years 2018, 2019 and 2020
df_tourism.iloc[:5, 23:26]

Unnamed: 0_level_0,2018,2019,2020
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Aruba,1897000.0,1951000.0,
Angola,218000.0,218000.0,
Albania,5927000.0,6406000.0,2658000.0
Andorra,8328000.0,8235000.0,5207000.0
United Arab Emirates,23092000.0,25282000.0,8084000.0


### Filtering with conditions

To filter the dataframe based on a condition, we simply pass the condition as a boolean vector.

First, let's import the greek restaurants dataset into a new dataframe:

In [39]:
df_restaurants = pd.read_csv('tripadvisor_restaurants_greece.csv')
df_restaurants

Unnamed: 0,restaurant_name,original_location,country,region,province,city,address,latitude,longitude,claimed,...,excellent,very_good,average,poor,terrible,food,service,value,atmosphere,keywords
0,O Andreas,"[""Europe"", ""Greece"", ""Northeast Aegean Islands...",Greece,Northeast Aegean Islands,Thasos,Kallirachi,Kallirachi Greece,,,Unclaimed,...,1.0,0.0,0.0,0.0,0.0,,,,,
1,Yogart,"[""Europe"", ""Greece"", ""Peloponnese"", ""Argolis R...",Greece,Peloponnese,Argolis Region,Tolon,"13 Sekeri, Tolon 210 56 Greece",37.520878,22.859404,Claimed,...,26.0,0.0,1.0,1.0,0.0,5.0,5.0,4.5,,
2,Thalassa Tavern,"[""Europe"", ""Greece"", ""Peloponnese"", ""Argolis R...",Greece,Peloponnese,Argolis Region,Tolon,"14 Atkis st, Tolon 210 56 Greece",37.520290,22.859080,Unclaimed,...,5.0,3.0,0.0,0.0,2.0,4.0,4.5,4.0,,
3,Bob's Snack Cafe,"[""Europe"", ""Greece"", ""Peloponnese"", ""Argolis R...",Greece,Peloponnese,Argolis Region,Tolon,"Sekeri 59, Tolon 21056 Greece",37.517820,22.858550,Claimed,...,10.0,8.0,3.0,0.0,0.0,4.0,4.0,4.0,,
4,Ormos,"[""Europe"", ""Greece"", ""Peloponnese"", ""Argolis R...",Greece,Peloponnese,Argolis Region,Tolon,"Aktis 8, Tolon 21056 Greece",37.520450,22.859852,Claimed,...,21.0,2.0,1.0,0.0,0.0,4.5,5.0,4.5,,"fresh fish, seafood, big portions, visited thi..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
33758,Το Giasemi,"[""Europe"", ""Greece"", ""Crete"", ""Heraklion Prefe...",Greece,Crete,Heraklion Prefecture,Fodele,"Fodele, Crete 71500 Greece",35.381607,24.957891,Unclaimed,...,15.0,2.0,0.0,0.0,0.0,5.0,5.0,5.0,,
33759,Virage Café & Bar,"[""Europe"", ""Greece"", ""Crete"", ""Chania Prefectu...",Greece,Crete,Chania Prefecture,Kalamaki,"Kalamaki, Chania Town, Crete 73100 Greece",35.512974,23.967583,Claimed,...,17.0,2.0,5.0,2.0,2.0,4.5,4.5,4.5,,
33760,Salavantes - Garden Restaurant & bar,"[""Europe"", ""Greece"", ""Crete"", ""Chania Prefectu...",Greece,Crete,Chania Prefecture,Kalamaki,"5th km Chania-Kissamos old road, Kalamaki, Cha...",35.512150,23.966623,Claimed,...,91.0,15.0,2.0,0.0,1.0,4.5,5.0,4.5,,"salmon, seafood, friendly staff and delicious ..."
33761,Kalamaki Restaurant Beach Bar,"[""Europe"", ""Greece"", ""Crete"", ""Chania Prefectu...",Greece,Crete,Chania Prefecture,Kalamaki,"PEO Kissamou Chanion Kalamaki Beach, Kalamaki,...",35.513256,23.969404,Claimed,...,85.0,62.0,29.0,6.0,8.0,4.0,4.5,4.0,4.0,"sea bream, filet, tzatziki, burger, great plac..."


Let's view the columns of the dataset:

In [40]:
df_restaurants.columns

Index(['restaurant_name', 'original_location', 'country', 'region', 'province',
       'city', 'address', 'latitude', 'longitude', 'claimed', 'awards',
       'popularity_detailed', 'popularity_generic', 'top_tags', 'price_level',
       'price_range', 'meals', 'cuisines', 'special_diets', 'features',
       'vegetarian_friendly', 'vegan_options', 'gluten_free',
       'original_open_hours', 'open_days_per_week', 'open_hours_per_week',
       'working_shifts_per_week', 'avg_rating', 'total_reviews_count',
       'default_language', 'reviews_count_in_default_language', 'excellent',
       'very_good', 'average', 'poor', 'terrible', 'food', 'service', 'value',
       'atmosphere', 'keywords'],
      dtype='object')

To find the unique values for a specific column, e.g. region:

In [41]:
df_restaurants.region.unique()

array(['Northeast Aegean Islands', 'Peloponnese', 'Crete', 'South Aegean',
       'Attica', 'Central Macedonia', 'West Greece', 'Epirus', 'Thessaly',
       'Ionian Islands', 'Central Greece', 'Sporades',
       'East Macedonia and Thrace', 'West Macedonia', nan], dtype=object)

In [42]:
# create the condition boolean series
condition = df_restaurants['region'] == 'Attica'
print(condition)

0        False
1        False
2        False
3        False
4        False
         ...  
33758    False
33759    False
33760    False
33761    False
33762    False
Name: region, Length: 33763, dtype: bool


In [43]:
# pass this condition into the dataFrame
df_restaurants[condition]

Unnamed: 0,restaurant_name,original_location,country,region,province,city,address,latitude,longitude,claimed,...,excellent,very_good,average,poor,terrible,food,service,value,atmosphere,keywords
144,Paul Boulangerie Patisserie,"[""Europe"", ""Greece"", ""Attica"", ""Kifissia""]",Greece,Attica,,Kifissia,"4 Levidou, Kifissia 145 62 Greece",38.072334,23.813755,Unclaimed,...,4.0,11.0,5.0,7.0,10.0,3.5,3.0,3.0,,
145,Recipe,"[""Europe"", ""Greece"", ""Attica"", ""Kifissia""]",Greece,Attica,,Kifissia,"12 Kassaveti, Kifissia 145 62 Greece",38.072063,23.814125,Claimed,...,6.0,6.0,2.0,0.0,1.0,4.0,4.5,4.0,,
146,To Koytouki,"[""Europe"", ""Greece"", ""Attica"", ""Kifissia""]",Greece,Attica,,Kifissia,"Lewforos Khfhsίas 308 kai Krnths, Kifissia 145...",38.083504,23.816168,Unclaimed,...,8.0,1.0,0.0,0.0,0.0,4.0,4.0,4.0,,
147,Tilemaxos,"[""Europe"", ""Greece"", ""Attica"", ""Kifissia""]",Greece,Attica,,Kifissia,"Fragopoulou 19, Kifissia 145 61 Greece",38.078064,23.799950,Claimed,...,15.0,15.0,5.0,4.0,1.0,4.0,4.0,3.5,3.5,
148,Alevri kai Nero,"[""Europe"", ""Greece"", ""Attica"", ""Kifissia""]",Greece,Attica,,Kifissia,"45 Patron, Kifissia 145 64 Greece",38.096710,23.798890,Unclaimed,...,0.0,0.0,0.0,0.0,1.0,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
33746,Casanova Pizza & Vino,"[""Europe"", ""Greece"", ""Attica"", ""Poros""]",Greece,Attica,,Poros,"POROS GREECE, Poros 180 20 Greece",37.498386,23.454212,Claimed,...,55.0,4.0,3.0,1.0,0.0,5.0,4.5,4.5,,"pizza, prosciutto, salad, background music, vi..."
33747,Sail Cocktail Bar,"[""Europe"", ""Greece"", ""Attica"", ""Poros""]",Greece,Attica,,Poros,"Paraliaki Odos Poroy, Poros 180 20 Greece",37.498400,23.455635,Unclaimed,...,5.0,0.0,0.0,0.0,1.0,,,,,
33748,Odyssey Poros,"[""Europe"", ""Greece"", ""Attica"", ""Poros""]",Greece,Attica,,Poros,"Askeli, Kiani Akti Odyssey Apartments, Poros 1...",37.508907,23.470161,Claimed,...,72.0,7.0,3.0,1.0,2.0,5.0,5.0,5.0,,
33749,Sofrano Cafe Bar,"[""Europe"", ""Greece"", ""Attica"", ""Poros""]",Greece,Attica,,Poros,Poros Greece,37.498272,23.453160,Unclaimed,...,5.0,2.0,0.0,0.0,1.0,4.5,5.0,4.5,,


In [44]:
# we can pass multiple criteria in the same line.
df_restaurants[(df_restaurants['region'] == 'Attica') & (df_restaurants['claimed'] == 'Claimed')]

# note: When using 'and' and 'or' operators, pandas requires we use '&' and '|' instead of 'and' and 'or'
# don't forget to enclose the two conditions in parentheses

Unnamed: 0,restaurant_name,original_location,country,region,province,city,address,latitude,longitude,claimed,...,excellent,very_good,average,poor,terrible,food,service,value,atmosphere,keywords
145,Recipe,"[""Europe"", ""Greece"", ""Attica"", ""Kifissia""]",Greece,Attica,,Kifissia,"12 Kassaveti, Kifissia 145 62 Greece",38.072063,23.814125,Claimed,...,6.0,6.0,2.0,0.0,1.0,4.0,4.5,4.0,,
147,Tilemaxos,"[""Europe"", ""Greece"", ""Attica"", ""Kifissia""]",Greece,Attica,,Kifissia,"Fragopoulou 19, Kifissia 145 61 Greece",38.078064,23.799950,Claimed,...,15.0,15.0,5.0,4.0,1.0,4.0,4.0,3.5,3.5,
149,Salmatanis,"[""Europe"", ""Greece"", ""Attica"", ""Kifissia""]",Greece,Attica,,Kifissia,"Syggrou 18, Kifissia 14562 Greece",38.068270,23.815060,Claimed,...,2.0,1.0,0.0,0.0,1.0,3.5,4.0,3.5,,
151,Klytemnistra Live Music Bar Restaurant Cafe,"[""Europe"", ""Greece"", ""Attica"", ""Kifissia""]",Greece,Attica,,Kifissia,"Kolokotroni 9, Kifissia 145 62 Greece",38.072704,23.816504,Claimed,...,5.0,2.0,0.0,0.0,0.0,4.5,4.5,4.5,,
152,The Gozleme,"[""Europe"", ""Greece"", ""Attica"", ""Kifissia""]",Greece,Attica,,Kifissia,"Leoforos Kifisias 238-240 Mela Mall, Kifissia ...",38.072730,23.812500,Claimed,...,0.0,1.0,0.0,0.0,0.0,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
33743,Mr. Falafel,"[""Europe"", ""Greece"", ""Attica"", ""Piraeus Region...",Greece,Attica,Piraeus Region,Korydallos,"St. Taksiarxon 15 Taksiarxon 15, Korydallos 18...",37.943634,23.646166,Claimed,...,0.0,0.0,1.0,0.0,0.0,,,,,
33744,Simply Burgers Korydallos,"[""Europe"", ""Greece"", ""Attica"", ""Piraeus Region...",Greece,Attica,Piraeus Region,Korydallos,"238 Leoforos Labraki Grigoriou, Korydallos 181...",37.980778,23.649208,Claimed,...,2.0,7.0,7.0,1.0,1.0,4.0,4.5,2.5,,
33746,Casanova Pizza & Vino,"[""Europe"", ""Greece"", ""Attica"", ""Poros""]",Greece,Attica,,Poros,"POROS GREECE, Poros 180 20 Greece",37.498386,23.454212,Claimed,...,55.0,4.0,3.0,1.0,0.0,5.0,4.5,4.5,,"pizza, prosciutto, salad, background music, vi..."
33748,Odyssey Poros,"[""Europe"", ""Greece"", ""Attica"", ""Poros""]",Greece,Attica,,Poros,"Askeli, Kiani Akti Odyssey Apartments, Poros 1...",37.508907,23.470161,Claimed,...,72.0,7.0,3.0,1.0,2.0,5.0,5.0,5.0,,
