# Travel Data inbound worldwide

## Notebook Description
This notebook is used to prepare and provide data on the inbound tourism worldwide

## Data Source

**UNWTO - The World Tourism Organization**
The World Tourism Organization (UNWTO) is the United Nations agency responsible for the promotion of responsible, sustainable and universally accessible tourism.

**UNWTO TOURISM DATA DASHBOARD**
The UNWTO Tourism Data Dashboard – provides statistics and insights on key indicators for inbound and outbound tourism at the global, regional and national levels.

Data are collected from countries by UNWTO through a series of yearly questionnaires that are in line with the International Recommendations for Tourism Statistics (IRTS 2008) standard led by UNWTO and approved by the United Nations.

The latest update took place on 22 December 2022.

Access the data: https://www.unwto.org/tourism-statistics/key-tourism-statistics

## Imports

In [1]:
import pandas as pd
import numpy as np

## Data Loading

In [2]:
df_in = pd.read_csv('data/inbound_tourism.csv', sep=';', thousands='.', decimal=',')

In [3]:
df_in.columns

Index(['Basic data and indicators', 'Unnamed: 1', 'Unnamed: 2', 'Unnamed: 3',
       'Unnamed: 4', 'Units', 'Notes', 'Series', '1995', '1996', '1997',
       '1998', '1999', '2000', '2001', '2002', '2003', '2004', '2005', '2006',
       '2007', '2008', '2009', '2010', '2011', '2012', '2013', '2014', '2015',
       '2016', '2017', '2018', '2019', '2020', '2021', 'Unnamed: 35',
       'Unnamed: 36', 'Unnamed: 37', 'Unnamed: 38', 'Unnamed: 39',
       'Unnamed: 40'],
      dtype='object')

In [4]:
col_year = ['1995', '1996', '1997',
       '1998', '1999', '2000', '2001', '2002', '2003', '2004', '2005', '2006',
       '2007', '2008', '2009', '2010', '2011', '2012', '2013', '2014', '2015',
       '2016', '2017', '2018', '2019', '2020', '2021']

In [5]:
for i in col_year:
    df_in[i] = df_in[i].astype(str).str.replace('.', '')


  df_in[i] = df_in[i].astype(str).str.replace('.', '')


In [6]:
df_in['Unnamed: 3']
df_in['Unnamed: 3'] = df_in['Unnamed: 3'].fillna(df_in['Unnamed: 2'])

In [7]:
df_in['Unnamed: 3'] = df_in['Unnamed: 3'].fillna(df_in['Unnamed: 4'])
df_in['Unnamed: 3'].head(10)

0                                  NaN
1                                  NaN
2                       Total arrivals
3       Overnights visitors (tourists)
4    Same-day visitors (excursionists)
5          of which, cruise passengers
6                                  NaN
7                                Total
8                               Africa
9                             Americas
Name: Unnamed: 3, dtype: object

In [8]:
df_in['Basic data and indicators'] = df_in['Basic data and indicators'].fillna(method='ffill')
df_in['Unnamed: 1'] = df_in['Unnamed: 1'].fillna(method='ffill')
df_in.head()

Unnamed: 0,Basic data and indicators,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Units,Notes,Series,1995,1996,...,2018,2019,2020,2021,Unnamed: 35,Unnamed: 36,Unnamed: 37,Unnamed: 38,Unnamed: 39,Unnamed: 40
0,AFGHANISTAN,,,,,,,,,,...,,,,,,,,,,
1,AFGHANISTAN,Arrivals,,,,,,,,,...,,,,,,,,,,
2,AFGHANISTAN,Arrivals,Total arrivals,Total arrivals,,Thousands,,,,,...,,,,,,,,,,
3,AFGHANISTAN,Arrivals,,Overnights visitors (tourists),,Thousands,,,,,...,,,,,,,,,,
4,AFGHANISTAN,Arrivals,,Same-day visitors (excursionists),,Thousands,,,,,...,,,,,,,,,,


In [9]:
row_drop = df_in[df_in['Unnamed: 3'].isna()].index
df_in.drop(row_drop, inplace = True)

df_in.head()

Unnamed: 0,Basic data and indicators,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Units,Notes,Series,1995,1996,...,2018,2019,2020,2021,Unnamed: 35,Unnamed: 36,Unnamed: 37,Unnamed: 38,Unnamed: 39,Unnamed: 40
2,AFGHANISTAN,Arrivals,Total arrivals,Total arrivals,,Thousands,,,,,...,,,,,,,,,,
3,AFGHANISTAN,Arrivals,,Overnights visitors (tourists),,Thousands,,,,,...,,,,,,,,,,
4,AFGHANISTAN,Arrivals,,Same-day visitors (excursionists),,Thousands,,,,,...,,,,,,,,,,
5,AFGHANISTAN,Arrivals,,"of which, cruise passengers","of which, cruise passengers",Thousands,,,,,...,,,,,,,,,,
7,AFGHANISTAN,Arrivals by region,Total,Total,,Thousands,,,,,...,,,,,,,,,,


In [10]:
df_in.drop(['Unnamed: 2', 'Unnamed: 4', 'Notes', 'Series', 'Unnamed: 35', 'Unnamed: 36', 'Unnamed: 37', 'Unnamed: 38',	'Unnamed: 39',	'Unnamed: 40'], axis=1, inplace = True)

In [11]:
df_in.rename(columns=lambda x : x.lower(), inplace=True)
df_in.rename(columns=lambda x : x.replace(' ', '_'), inplace=True)
df_in.columns

Index(['basic_data_and_indicators', 'unnamed:_1', 'unnamed:_3', 'units',
       '1995', '1996', '1997', '1998', '1999', '2000', '2001', '2002', '2003',
       '2004', '2005', '2006', '2007', '2008', '2009', '2010', '2011', '2012',
       '2013', '2014', '2015', '2016', '2017', '2018', '2019', '2020', '2021'],
      dtype='object')

In [12]:
df_in = df_in.rename(columns={'basic_data_and_indicators':'iso3', 'unnamed:_1':'category', 'unnamed:_3': 'ind'})

In [13]:
df_in.columns

Index(['iso3', 'category', 'ind', 'units', '1995', '1996', '1997', '1998',
       '1999', '2000', '2001', '2002', '2003', '2004', '2005', '2006', '2007',
       '2008', '2009', '2010', '2011', '2012', '2013', '2014', '2015', '2016',
       '2017', '2018', '2019', '2020', '2021'],
      dtype='object')

In [14]:
col_to_keep = ['iso3', 
               'category', 
               'ind', 
               'units', 
                '2012', 
                '2013', 
                '2014', 
                '2015',
                '2016', 
                '2017', 
                '2018', 
                '2019', 
                '2020', 
                '2021'
]

In [15]:
df_in = df_in.loc[:,col_to_keep]

In [16]:
df_in.columns

Index(['iso3', 'category', 'ind', 'units', '2012', '2013', '2014', '2015',
       '2016', '2017', '2018', '2019', '2020', '2021'],
      dtype='object')

In [17]:
df_in.head()

Unnamed: 0,iso3,category,ind,units,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021
2,AFGHANISTAN,Arrivals,Total arrivals,Thousands,,,,,,,,,,
3,AFGHANISTAN,Arrivals,Overnights visitors (tourists),Thousands,,,,,,,,,,
4,AFGHANISTAN,Arrivals,Same-day visitors (excursionists),Thousands,,,,,,,,,,
5,AFGHANISTAN,Arrivals,"of which, cruise passengers",Thousands,,,,,,,,,,
7,AFGHANISTAN,Arrivals by region,Total,Thousands,,,,,,,,,,


In [18]:
col_year = ['2012', 
            '2013', 
            '2014', 
            '2015',
            '2016', 
            '2017', 
            '2018', 
            '2019', 
            '2020', 
            '2021']

In [19]:
df_in.category.head(10)

2               Arrivals
3               Arrivals
4               Arrivals
5               Arrivals
7     Arrivals by region
8     Arrivals by region
9     Arrivals by region
10    Arrivals by region
11    Arrivals by region
12    Arrivals by region
Name: category, dtype: object

In [20]:
sel_category = ['Arrivals by region']

In [21]:
arrivel_region = df_in[df_in['category'].isin(sel_category)]

In [22]:
print(arrivel_region[arrivel_region['iso3'] == 'GERMANY'].head())

         iso3            category                        ind      units  \
2632  GERMANY  Arrivals by region                      Total  Thousands   
2633  GERMANY  Arrivals by region                     Africa  Thousands   
2634  GERMANY  Arrivals by region                   Americas  Thousands   
2635  GERMANY  Arrivals by region  East Asia and the Pacific  Thousands   
2636  GERMANY  Arrivals by region                     Europe  Thousands   

       2012   2013   2014   2015   2016   2017   2018   2019   2020   2021  
2632  30411  31545  32999  34970  35555  37452  38881  39563  12449  11688  
2633    227    246    254    274    263    271    273    284     74     62  
2634   3155   3192   3272   3487   3509   3890   4085   4142    797    874  
2635   2666   2805   3024   3556   3489   3889   3965   3963    579    314  
2636  23121  23899  24915  25951  26568  27635  28681  29489  10595  10048  


In [23]:
arrivel_region.drop(columns='category', axis=1, inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  arrivel_region.drop(columns='category', axis=1, inplace=True)


In [24]:
arrivel_region

Unnamed: 0,iso3,ind,units,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021
7,AFGHANISTAN,Total,Thousands,,,,,,,,,,
8,AFGHANISTAN,Africa,Thousands,,,,,,,,,,
9,AFGHANISTAN,Americas,Thousands,,,,,,,,,,
10,AFGHANISTAN,East Asia and the Pacific,Thousands,,,,,,,,,,
11,AFGHANISTAN,Europe,Thousands,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...
7781,ZIMBABWE,Europe,Thousands,114,131,144,153,141,223,237,191,37,71
7782,ZIMBABWE,Middle East,Thousands,1,1,2,1,1,3,2,4,03,03
7783,ZIMBABWE,South Asia,Thousands,2,3,2,7,5,9,14,13,3,3
7784,ZIMBABWE,Other not classified,Thousands,,,,,,,,,,


In [25]:
arrivel_region.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2007 entries, 7 to 7785
Data columns (total 13 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   iso3    2007 non-null   object
 1   ind     2007 non-null   object
 2   units   2007 non-null   object
 3   2012    2007 non-null   object
 4   2013    2007 non-null   object
 5   2014    2007 non-null   object
 6   2015    2007 non-null   object
 7   2016    2007 non-null   object
 8   2017    2007 non-null   object
 9   2018    2007 non-null   object
 10  2019    2007 non-null   object
 11  2020    2007 non-null   object
 12  2021    2007 non-null   object
dtypes: object(13)
memory usage: 219.5+ KB


In [26]:
col_num = ['2012', '2013', '2014', '2015', '2016', '2017', '2018', '2019', '2020', '2021']

for i in col_num:
    #arrivel_region[i] = pd.to_numeric(arrivel_region[i], downcast='integer',  errors='coerce')
    arrivel_region[i] = pd.to_numeric(arrivel_region[i], errors='coerce')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  arrivel_region[i] = pd.to_numeric(arrivel_region[i], errors='coerce')


In [27]:
arrivel_region.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2007 entries, 7 to 7785
Data columns (total 13 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   iso3    2007 non-null   object 
 1   ind     2007 non-null   object 
 2   units   2007 non-null   object 
 3   2012    1167 non-null   float64
 4   2013    1178 non-null   float64
 5   2014    1169 non-null   float64
 6   2015    1186 non-null   float64
 7   2016    1187 non-null   float64
 8   2017    1180 non-null   float64
 9   2018    1165 non-null   float64
 10  2019    1150 non-null   float64
 11  2020    1081 non-null   float64
 12  2021    956 non-null    float64
dtypes: float64(10), object(3)
memory usage: 219.5+ KB


In [28]:
from transform_esg import melt_pivot

In [29]:
arrivel_region_trans = melt_pivot(arrivel_region)

In [168]:
arrivel_region_trans.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2230 entries, 0 to 2229
Data columns (total 11 columns):
 #   Column                               Non-Null Count  Dtype  
---  ------                               --------------  -----  
 0   iso3                                 2230 non-null   object 
 1   year                                 2230 non-null   object 
 2   africa                               1261 non-null   float64
 3   americas                             1751 non-null   float64
 4   east_asia_and_the_pacific            1555 non-null   float64
 5   europe                               1765 non-null   float64
 6   middle_east                          979 non-null    float64
 7   other_not_classified                 1032 non-null   float64
 8   south_asia                           1047 non-null   float64
 9   total                                1780 non-null   float64
 10  of_which,_nationals_residing_abroad  249 non-null    float64
dtypes: float64(9), object(2)
memor

In [30]:
arrivel_region_trans.rename(columns=lambda x : x.lower(), inplace=True)
arrivel_region_trans.rename(columns=lambda x : x.replace(' ', '_'), inplace=True)
arrivel_region_trans.rename({'variable':'year'}, axis=1, inplace=True)
arrivel_region_trans.columns

Index(['iso3', 'year', 'africa', 'americas', 'east_asia_and_the_pacific',
       'europe', 'middle_east', 'other_not_classified', 'south_asia', 'total',
       'of_which,_nationals_residing_abroad'],
      dtype='object', name='ind')

In [31]:
arrivel_region_trans.head()

ind,iso3,year,africa,americas,east_asia_and_the_pacific,europe,middle_east,other_not_classified,south_asia,total,"of_which,_nationals_residing_abroad"
0,AFGHANISTAN,2012,,,,,,,,,
1,AFGHANISTAN,2013,,,,,,,,,
2,AFGHANISTAN,2014,,,,,,,,,
3,AFGHANISTAN,2015,,,,,,,,,
4,AFGHANISTAN,2016,,,,,,,,,


In [32]:
arrivel_region_trans.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2230 entries, 0 to 2229
Data columns (total 11 columns):
 #   Column                               Non-Null Count  Dtype  
---  ------                               --------------  -----  
 0   iso3                                 2230 non-null   object 
 1   year                                 2230 non-null   object 
 2   africa                               1261 non-null   float64
 3   americas                             1751 non-null   float64
 4   east_asia_and_the_pacific            1555 non-null   float64
 5   europe                               1765 non-null   float64
 6   middle_east                          979 non-null    float64
 7   other_not_classified                 1032 non-null   float64
 8   south_asia                           1047 non-null   float64
 9   total                                1780 non-null   float64
 10  of_which,_nationals_residing_abroad  249 non-null    float64
dtypes: float64(9), object(2)
memor

In [33]:
arrivel_region_trans['iso3'].nunique()

223

In [34]:
from country_iso_dict import country_iso_dict

193


In [135]:
country_iso_dict = {key.upper(): value for key, value in country_iso_dict.items()}

In [136]:
arrivel_region_iso = arrivel_region_trans.copy()

In [137]:
arrivel_region_iso['iso3'] = arrivel_region_iso['iso3'].map(country_iso_dict).fillna(arrivel_region_iso['iso3'])

In [138]:
arrivel_region_iso

ind,iso3,year,africa,americas,east_asia_and_the_pacific,europe,middle_east,other_not_classified,south_asia,total,"of_which,_nationals_residing_abroad"
0,AFG,2012,,,,,,,,,
1,AFG,2013,,,,,,,,,
2,AFG,2014,,,,,,,,,
3,AFG,2015,,,,,,,,,
4,AFG,2016,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...
2225,ZWE,2017,1948.0,121.0,120.0,223.0,3.0,,9.0,2423.0,
2226,ZWE,2018,2064.0,120.0,142.0,237.0,2.0,,14.0,2580.0,
2227,ZWE,2019,1872.0,101.0,113.0,191.0,4.0,,13.0,2294.0,
2228,ZWE,2020,568.0,17.0,14.0,37.0,,,3.0,639.0,


In [139]:
iso3_unique = set(arrivel_region_iso['iso3'].unique())
off_unique = set(list_off)

missing_iso = off_unique - iso3_unique
missing_iso

{'BOL',
 'CIV',
 'COD',
 'CZE',
 'FSM',
 'GBR',
 'IRN',
 'KOR',
 'LAO',
 'MDA',
 'PRK',
 'TUR',
 'VEN'}

In [140]:
country_iso3_spec = {
    'BOLIVIA, PLURINATIONAL STATE OF': 'BOL',
    'COTE D´IVOIRE': 'CIV',
    'CONGO, DEMOCRATIC REPUBLIC OF THE': 'COD',
    'CZECH REPUBLIC (CZECHIA)': 'CZE',
    'MICRONESIA, FEDERATED STATES OF': 'FSM',
    'UNITED KINGDOM': 'GBR',
    'IRAN, ISLAMIC REPUBLIC OF': 'IRN',
    'KOREA, REPUBLIC OF': 'KOR',
    'LAO PEOPLE´S DEMOCRATIC REPUBLIC': 'LAO',
    'MOLDOVA, REPUBLIC OF': 'MDA',
    'KOREA, DEMOCRATIC PEOPLE´S REPUBLIC OF': 'PRK',
    'TÜRKIYE': 'TUR',
    'VENEZUELA, BOLIVARIAN REPUBLIC OF': 'VEN'
}

In [141]:
arrivel_region_iso['iso3'] = arrivel_region_iso['iso3'].replace(country_iso3_spec)

In [142]:
arrivel_region_iso['iso3'].nunique()

223

In [143]:
arrivel_region_iso['iso3']

0       AFG
1       AFG
2       AFG
3       AFG
4       AFG
       ... 
2225    ZWE
2226    ZWE
2227    ZWE
2228    ZWE
2229    ZWE
Name: iso3, Length: 2230, dtype: object

In [65]:
from country_iso_dict import list_off

In [145]:
print(len(list_off))

193


In [146]:
arrivel_region_off = arrivel_region_iso[arrivel_region_iso['iso3'].isin(list_off)]

In [147]:
arrivel_region_off['iso3'].nunique()

193

In [149]:
trans_iso3_unique = set(arrivel_region_iso['iso3'].unique())
off_iso3_unique = set(arrivel_region_off['iso3'].unique())

missing_iso3 = trans_iso3_unique - off_iso3_unique
missing_iso3

{'ABW',
 'AIA',
 'ASM',
 'BMU',
 'BONAIRE',
 'BRITISH VIRGIN ISLANDS',
 'COK',
 'CUW',
 'CYM',
 'GLP',
 'GUF',
 'GUM',
 'HONG KONG, CHINA',
 'MACAO, CHINA',
 'MNP',
 'MSR',
 'MTQ',
 'NCL',
 'NIU',
 'PRI',
 'PYF',
 'REUNION',
 'SABA',
 'SERBIA AND MONTENEGRO',
 'SINT EUSTATIUS',
 'STATE OF PALESTINE',
 'SXM',
 'TAIWAN PROVINCE OF CHINA',
 'TCA',
 'UNITED STATES VIRGIN ISLANDS'}

ABW: Aruba is a constituent country of the Kingdom of the Netherlands, which is a UN member state.  
AIA: Anguilla is a British Overseas Territory, which is not a UN member state.  
ASM: American Samoa is an unincorporated territory of the United States, which is a UN member state.  
BMU: Bermuda is a British Overseas Territory, which is not a UN member state.  
BONAIRE: Bonaire is a special municipality of the Netherlands, which is a UN member state.  
BRITISH VIRGIN ISLANDS: The British Virgin Islands is a British Overseas Territory, which is not a UN member state.  
COK: Cook Islands is a self-governing territory in free association with New Zealand, which is a UN member state.  
CUW: Curaçao is a constituent country of the Kingdom of the Netherlands, which is a UN member state.  
CYM: Cayman Islands is a British Overseas Territory, which is not a UN member state.  
GLP: Guadeloupe is an overseas department and region of France, which is a UN member state.  
GUF: French Guiana is an overseas department and region of France, which is a UN member state.  
GUM: Guam is an unincorporated territory of the United States, which is a UN member state.  
HONG KONG, CHINA: Hong Kong, China is a special administrative region of China, which is a UN member state.  
MACAO, CHINA: Macao, China is a special administrative region of China, which is a UN member state.  
MNP: Northern Mariana Islands is a commonwealth in political union with the United States, which is a UN member state.  
MSR: Montserrat is a British Overseas Territory, which is not a UN member state.  
MTQ: Martinique is an overseas department and region of France, which is a UN member state.  
NCL: New Caledonia is a special collectivity of France, which is a UN member state.  
NIU: Niue is a self-governing territory in free association with New Zealand, which is a UN member state.  
PRI: Puerto Rico is an unincorporated territory of the United States, which is a UN member state.  
PYF: French Polynesia is an overseas collectivity of France, which is a UN member state.  
REUNION: Réunion is an overseas department and region of France, which is a UN member state.  
SABA: Saba is a special municipality of the Netherlands, which is a UN member state.  
SERBIA AND MONTENEGRO: Serbia and Montenegro was a former country, and it is not a current UN member state.  
SINT EUSTATIUS: Sint Eustatius is a special municipality of the Netherlands, which is a UN member state.  
STATE OF PALESTINE: The State of Palestine has observer status at the United Nations General Assembly.  
SXM: Sint Maarten is a constituent country of the Kingdom of the Netherlands, which is a UN member state.  
TAIWAN PROVINCE OF CHINA: Taiwan Province of China is not a UN member state.  
TCA: Turks and Caicos Islands is a British Overseas Territory, which is not a UN member state.  
UNITED STATES VIRGIN ISLANDS: The United States Virgin Islands is an unincorporated territory of the United States, which is a UN member state.  

In [151]:
from transform_esg import per_null

In [152]:
per_null(arrivel_region_off)

ind
iso3                                    0.000000
year                                    0.000000
africa                                 37.772021
americas                               19.067358
east_asia_and_the_pacific              25.181347
europe                                 18.393782
middle_east                            51.088083
other_not_classified                   53.523316
south_asia                             47.305699
total                                  18.341969
of_which,_nationals_residing_abroad    87.098446
dtype: float64


In [153]:
arrivel_region_off.columns

Index(['iso3', 'year', 'africa', 'americas', 'east_asia_and_the_pacific',
       'europe', 'middle_east', 'other_not_classified', 'south_asia', 'total',
       'of_which,_nationals_residing_abroad'],
      dtype='object', name='ind')

In [154]:
col_lst = ['africa', 'americas', 'east_asia_and_the_pacific',
       'europe', 'middle_east', 'other_not_classified', 'south_asia', 'total',
       'of_which,_nationals_residing_abroad']

In [155]:
from transform_esg import interpol

In [156]:
arrival_regions_imp = interpol(arrivel_region_off, col_lst)

In [158]:
per_null(arrival_regions_imp)

ind
iso3                                    0.000000
year                                    0.000000
africa                                 28.497409
americas                               13.989637
east_asia_and_the_pacific              19.689119
europe                                 13.471503
middle_east                            40.414508
other_not_classified                   38.860104
south_asia                             36.787565
total                                  13.471503
of_which,_nationals_residing_abroad    84.455959
dtype: float64


In [159]:
arrivel_region_off['iso3'].nunique()

193

In [161]:
import matplotlib.pyplot as plt

In [160]:
country_lst = arrivel_region_off['iso3'].unique().tolist()

In [54]:
# for i in country_lst:
#     arrivel_region_trans[arrivel_region_trans['iso3'] == i].plot(x='year')
#     plt.title(f"{i}")
    

In [162]:
arrivel_region_off['year'] = pd.to_datetime(arrivel_region_off['year'], format='%Y', errors='coerce')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  arrivel_region_off['year'] = pd.to_datetime(arrivel_region_off['year'], format='%Y', errors='coerce')


In [164]:
arrivel_region_off.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1930 entries, 0 to 2229
Data columns (total 11 columns):
 #   Column                               Non-Null Count  Dtype         
---  ------                               --------------  -----         
 0   iso3                                 1930 non-null   object        
 1   year                                 1930 non-null   datetime64[ns]
 2   africa                               1380 non-null   float64       
 3   americas                             1660 non-null   float64       
 4   east_asia_and_the_pacific            1550 non-null   float64       
 5   europe                               1670 non-null   float64       
 6   middle_east                          1150 non-null   float64       
 7   other_not_classified                 1180 non-null   float64       
 8   south_asia                           1220 non-null   float64       
 9   total                                1670 non-null   float64       
 10  of_which,_na

In [163]:
arrivel_region_off.to_csv('data/arrival_regions.csv')

In [165]:
from sql_functions import get_engine
import psycopg2

In [166]:
schema = 'capstone_travel_index'
engine = get_engine()

In [167]:
# table_name = 'arrival_regions'

# if engine!=None:
#     try:
#         arrivel_region_off.to_sql(name=table_name, # Name of SQL table
#                         con=engine, # Engine or connection
#                         if_exists='replace', # Drop the table before inserting new values 
#                         schema=schema, # Use schmea that was defined earlier
#                         index=False, # Write DataFrame index as a column
#                         chunksize=5000, # Specify the number of rows in each batch to be written at a time
#                         method='multi') # Pass multiple values in a single INSERT clause
#         print(f"The {table_name} table was imported successfully.")
#     # Error handling
#     except (Exception, psycopg2.DatabaseError) as error:
#         print(error)
#         engine = None

The arrival_regions table was imported successfully.
