# 01_PREPROCESSING: World Bank FDI & GDP Data

**M·ª•c ƒë√≠ch:** X·ª≠ l√Ω d·ªØ li·ªáu t·ª´ World Bank CSV files

**Output:** `ASEAN_FDI_GDP_Data_Final.csv` (150 quan s√°t s·∫°ch s·∫Ω)

---

## 1. Import th∆∞ vi·ªán

In [17]:
import pandas as pd
import numpy as np
import os
import glob
from datetime import datetime

print("‚úÖ Th∆∞ vi·ªán imported successfully!")
print(f"Pandas version: {pd.__version__}")
print(f"NumPy version: {np.__version__}")

‚úÖ Th∆∞ vi·ªán imported successfully!
Pandas version: 2.3.3
NumPy version: 2.3.5


## 2. C·∫•u h√¨nh

In [18]:
# C√°c qu·ªëc gia ASEAN (s·ª≠ d·ª•ng m√£ ISO Alpha-3) - 9 QU·ªêC GIA (lo·∫°i b·ªè Myanmar v√† Timor-Leste)
COUNTRIES = {
    'VNM': 'Viet Nam',
    'IDN': 'Indonesia',
    'THA': 'Thailand',
    'PHL': 'Philippines',
    'MYS': 'Malaysia',
    'SGP': 'Singapore',
    'BRN': 'Brunei Darussalam',
    'KHM': 'Cambodia',
    'LAO': 'Lao PDR'
}

# C√°c ch·ªâ s·ªë
INDICATORS = {
    'BX.KLT.DINV.CD.WD': 'FDI',
    'NY.GDP.MKTP.KD.ZG': 'GDP_Growth',
    'NE.EXP.GNFS.CD.ZS': 'Exports_pct_GDP',
    'NE.IMP.GNFS.CD.ZS': 'Imports_pct_GDP'
}

# Kho·∫£ng th·ªùi gian
START_YEAR = 2000
END_YEAR = 2024

# ƒê∆∞·ªùng d·∫´n
RAW_DATA_DIR = '../01_Data/01_Raw_CSV/'
PROCESSED_DATA_DIR = '../01_Data/02_Processed/'

# T·∫°o th∆∞ m·ª•c n·∫øu ch∆∞a t·ªìn t·∫°i
os.makedirs(PROCESSED_DATA_DIR, exist_ok=True)

print(f"‚úÖ Configuration loaded:")
print(f"   - Countries: {len(COUNTRIES)} ASEAN countries (excluding Myanmar and Timor-Leste)")
print(f"   - Indicators: {len(INDICATORS)}")
print(f"   - Time period: {START_YEAR}-{END_YEAR} ({END_YEAR - START_YEAR + 1} years)")
print(f"   - Expected observations: {len(COUNTRIES) * (END_YEAR - START_YEAR + 1)} = {len(COUNTRIES)} countries √ó {END_YEAR - START_YEAR + 1} years")

‚úÖ Configuration loaded:
   - Countries: 9 ASEAN countries (excluding Myanmar and Timor-Leste)
   - Indicators: 4
   - Time period: 2000-2024 (25 years)
   - Expected observations: 225 = 9 countries √ó 25 years


## 3. H√†m x·ª≠ l√Ω file CSV

In [19]:
def process_world_bank_csv(filepath, indicator_name, countries_dict):
    """
    X·ª≠ l√Ω file CSV t·ª´ World Bank
    """
    print(f"\n‚Üí ƒêang x·ª≠ l√Ω file: {os.path.basename(filepath)}")
    
    try:
        # ƒê·ªçc file, b·ªè 4 h√†ng header
        df = pd.read_csv(filepath, skiprows=4)
        print(f"  ‚úì File loaded: {len(df)} d√≤ng")
        
        # L·∫•y country codes
        country_codes = list(countries_dict.keys())
        
        # L·ªçc ch·ªâ c√°c n∆∞·ªõc ASEAN
        df_filtered = df[df['Country Code'].isin(country_codes)].copy()
        print(f"  ‚úì Filtered: {len(df_filtered)} qu·ªëc gia ASEAN")
        
        # X·ª≠ l√Ω d·ªØ li·ªáu
        results = []
        
        for idx, row in df_filtered.iterrows():
            country_code = row['Country Code']
            country_name = countries_dict[country_code]
            
            # L·∫∑p qua t·ª´ng nƒÉm
            for year in range(START_YEAR, END_YEAR + 1):
                year_str = str(year)
                
                if year_str in df.columns:
                    value = row[year_str]
                    
                    if pd.notna(value):
                        try:
                            value = float(value)
                            results.append({
                                'Country': country_name,
                                'Country_Code': country_code,
                                'Year': year,
                                'Indicator': indicator_name,
                                'Value': value
                            })
                        except (ValueError, TypeError):
                            pass
        
        result_df = pd.DataFrame(results)
        print(f"  ‚úì Processed: {len(result_df)} data points")
        
        return result_df
    
    except FileNotFoundError:
        print(f"  ‚ùå ERROR: File not found: {filepath}")
        return None
    except Exception as e:
        print(f"  ‚ùå ERROR: {e}")
        return None

print("‚úÖ H√†m process_world_bank_csv ƒë∆∞·ª£c ƒë·ªãnh nghƒ©a")

‚úÖ H√†m process_world_bank_csv ƒë∆∞·ª£c ƒë·ªãnh nghƒ©a


## 4. T√¨m v√† x·ª≠ l√Ω c√°c file CSV

In [20]:
# T√¨m c√°c file CSV
csv_files = glob.glob('../01_Data/01_Raw_CSV/*.csv')
print(f"\nT√¨m th·∫•y {len(csv_files)} file CSV:")
for file in csv_files:
    print(f"  - {os.path.basename(file)}")



T√¨m th·∫•y 4 file CSV:
  - API_BX.KLT.DINV.CD.WD_DS2_en_csv_v2_2672.csv
  - API_NE.EXP.GNFS.ZS_DS2_en_csv_v2_4843.csv
  - API_NE.IMP.GNFS.ZS_DS2_en_csv_v2_2642.csv
  - API_NY.GDP.MKTP.KD.ZG_DS2_en_csv_v2_2509.csv


In [21]:
# Ki·ªÉm tra t√™n qu·ªëc gia trong file CSV
import pandas as pd

# ƒê·ªçc m·ªôt file m·∫´u ƒë·ªÉ xem t√™n qu·ªëc gia
sample_file = csv_files[0]
print(f"\n‚Üí Ki·ªÉm tra file: {os.path.basename(sample_file)}")

# ƒê·ªçc file (b·ªè qua 4 d√≤ng ƒë·∫ßu theo c·∫•u tr√∫c World Bank)
df_sample = pd.read_csv(sample_file, skiprows=4)
print(f"\n‚Üí C√°c c·ªôt trong file: {list(df_sample.columns)}")

# Hi·ªÉn th·ªã c√°c qu·ªëc gia trong file
print(f"\n‚Üí T·ªïng s·ªë qu·ªëc gia/v√πng: {len(df_sample)}")
print("\n‚Üí 20 qu·ªëc gia ƒë·∫ßu ti√™n trong file:")
print(df_sample[['Country Name', 'Country Code']].head(20))

# T√¨m c√°c qu·ªëc gia ASEAN trong file
print("\n‚Üí Ki·ªÉm tra c√°c qu·ªëc gia ASEAN trong danh s√°ch COUNTRIES:")
for country_name, country_code in COUNTRIES.items():
    matches = df_sample[df_sample['Country Name'] == country_name]
    if len(matches) > 0:
        print(f"  ‚úì T√¨m th·∫•y: {country_name} ({country_code})")
    else:
        print(f"  ‚úó Kh√¥ng t√¨m th·∫•y: {country_name} ({country_code})")
        # T√¨m c√°c t√™n t∆∞∆°ng t·ª±
        similar = df_sample[df_sample['Country Name'].str.contains(country_name.split()[0], case=False, na=False)]
        if len(similar) > 0:
            print(f"    ‚Üí C√≥ th·ªÉ l√†: {list(similar['Country Name'].values)}")



‚Üí Ki·ªÉm tra file: API_BX.KLT.DINV.CD.WD_DS2_en_csv_v2_2672.csv

‚Üí C√°c c·ªôt trong file: ['Country Name', 'Country Code', 'Indicator Name', 'Indicator Code', '1960', '1961', '1962', '1963', '1964', '1965', '1966', '1967', '1968', '1969', '1970', '1971', '1972', '1973', '1974', '1975', '1976', '1977', '1978', '1979', '1980', '1981', '1982', '1983', '1984', '1985', '1986', '1987', '1988', '1989', '1990', '1991', '1992', '1993', '1994', '1995', '1996', '1997', '1998', '1999', '2000', '2001', '2002', '2003', '2004', '2005', '2006', '2007', '2008', '2009', '2010', '2011', '2012', '2013', '2014', '2015', '2016', '2017', '2018', '2019', '2020', '2021', '2022', '2023', '2024']

‚Üí T·ªïng s·ªë qu·ªëc gia/v√πng: 266

‚Üí 20 qu·ªëc gia ƒë·∫ßu ti√™n trong file:
                   Country Name Country Code
0                         Aruba          ABW
1   Africa Eastern and Southern          AFE
2                   Afghanistan          AFG
3    Africa Western and Central          AFW
4       

In [22]:
# T√¨m m√£ code th·ª±c t·∫ø c·ªßa c√°c qu·ªëc gia ASEAN
print("\n‚Üí T√¨m m√£ Country Code th·ª±c t·∫ø cho c√°c qu·ªëc gia ASEAN:")
asean_countries_fullname = ['Vietnam', 'Indonesia', 'Thailand', 'Philippines', 'Malaysia', 'Singapore']

for country_fullname in asean_countries_fullname:
    matches = df_sample[df_sample['Country Name'] == country_fullname]
    if len(matches) > 0:
        code = matches.iloc[0]['Country Code']
        print(f"  ‚úì {country_fullname:20s} ‚Üí Code: {code}")
    else:
        print(f"  ‚úó {country_fullname:20s} ‚Üí Kh√¥ng t√¨m th·∫•y")



‚Üí T√¨m m√£ Country Code th·ª±c t·∫ø cho c√°c qu·ªëc gia ASEAN:
  ‚úó Vietnam              ‚Üí Kh√¥ng t√¨m th·∫•y
  ‚úì Indonesia            ‚Üí Code: IDN
  ‚úì Thailand             ‚Üí Code: THA
  ‚úì Philippines          ‚Üí Code: PHL
  ‚úì Malaysia             ‚Üí Code: MYS
  ‚úì Singapore            ‚Üí Code: SGP


In [23]:
# T√¨m Vietnam v·ªõi c√°c bi·∫øn th·ªÉ t√™n
print("\n‚Üí T√¨m Vietnam trong file:")
vietnam_variants = df_sample[df_sample['Country Name'].str.contains('Viet', case=False, na=False)]
print(vietnam_variants[['Country Name', 'Country Code']])


‚Üí T√¨m Vietnam trong file:
    Country Name Country Code
257     Viet Nam          VNM


## 5. X·ª≠ l√Ω t·ª´ng file

In [24]:
print("\n" + "="*90)
print("PH·∫¶N 2: L·ªåC V√Ä H·ª¢P NH·∫§T D·ªÆ LI·ªÜU")
print("="*90)

all_data = []

# FDI
fdi_files = [f for f in csv_files if 'FDI' in f or 'BX.KLT.DINV' in f]
if fdi_files:
    df_fdi = process_world_bank_csv(fdi_files[0], 'FDI', COUNTRIES)
    if df_fdi is not None:
        all_data.append(df_fdi)
else:
    print("\n‚ö†Ô∏è Kh√¥ng t√¨m th·∫•y file FDI")

# GDP Growth
gdp_files = [f for f in csv_files if 'GDP' in f or 'NY.GDP.MKTP.KD.ZG' in f]
if gdp_files:
    df_gdp = process_world_bank_csv(gdp_files[0], 'GDP_Growth', COUNTRIES)
    if df_gdp is not None:
        all_data.append(df_gdp)
else:
    print("\n‚ö†Ô∏è Kh√¥ng t√¨m th·∫•y file GDP Growth")

# Exports
export_files = [f for f in csv_files if 'Export' in f or 'NE.EXP' in f]
if export_files:
    df_export = process_world_bank_csv(export_files[0], 'Exports_pct_GDP', COUNTRIES)
    if df_export is not None:
        all_data.append(df_export)
else:
    print("\n‚ö†Ô∏è Kh√¥ng t√¨m th·∫•y file Exports")

# Imports
import_files = [f for f in csv_files if 'Import' in f or 'NE.IMP' in f]
if import_files:
    df_import = process_world_bank_csv(import_files[0], 'Imports_pct_GDP', COUNTRIES)
    if df_import is not None:
        all_data.append(df_import)
else:
    print("\n‚ö†Ô∏è Kh√¥ng t√¨m th·∫•y file Imports")

print(f"\n‚Üí H·ª£p nh·∫•t {len(all_data)} indicator...")
if len(all_data) > 0:
    df_combined = pd.concat(all_data, ignore_index=True)
    print(f"‚úì H·ª£p nh·∫•t: {len(df_combined)} records")
else:
    print("‚ùå Kh√¥ng c√≥ d·ªØ li·ªáu ƒë·ªÉ x·ª≠ l√Ω!")


PH·∫¶N 2: L·ªåC V√Ä H·ª¢P NH·∫§T D·ªÆ LI·ªÜU

‚Üí ƒêang x·ª≠ l√Ω file: API_BX.KLT.DINV.CD.WD_DS2_en_csv_v2_2672.csv
  ‚úì File loaded: 266 d√≤ng
  ‚úì Filtered: 9 qu·ªëc gia ASEAN
  ‚úì Processed: 225 data points

‚Üí ƒêang x·ª≠ l√Ω file: API_NY.GDP.MKTP.KD.ZG_DS2_en_csv_v2_2509.csv
  ‚úì File loaded: 266 d√≤ng
  ‚úì Filtered: 9 qu·ªëc gia ASEAN
  ‚úì Processed: 225 data points

‚Üí ƒêang x·ª≠ l√Ω file: API_NE.EXP.GNFS.ZS_DS2_en_csv_v2_4843.csv
  ‚úì File loaded: 266 d√≤ng
  ‚úì Filtered: 9 qu·ªëc gia ASEAN
  ‚úì Processed: 216 data points

‚Üí ƒêang x·ª≠ l√Ω file: API_NE.IMP.GNFS.ZS_DS2_en_csv_v2_2642.csv
  ‚úì File loaded: 266 d√≤ng
  ‚úì Filtered: 9 qu·ªëc gia ASEAN
  ‚úì Processed: 216 data points

‚Üí H·ª£p nh·∫•t 4 indicator...
‚úì H·ª£p nh·∫•t: 882 records


## 6. Chuy·ªÉn ƒë·ªïi d·∫°ng Wide Format

In [25]:
print("\n" + "="*90)
print("PH·∫¶N 3: CHUY·ªÇN ƒê·ªîI D·∫†NG D·ªÆ LI·ªÜU")
print("="*90)

print("\n‚Üí Chuy·ªÉn ƒë·ªïi t·ª´ long format sang wide format...")

df_pivot = df_combined.pivot_table(
    index=['Country', 'Country_Code', 'Year'],
    columns='Indicator',
    values='Value',
    aggfunc='first'
).reset_index()

print(f"‚úì D·∫°ng m·ªõi: {df_pivot.shape[0]} d√≤ng √ó {df_pivot.shape[1]} c·ªôt")
print(f"‚úì C·ªôt: {list(df_pivot.columns)}")


PH·∫¶N 3: CHUY·ªÇN ƒê·ªîI D·∫†NG D·ªÆ LI·ªÜU

‚Üí Chuy·ªÉn ƒë·ªïi t·ª´ long format sang wide format...
‚úì D·∫°ng m·ªõi: 225 d√≤ng √ó 7 c·ªôt
‚úì C·ªôt: ['Country', 'Country_Code', 'Year', 'Exports_pct_GDP', 'FDI', 'GDP_Growth', 'Imports_pct_GDP']


## 7. L√†m s·∫°ch d·ªØ li·ªáu

In [26]:
print("\n" + "="*90)
print("PH·∫¶N 4: L√ÄM S·∫†CH D·ªÆ LI·ªÜU")
print("="*90)

print("\n‚Üí Ki·ªÉm tra Missing Values:")
print(df_pivot.isnull().sum())

print("\n‚Üí X·ª≠ l√Ω Missing Values:")

for col in ['FDI', 'GDP_Growth', 'Exports_pct_GDP', 'Imports_pct_GDP']:
    if col in df_pivot.columns:
        missing_count = df_pivot[col].isnull().sum()
        if missing_count > 0:
            print(f"   {col}: {missing_count} missing values")
            df_pivot[col] = df_pivot.groupby('Country')[col].transform(
                lambda x: x.interpolate(method='linear', limit_direction='both')
            )
            print(f"   ‚úì ƒê√£ x·ª≠ l√Ω b·∫±ng linear interpolation")
        else:
            print(f"   ‚úì {col}: Ho√†n ch·ªânh (kh√¥ng missing values)")


PH·∫¶N 4: L√ÄM S·∫†CH D·ªÆ LI·ªÜU

‚Üí Ki·ªÉm tra Missing Values:
Indicator
Country            0
Country_Code       0
Year               0
Exports_pct_GDP    9
FDI                0
GDP_Growth         0
Imports_pct_GDP    9
dtype: int64

‚Üí X·ª≠ l√Ω Missing Values:
   ‚úì FDI: Ho√†n ch·ªânh (kh√¥ng missing values)
   ‚úì GDP_Growth: Ho√†n ch·ªânh (kh√¥ng missing values)
   Exports_pct_GDP: 9 missing values
   ‚úì ƒê√£ x·ª≠ l√Ω b·∫±ng linear interpolation
   Imports_pct_GDP: 9 missing values
   ‚úì ƒê√£ x·ª≠ l√Ω b·∫±ng linear interpolation


## 8. T√≠nh bi·∫øn ph·ª•

In [27]:
print("\n" + "="*90)
print("PH·∫¶N 5: T·∫†O BI·∫æN T√çNH TO√ÅN")
print("="*90)

print("\n‚Üí T√≠nh Trade Openness (Exports + Imports) / GDP * 100...")

if 'Exports_pct_GDP' in df_pivot.columns and 'Imports_pct_GDP' in df_pivot.columns:
    df_pivot['Trade_Openness'] = df_pivot['Exports_pct_GDP'] + df_pivot['Imports_pct_GDP']
    print(f"   ‚úì Trade_Openness = Exports + Imports")
    
    vietnam_2000 = df_pivot[(df_pivot['Country']=='Vietnam') & (df_pivot['Year']==2000)]['Trade_Openness'].values
    if len(vietnam_2000) > 0:
        print(f"   ‚úì V√≠ d·ª• Vietnam 2000: {vietnam_2000[0]:.2f}%")


PH·∫¶N 5: T·∫†O BI·∫æN T√çNH TO√ÅN

‚Üí T√≠nh Trade Openness (Exports + Imports) / GDP * 100...
   ‚úì Trade_Openness = Exports + Imports


## 9. Ki·ªÉm tra ch·∫•t l∆∞·ª£ng d·ªØ li·ªáu

In [28]:
print("\n" + "="*90)
print("PH·∫¶N 6: KI·ªÇM TRA CH·∫§T L∆Ø·ª¢NG D·ªÆ LI·ªÜU")
print("="*90)

print("\n‚Üí Th·ªëng k√™ m√¥ t·∫£:")
print(df_pivot[['FDI', 'GDP_Growth', 'Trade_Openness']].describe().round(3))

print("\n‚Üí D·ªØ li·ªáu theo qu·ªëc gia:")
for country in sorted(df_pivot['Country'].unique()):
    df_country = df_pivot[df_pivot['Country'] == country]
    print(f"\n   {country}:")
    print(f"      - S·ªë nƒÉm: {len(df_country)}")
    print(f"      - NƒÉm: {df_country['Year'].min()}-{df_country['Year'].max()}")
    print(f"      - FDI: {df_country['FDI'].min():.2f} ‚Üí {df_country['FDI'].max():.2f} t·ª∑ USD")
    print(f"      - GDP Growth: {df_country['GDP_Growth'].min():.2f}% ‚Üí {df_country['GDP_Growth'].max():.2f}%")


PH·∫¶N 6: KI·ªÇM TRA CH·∫§T L∆Ø·ª¢NG D·ªÆ LI·ªÜU

‚Üí Th·ªëng k√™ m√¥ t·∫£:
Indicator           FDI  GDP_Growth  Trade_Openness
count      2.250000e+02     225.000         225.000
mean       1.229850e+10       4.781         134.590
std        2.442036e+10       3.301          88.715
min       -4.550355e+09      -9.518          32.972
25%        7.756420e+08       3.555          75.092
50%        4.376053e+09       5.294         117.253
75%        1.260000e+10       6.899         142.721
max        1.519410e+11      14.520         437.327

‚Üí D·ªØ li·ªáu theo qu·ªëc gia:

   Brunei Darussalam:
      - S·ªë nƒÉm: 25
      - NƒÉm: 2000-2024
      - FDI: -290169984.90 ‚Üí 864905530.00 t·ª∑ USD
      - GDP Growth: -3.90% ‚Üí 4.20%

   Cambodia:
      - S·ªë nƒÉm: 25
      - NƒÉm: 2000-2024
      - FDI: 81580650.56 ‚Üí 4394647334.00 t·ª∑ USD
      - GDP Growth: -3.56% ‚Üí 13.30%

   Indonesia:
      - S·ªë nƒÉm: 25
      - NƒÉm: 2000-2024
      - FDI: -4550355286.00 ‚Üí 25120732060.00 t·ª∑

## 10. S·∫Øp x·∫øp v√† ƒë·ªãnh d·∫°ng cu·ªëi c√πng

In [29]:
print("\n" + "="*90)
print("PH·∫¶N 7: S·∫ÆP X·∫æP V√Ä ƒê·ªäNH D·∫†NG CU·ªêI C√ôNG")
print("="*90)

# S·∫Øp x·∫øp theo Country v√† Year
df_final = df_pivot.sort_values(['Country', 'Year']).reset_index(drop=True)

# Ch·ªçn c√°c c·ªôt c·∫ßn thi·∫øt
columns_final = ['Country', 'Country_Code', 'Year', 'FDI', 'GDP_Growth', 
                 'Exports_pct_GDP', 'Imports_pct_GDP', 'Trade_Openness']

columns_final = [col for col in columns_final if col in df_final.columns]
df_final = df_final[columns_final]

print(f"\n‚Üí ƒê·ªãnh d·∫°ng cu·ªëi c√πng:")
print(f"   - S·ªë d√≤ng: {len(df_final)}")
print(f"   - S·ªë c·ªôt: {len(df_final.columns)}")
print(f"   - C·ªôt: {list(df_final.columns)}")

print("\n‚Üí 10 d√≤ng ƒë·∫ßu:")
print(df_final.head(10))

print("\n‚Üí 10 d√≤ng cu·ªëi:")
print(df_final.tail(10))


PH·∫¶N 7: S·∫ÆP X·∫æP V√Ä ƒê·ªäNH D·∫†NG CU·ªêI C√ôNG

‚Üí ƒê·ªãnh d·∫°ng cu·ªëi c√πng:
   - S·ªë d√≤ng: 225
   - S·ªë c·ªôt: 8
   - C·ªôt: ['Country', 'Country_Code', 'Year', 'FDI', 'GDP_Growth', 'Exports_pct_GDP', 'Imports_pct_GDP', 'Trade_Openness']

‚Üí 10 d√≤ng ƒë·∫ßu:
Indicator            Country Country_Code  Year           FDI  GDP_Growth  \
0          Brunei Darussalam          BRN  2000  5.496072e+08    3.474676   
1          Brunei Darussalam          BRN  2001  6.069464e+07    1.466061   
2          Brunei Darussalam          BRN  2002  2.296720e+08    3.961439   
3          Brunei Darussalam          BRN  2003  1.238209e+08    3.583429   
4          Brunei Darussalam          BRN  2004  1.132059e+08    0.104538   
5          Brunei Darussalam          BRN  2005  1.750685e+08   -0.003925   
6          Brunei Darussalam          BRN  2006  8.783913e+07    4.098413   
7          Brunei Darussalam          BRN  2007  2.576357e+08   -3.763623   
8          Brunei Darussalam   

## 11. L∆∞u file

In [30]:
print("\n" + "="*90)
print("PH·∫¶N 8: L∆ØU FILE")
print("="*90)

# T·∫°o t√™n file v·ªõi timestamp
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
output_filename = f"ASEAN_FDI_GDP_Data_Processed_{timestamp}.csv"

# L∆∞u file
df_final.to_csv(f'../01_Data/02_Processed/{output_filename}', index=False)
print(f"\n‚úì File ƒë√£ l∆∞u: {output_filename}")

# L∆∞u th√™m file kh√¥ng c√≥ timestamp
df_final.to_csv('../01_Data/02_Processed/ASEAN_FDI_GDP_Data_Final.csv', index=False)
print(f"‚úì File ƒë√£ l∆∞u: ASEAN_FDI_GDP_Data_Final.csv")


PH·∫¶N 8: L∆ØU FILE

‚úì File ƒë√£ l∆∞u: ASEAN_FDI_GDP_Data_Processed_20251210_145138.csv
‚úì File ƒë√£ l∆∞u: ASEAN_FDI_GDP_Data_Final.csv

‚úì File ƒë√£ l∆∞u: ASEAN_FDI_GDP_Data_Processed_20251210_145138.csv
‚úì File ƒë√£ l∆∞u: ASEAN_FDI_GDP_Data_Final.csv


## 12. Ki·ªÉm ch·ª©ng d·ªØ li·ªáu

In [31]:
print("\n" + "="*90)
print("PH·∫¶N 9: KI·ªÇM CH·ª®NG D·ªÆ LI·ªÜU")
print("="*90)

print("\n‚Üí Ki·ªÉm tra Vietnam 2000:")
vietnam_2000 = df_final[(df_final['Country'] == 'Vietnam') & (df_final['Year'] == 2000)]
if len(vietnam_2000) > 0:
    print(vietnam_2000.to_string(index=False))
    fdi_value = vietnam_2000['FDI'].values[0]
    print(f"\n   FDI: {fdi_value:.3f} t·ª∑ USD = {fdi_value * 1e9:,.0f} USD ‚úì")
else:
    print("   ‚ùå Kh√¥ng t√¨m th·∫•y d·ªØ li·ªáu Vietnam 2000")


PH·∫¶N 9: KI·ªÇM CH·ª®NG D·ªÆ LI·ªÜU

‚Üí Ki·ªÉm tra Vietnam 2000:
   ‚ùå Kh√¥ng t√¨m th·∫•y d·ªØ li·ªáu Vietnam 2000


## 13. Th·ªëng k√™ t∆∞∆°ng quan

In [32]:
print("\n" + "="*90)
print("PH·∫¶N 10: TH·ªêNG K√ä T∆Ø∆†NG QUAN")
print("="*90)

print("\n‚Üí T∆∞∆°ng quan v·ªõi Vietnam:")
df_vietnam = df_final[df_final['Country'] == 'Vietnam']
if len(df_vietnam) > 0:
    corr = df_vietnam[['FDI', 'GDP_Growth', 'Trade_Openness']].corr()
    print(corr.round(3))
else:
    print("   ‚ùå Kh√¥ng t√¨m th·∫•y d·ªØ li·ªáu Vietnam")


PH·∫¶N 10: TH·ªêNG K√ä T∆Ø∆†NG QUAN

‚Üí T∆∞∆°ng quan v·ªõi Vietnam:
   ‚ùå Kh√¥ng t√¨m th·∫•y d·ªØ li·ªáu Vietnam


## üéâ HO√ÄN TH√ÄNH!

‚úÖ **Preprocessing xong!**

üìÑ **File output:**
- `ASEAN_FDI_GDP_Data_Final.csv` - File ch√≠nh (150 quan s√°t, 8 c·ªôt)

**B∆∞·ªõc ti·∫øp theo:**
1. M·ªü notebook `02_Model_LinearRegression.ipynb` ƒë·ªÉ ch·∫°y Linear Regression
2. Ho·∫∑c ch·∫°y c√°c model kh√°c trong th∆∞ m·ª•c `02_Scripts/`
