# How the World Got Online: Internet Usage Evolution (2000-2023)

**_The Rise of Internet Adoption: Trends Across Countries_**

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
data = pd.read_csv('internet_usage.csv')
data.head(10)

## 📂 Initial Exploration
- Have a gist of the dataset `df.head()`
- check for missing values `df.isnull().sum()`
- Get column names and data types `df.info()`

In [None]:
data.info()
data.describe()

## 🔍 Our objectives
- Let's analyze the growth of internet usage globally
- Let's find countries with the fastest and slowest internet adoptation

`internet_usage.csv` is in a wide format with 217 row entries and 26 columns. 

### Data Cleaning
There are values `".."` that is likely a placeholders for missing data. This are treated as `strings` and prevent proper numerical analysis in `.describe()`.

We will use `forward/backward` fill or `interpolation` method for the missing data since the dataset is a time-series or continuous data.

In [2]:
# Data Cleaning
new_data = data.replace("..", np.nan)
missing_percentage = new_data.isnull().sum() #/ len(new_data) * 100
# print(missing_percentage)

original_data = new_data.copy() # copied data for before & after cleaning

# Convert year columns (2000-2023) to numeric
year_cols = new_data.columns[2:]  # Selects only year columns
new_data[year_cols] = new_data[year_cols].apply(pd.to_numeric, errors='coerce')
new_data[year_cols] = new_data[year_cols].fillna(method='ffill')
new_data[year_cols] = new_data[year_cols].interpolate(method='linear', axis=1)
new_data[year_cols] = new_data[year_cols].fillna(method='bfill')

# # Convert country columns to string (optional, but keeps consistency)
new_data["Country Name"] = new_data["Country Name"].astype(str)
new_data["Country Code"] = new_data["Country Code"].astype(str)
new_data.isna().sum().sum()  # Should return 0 if everything is filled


  new_data[year_cols] = new_data[year_cols].fillna(method='ffill')
  new_data[year_cols] = new_data[year_cols].fillna(method='bfill')


np.int64(0)