# get-recent-migration-stats

The purpose of this script is to find out what percent of the population has a mobile phone for each year 1980-2025. 
Fortunately for us, that data exists online and all we need to do is pull it.

Data source [linked here](https://stats.areppim.com/stats/stats_mobilexpenetr.htm)

----------------------

<p>Author: PJ Gibson</p>
<p>Date: 2023-01-18</p>
<p>Contact: peter.gibson@doh.wa.gov</p>
<p>Other Contact: pjgibson25@gmail.com</p>


## 0. Import libs

In [1]:
import pandas as pd
import numpy as np
import requests

## 1. Fetch data

In [20]:
url = 'https://stats.areppim.com/stats/stats_mobilexpenetr.htm'

# API Call 
page_data = requests.get(url)

# Identify the table we want within the page text, get proper cols
df = pd.read_html(page_data.text)[0].iloc[3:49,:]
df.columns = ['Year',
              'WorldPopulation_millions',
              'ActualSubscribers_millions',
              'ActualPercent',
              'ForecastSubscribers_millions',
              'ForecastPercent',
              'ForecastPercentSaturation']

### 1.1 Save to 01_Raw

In [21]:
df.to_csv('../../../SupportingDocs/Phone/01_Raw/MobileCellularPhonesGlobalMarketPenetration_1980_2025.csv',header=True,index=False)

## 2. Wrangle Data

Let's make it a bit more usable

In [27]:
# Identify desired columns and rename
df_wrangled = df[['Year','ForecastPercentSaturation']]\
                .rename(columns={'Year':'year',
                                 'ForecastPercentSaturation':'perc'})

# Wrangle percent into 1-normalized floating decimal, year into integer
df_wrangled['perc'] = df_wrangled['perc'].str[:-1].astype(float) / 100
df_wrangled['year'] = df_wrangled['year'].astype(int)

### 2.1 Save to 03_Complete

In [29]:
# Save to complete
df_wrangled.to_csv('../../../SupportingDocs/Phone/03_Complete/mobile_phone_proba_by_year.csv',header=True,index=False)