## Base Dimension Tables

Base Dimension tables are different because they come from the source data.

  - `i94cntyl` is the source of `dim_i94_citres_codes`.
  - `i94prtl` is the source of `dim_i94_port`, and it needs some preprocessing.
  - `i94addrl` is the complete list of States we need to support in our `dim_geography`.
  - `i94mode` is `dim_i94_mode`.



In [2]:
from immigration_lib.i94labels import i94cntyl, i94prtl, i94addrl, i94mode
import pandas as pd

dim_i94_citres_codes = pd.DataFrame(list(i94cntyl.items()), columns = ['id', 'name'])
dim_i94_citres_codes.sort_values(by=['id'], inplace=True)

display(dim_i94_citres_codes.iloc[:3])

Unnamed: 0,id,name
255,0,INVALID: STATELESS
266,54,No Country Code (54)
267,100,No Country Code (100)


To obtain the cities we use [World Cities Database Free Edition](http://www.geodatasource.com/world-cities-database/free). We need to obtain all cities for all states in `i94addrl`. 

In [3]:
i94addrl

{'AL': 'ALABAMA',
 'AK': 'ALASKA',
 'AZ': 'ARIZONA',
 'AR': 'ARKANSAS',
 'CA': 'CALIFORNIA',
 'CO': 'COLORADO',
 'CT': 'CONNECTICUT',
 'DE': 'DELAWARE',
 'DC': 'DIST. OF COLUMBIA',
 'FL': 'FLORIDA',
 'GA': 'GEORGIA',
 'GU': 'GUAM',
 'HI': 'HAWAII',
 'ID': 'IDAHO',
 'IL': 'ILLINOIS',
 'IN': 'INDIANA',
 'IA': 'IOWA',
 'KS': 'KANSAS',
 'KY': 'KENTUCKY',
 'LA': 'LOUISIANA',
 'ME': 'MAINE',
 'MD': 'MARYLAND',
 'MA': 'MASSACHUSETTS',
 'MI': 'MICHIGAN',
 'MN': 'MINNESOTA',
 'MS': 'MISSISSIPPI',
 'MO': 'MISSOURI',
 'MT': 'MONTANA',
 'NC': 'N. CAROLINA',
 'ND': 'N. DAKOTA',
 'NE': 'NEBRASKA',
 'NV': 'NEVADA',
 'NH': 'NEW HAMPSHIRE',
 'NJ': 'NEW JERSEY',
 'NM': 'NEW MEXICO',
 'NY': 'NEW YORK',
 'OH': 'OHIO',
 'OK': 'OKLAHOMA',
 'OR': 'OREGON',
 'PA': 'PENNSYLVANIA',
 'PR': 'PUERTO RICO',
 'RI': 'RHODE ISLAND',
 'SC': 'S. CAROLINA',
 'SD': 'S. DAKOTA',
 'TN': 'TENNESSEE',
 'TX': 'TEXAS',
 'UT': 'UTAH',
 'VT': 'VERMONT',
 'VI': 'VIRGIN ISLANDS',
 'VA': 'VIRGINIA',
 'WV': 'W. VIRGINIA',
 'WA': 'WAS

In [5]:
dim_i94_mode = pd.DataFrame(list(i94mode.items()), columns = ['id', 'name'])
dim_i94_mode.sort_values(by=['id'], inplace=True)

display(dim_i94_mode)

Unnamed: 0,id,name
0,1,Air
1,2,Sea
2,3,Land
3,9,Not reported


In [6]:
import requests
import os

API_TOKEN = os.environ['FREE_REST_API_FOR_COUNTRIES']

def get_auth_token():
    headers = {
        'Accept': 'application/json',
        'api-token': API_TOKEN,
        'user-email': 'claudio.rdgz@gmail.com'
    }
    r = requests.get("https://www.universal-tutorial.com/api/getaccesstoken", headers=headers)
    return r.json()['auth_token']

session_token = get_auth_token()

### Use Session Token to Fetch all Cities per State



In [49]:
import json

def get_states(token, country):
    dataset_name = country.replace(" ", '_')
    file_name = f"{dataset_name}_states.json"
    full_path_to_file = f"./geodata/{file_name}"
    if not os.path.exists(full_path_to_file):        
        headers = {
            'Accept': 'application/json',
            'Authorization': f'Bearer {token}'
        }
        r = requests.get(f"https://www.universal-tutorial.com/api/states/{country}", headers=headers)
        data = r.json()
        with open(full_path_to_file, 'w') as f:
            json.dump(data, f)
        return data
    else:
        with open(full_path_to_file) as json_file:
            states_data = json.load(json_file)
            return states_data

state_data = get_states(session_token, "United States")
puerto_rico_state_data = get_states(session_token, "Puerto Rico")
virgin_islands_state_data = get_states(session_token, "Virgin Islands (US)")

process_states = lambda states_from_api: list(map(lambda x: x['state_name'], states_from_api)) 

us_states = process_states(state_data)
pr_states = process_states(puerto_rico_state_data)
vi_states = process_states(virgin_islands_state_data)

In [53]:
def get_cities(token, country_name, states):
    if not os.path.exists(f"./geodata/{country_name}"):
        os.makedirs(f"./geodata/{country_name}")
    
    state_cities = {}
    for state_name in states:   
        file_name = f"{state_name}_cities.json"
        full_path_to_file = f"./geodata/{country_name}/{file_name}"
        if not os.path.exists(full_path_to_file):        
            headers = {
                'Accept': 'application/json',
                'Authorization': f'Bearer {token}'
            }
            r = requests.get(f"https://www.universal-tutorial.com/api/cities/{state_name}", headers=headers)
            data = r.json()
            with open(full_path_to_file, 'w') as f:
                json.dump(data, f)
            state_cities[state_name] = data
        else:
            with open(full_path_to_file) as json_file:
                city_data = json.load(json_file)
                state_cities[state_name] = city_data
    return state_cities

us_cities = get_cities(session_token, "United_States", us_states)
pr_cities = get_cities(session_token, "Puerto_Rico", pr_states)
vi_cities = get_cities(session_token, "Virgin_Islands", vi_states)

In [55]:
us_states

['Alabama',
 'Alaska',
 'Arizona',
 'Arkansas',
 'California',
 'Colorado',
 'Connecticut',
 'Delaware',
 'District of Columbia',
 'Florida',
 'Georgia',
 'Hawaii',
 'Idaho',
 'Illinois',
 'Indiana',
 'Iowa',
 'Kansas',
 'Kentucky',
 'Louisiana',
 'Maine',
 'Maryland',
 'Massachusetts',
 'Michigan',
 'Minnesota',
 'Mississippi',
 'Missouri',
 'Montana',
 'Nebraska',
 'Nevada',
 'New Hampshire',
 'New Jersey',
 'New Mexico',
 'New York',
 'North Carolina',
 'North Dakota',
 'Ohio',
 'Oklahoma',
 'Ontario',
 'Oregon',
 'Pennsylvania',
 'Ramey',
 'Rhode Island',
 'South Carolina',
 'South Dakota',
 'Sublimity',
 'Tennessee',
 'Texas',
 'Trimble',
 'Utah',
 'Vermont',
 'Virginia',
 'Washington',
 'West Virginia',
 'Wisconsin',
 'Wyoming']

In [6]:
for k, v in i94prtl.items():
    print(k, v)

ALC ALCAN, AK             
ANC ANCHORAGE, AK         
BAR BAKER AAF - BAKER ISLAND, AK
DAC DALTONS CACHE, AK     
PIZ DEW STATION PT LAY DEW, AK
DTH DUTCH HARBOR, AK      
EGL EAGLE, AK             
FRB FAIRBANKS, AK         
HOM HOMER, AK             
HYD HYDER, AK             
JUN JUNEAU, AK            
5KE KETCHIKAN, AK
KET KETCHIKAN, AK         
MOS MOSES POINT INTERMEDIATE, AK
NIK NIKISKI, AK           
NOM NOM, AK               
PKC POKER CREEK, AK       
ORI PORT LIONS SPB, AK
SKA SKAGWAY, AK           
SNP ST. PAUL ISLAND, AK
TKI TOKEEN, AK
WRA WRANGELL, AK          
HSV MADISON COUNTY - HUNTSVILLE, AL
MOB MOBILE, AL            
LIA LITTLE ROCK, AR (BPS)
ROG ROGERS ARPT, AR
DOU DOUGLAS, AZ           
LUK LUKEVILLE, AZ         
MAP MARIPOSA AZ           
NAC NACO, AZ              
NOG NOGALES, AZ           
PHO PHOENIX, AZ           
POR PORTAL, AZ
SLU SAN LUIS, AZ          
SAS SASABE, AZ            
TUC TUCSON, AZ            
YUI YUMA, AZ              
AND ANDRADE, CA         