Moving towards Merging Data: After retrieving and including the ORES data for each article - merge the wikipedia data and population data together. Both files have fields containing state names for just that purpose. The combined dataset also requires labeling each state with its US Census regional-division. The spreadsheet listing the states in each regional division represents the regions, divisions and states hierarchically. You will need to read this data file and merge it into the resulting dataset.

In [1]:
# Import necessary libraries
import pandas as pd

In [3]:
## Reading the 'ores_predictions.csv' file
ores_df = pd.read_csv('/content/cleaned_data.csv')

## Reading the 'us_cities_by_state_SEPT.2023.csv' file
cities_df = pd.read_csv('/content/us_cities_by_state_SEPT.2023.csv')

## Reading the 'US States by Region - US Census Bureau.xlsx' file
regions_df = pd.read_excel('/content/US States by Region - US Census Bureau.xlsx')

# Read the NST-EST2022-POP.xlsx file
population_df = pd.read_excel('/content/NST-EST2022-POP.xlsx', header=[2,3])

ores_df

Unnamed: 0,title,rev_id,prediction
0,"Abbeville, Alabama",1171163550,C
1,"Adamsville, Alabama",1177621427,C
2,"Addison, Alabama",1168359898,C
3,"Akron, Alabama",1165909508,GA
4,"Abbeville, Alabama",1171163550,C
...,...,...,...
22119,"Wamsutter, Wyoming",1169591845,GA
22120,"Wheatland, Wyoming",1176370621,GA
22121,"Worland, Wyoming",1166347917,GA
22122,"Wright, Wyoming",1166334449,GA


In [4]:
# Removing duplicates from the given df
# Assuming 'common_key_column' is the column based on which you want to identify unique rows
ores_df.drop_duplicates(subset='title', keep='first', inplace=True)
cities_df.drop_duplicates(subset='page_title', keep='first', inplace=True)

ores_df

Unnamed: 0,title,rev_id,prediction
0,"Abbeville, Alabama",1171163550,C
1,"Adamsville, Alabama",1177621427,C
2,"Addison, Alabama",1168359898,C
3,"Akron, Alabama",1165909508,GA
8,"Alabaster, Alabama",1179139816,C
...,...,...,...
22119,"Wamsutter, Wyoming",1169591845,GA
22120,"Wheatland, Wyoming",1176370621,GA
22121,"Worland, Wyoming",1166347917,GA
22122,"Wright, Wyoming",1166334449,GA


In [5]:
# Now we extract the 'State' column from the 'us_cities_by_state_SEPT.2023.csv' dataset.
# Next we merge the 'ores_df' and 'cities_df' dataframes to combine relevant information from both datasets.

## Extracting the 'State' column from the cities dataframe
cities_df = cities_df[['page_title', 'state']]

## Merging the 'ores_df' and 'cities_df' on the 'Title' and 'page_title' columns
merged_df = pd.merge(ores_df, cities_df, left_on='title', right_on='page_title', how='left')

merged_df

Unnamed: 0,title,rev_id,prediction,page_title,state
0,"Abbeville, Alabama",1171163550,C,"Abbeville, Alabama",Alabama
1,"Adamsville, Alabama",1177621427,C,"Adamsville, Alabama",Alabama
2,"Addison, Alabama",1168359898,C,"Addison, Alabama",Alabama
3,"Akron, Alabama",1165909508,GA,"Akron, Alabama",Alabama
4,"Alabaster, Alabama",1179139816,C,"Alabaster, Alabama",Alabama
...,...,...,...,...,...
21500,"Wamsutter, Wyoming",1169591845,GA,"Wamsutter, Wyoming",Wyoming
21501,"Wheatland, Wyoming",1176370621,GA,"Wheatland, Wyoming",Wyoming
21502,"Worland, Wyoming",1166347917,GA,"Worland, Wyoming",Wyoming
21503,"Wright, Wyoming",1166334449,GA,"Wright, Wyoming",Wyoming


Merging the dataframes: HWe merge the combined dataframe along with the 'regions_df' dataframe based on the 'State' column. The resultant table will be a combines view of US states, their regions, and corresponding Wikipedia article details.

In [12]:
## Merging the 'merged_df' and 'regions_df' on the 'State' column
# Preprocessing the 'final_df' dataframe for standardizing state values
final_df=pd.DataFrame()
final_df['state'] = merged_df['state'].str.strip().str.lower()

# Filling the NaN values in the 'REGION' and 'DIVISION' columns with the last valid observation
regions_df['REGION'] = regions_df['REGION'].fillna(method='ffill')
regions_df['DIVISION'] = regions_df['DIVISION'].fillna(method='ffill')

# Preprocessing the 'regions_df' dataframe for standardizing state values and formatting the divisions
regions_df['STATE'] = regions_df['STATE'].str.strip().str.lower()
regions_df['DIVISION'] = regions_df['DIVISION'].apply(lambda x: x.strip() if isinstance(x, str) else x)


regions_df

Unnamed: 0,REGION,DIVISION,STATE
0,Northeast,,
1,Northeast,New England,
2,Northeast,New England,connecticut
3,Northeast,New England,maine
4,Northeast,New England,massachusetts
...,...,...,...
58,West,Pacific,alaska
59,West,Pacific,california
60,West,Pacific,hawaii
61,West,Pacific,oregon


In [13]:
merged_df['state'] = merged_df['state'].str.lower()
regions_df['STATE'] = regions_df['STATE'].str.lower()

# Merging the 'final_df' and 'regions_df' on the 'state' column
final_df = pd.merge(merged_df, regions_df, left_on='state', right_on='STATE', how='left')

final_df

Unnamed: 0,title,rev_id,prediction,page_title,state,REGION,DIVISION,STATE
0,"Abbeville, Alabama",1171163550,C,"Abbeville, Alabama",alabama,South,East South Central,alabama
1,"Adamsville, Alabama",1177621427,C,"Adamsville, Alabama",alabama,South,East South Central,alabama
2,"Addison, Alabama",1168359898,C,"Addison, Alabama",alabama,South,East South Central,alabama
3,"Akron, Alabama",1165909508,GA,"Akron, Alabama",alabama,South,East South Central,alabama
4,"Alabaster, Alabama",1179139816,C,"Alabaster, Alabama",alabama,South,East South Central,alabama
...,...,...,...,...,...,...,...,...
21500,"Wamsutter, Wyoming",1169591845,GA,"Wamsutter, Wyoming",wyoming,West,Mountain,wyoming
21501,"Wheatland, Wyoming",1176370621,GA,"Wheatland, Wyoming",wyoming,West,Mountain,wyoming
21502,"Worland, Wyoming",1166347917,GA,"Worland, Wyoming",wyoming,West,Mountain,wyoming
21503,"Wright, Wyoming",1166334449,GA,"Wright, Wyoming",wyoming,West,Mountain,wyoming


In [15]:
# Selecting the necessary columns for the final dataset results in the output dataset focused on the essential
#information for further analysis:


final_df = final_df[['state', 'DIVISION',  'title', 'rev_id', 'prediction']]

In [16]:
# Adding a step to rename the columns to improve the readability of the final dataset

final_df.columns = ['state', 'regional_division', 'article_title', 'revision_id', 'article_quality']
final_df

Unnamed: 0,state,regional_division,article_title,revision_id,article_quality
0,alabama,East South Central,"Abbeville, Alabama",1171163550,C
1,alabama,East South Central,"Adamsville, Alabama",1177621427,C
2,alabama,East South Central,"Addison, Alabama",1168359898,C
3,alabama,East South Central,"Akron, Alabama",1165909508,GA
4,alabama,East South Central,"Alabaster, Alabama",1179139816,C
...,...,...,...,...,...
21500,wyoming,Mountain,"Wamsutter, Wyoming",1169591845,GA
21501,wyoming,Mountain,"Wheatland, Wyoming",1176370621,GA
21502,wyoming,Mountain,"Worland, Wyoming",1166347917,GA
21503,wyoming,Mountain,"Wright, Wyoming",1166334449,GA


In [17]:
# Saving the resulting data to a CSV file

final_df.to_csv('/content/resulting_data.csv', index=False)


# Display the first few rows of the final merged dataset
final_df.head()

Unnamed: 0,state,regional_division,article_title,revision_id,article_quality
0,alabama,East South Central,"Abbeville, Alabama",1171163550,C
1,alabama,East South Central,"Adamsville, Alabama",1177621427,C
2,alabama,East South Central,"Addison, Alabama",1168359898,C
3,alabama,East South Central,"Akron, Alabama",1165909508,GA
4,alabama,East South Central,"Alabaster, Alabama",1179139816,C


In [18]:
# Now we merge the NST-EST2022-POP.xlsx dataset along with the regional
#information to colect the final population information

# Remove any leading dots in the 'Geographic Area' column to match the 'state' column
population_df.columns = ['Geographic Area', 'April 1, 2020', '2020', '2021', '2022']

population_df = population_df[['Geographic Area', '2022']]
population_df = population_df.rename(columns={'2022': 'Population'})

population_df

Unnamed: 0,Geographic Area,Population
0,Northeast,57040406.0
1,Midwest,68787595.0
2,South,128716192.0
3,West,78743364.0
4,.Alabama,5074296.0
5,.Alaska,733583.0
6,.Arizona,7359197.0
7,.Arkansas,3045637.0
8,.California,39029342.0
9,.Colorado,5839926.0


In [19]:
population_df.loc[:, 'Geographic Area'] = population_df['Geographic Area'].str.replace(r'^\W+', '', regex=True).str.lower()
population_df

Unnamed: 0,Geographic Area,Population
0,northeast,57040406.0
1,midwest,68787595.0
2,south,128716192.0
3,west,78743364.0
4,alabama,5074296.0
5,alaska,733583.0
6,arizona,7359197.0
7,arkansas,3045637.0
8,california,39029342.0
9,colorado,5839926.0


In [20]:
merged_data_df = pd.merge(final_df, population_df, left_on='state', right_on='Geographic Area', how='left')

merged_data_df

Unnamed: 0,state,regional_division,article_title,revision_id,article_quality,Geographic Area,Population
0,alabama,East South Central,"Abbeville, Alabama",1171163550,C,alabama,5074296.0
1,alabama,East South Central,"Adamsville, Alabama",1177621427,C,alabama,5074296.0
2,alabama,East South Central,"Addison, Alabama",1168359898,C,alabama,5074296.0
3,alabama,East South Central,"Akron, Alabama",1165909508,GA,alabama,5074296.0
4,alabama,East South Central,"Alabaster, Alabama",1179139816,C,alabama,5074296.0
...,...,...,...,...,...,...,...
21500,wyoming,Mountain,"Wamsutter, Wyoming",1169591845,GA,wyoming,581381.0
21501,wyoming,Mountain,"Wheatland, Wyoming",1176370621,GA,wyoming,581381.0
21502,wyoming,Mountain,"Worland, Wyoming",1166347917,GA,wyoming,581381.0
21503,wyoming,Mountain,"Wright, Wyoming",1166334449,GA,wyoming,581381.0


In [21]:
# Removing the 'Geographic_Area' column
merged_data_df.drop('Geographic Area', axis=1, inplace=True)

# Changing the name of the 'Population' column to 'population'
merged_data_df.rename(columns={'Population': 'population'}, inplace=True)

# Save the merged DataFrame to a new CSV file
merged_data_df.to_csv('/content/wp_scored_city_articles_by_state.csv', index=False)

Analysis:

The analysis will consist of calculating total-articles-per-population (a ratio representing the number of articles per person)  and high-quality-articles-per-population (a ratio representing the number of high quality articles per person) on a state-by-state and divisional basis. All of these values are “per capita” ratios.
For this analysis you should consider "high quality" articles to be articles that ORES predicted would be in either the "FA" (featured article) or "GA" (good article) classes.


In [23]:
#lets store the merged csv file in a new dataframe

df_consolidated = pd.read_csv('/content/wp_scored_city_articles_by_state.csv')
df_consolidated

Unnamed: 0,state,regional_division,article_title,revision_id,article_quality,population
0,alabama,East South Central,"Abbeville, Alabama",1171163550,C,5074296.0
1,alabama,East South Central,"Adamsville, Alabama",1177621427,C,5074296.0
2,alabama,East South Central,"Addison, Alabama",1168359898,C,5074296.0
3,alabama,East South Central,"Akron, Alabama",1165909508,GA,5074296.0
4,alabama,East South Central,"Alabaster, Alabama",1179139816,C,5074296.0
...,...,...,...,...,...,...
21500,wyoming,Mountain,"Wamsutter, Wyoming",1169591845,GA,581381.0
21501,wyoming,Mountain,"Wheatland, Wyoming",1176370621,GA,581381.0
21502,wyoming,Mountain,"Worland, Wyoming",1166347917,GA,581381.0
21503,wyoming,Mountain,"Wright, Wyoming",1166334449,GA,581381.0


In [25]:
# We perform the following preprocessing steps befor ethe analysis to calculate the articles per capita for each state

# Removing the duplicates for states,
# group the states and aggregate population per regional division by counting number of articles,
# calculate article_per_capita

import pandas as pd
import numpy as np

df1 = df_consolidated[~df_consolidated.duplicated(subset=['state', 'regional_division'], keep = 'last')]

# Calculating the population of each state
state_pop = df1[['state', 'population']].groupby('state').sum().reset_index()
state_article_cnt = df_consolidated[['state', 'article_title']].groupby('state').count().reset_index()
total_articles_state = state_pop.merge(state_article_cnt, on='state')
total_articles_state.columns=['state', 'population', 'article_count']
total_articles_state['article_count'] = total_articles_state['article_count'].astype('int')
total_articles_state['articles_per_capita'] = total_articles_state['article_count'] / (total_articles_state['population'])
total_articles_state['articles_per_capita'] = total_articles_state['articles_per_capita'].astype('float')

# handling for conditions where population is zero (6 states)
total_articles_state = total_articles_state[total_articles_state['articles_per_capita'] != np.inf]
print('On a state level, the dataframe returns the below number of rows')
print(len(total_articles_state['state'].unique()))
total_articles_state.reset_index(inplace=True)
total_articles_state = total_articles_state.drop('index', axis = 1)
total_articles_state.head()

On a state level, the dataframe returns the below number of rows
37


Unnamed: 0,state,population,article_count,articles_per_capita
0,alabama,5074296.0,461,9.1e-05
1,alaska,733583.0,148,0.000202
2,arizona,7359197.0,91,1.2e-05
3,arkansas,3045637.0,500,0.000164
4,california,39029342.0,482,1.2e-05


In [26]:
# Next we analyze the data by regional divisions

division_pop = df_consolidated.drop_duplicates(subset=['state', 'population']).groupby('regional_division')['population'].sum().to_frame().reset_index()

df_pop_division = division_pop
# Resetting the index to make 'regional_division' a column again
# division_pop.reset_index(inplace=True)
division_pop

Unnamed: 0,regional_division,population
0,East North Central,47097779.0
1,East South Central,19578002.0
2,Middle Atlantic,12972008.0
3,Mountain,23400976.0
4,New England,9014378.0
5,Pacific,53229044.0
6,South Atlantic,38111498.0
7,West North Central,18032808.0
8,West South Central,41685250.0


In [27]:
# Repeating the same as above but grouping by regional division in this case
# Calculating the population of each state


division_article_cnt = df_consolidated[['regional_division', 'article_title']].groupby('regional_division').count().reset_index()
total_articles_division = division_pop.merge(division_article_cnt, on='regional_division')
total_articles_division.columns=['regional_division', 'population', 'article_count']
total_articles_division['articles_per_capita'] = total_articles_division['article_count'] / (total_articles_division['population'])

print('On a regional division level, the dataframe returns the below number of rows')
print(len(total_articles_division['regional_division'].unique()))
total_articles_division.head()

On a regional division level, the dataframe returns the below number of rows
9


Unnamed: 0,regional_division,population,article_count,articles_per_capita
0,East North Central,47097779.0,4751,0.000101
1,East South Central,19578002.0,1527,7.8e-05
2,Middle Atlantic,12972008.0,2556,0.000197
3,Mountain,23400976.0,1081,4.6e-05
4,New England,9014378.0,1163,0.000129


In [28]:
# Now we calculate the number of high-quality articles per capita for each state.
# The dataset is filtered to include only articles tagged with
#"FA" (Featured Article) or "GA" (Good Article) in the "article_quality" column.

# Filtering the ahttp://localhost:8888/notebooks/Downloads/data%20512%20hw2.ipynb#By-Staterticle based on the artcile_quality attribute
# Calculation for article_count and article_per_capita done the same as above i.e., group by state

df3 = df_consolidated[~df_consolidated.duplicated(subset=['state', 'regional_division'], keep = 'last')]

state_pop = df3[['state', 'population']].groupby('state').sum().reset_index()
hq_state_df = df_consolidated[(df_consolidated['article_quality'] ==
                                 'FA') | (df_consolidated['article_quality'] == 'GA')]

state_count = hq_state_df[['state', 'article_title']].groupby('state').count().reset_index()
hq_state_df = state_pop.merge(state_count, on='state')
hq_state_df.columns=['state', 'population', 'article_count']
hq_state_df['article_count'] = hq_state_df['article_count'].astype('int')
hq_state_df['articles_per_capita'] = hq_state_df['article_count'] / (hq_state_df['population'])
hq_state_df['articles_per_capita'] = hq_state_df['articles_per_capita'].astype('float')

# Need to exclude conditions where the population of a state is zero
hq_state_df = hq_state_df[hq_state_df['articles_per_capita'] != np.inf]
hq_state_df.reset_index(inplace=True)
hq_state_df.drop(columns=['index'], inplace=True)

print('On a state level, the high quality dataframe returns the below number of rows')
print(len(hq_state_df['state'].unique()))
hq_state_df.head()

On a state level, the high quality dataframe returns the below number of rows
37


Unnamed: 0,state,population,article_count,articles_per_capita
0,alabama,5074296.0,53,1e-05
1,alaska,733583.0,31,4.2e-05
2,arizona,7359197.0,24,3e-06
3,arkansas,3045637.0,72,2.4e-05
4,california,39029342.0,173,4e-06


In [29]:
# by regional division

# Filtering the article based on the artcile_quality attribute
# Calculation for article_count and article_per_capita done the same as above i.e., group by regional division

division_pop = df_pop_division #Using from step 1

hq_division_df = df_consolidated[(df_consolidated['article_quality'] ==
                                 'FA') | (df_consolidated['article_quality'] == 'GA')]
division_count = hq_division_df[['regional_division', 'article_title']].groupby('regional_division').count().reset_index()
hq_division_df = division_pop.merge(division_count, on='regional_division')
hq_division_df.columns=['regional_division', 'population', 'article_count']
hq_division_df['articles_per_capita'] = hq_division_df['article_count'] / (hq_division_df['population'])

print('On a regional division level, the high quality dataframe returns the below number of rows')
print(len(hq_division_df['regional_division'].unique()))
hq_division_df.head()

On a regional division level, the high quality dataframe returns the below number of rows
9


Unnamed: 0,regional_division,population,article_count,articles_per_capita
0,East North Central,47097779.0,717,1.5e-05
1,East South Central,19578002.0,316,1.6e-05
2,Middle Atlantic,12972008.0,566,4.4e-05
3,Mountain,23400976.0,304,1.3e-05
4,New England,9014378.0,150,1.7e-05


Results:

The results from you analysis will be produced in the form of data tables. You are being asked to produce six total tables, that show:

1) Top 10 US states by coverage: The 10 US states with the highest total articles per capita (in descending order) .

2) Bottom 10 US states by coverage: The 10 US states with the lowest total articles per capita (in ascending order) .

3) Top 10 US states by high quality: The 10 US states with the highest high quality articles per capita (in descending order) .

4) Bottom 10 US states by high quality: The 10 US states with the lowest high quality articles per capita (in ascending order).

5) Census divisions by total coverage: A rank ordered list of US census divisions (in descending order) by total articles per capita.

6) Census divisions by high quality coverage: Rank ordered list of US census divisions (in descending order) by high quality articles per capita.


In [43]:
# Top 10 US states by coverage: The 10 US states with the highest total articles per capita (in descending order) .

top10_state = total_articles_state.sort_values(by=['articles_per_capita'],
                                                    ascending=False).head(10).reset_index()
top10_state.index += 1
top10_state['state']


from tabulate import tabulate

table = top10_state[['state']]

# Add a serial number column
table.insert(0, 'Serial No.', range(1, 1 + len(table)))
print(tabulate(table, headers='keys', tablefmt='pretty', showindex=False))

+------------+--------------+
| Serial No. |    state     |
+------------+--------------+
|     1      |   vermont    |
|     2      |    maine     |
|     3      |     iowa     |
|     4      |    alaska    |
|     5      | pennsylvania |
|     6      |   michigan   |
|     7      |   wyoming    |
|     8      |   arkansas   |
|     9      |   missouri   |
|     10     |  minnesota   |
+------------+--------------+


In [48]:
# Bottom 10 US states by coverage: The 10 US states with the lowest total articles per capita (in ascending order) .

bottom10_state = total_articles_state.sort_values(by=['articles_per_capita'],
                                                    ascending=True).head(10).reset_index()
bottom10_state.index += 1
bottom10_state['state']

table = bottom10_state[['state']]

# Add a serial number column
table.insert(0, 'Serial No.', range(1, 1 + len(table)))
print(tabulate(table, headers='keys', tablefmt='pretty', showindex=False))

+------------+------------+
| Serial No. |   state    |
+------------+------------+
|     1      |   nevada   |
|     2      | california |
|     3      |  arizona   |
|     4      |  virginia  |
|     5      |  florida   |
|     6      |  oklahoma  |
|     7      |   kansas   |
|     8      |  maryland  |
|     9      | wisconsin  |
|     10     | washington |
+------------+------------+


In [52]:
# Top 10 US states by high quality: The 10 US states with the highest high quality articles per capita (in descending order) .

top10_hq_state = hq_state_df.sort_values(by=['articles_per_capita'],
                                             ascending=False).head(10).reset_index()
top10_hq_state.index += 1
top10_hq_state['state']

table = top10_hq_state[['state']]

table.insert(0, 'Serial No.', range(1, 1 + len(table)))
print(tabulate(table, headers='keys', tablefmt='pretty', showindex=False))

+------------+--------------+
| Serial No. |    state     |
+------------+--------------+
|     1      |   vermont    |
|     2      |   wyoming    |
|     3      |   montana    |
|     4      | pennsylvania |
|     5      |   missouri   |
|     6      |    alaska    |
|     7      |    oregon    |
|     8      |     iowa     |
|     9      |    maine     |
|     10     |  minnesota   |
+------------+--------------+


In [56]:
# Bottom 10 US states by high quality: The 10 US states with the lowest high quality articles per capita (in ascending order).

bottom10_hq_state = hq_state_df.sort_values(by=['articles_per_capita'],
                                             ascending=True).head(10).reset_index()
bottom10_hq_state.index += 1
bottom10_hq_state['state']

table = bottom10_hq_state[['state']]

table.insert(0, 'Serial No.', range(1, 1 + len(table)))
print(tabulate(table, headers='keys', tablefmt='pretty', showindex=False))


+------------+---------------+
| Serial No. |     state     |
+------------+---------------+
|     1      |   virginia    |
|     2      |    nevada     |
|     3      |    arizona    |
|     4      |  california   |
|     5      |    florida    |
|     6      |   maryland    |
|     7      |    kansas     |
|     8      |   oklahoma    |
|     9      | massachusetts |
|     10     |   louisiana   |
+------------+---------------+


In [57]:
# Census divisions by total coverage: Displays a rank-ordered list of US census divisions by total articles per capita in descending order.

division_coverage = total_articles_division.sort_values(by=['articles_per_capita'],
                                                ascending=False).reset_index()
division_coverage.index += 1
division_coverage['regional_division']

table = division_coverage[['regional_division']]

# Add a serial number column
table.insert(0, 'Serial No.', range(1, 1 + len(table)))
print(tabulate(table, headers='keys', tablefmt='pretty', showindex=False))

+------------+--------------------+
| Serial No. | regional_division  |
+------------+--------------------+
|     1      |  Middle Atlantic   |
|     2      | West North Central |
|     3      |    New England     |
|     4      | East North Central |
|     5      | East South Central |
|     6      | West South Central |
|     7      |      Mountain      |
|     8      |      Pacific       |
|     9      |   South Atlantic   |
+------------+--------------------+


In [58]:
# Census divisions by high quality coverage:
#Rank ordered list of US census divisions (in descending order) by high quality articles per capita.

division_hq_coverage = hq_division_df.sort_values(by=['articles_per_capita'],
                                           ascending=False).reset_index()
division_hq_coverage.index += 1
division_hq_coverage['regional_division']

table = division_hq_coverage[['regional_division']]

# Add a serial number column
table.insert(0, 'Serial No.', range(1, 1 + len(table)))
print(tabulate(table, headers='keys', tablefmt='pretty', showindex=False))

+------------+--------------------+
| Serial No. | regional_division  |
+------------+--------------------+
|     1      |  Middle Atlantic   |
|     2      | West North Central |
|     3      |    New England     |
|     4      | East South Central |
|     5      | East North Central |
|     6      | West South Central |
|     7      |      Mountain      |
|     8      |      Pacific       |
|     9      |   South Atlantic   |
+------------+--------------------+
