### Name: Asha Cumberbatch 
### Date: April 8th
### Assignment: Project 2 part 2 - Migration table
### Purpose: The aim of this notebook is to pull the data from the tables on the List of U.S. states and territories by net migration Wikipedia page (https://en.wikipedia.org/wiki/List_of_U.S._states_and_territories_by_net_migration). The page will be web scraped and the data cleaned. 
### There are three tables on this page. They provide information about net domestic migration, net international migration and net combined migration (combining domestic and international.  
### The columns of particular interest in each table are State, and Net (domestic, international or combined) migration per 1,000 inhabitants (2020-2024).

### The data pulled from each table will be saved to a csv file. Those csv file will be merged with other csv files, created from similar pages, then saved as a new data frame. The resulting data frame will be used to gain insight on how these metrics differ by state.

##### The first step is be to import the necessary packages. BeautifulSoup, imported as bs, and pandas, imported as pd will be needed.

In [1]:
import requests
from bs4 import BeautifulSoup as bs
import pandas as pd

# Migration tables -  data pulling

##### Before attempting to web scrape the page, the robots.txt file was was run for Wikipedia to ensure that scraping was allowed.
##### The instruction to pull the page also contains an if statement, which will return an error message if there is an issue when attempting to pull the page.

In [2]:
url='https://en.wikipedia.org/wiki/List_of_U.S._states_and_territories_by_net_migration'
response = requests.get(url)
status = response.status_code
if status == 200:
    page = response.text
    soup = bs(page)
else:
    print(f"Oops! Received status code {status}")

In [3]:
print(soup.prettify())
type(soup)

<!DOCTYPE html>
<html class="client-nojs vector-feature-language-in-header-enabled vector-feature-language-in-main-page-header-disabled vector-feature-page-tools-pinned-disabled vector-feature-toc-pinned-clientpref-1 vector-feature-main-menu-pinned-disabled vector-feature-limited-width-clientpref-1 vector-feature-limited-width-content-enabled vector-feature-custom-font-size-clientpref-1 vector-feature-appearance-pinned-clientpref-1 vector-feature-night-mode-enabled skin-theme-clientpref-day vector-sticky-header-enabled vector-toc-available" dir="ltr" lang="en">
 <head>
  <meta charset="utf-8"/>
  <title>
   List of U.S. states and territories by net migration - Wikipedia
  </title>
  <script>
   (function(){var className="client-js vector-feature-language-in-header-enabled vector-feature-language-in-main-page-header-disabled vector-feature-page-tools-pinned-disabled vector-feature-toc-pinned-clientpref-1 vector-feature-main-menu-pinned-disabled vector-feature-limited-width-clientpref-1

bs4.BeautifulSoup

# Net domestic migration

##### To assist with identifying and selecting the right table for each type of migration, a function was used. The number of tables on the page and the content of each table is printed.

In [4]:
tables = soup.find_all('table')  # finds all the tables on the page
print(len(tables))  # prints the number of tables found

for i, table in enumerate(tables):
    print(f"Table {i}:")
    print(table.prettify()[:500])  # for each tabl identified, prints the first 500 characters
    print("\n")


5
Table 0:
<table class="wikitable sortable">
 <caption>
  U.S. states by net domestic migration (From April 1, 2020 to July 1, 2024)
 </caption>
 <tbody>
  <tr>
   <th scope="col" style="width: 50px;">
    National Rank
   </th>
   <th scope="col" style="width: 150px;">
    State
   </th>
   <th scope="col" style="width: 150px;">
    Total net domestic migration (2020-2024)
    <sup class="reference" id="cite_ref-:0_1-0">
     <a href="#cite_note-:0-1">
      <span class="cite-bracket">
       [
      </s


Table 1:
<table class="wikitable sortable">
 <caption>
  U.S. states by net international migration (From April 1, 2020 to July 1, 2024)
 </caption>
 <tbody>
  <tr>
   <th scope="col" style="width: 50px;">
    National Rank
   </th>
   <th scope="col" style="width: 150px;">
    State
   </th>
   <th scope="col" style="width: 150px;">
    Total net international migration (2020-2024)
    <sup class="reference" id="cite_ref-:0_1-1">
     <a href="#cite_note-:0-1">
      <span class="

#### An empty list is set to hold the variables that will be pulled from the table. The table that needs to be pulled will also be printed.

In [5]:
net_domestic_migration_list = [ ]
net_domestic_migration_table = soup.find(class_='wikitable sortable').tbody
net_domestic_migration_table

<tbody><tr>
<th scope="col" style="width: 50px;">National Rank</th>
<th scope="col" style="width: 150px;">State</th>
<th scope="col" style="width: 150px;">Total net domestic migration (2020-2024)<sup class="reference" id="cite_ref-:0_1-0"><a href="#cite_note-:0-1"><span class="cite-bracket">[</span>1<span class="cite-bracket">]</span></a></sup></th>
<th scope="col" style="width: 150px;">Net domestic migration rate per 1,000 inhabitants (2020-2024)
</th></tr>
<tr>
<td>1</td>
<td><span class="flagicon"><span class="mw-image-border" typeof="mw:File"><span><img alt="" class="mw-file-element" data-file-height="200" data-file-width="300" decoding="async" height="15" src="//upload.wikimedia.org/wikipedia/commons/thumb/f/f7/Flag_of_Florida.svg/40px-Flag_of_Florida.svg.png" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/f/f7/Flag_of_Florida.svg/60px-Flag_of_Florida.svg.png 2x" width="23"/></span></span> </span><a href="/wiki/Florida" title="Florida">Florida</a></td>
<td>872,722</td>
<td

In [6]:
# determines the maximum number of columns in the table (net_domestic_migration_table)
max_columns = max([len(row.find_all(['th', 'td'])) for row in net_domestic_migration_table.find_all('tr')])

for row in net_domestic_migration_table.find_all('tr'): #uses each row in the table, including the header. (use [1:]: to skip header)
    cells = row.find_all(['th', 'td']) # finds all the header (th) and data (td) elements from each row
    row_data = [cell.text.strip() for cell in cells]  #pulls the text within each cell, removes any spaces
    
    # in case any of the columns are missing, this function inserts None at the beginning of the row
    while len(row_data) < max_columns:  #checks if any missing column entries by comparing against the value in max_column
        row_data.insert(0, None)  # 0 tells the code to add the placeholder none to the first column
    
    net_domestic_migration_list.append(row_data)


for row in net_domestic_migration_list:
    print(row)

['National Rank', 'State', 'Total net domestic migration (2020-2024)[1]', 'Net domestic migration rate per 1,000 inhabitants (2020-2024)']
['1', 'Florida', '872,722', '40.52']
['2', 'Texas', '747,730', '25.65']
['3', 'North Carolina', '392,010', '37.54']
['4', 'South Carolina', '314,953', '61.54']
['5', 'Arizona', '252,654', '35.30']
['6', 'Tennessee', '252,180', '36.48']
['7', 'Georgia', '205,811', '19.21']
['8', 'Idaho', '120,350', '65.44']
['9', 'Alabama', '119,132', '23.71']
['10', 'Oklahoma', '93,218', '23.54']
['11', 'Nevada', '81,386', '26.21']
['12', 'Arkansas', '68,640', '22.79']
['13', 'Montana', '53,496', '49.34']
['14', 'Utah', '51,891', '15.86']
['15', 'Maine', '49,132', '36.04']
['16', 'Delaware', '46,357', '46.83']
['17', 'Missouri', '42,234', '6.86']
['18', 'Colorado', '31,172', '5.40']
['19', 'Indiana', '30,239', '4.46']
['20', 'New Hampshire', '29,170', '21.18']
['21', 'Kentucky', '28,781', '6.39']
['22', 'South Dakota', '21,370', '24.10']
['23', 'West Virginia', '10,

In [7]:
unsorted_net_domestic_migration_df = pd.DataFrame(net_domestic_migration_list)
unsorted_net_domestic_migration_df

Unnamed: 0,0,1,2,3
0,National Rank,State,Total net domestic migration (2020-2024)[1],"Net domestic migration rate per 1,000 inhabita..."
1,1,Florida,872722,40.52
2,2,Texas,747730,25.65
3,3,North Carolina,392010,37.54
4,4,South Carolina,314953,61.54
5,5,Arizona,252654,35.30
6,6,Tennessee,252180,36.48
7,7,Georgia,205811,19.21
8,8,Idaho,120350,65.44
9,9,Alabama,119132,23.71


#### The output from the webpage has been saved (to unsorted_net_domestic_migration_df). To ensure the scraped data from each webpage will be in the same order when merging the various files, it is sorted alphabetically.

In [8]:
# creates a pandas dataframe from the net_domestic_list, using the entries in row 0 as the column names 
# the entries, starting from row 1 will become the data to fill those columns
net_domestic_migration_df = pd.DataFrame(net_domestic_migration_list[1:], columns=net_domestic_migration_list[0]) 

net_domestic_migration_df = net_domestic_migration_df.sort_values(by='State', ascending=True) # sorts the dataFrame alphabetically by the state column

net_domestic_migration_df

Unnamed: 0,National Rank,State,Total net domestic migration (2020-2024)[1],"Net domestic migration rate per 1,000 inhabitants (2020-2024)"
8,9,Alabama,119132,23.71
32,33,Alaska,-19564,-26.68
4,5,Arizona,252654,35.3
11,12,Arkansas,68640,22.79
50,50,California,-1234030,-37.04
17,18,Colorado,31172,5.4
36,37,Connecticut,-24206,-6.71
15,16,Delaware,46357,46.83
37,--,District of Columbia,-29330,-42.54
0,1,Florida,872722,40.52


#### To make the 'Net domestic migration rate per 1,000 inhabitants (2020-2024)' column easier to reference, it will be renamed domestic migration rate

In [9]:
net_domestic_migration_df = net_domestic_migration_df.rename(columns={'Net domestic migration rate per 1,000 inhabitants (2020-2024)': 'domestic migration rate'})
net_domestic_migration_df

Unnamed: 0,National Rank,State,Total net domestic migration (2020-2024)[1],domestic migration rate
8,9,Alabama,119132,23.71
32,33,Alaska,-19564,-26.68
4,5,Arizona,252654,35.3
11,12,Arkansas,68640,22.79
50,50,California,-1234030,-37.04
17,18,Colorado,31172,5.4
36,37,Connecticut,-24206,-6.71
15,16,Delaware,46357,46.83
37,--,District of Columbia,-29330,-42.54
0,1,Florida,872722,40.52


#### The scraped table contains some entries, like totals for the United States or Washington D.C., which will not be included in the analysis. To identify any of these entries need to be removed, the shape of the data frame must be checked.

In [10]:
net_domestic_migration_df.shape[0]

51

In [11]:
net_domestic_migration_df = net_domestic_migration_df[~net_domestic_migration_df['State'].isin(["District of Columbia", "United States"])]
print(net_domestic_migration_df)
net_domestic_migration_df.shape[0]

   National Rank           State Total net domestic migration (2020-2024)[1]  \
8              9         Alabama                                     119,132   
32            33          Alaska                                     -19,564   
4              5         Arizona                                     252,654   
11            12        Arkansas                                      68,640   
50            50      California                                  -1,234,030   
17            18        Colorado                                      31,172   
36            37     Connecticut                                     -24,206   
15            16        Delaware                                      46,357   
0              1         Florida                                     872,722   
6              7         Georgia                                     205,811   
42            42          Hawaii                                     -50,754   
7              8           Idaho        

50

#### The data types of each column will also need to be verified, and any numeric columns needed for analysis converted to int or float. This data is stored in quotes and include commas. These will have to be removed to convert the data.

In [12]:
net_domestic_migration_df.dtypes

National Rank                                  object
State                                          object
Total net domestic migration (2020-2024)[1]    object
domestic migration rate                        object
dtype: object

In [13]:
print(net_domestic_migration_df['domestic migration rate'].unique())  
# shows unique values of each entry in the coulmn exactly as they are displayed

['23.71' '-26.68' '35.30' '22.79' '-37.04' '5.40' '-6.71' '46.83' '40.52'
 '19.21' '-34.88' '65.44' '-32.61' '4.46' '-2.97' '-8.14' '6.39' '-27.80'
 '36.04' '-19.48' '-23.14' '-6.73' '-8.40' '-7.48' '6.86' '49.34' '-7.01'
 '26.21' '21.18' '-20.69' '-3.77' '-47.82' '37.54' '-8.28' '-3.22' '23.54'
 '-0.44' '-7.87' '61.54' '24.10' '36.48' '25.65' '15.86' '9.58' '-4.00'
 '-2.82' '5.74' '0.50' '12.66']


##### By removing the quotes from the 'domestic migration rate' column, the data frame is updated. This prompts a warning that the data frame is being modified, without a copy of the data originally scraped being saved. This warning can be avoided by saving a copy of the original net_domestic_migration_df.

In [14]:
net_domestic_migration_df = net_domestic_migration_df.copy()

net_domestic_migration_df['domestic migration rate'] = net_domestic_migration_df['domestic migration rate'].str.replace(',', '').str.strip()
# cleans the 'domestic migration rate' column by removing the quotes
net_domestic_migration_df

Unnamed: 0,National Rank,State,Total net domestic migration (2020-2024)[1],domestic migration rate
8,9,Alabama,119132,23.71
32,33,Alaska,-19564,-26.68
4,5,Arizona,252654,35.3
11,12,Arkansas,68640,22.79
50,50,California,-1234030,-37.04
17,18,Colorado,31172,5.4
36,37,Connecticut,-24206,-6.71
15,16,Delaware,46357,46.83
0,1,Florida,872722,40.52
6,7,Georgia,205811,19.21


In [15]:
net_domestic_migration_df['domestic migration rate'] = pd.to_numeric(net_domestic_migration_df['domestic migration rate'], errors='coerce')
# converts all the values in the 'domestic migration rate' column from object to float so it can be used as numeric data

net_domestic_migration_df

Unnamed: 0,National Rank,State,Total net domestic migration (2020-2024)[1],domestic migration rate
8,9,Alabama,119132,23.71
32,33,Alaska,-19564,-26.68
4,5,Arizona,252654,35.3
11,12,Arkansas,68640,22.79
50,50,California,-1234030,-37.04
17,18,Colorado,31172,5.4
36,37,Connecticut,-24206,-6.71
15,16,Delaware,46357,46.83
0,1,Florida,872722,40.52
6,7,Georgia,205811,19.21


In [16]:
net_domestic_migration_df.dtypes

National Rank                                   object
State                                           object
Total net domestic migration (2020-2024)[1]     object
domestic migration rate                        float64
dtype: object

#### The 'domestic migration rate' and State column will be saved as a data frame, and a new csv file created (at the end of the notebook.)

In [17]:
net_domestic_migration_df = net_domestic_migration_df[['State','domestic migration rate']]
net_domestic_migration_df

Unnamed: 0,State,domestic migration rate
8,Alabama,23.71
32,Alaska,-26.68
4,Arizona,35.3
11,Arkansas,22.79
50,California,-37.04
17,Colorado,5.4
36,Connecticut,-6.71
15,Delaware,46.83
0,Florida,40.52
6,Georgia,19.21


#### This process will be repeated for the net international and total net combined migration tables

# Net international migration

#### The net migration table is the second table on the page. This can easily be selected by using the table index.

In [18]:
net_international_migration_list = []
net_international_migration_table = tables[1].tbody  # selects the second table (index 1)
net_international_migration_table

<tbody><tr>
<th scope="col" style="width: 50px;">National Rank</th>
<th scope="col" style="width: 150px;">State</th>
<th scope="col" style="width: 150px;">Total net international migration (2020-2024)<sup class="reference" id="cite_ref-:0_1-1"><a href="#cite_note-:0-1"><span class="cite-bracket">[</span>1<span class="cite-bracket">]</span></a></sup></th>
<th scope="col" style="width: 150px;">Net international migration rate per 1,000 inhabitants (2020-2024)
</th></tr>
<tr>
<td>1</td>
<td><span class="flagicon"><span class="mw-image-border" typeof="mw:File"><span><img alt="" class="mw-file-element" data-file-height="200" data-file-width="300" decoding="async" height="15" src="//upload.wikimedia.org/wikipedia/commons/thumb/f/f7/Flag_of_Florida.svg/40px-Flag_of_Florida.svg.png" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/f/f7/Flag_of_Florida.svg/60px-Flag_of_Florida.svg.png 2x" width="23"/></span></span> </span><a href="/wiki/Florida" title="Florida">Florida</a></td>
<td>1,059,

In [19]:
max_columns = max([len(row.find_all(['th', 'td'])) for row in net_international_migration_table.find_all('tr')])

for row in net_international_migration_table.find_all('tr'):
    cells = row.find_all(['th', 'td'])
    row_data = [cell.text.strip() for cell in cells]  
    
    while len(row_data) < max_columns:
        row_data.insert(0, None) 
    
    net_international_migration_list.append(row_data)


for row in net_international_migration_list:
    print(row)

['National Rank', 'State', 'Total net international migration (2020-2024)[1]', 'Net international migration rate per 1,000 inhabitants (2020-2024)']
['1', 'Florida', '1,059,143', '49.18']
['2', 'California', '934,230', '23.62']
['3', 'Texas', '820,761', '28.16']
['4', 'New York', '519,395', '25.71']
['5', 'New Jersey', '327,188', '35.22']
['6', 'Illinois', '278,657', '21.73']
['7', 'Massachusetts', '255,102', '36.27']
['8', 'Washington', '206,851', '26.84']
['9', 'Pennsylvania', '198,901', '15.30']
['10', 'North Carolina', '181,262', '17.36']
['11', 'Georgia', '170,551', '15.92']
['12', 'Michigan', '164,465', '16.32']
['13', 'Ohio', '164,274', '13.92']
['14', 'Arizona', '158,932', '22.20']
['15', 'Virginia', '158,813', '18.40']
['16', 'Maryland', '154,183', '24.94']
['17', 'Connecticut', '95,160', '26.38']
['18', 'Indiana', '88,582', '13.05']
['19', 'Colorado', '83,062', '14.38']
['20', 'Minnesota', '81,091', '14.21']
['21', 'Utah', '77,904', '23.81']
['22', 'Tennessee', '73,139', '10.

In [20]:
unsorted_net_international_migration_df = pd.DataFrame(net_international_migration_list)
unsorted_net_international_migration_df

Unnamed: 0,0,1,2,3
0,National Rank,State,Total net international migration (2020-2024)[1],"Net international migration rate per 1,000 inh..."
1,1,Florida,1059143,49.18
2,2,California,934230,23.62
3,3,Texas,820761,28.16
4,4,New York,519395,25.71
5,5,New Jersey,327188,35.22
6,6,Illinois,278657,21.73
7,7,Massachusetts,255102,36.27
8,8,Washington,206851,26.84
9,9,Pennsylvania,198901,15.30


In [21]:
net_international_migration_df = pd.DataFrame(net_international_migration_list[1:], columns=net_international_migration_list[0]) # _list[0] is the header row

net_international_migration_df = net_international_migration_df.sort_values(by='State', ascending=True)

net_international_migration_df


Unnamed: 0,National Rank,State,Total net international migration (2020-2024)[1],"Net international migration rate per 1,000 inhabitants (2020-2024)"
32,33,Alabama,38850,7.73
44,44,Alaska,11176,15.24
13,14,Arizona,158932,22.2
40,40,Arkansas,18737,6.22
1,2,California,934230,23.62
18,19,Colorado,83062,14.38
16,17,Connecticut,95160,26.38
41,41,Delaware,17748,17.93
33,--,District of Columbia,34639,50.23
0,1,Florida,1059143,49.18


In [22]:
net_international_migration_df = net_international_migration_df.rename(columns={'Net international migration rate per 1,000 inhabitants (2020-2024)': 'international migration rate'})
print(net_international_migration_df)
net_international_migration_df.shape[0]

   National Rank                 State  \
32            33               Alabama   
44            44                Alaska   
13            14               Arizona   
40            40              Arkansas   
1              2            California   
18            19              Colorado   
16            17           Connecticut   
41            41              Delaware   
33            --  District of Columbia   
0              1               Florida   
10            11               Georgia   
35            35                Hawaii   
38            38                 Idaho   
5              6              Illinois   
17            18               Indiana   
29            30                  Iowa   
30            31                Kansas   
23            24              Kentucky   
26            27             Louisiana   
42            42                 Maine   
15            16              Maryland   
6              7         Massachusetts   
11            12              Mich

51

In [23]:
net_international_migration_df = net_international_migration_df[~net_international_migration_df['State'].isin(["District of Columbia", "United States"])]
print(net_international_migration_df)
net_international_migration_df.shape[0]

   National Rank           State  \
32            33         Alabama   
44            44          Alaska   
13            14         Arizona   
40            40        Arkansas   
1              2      California   
18            19        Colorado   
16            17     Connecticut   
41            41        Delaware   
0              1         Florida   
10            11         Georgia   
35            35          Hawaii   
38            38           Idaho   
5              6        Illinois   
17            18         Indiana   
29            30            Iowa   
30            31          Kansas   
23            24        Kentucky   
26            27       Louisiana   
42            42           Maine   
15            16        Maryland   
6              7   Massachusetts   
11            12        Michigan   
19            20       Minnesota   
39            39     Mississippi   
27            28        Missouri   
50            50         Montana   
34            34        Nebr

50

In [24]:
print(net_international_migration_df['international migration rate'].unique()) 

['7.73' '15.24' '22.20' '6.22' '23.62' '14.38' '26.38' '17.93' '49.18'
 '15.92' '21.12' '11.98' '21.73' '13.05' '15.61' '14.34' '15.67' '12.69'
 '10.12' '24.94' '36.27' '16.32' '14.21' '6.74' '9.57' '2.42' '16.99'
 '23.02' '7.94' '35.22' '14.23' '25.71' '17.36' '16.90' '13.92' '10.31'
 '12.53' '15.30' '24.49' '11.86' '7.52' '10.58' '28.16' '23.81' '9.39'
 '18.40' '26.84' '4.24' '10.77' '5.25']


In [25]:
net_international_migration_df = net_international_migration_df.copy()
net_international_migration_df['international migration rate'] = net_international_migration_df['international migration rate'].str.replace(',', '').str.strip()
net_international_migration_df

Unnamed: 0,National Rank,State,Total net international migration (2020-2024)[1],international migration rate
32,33,Alabama,38850,7.73
44,44,Alaska,11176,15.24
13,14,Arizona,158932,22.2
40,40,Arkansas,18737,6.22
1,2,California,934230,23.62
18,19,Colorado,83062,14.38
16,17,Connecticut,95160,26.38
41,41,Delaware,17748,17.93
0,1,Florida,1059143,49.18
10,11,Georgia,170551,15.92


In [26]:
net_international_migration_df['international migration rate'] = pd.to_numeric(net_international_migration_df['international migration rate'], errors='coerce')
net_international_migration_df

Unnamed: 0,National Rank,State,Total net international migration (2020-2024)[1],international migration rate
32,33,Alabama,38850,7.73
44,44,Alaska,11176,15.24
13,14,Arizona,158932,22.2
40,40,Arkansas,18737,6.22
1,2,California,934230,23.62
18,19,Colorado,83062,14.38
16,17,Connecticut,95160,26.38
41,41,Delaware,17748,17.93
0,1,Florida,1059143,49.18
10,11,Georgia,170551,15.92


In [27]:
net_international_migration_df.dtypes

National Rank                                        object
State                                                object
Total net international migration (2020-2024)[1]     object
international migration rate                        float64
dtype: object

In [28]:
net_international_migration_df = net_international_migration_df[['State','international migration rate']]
net_international_migration_df

Unnamed: 0,State,international migration rate
32,Alabama,7.73
44,Alaska,15.24
13,Arizona,22.2
40,Arkansas,6.22
1,California,23.62
18,Colorado,14.38
16,Connecticut,26.38
41,Delaware,17.93
0,Florida,49.18
10,Georgia,15.92


# Net combined migration

#### The net combined table is the third table on the page. This will also be selected by using the table index.

In [29]:
net_combined_migration_list = []
net_combined_migration_table = tables[2].tbody  # Select the third table (index 2)
net_combined_migration_table

<tbody><tr>
<th scope="col" style="width: 50px;">National rank</th>
<th scope="col" style="width: 150px;">State</th>
<th scope="col" style="width: 150px;">Total net combined migration (2020-2024)<sup class="reference" id="cite_ref-:0_1-2"><a href="#cite_note-:0-1"><span class="cite-bracket">[</span>1<span class="cite-bracket">]</span></a></sup></th>
<th scope="col" style="width: 150px;">Total net combined migration per 1,000 residents (2020-2024)
</th></tr>
<tr>
<td>1</td>
<td><span class="flagicon"><span class="mw-image-border" typeof="mw:File"><span><img alt="" class="mw-file-element" data-file-height="200" data-file-width="300" decoding="async" height="15" src="//upload.wikimedia.org/wikipedia/commons/thumb/f/f7/Flag_of_Florida.svg/40px-Flag_of_Florida.svg.png" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/f/f7/Flag_of_Florida.svg/60px-Flag_of_Florida.svg.png 2x" width="23"/></span></span> </span><a href="/wiki/Florida" title="Florida">Florida</a></td>
<td>1,931,865</td>
<t

In [30]:
max_columns = max([len(row.find_all(['th', 'td'])) for row in net_combined_migration_table.find_all('tr')])

for row in net_combined_migration_table.find_all('tr'):
    cells = row.find_all(['th', 'td'])
    row_data = [cell.text.strip() for cell in cells]  
    
    
    while len(row_data) < max_columns:
        row_data.insert(0, None)  
    
    net_combined_migration_list.append(row_data)

for row in net_combined_migration_list:
    print(row)

['National rank', 'State', 'Total net combined migration (2020-2024)[1]', 'Total net combined migration per 1,000 residents (2020-2024)']
['1', 'Florida', '1,931,865', '89.69']
['2', 'Texas', '1,568,491', '53.81']
['3', 'North Carolina', '573,272', '54.90']
['4', 'Arizona', '411,586', '57.50']
['5', 'Georgia', '376,362', '35.13']
['6', 'South Carolina', '375,644', '73.39']
['7', 'Tennessee', '325,319', '47.06']
['8', 'Washington', '185,134', '24.02']
['9', 'Alabama', '157,982', '31.44']
['10', 'Nevada', '152,874', '49.23']
['11', 'Pennsylvania', '149,870', '11.53']
['12', 'Idaho', '142,379', '77.42']
['13', 'New Jersey', '134,979', '14.53']
['14', 'Oklahoma', '134,029', '33.85']
['15', 'Utah', '129,795', '39.67']
['16', 'Ohio', '126,256', '10.70']
['17', 'Virginia', '124,316', '14.40']
['18', 'Indiana', '118,821', '17.51']
['19', 'Colorado', '114,234', '19.78']
['20', 'Missouri', '101,152', '16.43']
['21', 'Kentucky', '99,395', '22.06']
['22', 'Michigan', '96,680', '9.59']
['23', 'Mass

In [31]:
unsorted_net_combined_migration_df = pd.DataFrame(net_combined_migration_list)
unsorted_net_combined_migration_df

Unnamed: 0,0,1,2,3
0,National rank,State,Total net combined migration (2020-2024)[1],"Total net combined migration per 1,000 residen..."
1,1,Florida,1931865,89.69
2,2,Texas,1568491,53.81
3,3,North Carolina,573272,54.90
4,4,Arizona,411586,57.50
5,5,Georgia,376362,35.13
6,6,South Carolina,375644,73.39
7,7,Tennessee,325319,47.06
8,8,Washington,185134,24.02
9,9,Alabama,157982,31.44


In [32]:
net_combined_migration_df = pd.DataFrame(net_combined_migration_list[1:], columns=net_combined_migration_list[0]) # _list[0] is the header row

net_combined_migration_df = net_combined_migration_df.sort_values(by='State', ascending=True)

net_combined_migration_df

Unnamed: 0,National rank,State,Total net combined migration (2020-2024)[1],"Total net combined migration per 1,000 residents (2020-2024)"
8,9,Alabama,157982,31.44
45,45,Alaska,-8388,-11.44
3,4,Arizona,411586,57.5
23,24,Arkansas,87377,29.01
50,50,California,-530886,-13.42
18,19,Colorado,114234,19.78
24,25,Connecticut,70954,19.67
26,27,Delaware,64105,64.76
43,--,District of Columbia,5309,7.7
0,1,Florida,1931865,89.69


In [33]:
net_combined_migration_df = net_combined_migration_df.rename(columns={'Total net combined migration per 1,000 residents (2020-2024)': 'combined migration rate'})
print(net_combined_migration_df)
net_combined_migration_df.shape[0]

   National rank                 State  \
8              9               Alabama   
45            45                Alaska   
3              4               Arizona   
23            24              Arkansas   
50            50            California   
18            19              Colorado   
24            25           Connecticut   
26            27              Delaware   
43            --  District of Columbia   
0              1               Florida   
4              5               Georgia   
46            46                Hawaii   
11            12                 Idaho   
48            48              Illinois   
17            18               Indiana   
30            31                  Iowa   
38            39                Kansas   
20            21              Kentucky   
47            47             Louisiana   
27            28                 Maine   
32            33              Maryland   
22            23         Massachusetts   
21            22              Mich

51

In [34]:
net_combined_migration_df = net_combined_migration_df[~net_combined_migration_df['State'].isin(["District of Columbia", "United States"])]
print(net_combined_migration_df)
net_combined_migration_df.shape[0]

   National rank           State Total net combined migration (2020-2024)[1]  \
8              9         Alabama                                     157,982   
45            45          Alaska                                      -8,388   
3              4         Arizona                                     411,586   
23            24        Arkansas                                      87,377   
50            50      California                                    -530,886   
18            19        Colorado                                     114,234   
24            25     Connecticut                                      70,954   
26            27        Delaware                                      64,105   
0              1         Florida                                   1,931,865   
4              5         Georgia                                     376,362   
46            46          Hawaii                                     -20,019   
11            12           Idaho        

50

In [35]:
net_combined_migration_df.dtypes

National rank                                  object
State                                          object
Total net combined migration (2020-2024)[1]    object
combined migration rate                        object
dtype: object

In [36]:
print(net_combined_migration_df['combined migration rate'].unique())  


['31.44' '-11.44' '57.50' '29.01' '-13.42' '19.78' '19.67' '64.76' '89.69'
 '35.13' '-13.76' '77.42' '-10.87' '17.51' '12.63' '6.20' '22.06' '-15.11'
 '46.17' '5.46' '13.13' '9.59' '5.81' '-0.74' '16.43' '51.76' '9.98'
 '49.23' '29.12' '14.53' '10.46' '-22.12' '54.90' '8.62' '10.70' '33.85'
 '12.09' '11.53' '16.62' '73.39' '31.62' '47.06' '53.81' '39.67' '18.97'
 '14.40' '24.02' '11.27' '17.91']


In [37]:
net_combined_migration_df = net_combined_migration_df.copy()
net_combined_migration_df['combined migration rate'] = net_combined_migration_df['combined migration rate'].str.replace(',', '').str.strip()
net_combined_migration_df

Unnamed: 0,National rank,State,Total net combined migration (2020-2024)[1],combined migration rate
8,9,Alabama,157982,31.44
45,45,Alaska,-8388,-11.44
3,4,Arizona,411586,57.5
23,24,Arkansas,87377,29.01
50,50,California,-530886,-13.42
18,19,Colorado,114234,19.78
24,25,Connecticut,70954,19.67
26,27,Delaware,64105,64.76
0,1,Florida,1931865,89.69
4,5,Georgia,376362,35.13


In [38]:
net_combined_migration_df['combined migration rate'] = pd.to_numeric(net_combined_migration_df['combined migration rate'], errors='coerce')

In [39]:
net_combined_migration_df.dtypes

National rank                                   object
State                                           object
Total net combined migration (2020-2024)[1]     object
combined migration rate                        float64
dtype: object

In [40]:
net_combined_migration_df = net_combined_migration_df[['State','combined migration rate']]
net_combined_migration_df

Unnamed: 0,State,combined migration rate
8,Alabama,31.44
45,Alaska,-11.44
3,Arizona,57.5
23,Arkansas,29.01
50,California,-13.42
18,Colorado,19.78
24,Connecticut,19.67
26,Delaware,64.76
0,Florida,89.69
4,Georgia,35.13


In [41]:
net_domestic_migration_df.to_csv('net_domestic_migration.csv', index=False)
net_international_migration_df.to_csv('net_international_migration.csv', index=False)
net_combined_migration_df.to_csv('net_combined_migration_df.csv', index=False)

### References
#### Wikipedia contributors. (n.d.). List of U.S. states and territories by net migration. Wikipedia, The Free Encyclopedia. Retrieved April 8, 2025, from https://en.wikipedia.org/wiki/List_of_U.S._states_and_territories_by_net_migration