# Structured Data Download: BLS Example - Multiple States

#### Now, what if you had a list of states, and you wanted the unemployment statistics for all these states. 

#### First, as before, we will need to import the urllib library.

In [1]:
import urllib.request

#### Lets say that you need the states Indiana and Illinois.  We will also need the code of those states.

#### In the Files Directory BLS folder, there is a file called BLS_state_codes.csv.  This is a csv file, which means that it is comma seperated (columns are seperated by commas).  This has the three states that we want to use with their associated code.  Let's read these into the state_codes list:

In [2]:
Input_File = open('Files_Directory/BLS/states.csv','r')
Input_File = Input_File.read()    

In [3]:
states_for_url = Input_File.splitlines(1)
states_for_url

['21.Indiana\n', '31.Mississippi\n', '37.NewJersey']

#### We don't need the \n, so we use option 0 instead:

In [8]:
states_for_url = Input_File.splitlines(0)
states_for_url

['21.Indiana', '20.Illinois', '11.California']

#### Now, from the previous example, we know how the urls should like:

#### https://download.bls.gov/pub/time.series/la/la.data.21.indiana

#### https://download.bls.gov/pub/time.series/la/la.data.11.california

#### https://download.bls.gov/pub/time.series/la/la.data.20.illinois

#### So, we need create three seperate strings because we will need to use the urllib2 library three times to open three different pages.  Simply copying the code from our previous example, we need all the urls from where we are going to get our  files from. We will use loops for this. Since a 'for' loop over a list will automatically iterate through all the elements in the list, we do not have to specify a range for the 'for' loop.  'state' is simply a variable which is assigned to the element of the current iteration from within the list.  Let's look at the following 'for' loop:

In [9]:
urls = []
for state in states_for_url:
    urls.append('https://download.bls.gov/pub/time.series/la/la.data.' + state)
urls

['https://download.bls.gov/pub/time.series/la/la.data.21.Indiana',
 'https://download.bls.gov/pub/time.series/la/la.data.20.Illinois',
 'https://download.bls.gov/pub/time.series/la/la.data.11.California']

#### Now we need to go to each website in the website list, read the file, and then export the returns to a file in our directory. For this, we will be using the same procedure we used in the previous example, except this time we loop through the website list. 

#### Note that we want each output file to be named after the state.  However, the string in the list of states has the code followed by a period followed by the state.  Lets create a list of state names that does not have the fip nor the period:

In [10]:
state_names = []
for state in states_for_url:
    state_names.append(state[3:])
print(state_names)

['Indiana', 'Illinois', 'California']


#### In this case, we cannot simply have the for loop give us the state corresponding to the current iteration.  We will not only need  the value of the url, but also the cell reference for state_names.  To accomplish this, we will introduce a variable called 'counter', which is just the count of the current iteration.  This means in the first iteration of the 'for' loop it will be valued at 0, in the second 1, and so forth. 

In [11]:
### Why not enumerate
for counter in range(0 , len(urls) ):
    print(state_names[counter])
    html = urllib.request.urlopen(urls[counter]).read().decode('utf-8')
    Output_File = open('Files_Directory/BLS/' + state_names[counter] +  '.txt','w')
    Output_File.write(html)
    Output_File.close()

Indiana
Illinois
California
