# Combining U.S. Census Bureau's GBF/DIME 1980 text files and converting to CSV

The following description comes from the [Summary on ICPSR's study homepage:](https://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/8378/summary) 
> A Geographic Base File is a map in machine-readable form, and Dual Independent Map Encoding is the method used for this collection to encode map features in data files. These files have been created for most metropolitan areas. GBF/DIME does not contain individual house addresses, names, or other means of identifying individuals, and it does not contain statistical information. This collection provides a means to structure, compare, and display data, and relate this information to small geographic areas. ICPSR also has the Special Program Information Tape (SPIT) produced by the Census Bureau (See ICPSR 8372), which contains several computer programs designed for use with the GBF/DIME files.

## Access the ASCII files
The Geographic Base File/Dual Independent Map Encoding (GBF/DIME), 1980 (ICPSR 8378) can be accessed here: https://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/8378/datadocumentation#

Downloading all the ASCII data files in a single download from this link https://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/8378/versions/V1/download/ascii?path=/pcms/studies/0/0/8/3/08378/V1 will provide you with a ZIP file. When the ZIP file is extracted, there will be a nested directory structure with 324 dataset-specific ASCII files each in its own directory. 


## Combine the ASCII files
This first section of code pulls each of the ASCII files from its own directory and adds it to a single tall file (herein named as '08378-all.txt.' The code takes into account the idiosyncracies of the datafiles' naming formats (while mostly named in consecutive numeric order, a few numbers are skipped, requiring some if statements in the for loop).

In [4]:
#The following file pulls text files each stored in its own directory and writes to a single output file, creating a single tall file.


#importing os to allow for changing working directory
import os

#assigning a filename for output and setting the dataset number to 1 to begin the while loop
outfile = open('08378-all.txt', 'a', newline='')
dsnum=1

#beginning while loop to iterate through dataset 397
while dsnum < 398:
    
    #creating a string version of the dataset number, prefixed with the correct number of zeros
    if len(str(dsnum))==1:
        dsstr='000'+str(dsnum)
    elif len(str(dsnum))==2:
        dsstr='00'+str(dsnum)
    elif len(str(dsnum))==3:
        dsstr='0'+str(dsnum)
    else: break
    print(dsstr)
    
    #assigning filename and filepath and calling in the text file (change directory reference as needed)
    infile = '08378-'+dsstr+'-Data.txt'
    wd = '/ICPSR_08378/DS'+dsstr+'/'
    os.chdir(wd)
    #print(wd, infile)
    fhand = open(infile, 'r')
    
    #writing the lines of the input file to the end of the output file
    for line in fhand :
        outfile.write(line)

    #moving forward to the next step in the loop. if statements needed to account for nonsequential DS numbers.
    if dsnum in [188] : dsnum=dsnum+2
    elif dsnum == 278 : dsnum = 351
    else: dsnum=dsnum+1

#closing the output file    
outfile.close()   


0001


FileNotFoundError: [WinError 3] The system cannot find the path specified: '/ICPSR_08378/DS0001/'

## Converting text to CSV
The following section of code transforms the tall ASCII file output from the previous code into a CSV file. This code draws upon the codebook that is inlcuded in the download of the data.

**Note: Some columns may have been ommitted. Please refer to the codebook from ICPSR for a full listing of variables/columns/fields.

In [3]:
import csv


#Naming the text file to use for input and CSV file for ouput (change directory references as needed). 
infile = '/ICPSR_08378/08378-all.txt'
outfile = open('/ICPSR_08378/08378-all.csv', 'w', newline='')

#Getting the output file set up and adding the header row and explanation row.
csvout = csv.writer(outfile, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
csvout.writerow(['STPREDIR', 'STNAME', 'STTYPE', 'STSUFDIR', 'NONSTCO', 'MAPNO1', 'MAPNO2','MAPNO3',
                 'MAPNO4','COFLAG', 'LEFTADD1', 'LEFTADD2', 'RGTADD1', 'RGTADD2', 'CENTRA1', 'CENTRA2',
                 'CENTRA3', 'CENTRA4', 'ZIPCOLEF', 'ZIPCORGT', 'SMSA', 'PLACO1', 'PLACO2', 'STCOLEFT', 
                 'CNTCOLFT', 'MCDCCDL', 'BLKLEFT', 'STCORGT', 'CNTCORGT', 'MCDCCDR', 'BLOCKRGT', 
                 'STPLACO1', 'STPLACO2', 'FROMLAT', 'FROMLONG', 'TOLAT', 'TOLONG', 'STPLA1', 'STPLA2',
                 'STPLA3', 'STPLA4', 'LEFTADD3', 'LEFTADD4', 'RGTADD3', 'RGTADD4'])
csvout.writerow(['STREET PREFIX DIRECTION', 'STREET OR NON-STREET FEATURE NAME', 'STREET TYPE', 
                 'STREET SUFFIX DIRECTION', 'NON-STREET FEATURE CODE', 'FROM MAP (BASIC NUMBER)', 
                 'FROM MAP (SUFFIX)', 'FROM MAP (BASIC NUMBER)', 'FROM MAP (SUFFIX)','CODING LIMIT FLAG',
                 'LEFT LOW ADDRESS (SEE FOOTNOTE 1)', 'LEFT HIGH ADDRESS (SEE FOOTNOTE 1)',
                 'RIGHT LOW ADDRESS (SEE FOOTNOTE 1)', 'RIGHT HIGH ADDRESS (SEE FOOTNOTE 1)',
                 'CENSUS TRACT LEFT (BASIC)', 'CENSUS TRACT LEFT (SUFFIX)', 'CENSUS TRACT RIGHT (BASIC)',
                 'CENSUS TRACT RIGHT (SUFFIX)', 'ZIP CODE LEFT', 'ZIP CODE RIGHT', 'SMSA', 
                 'PLACE CODE LEFT', 'PLACE CODE RIGHT', 'FIPS STATE CODE LEFT', 'FIPS COUNTY CODE LEFT',
                 'MINOR CIVIL DIVISION CODE/CENSUS COUNTY DIVISION CODE LEFT', 'BLOCK LEFT',
                 'FIPS STATE CODE RIGHT', 'FIPS COUNTY CODE RIGHT', 
                 'MINOR CIVIL DIVISION CODE/CENSUS COUNTY DIVISION CODE RIGHT', 'BLOCK RIGHT',
                 'FROM STATE PLANE CODE', 'TO STATE PLANE CODE', 'FROM LATITUDE (Y COORDINATE)',
                 'FROM LONGITUDE (X COORDINATE)', 'TO LATITUDE (Y COORDINATE)', 'TO LONGITUDE (X COORDINATE)',
                 'FROM STATE PLANE (Y COORDINATE)', 'FROM STATE PLANE (X COORDINATE)', 
                 'TO STATE PLANE (Y COORDINATE)', 'TO STATE PLANE (X COORDINATE)',
                 'LEFT LOW ADDRESS (SELECTED AREAS OF CHICAGO, MILWAUKEE, AND SAN JUAN ONLY.)',
                 'LEFT HIGH ADDRESS (SELECTED AREAS OF CHICAGO, MILWAUKEE, AND SAN JUAN ONLY.)',
                 'RIGHT LOW ADDRESS (SELECTED AREAS OF CHICAGO, MILWAUKEE, AND SAN JUAN ONLY.)',
                 'RIGHT HIGH ADDRESS (SELECTED AREAS OF CHICAGO, MILWAUKEE, AND SAN JUAN ONLY.)'])


#Reading in the text file and splitting the fixed-width text to columns.
fhand = open(infile)

for line in fhand :
    csvout.writerow([line[:2], line[2:22], line[22:26], line[26:28], line[28], line[45:48], 
                     line[48:50], line[50:53], line[53:55], line[55], line[56:62], line[62:68],
                     line[68:74], line[74:80], line[91:95], line[95:97], line[97:101], 
                     line[101:103], line[103:108], line[108:113], line[113:117], line [130:134],
                     line[134:138], line[138:140], line[140:143], line[143:146], line[151:154],
                     line[157:159], line[159:162], line[162:165], line[170:173], line[176:178],
                     line[178:180], line[204:210], line[210:217], line[217:223], line[223:230],
                     line[230:237], line[237:244], line[244:251], line[251:258], line[258:268],
                     line[268:278], line[278:288]])
    
outfile.close()

FileNotFoundError: [Errno 2] No such file or directory: '/ICPSR_08378/08378-all.csv'