##  Writing `zip` files to a Pandas DF as a `csv` file using `python list`. 
- 0. Download and inspect the zip file from the source using `wget.download()` method.
- 1. Extract the `Year` value from text file.
- 2. Iterate through zip file list
- 3. `Join` all dfs and create `csv` file.
- 4. Clean up.

In [2]:
import pandas as pd
import numpy as np
import os
from glob import glob
import wget
from zipfile import ZipFile 

- 0. Download and inspect the zip file from the source using `wget.download()` method.

In [2]:
url = 'http://www.ssa.gov/OACT/babynames/names.zip'
babynames = wget.download(url)
babynames

'names.zip'

Check the content of the zip file

In [7]:
zip_file = ZipFile(babynames)

text_files = zip_file.infolist()

for text_file in text_files:
    print(text_file)

<ZipInfo filename='yob1880.txt' compress_type=deflate external_attr=0x20 file_size=24933 compress_size=8461>
<ZipInfo filename='yob1881.txt' compress_type=deflate external_attr=0x20 file_size=24052 compress_size=8174>
<ZipInfo filename='yob1882.txt' compress_type=deflate external_attr=0x20 file_size=26559 compress_size=8910>
<ZipInfo filename='yob1883.txt' compress_type=deflate external_attr=0x20 file_size=26002 compress_size=8681>
<ZipInfo filename='yob1884.txt' compress_type=deflate external_attr=0x20 file_size=28670 compress_size=9522>
<ZipInfo filename='yob1885.txt' compress_type=deflate external_attr=0x20 file_size=28625 compress_size=9556>
<ZipInfo filename='yob1886.txt' compress_type=deflate external_attr=0x20 file_size=29822 compress_size=9870>
<ZipInfo filename='yob1887.txt' compress_type=deflate external_attr=0x20 file_size=29531 compress_size=9750>
<ZipInfo filename='yob1888.txt' compress_type=deflate external_attr=0x20 file_size=33064 compress_size=10845>
<ZipInfo filename=

- 1. Extract the `Year` value from text file.

In [12]:
zip_file.infolist()[0].filename

'yob1880.txt'

In [13]:
year_str = zip_file.infolist()[0].filename
year_str = year_str[3:7]
year = int(year_str)
year

1880

- 2. Iterate through zip file list

In [15]:
zip_file = ZipFile(babynames)

# create a list to append all dfs 
df_list = []

try:
    for text_file in zip_file.infolist():
        if text_file.filename.endswith('.txt'):     # exclude PDF file
            columns = ['Name', 'Sex', 'Count']  # set column names
            year = int(text_file.filename[3:7])
            df = pd.read_csv(zip_file.open(text_file.filename)
                            , header= None
                            , names= columns)        
            df['Year'] = year      # add Year column 
            df_list.append(df)     # add the df to the df_list
except Exception as e:
    print(e)
else:
    print(df_list[0])  # print the first df in the df_list

           Name Sex  Count  Year
0          Mary   F   7065  1880
1          Anna   F   2604  1880
2          Emma   F   2003  1880
3     Elizabeth   F   1939  1880
4        Minnie   F   1746  1880
...         ...  ..    ...   ...
1995     Woodie   M      5  1880
1996     Worthy   M      5  1880
1997     Wright   M      5  1880
1998       York   M      5  1880
1999  Zachariah   M      5  1880

[2000 rows x 4 columns]


- 3. `Join` all dfs and create `csv` file.

In [1]:
# final_df = pd.concat(df_list)
# final_df.to_csv('data/cleaned/SocialSecurityNamesAllYears.csv', index=False)

- 4. Clean up

In [3]:
os.remove('names.zip')